Is Mathematics important in Data Science?

3 min readMar 13, 2021

Hi there, I see you have been wondering the same thing as I. In the field of Data Science, mathematics can best be described as a tool in manipulating your data and deriving the best possible analyses and relationship between the data. Some of the most frequently used concepts are Standard deviation, Variance, z-score, etc. As an example, we would like to look at two concepts: Standard Deviation and Z-score and how it can be used as a Financial Analyst or Stockbroker.

We have a sample data of the returns of a company stock over the last 5 years: 10% in 2016, 40% in 2017, -4% in 2018, 80% in 2019 and 65% in 2020. The first course of action will be to find the relationship between the returns over these past 5 years using the mean. The mean (average) of the returns of the last 5 years is calculated to be 38.2%. That is, over the past 5 years the investment yielded 38.2% ROI. Now, we will like to determine how far apart from this 5 year average the yearly ROIs were each. We will subtract the individual ROIs from the ROIavg (38.2%) which gives 28.2%, 1.8%, 32.2%, 41.8% and 26.8% respectively. We will proceed to square up these variances giving: 0.079524, 0.000324, 0.103684, 0.174724 and 0.071824. The sum of these is 0.43008, to find the sample variance, we divide the sum by 4 which is 0.10752. The standard Deviation (SD) is the square root of the variance which is 32.79%

The SD of the sample is 32.79% which is less than 50% which means that the ROIs are not far off from the ROIavg. A volatile stock has a high SD while a stable blue-chip company usually has a low SD. The SD gives us insight to how volatile a stock can be. The higher the volatility of a stock, the riskier the stock is. That is why stocks with high ROIs usually have higher Risks attached. Hence, in advising a customer, the customer’s risk appetite is often considered.

Let’s look at one more concept, the Z-score. The Z-score measures the relationship between each value and the mean of a sample data. It is measured relative to the standard deviation of the sample data.

The formulae: : Z=(x- µ)/σ

µ=mean of the sample
x= raw score
σ=Standard Deviation of the sample

A positive Z-score means that the value is above the mean while a negative Z-score shows that the value is below the mean. If the Z-score is 0, this means that the value is the same as the mean value. If the Z-score is 1, it means that the value is 1 standard deviation away from the mean.

Z-score vs Standard Deviation

The Standard Deviation measures how far a value is from the mean while the Z-score measures how far a value is from the mean in terms of the standard deviation. That is, the Z-score tells us the number of standard deviations a value is away from the mean.

In Summary,

The Standard Deviation of a sample tells us degree of variability of a data set. That’s why it is often relied upon in risk analysis of because it can reveal the degree of volatility of a data set.

The formula for Standard Deviation is: σ=√(Σ(x- µ)² )/N

x=each value from the population
µ=The population mean
N=Size of the population
σ=Standard Deviation

The Z-score or standard score is the number of standard deviations a value lies above or below mean.

The Z-score can tell you how a company will perform in the overall market, if it is close to bankruptcy or not. A company with a Z-score of less than 1.8 shows that the company is headed for bankruptcy while a company with a Z-score closer to 3 suggests the company is in solid financial positioning.

Hence, Mathematics is an essential tool in Data science because it used for proper analysis and decision making.

Is Mathematics important in Data Science?

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Bi

No responses yet