Is Mathematics important in Data Science?

Bi
3 min readMar 13, 2021

Hi there, I see you have been wondering the same thing as I. In the field of Data Science, mathematics can best be described as a tool in manipulating your data and deriving the best possible analyses and relationship between the data. Some of the most frequently used concepts are Standard deviation, Variance, z-score, etc. As an example, we would like to look at two concepts: Standard Deviation and Z-score and how it can be used as a Financial Analyst or Stockbroker.

We have a sample data of the returns of a company stock over the last 5 years: 10% in 2016, 40% in 2017, -4% in 2018, 80% in 2019 and 65% in 2020. The first course of action will be to find the relationship between the returns over these past 5 years using the mean. The mean (average) of the returns of the last 5 years is calculated to be 38.2%. That is, over the past 5 years the investment yielded 38.2% ROI. Now, we will like to determine how far apart from this 5 year average the yearly ROIs were each. We will subtract the individual ROIs from the ROIavg (38.2%) which gives 28.2%, 1.8%, 32.2%, 41.8% and 26.8% respectively. We will proceed to square up these variances giving: 0.079524, 0.000324, 0.103684, 0.174724 and 0.071824. The sum of these is 0.43008, to find the sample variance, we divide the sum by 4 which is 0.10752. The standard Deviation (SD) is the square root of the variance which is 32.79%

The SD of the sample is 32.79% which is less than 50% which means that the ROIs are not far off from the ROIavg. A volatile stock has a high SD while a stable blue-chip company usually has a low SD. The SD gives us insight to how volatile a stock can be. The higher the volatility of a stock, the riskier the stock is. That is why stocks with high ROIs usually have higher Risks attached. Hence, in advising a customer, the customer’s risk appetite is often considered.

Let’s look at one more concept, the Z-score. The Z-score measures the relationship between each value and the mean of a sample data. It is measured relative to the standard deviation of the sample data.

The formulae: : Z=(x- µ)/σ

µ=mean of the sample
x= raw score
σ=Standard Deviation of the sample

A positive Z-score means that the value is above the mean while a negative Z-score shows that the value is below the mean. If the Z-score is 0, this means that the value is the same as the mean value. If the Z-score is 1, it means that the value is 1 standard deviation away from the mean.

Z-score vs Standard Deviation

The Standard Deviation measures how far a value is from the mean while the Z-score measures how far a value is from the mean in terms of the standard deviation. That is, the Z-score tells us the number of standard deviations a value is away from the mean.

In Summary,

The Standard Deviation of a sample tells us degree of variability of a data set. That’s why it is often relied upon in risk analysis of because it can reveal the degree of volatility of a data set.

The formula for Standard Deviation is: σ=√(Σ(x- µ)² )/N

x=each value from the population
µ=The population mean
N=Size of the population
σ=Standard Deviation

The Z-score or standard score is the number of standard deviations a value lies above or below mean.

The Z-score can tell you how a company will perform in the overall market, if it is close to bankruptcy or not. A company with a Z-score of less than 1.8 shows that the company is headed for bankruptcy while a company with a Z-score closer to 3 suggests the company is in solid financial positioning.

Hence, Mathematics is an essential tool in Data science because it used for proper analysis and decision making.

--

--

Bi

Data Science, Software Development and lots of Satire...