Role Of Probability and Statistics in Machine Learning

Why should an aspiring Data Scientist and Machine Learning Engineer care about Probability and Statistics.

Yash Agarwal

--

When I started learning Machine Learning ( and even now ) I came across a lot of articles, blogs, videos and tutorials where people always say that before getting into Machine Learning one must understand Probability and Statistics and all them start with those concepts.

Now I personally do not have a problem with that, sure I would love to know these fields as well , “A Knowledge Gain Is Never Wasted” but why should I learn these now and specially for Machine Learning?? Because as soon as all of those articles, blogs, videos and tutorials that I talked about earlier start with the concepts of Machine Learning some how they are not able to link the concepts of Machine Learning to that of Probability and Statistics and I am left feeling that why did I waste my time in learning that now when I can be learning more about Machine Learning Algorithms and how to implement them.

So after a lot of head banging on the internet I finally have the Answer to this Golden Question and I would like to explain that to you all how I would love to explain Machine Learning to someone who is still in their 12th standard and learning all these topics probably right now.

ALERT!!! If you absolutely hate Maths then you should leave now, if you can handle a little bit of Maths you are going to love this explanation.

So let’s start with our first equation.

y = mx + c

See it’s a simple and innocent equation nothing to be scared of, we all have seen this equation form the time when we were in our 9th or 10th standard. And yes this is the equation with which I will explain everything!!!

Some details about the equation it is an equation of a straight line whose slope is ‘m’ and intercept is ‘c’. We call these variable ‘m’ and ‘c’ as Parameters since they define how our line will look. I hope you guys know this from your school days.

Now you remember how in your Maths exam you used to get a question like:

If a line passes by (1,1) and (4,5) what is the equation of the line?

What the question is really asking you to find is the Parameters of the equation given two points that lie on the line. Let’s say the two points given are (x1, y1) and (x2, y2) we can easily calculate ‘m’ and ‘c’ and hence the line equation.

y1 = mx1 + c
y2 = mx2 +c

(y1 - y2) = m(x1 - x2)
(y1 - y2) / ( x1 - x2) = m

Put the value of m in any one of the equation above to get c

y1 = x1(y1-y2)/(x1 - x2 ) + c
(x1y2 - x2y1) / (x1 - x2) = c

Now for our stupid exam we had x1= 1 , y1 =1 , x2=4 and y2= 5. Let’s plug these value to get ‘m’ and ‘c’ and get the equation of the line.

(y1 - y2) / ( x1 - x2) = m
(1 - 5) / (1 - 4) = m
4/3 = m

And
(x1y2 - x2y1) / (x1 - x2) = c
(1*5 - 4*1) / (1 - 4) = c
-1/3 = c

So Line euqation is
y = 4x/3 -1/3

So now that you have done this question you feel pretty good about yourself and move on to the next question to find that your teacher is very clever as the next question is.

What is value of ‘y’ from the above equation if ‘x = 3’?

And you are like what the….

So if you are unsuccessful is deriving the correct Parameters from the given (x1,y1) and (x2,y2) you do not have a correct Equation and therefore you will not be able to answer this question correctly as well.

Remember the above statement it will come again later :P

But for now we have the Parameters and Equation with us to let’s plug. ‘x = 3 ‘ and see what the result is

y = 4x/3 - 1/3
y = 11/3

I know you guys are pretty bored by and asking me why the hell are we doing 9th standard Maths and solving their question paper.

Well because my friends what we did essentially ( with a super simple example ) is the core of Machine Learning and each Algo that you study and will come across your journey will do exactly the same.

You will be given some data points like the first question (x1,y1) and (x2,y2) ( we considered 2 in real world you will have million’s). From these given Data Points you will have to identify the Parameters (again we considered 2 in real world you will have hundreds or maybe thousands)which decides how these data points are behaving and how are they connected ( in our case a super simple Line Equation ). Machine Learning is like our smart teacher that takes it to the next part of the question that is , Now we have the Parameters and if we give a new ‘x_new' what is the corresponding ‘y_new’ and the world went crazy…..Magic Right!!!! No Maths!!!! This is a super simple crux of Machine Learning and all the Algorithms that I have come across till now They all do this same thing exactly but on a much much larger scale than what I have explained. Told you to remember that statement in the above section :D

Let’s try to understand it from an image and try to refine that image as we move forward.

A OverSimplified Machine Learning Cycle. Image by Author

As explained earlier the Step#1 is we are given some input data from which we need to evaluate the parameters of the underlying data like question 1 in our case. Then the Step#2 is from those parameters we produce output for a new data like question 2.

Now let’s add some Machine Learning to this image to have a better clarity.

Putting Machine Learning Terms to our Cycle.Image by Author

So Step#1 is what is usually call the Training/Learning stage where the machine is learning the parameters of the given data.The output of this is a model which have these learned parameters.

Whereas the Step#2 is called the Prediction stage where we give the model which has these parameters some future/new/unseen data to get the output on that data point.

I know, I know that most of you are well aware of this and you are like, “ Hey you promised to tell how Probability and Statistics are related to this”.

Well remember what I said just now let’s refine the above image to understand the very question that we started with.

Probability and Statistics in Machine Learning Cycle.Image by Author

BBBBAAAAAMMMM!!! This is where both Statistics and Probability comes into picture.

Statistics plays a role in Step#1 where we have some given data and we want to make some statements about the parameters.

Probability plays a role in Step#2 where we are trying to make some statement of the observations given parameters.

Ever wondered why the cost function is what it is and how do we get it and why we see and get the equation we do in Machine Learning, well all that can come right from the use of statistics and probability. So if you start with the maths of it from step#1 using statistics and probability in step#2 you will see that all the equations popping out from it. ( Now all that detailed maths is out of scope for this article, but if you want to know some contact me or let me know I can try to write another article about it).

Also I found a similar article on Medium so I am refering to an image that was posted on that article. Do check out that article as well here is the link to that https://medium.com/towards-data-science/probability-vs-statistics-for-data-science-and-machine-learning-84f00bf67ce1.

Image by Travis Tang

Also I would like to give a mention Stanford for putting up their entire lecture series on various topics on youtube along with the Lecture Notes and Slides and various other things that are made publicly available so that people like me with the intense curiosity can learn from the best.

Well as I said I will try to explain this to you as I would love to explain it to someone in 12th standard. So this of my attempt on the simplest way possible to explain this to you . It can feel over-simplified but if you sit down for a while and think about it isn’t it what it is all about.

Hope this gives some intuition to the people who are completely oblivious about Machine Learning as to what is Machine Learning and how is Probability and Statistics related to it.

REFERENCES

STANFORD YOUTUBE CHANNEL https://www.youtube.com/@stanfordonline

STANFORD YOUTUBE MACHINE LEARNING COURSE https://www.youtube.com/playlist?list=PLoROMvodv4rNH7qL6-efu_q2_bPuy0adh

EXACT LECTURE VIDEO WHERE THIS QUESTION IS ANSWERED https://www.youtube.com/watch?v=Mi8wnYc1m04&list=PLoROMvodv4rNH7qL6-efu_q2_bPuy0adh&index=3

“ I would recommend going through this famous CS-229 Machine Learning course by Stanford. It’s super awesome, some fun and you will learn a lot of things about Machine Learning that you will not get anywhere else. But be prepared it has a lot of Maths involved in it. :D”

--

--

Yash Agarwal
Yash Agarwal

Written by Yash Agarwal

Data Scientist/ Machine Learning Engineer

Responses (1)