Before we begin our tutorial in Machine Learning Using Python let us first explain to you that you need to replicate some non-trivial algorithms into Python code in an effort to create by calculating the best-fit line model for a particular dataset.
However, before we go down that path to understand such complexities let us ask a very important question, why do we even need to bother with all such things? That is because linear regression is fundamentally the building blocks of machine learning. They are used in almost all major machine-learning algorithms so to gain an understanding of this concept will be beneficial for you to build a strong foundation for most major machine learning algorithms. Due to this very reason, Machine Learning course in India are gaining a lot of traction and flourishing around the nation.
Those keen on learning these concepts, the first step for them is to understand linear regression and the general linear algebra which will take closer to writing their own custom machine learning algorithms, using whichever is the best processing system at the given situation. With the gradual improvements in processing and the changes in hardware architecture the methods used for machine learning are also likely to change.
The very recent increase in the neural networks has had a lot to do with the general purpose of graphic processing units. But have you ever given a thought to what is at the heart of an artificial neural network? And the answer to that is – linear regression.
This is the calculation for the best-fit/ regression/ ‘y-hat’ line’s slope, m:
If this looks too complex at a glance, do not fret. As we will break it down to different parts. We begin with a couple of imports:
from statistics import mean import numpy as np
We are now importing the mean from statistics so that we can get the mean of a list or array effortlessly. The next step is to grab a numpy as np so that we can build NumPy arrays. There are a lot of things we can do with these lists, however, we must be capable of doing some simple matrix operations. These may not available with simple lists and that is why we will be using NumPy. We will not be getting too complex at this point with the NumPy, however later on NumPy is going to be a good friend for you. The next step is to define some beginning data points:
xs = [1,2,3,4,5] ys = [5,4,6,5,6]
These are the data points we are going to make use of as Xs and Ys. It may already be coming to you for framing it, as you know that the Xs are the features and the Ys are the labels. They both may be features, which we are interconnecting to establish relationships. As mentioned previously, we really want these to be NumPy arrays so that we can perform matrix operations on them. Therefore, we need to modify these two lines as given below:
xs = np.array([1,2,3,4,5], dtype=np.float64) ys = np.array([5,4,6,5,6], dtype=np.float64)
There are the NumPy arrays and we will be explicit with the data types here; instead of getting in too deep in this case. Data types have various attributes and these attributes boil down to how the data within its own is stored into memory and has the feasibility to be manipulated. This will not matter as much right now as it will going down the line forward when and if we are doing massive operations and hoping to get them done on our own GPUs rather CPUs.
If we graph our data it should look something like this:
Now we are almost ready to build the function to compute m, as this is our regression line’s slope:
def best_fit_slope(xs,ys): return m m = best_fit_slope(xs,ys)
This is our skeleton and now we have to fill it in.
The first order of business in order to do so, is to find the mean of the x points, which must be multiplied by the mean of our Y points. Here is what it looks as we continue to fill out our skeleton:
def best_fit_slope(xs,ys): m = (mean(xs) * mean(ys)) return m
Doing these were simple enough so far. One can use the mean function on lists, arrays, tuples etc. we would advise you to take a closer look at the use of parenthesis here. Python honours the order of operations concerning math. Thus, if you want to ensure order, make sure you are explicit. And do not forget the rule of PEDMAS.
The next step is to subtract the mean of x*y, which will be our matrix operation: mean (xs*ys). The full:
def best_fit_slope(xs,ys): m = ( (mean(xs)*mean(ys)) - mean(xs*ys) ) return m
It may not be necessary by the order of operations to encase the entire calculation within parenthesis. But we will still be doing it here so that we can add a new line after our divisions. This will make things a little more easier to read to follow. As without this, we would get a syntax error at the new line. We are almost complete here now we just have to subtract the mean of the squared x values: mean (xs*xs). Here again we cannot get away with a simple carrot 2, but we can multiple the array on its own and get an equal outcome that we desire. To sum things up together now:
def best_fit_slope(xs,ys): m = (((mean(xs)*mean(ys)) - mean(xs*ys)) / ((mean(xs)**2) - mean(xs*xs))) return m
So, that makes our full script as this:
from statistics import mean import numpy as np xs = np.array([1,2,3,4,5], dtype=np.float64) ys = np.array([5,4,6,5,6], dtype=np.float64) def best_fit_slope(xs,ys): m = (((mean(xs)*mean(ys)) - mean(xs*ys)) / ((mean(xs)**2) - mean(xs**2))) return m m = best_fit_slope(xs,ys) print(m)
The next step is the need to calculate the Y intercept: b. We will be tackling that in the very next Machine Learning training course and will also complete the best-fit line overall calculation. It will be easier to calculate than the one for the calculation of the slope. You must try to write your own functions to do so.
For those who do not know how to do, we highly recommend you stay tuned for the next tutorial from DexLab Analytics, the premiere Machine Learning Certification provider in Pune as we will be doing much more than simply computing for b.
For more information on Machine Learning courses in Pune follow our regular updates in this blog channel.