In case you haven’t read my blog on Gradient Descent yet. I would recommend you to read it first before starting with this. That will act as good fundation for this blog.
We know that in Regression Model. We try to fit a line (For Linear Regression) or plane/hyperplane (For multivariate Regression). So, that we can use that line/plane to do the prediction for the test/future data.
I already wrote a blog on Linear and Multivariate Regression Modeling implementation using Scikit Learn which you can find it here.
You may be wondering if we already covered it then we are we doing it again?
The thing sklearn.linear_model.LinearRegression uses the Ordinary least squares Linear Regression method for finding the line that best fits the data. However, we want to see how Gradient Descent work for linear regression.
You may ask now why do we need to do it with Gradient Descent if we can do it with the Ordinary least square method?
The Ordinary least square method finds the best slope by solving it for 0 that is solving for when the derivative is zero. However, sometimes it’s not possible to solve for where the derivative=0, and this is why Gradient Descent can be used in so many situations. I know this is not the case with Linear Regression you can easily solve for where derivative is ZERO for linear regression but it is easy to understand the usage and power of Gradient descent using Linear Regression. Moreover, the Neural Network uses the Gradient Descent as well and the cost function we use here MSE is also use in Neural Networks. So, this will also help us to extend the idea of Gradient Descent for Neural Network when we will discuss them.
Python Implementation
Let’s say we have x and y for the Linear Regression Model y = mx + c
where values of x and y are
x = np.array([1,2,3,4,5])
y = np.array([5,7,9,11,13])
We want to tune the paramters m & c using Gradient descent. So, that we can accurately find y given x.
Before starting with anything, decide the Cost function we will be using. For this implementation we will be using MSE (Mean Square Error) LOSS/COST function.
Step-1
We will start by initializing the LearningRate as well as the maximum number of iterations (Steps).
learningRate = 0.01
max_iterations = 10000
n = len(x)
Step-2
We will initialize all the parameters to random values we want to tune.
m=1 # slope
c=0 # intercept
Step-3
We will do the prediction using the current value of the parameters.
y_pred = ((m * x) + c)
Step-4
We then take the partial derivative of the Loss function for each parameter in it respectively. In fancy Machine learning, we will compute the Gradient of the loss function.
t1 = x * (y-y_pred)
t2 = (y-y_pred)
md = -(2/n)*sum(t1)
cd = -(2/n)*sum(t2)
Step-5
Updating the paramters
m = m - learningRate * md
c = c - learningRate * cd
Step-6
We are going to repeat 3 to 5 for the number of steps
for each in range(max_iterations):
# Step-2
# Prediction
y_pred = ((m * x) + c)
# Step-3
# Calculating Gradient (Vector of Partial Directives)
t1 = x * (y-y_pred)
t2 = (y-y_pred)
md = -(2/n)*sum(t1)
cd = -(2/n)*sum(t2)
# Step-4
# Update the parameters/variable
m = m - learningRate * md
c = c - learningRate * cd
print(f"Slope= {m}\nIntercept={c}")
Final Code
def gradientDescent(x,y):
# Initializing Learning rate
learningRate = 0.01
# Step-1
# Initialization
max_iterations = 10000
m=1
c=0
n = len(x)
for each in range(max_iterations):
# Step-2
# Prediction
y_pred = ((m * x) + c)
# Step-3
# Calculating Gradient (Vector of Partial Directives)
t1 = x * (y-y_pred)
t2 = (y-y_pred)
md = -(2/n)*sum(t1)
cd = -(2/n)*sum(t2)
# Step-4
# Update the parameters/variable
m = m - learningRate * md
c = c - learningRate * cd
print(f"Slope= {m}\nIntercept={c}")
x = np.array([1,2,3,4,5])
y = np.array([5,7,9,11,13])
gradientDescent(x,y)
Tuning Learning Rate and Iterations (Steps)
We usually start by keeping the number of iterations (steps) less and start with small learning rate say 0.001. We will see if we are reducing the cost for each iteration. If we are reducing the cost then we can increse the learning rate say from 0.001 to 0.1 if cost is still reducing. We can keep on increasing the learning rate till we find that our cost is increasing that is we are crossing our Global Minima. So, we have to be less than the learning rate which is increasing our COST as our goal is to reduce the cost with each step.
We are trying the maximum Learning Rate for which our cost is descreasing once we find it. We can increase our number of iterations (Steps).
There is no exact method/way for finding the optimal value for Learning Rate and steps it’s a more of a hit and trial. This may be one of the way to go about it.
These are some good youtube videos doing basically the same thing.
- https://www.youtube.com/watch?v=vsWrXfO3wWw&list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw&index=5
- https://www.youtube.com/watch?v=4PHI11lX11I&t=7s