Simplified Math Behind Linear Regression: Unlocking Predictive Insights

Introduction:

Linear regression is a fundamental concept in machine learning that allows us to use a straight line to model and solve predictive problems. At its core, linear regression aims to find the best-fit line that represents the relationship between independent and dependent variables in a dataset. In this blog post, we'll explore the intricacies of linear regression, from the basic equation to the mathematical foundations behind it.

Linear line equation:

To use linear regression means to use a straight line to solve a Machine learning problem. So it means we have to generalize the given data points using a straight line equation. The equation of a straight line is given by: \( y=mx+c\)

Here, "m" represents the slope of the line, indicating how much "y" changes when "x" is incremented by 1 unit, and "c" represents the y-intercept, indicating where the line intersects the y-axis.

Visualizing the Relationship:

When working with a 2-D dataset, we can easily visualize it in 2-D space. Linear regression is all about finding a linear relationship between data points. However, there can be multiple straight lines that fit the data. The key is to identify the best-fit line.

Finding the Best-Fit Line:

The best-fit line is the one that minimizes the distance between the line and all the data points. To discover this line, we begin with a random line equation and refine it iteratively until it best represents our data. It simply means that we will start with a line \(h(\theta) = \theta_1 x + \theta_0\) which is the same as \(y=mx+c\)

Tweaking the above random straight line means changing the parameters \(\theta_0 , \theta1 \) So to update these values we will be using the gradient descent algorithm to update these values
\(\theta_0 new= \theta_0 old - \alpha (\frac{{\partial}}{{\partial \theta_0}} cost function)\)

\(\theta_1 new= \theta_1 old - \alpha (\frac{{\partial}}{{\partial \theta_1}} cost function)\)
But what is this cost function term and \(\alpha\)

in this equation? The cost function can be simply understood as the distance between the data point and the point of our straight line. \(\alpha\) is the learning rate. Learning should not be too high or it won't converge to the least value and a low learning rate means it will take more time to converge to the least point.
So with this definition of cost function, we can simply find our cost function. Suppose \(y\) is a data point and \(h(\theta)\) is the point on our best-fit line, then the distance between them is \(h(\theta) - y\) But this distance can be negative so to avoid that we can square it \((h(\theta) - y)^2\)
But now we have to consider all the data points. so we can change our equation for i number of data points. \(\sum_{m=0}^{i} (h(\theta_m) - y_m)^2\)
To find the average distance we can divide the above equation and the updated equation for the average distance becomes \(\frac{1}{i} \sum_{m=0}^{i} (h(\theta_m) - y_m)^2\)
So we can find the cost function equation, but we need to find the derivate of this cost function and we can see that there is a square term present in the equation so if we divide the equation with 2 it will help us to find the derivate quickly. General derivate formula \((\frac{d}{dx}(x^2) = 2x)\) and now the updated formula for cost function becomes \(\frac{1}{2i} \sum_{m=0}^{i} (h(\theta_m) - y_m)^2\)
Now that we know the values of the differentiable of cost function we can modify our original equation. \(\theta_0 new= \theta_0 old - \alpha (\frac{{\partial}}{{\partial \theta_0}} \frac{1}{2i} \sum_{m=0}^{i} (h(\theta_m) - y_m)^2)\)

\(\theta_1 new= \theta_1 old - \alpha (\frac{{\partial}}{{\partial \theta_1}} \frac{1}{2i} \sum_{m=0}^{i} (h(\theta_m) - y_m)^2)\)
Once we can find the best values we can draw our best-fit line \(h(\theta) = \theta_1 new*x + \theta_0 new\)
So once the best-fit line is formed we use the linear regression to solve our Machine learning regression problem

Conclusion:

Linear regression is a powerful tool in machine learning, allowing us to model relationships between variables using a simple straight line. By understanding the fundamental concepts and the optimization process involved, you can leverage linear regression to solve a wide range of regression problems in the field of data science and machine learning.