Logistic Regression : From Probabilities to Predictions

Introduction:

In the world of machine learning and statistics, Logistic Regression stands as a powerful and versatile tool for classification tasks. Despite its name, it's not just about regression; rather, it's a method for estimating probabilities and making binary or multi-class classifications. In this comprehensive technical blog post, we'll delve deep into the workings of Logistic Regression, its mathematical underpinnings, real-world applications, and how it bridges the gap between regression and classification.

Understanding the Binary Classification Problem:

At the core of Logistic Regression lies the binary classification problem. Given input features, we aim to predict one of two possible outcomes, typically represented as 0 (negative class) and 1 (positive class). Logistic Regression deals with the probability of an instance belonging to the positive class.

Regression Algorithm to solve classification problem?

Can we solve a classification problem using a regression algorithm? Let's suppose we use linear regression to solve our classification problem then what will happen?

Problem with Linear Regression for Classification:

If we use linear regression then we will get 2 scenariors where it may fail:

Outlier: if an outlier is present in the data set then our best-fit line will tweak towards that outlier and our hypothesis will give us a wrong result
Beyond range: When we try to predict a class of a new point if we get its projection above 1 or below 0 then it will violate our hypothesis for classification.

Squashing: How to resolve the above problem:

We can easily resolve the above problem by squashing our linear regression line so that our hypothesis is not violated and outliers don't disturb our prediction. Squashing is generally done with a sigmoid/logistic function, thus the name logistic regression (Linear Regression + Squashing(Logistics function))

Mathematical Intuition Behind Logistic Regression:

Our goal here is the same as we did in linear regression: to reduce the cost function for our algorithm. But before that let's derive the cost function of logistic regression.

The cost function for linear regression is given by :

\(\frac{1}{2i} \sum_{m=0}^{i} (h(\theta_m) - y_m)^2\) and \(h(\theta) = \theta_1 x + \theta_0\)
But now we will apply squashing to this as per our earlier conclusion \(h(\theta) =\frac{1}{1 + e^{- (\theta_1 x + \theta_0)}}\)
However, researchers found that the above function is nonconvex and it cannot bring us to the global minimum. thus, they came up with a new way to convert this function to a convex function
\(\text{When } y = 0: -\log(1 - h(\theta))\)

\(\text{When } y = 1: -\log(h(\theta))\)
We can easily generalize this equation

\(h(\theta) = -y \log(h(\theta_{\text{old}})) - (1-y) \log(1 - h(\theta_{\text{old}}))\)
We can use this value in step 1 for the cost function and then update this value in each iteration of the convergence algorithm

Real-World Applications

Logistic Regression finds applications across various domains:

Medical Diagnosis: Predicting disease outcomes based on patient data.
Credit Scoring: Assessing creditworthiness and risk prediction.
Marketing: Identifying potential customers for targeted campaigns.
Natural Language Processing (NLP): Sentiment analysis and text classification.
Image Classification: Binary image categorization tasks.

Limitations and Assumptions

No model is without limitations. Logistic Regression assumes linearity and may not perform well on highly non-linear data.

Conclusion

Logistic Regression, with its solid mathematical foundation and practicality, remains a fundamental tool in the machine-learning landscape. By mastering its principles and techniques, you gain a powerful ally in tackling classification challenges across diverse domains. This blog post aims to equip you with the knowledge and insights needed to harness the potential of Logistic Regression in your data science endeavors.