Intro to Linear and Logistic Reg
Intro to Linear and Logistic Reg
Machine learning models help us find patterns in data. Two popular models are linear regression and
logistic regression. Although they sound similar, they are used for different types of problems. In this
document, we'll learn what each model does, see some simple code examples, and understand when
to use them.
1. Linear Regression
What Is Linear Regression?
Linear regression is a statistical method used to predict a continuous outcome (or target variable)
based on one or more predictor variables (features). The relationship between the variables is
modeled as a straight line:
ϵ\epsilonϵ: The error term (difference between predicted and actual value).
Predicting continuous values: For example, predicting house prices, temperature, or stock
prices.
Simple relationships: When the relationship between the input variables and the output is
approximately linear.
Below is a simple example using Python's scikit-learn library to perform linear regression.
python
CopyEdit
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1) # y = 4 + 3x + noise
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
plt.xlabel("X")
plt.ylabel("y")
plt.title("Linear Regression")
plt.legend()
plt.show()
Explanation:
Data Generation: We create synthetic data where y is roughly equal to 4 + 3X with some
added noise.
Train-Test Split: We divide the data into training and testing sets.
Prediction and Visualization: We predict the output on the test set and plot the actual data
points and the regression line.
2. Logistic Regression
What Is Logistic Regression?
Logistic regression is used for classification problems, where the goal is to predict a categorical
outcome (typically binary). Instead of fitting a line, logistic regression fits an S-shaped curve (sigmoid
function) to model the probability that an instance belongs to a particular class.
The output σ(z)\sigma(z)σ(z) is the probability that the input belongs to class 1.
Binary classification: For example, spam detection (spam vs. not spam), medical diagnosis
(disease vs. no disease).
Here’s a simple example using Python's scikit-learn library to perform logistic regression.
python
CopyEdit
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
np.random.seed(42)
X = np.random.randn(100, 1)
y = (X[:, 0] > 0).astype(int) # Class 1 if X > 0, else Class 0
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_range)[:, 1]
plt.xlabel("X")
plt.title("Logistic Regression")
plt.legend()
plt.show()
Explanation:
Data Generation: We create synthetic data where the label is 1 if X > 0 and 0 otherwise.
Train-Test Split: The data is split into training and testing sets.
Prediction and Evaluation: We predict the classes for the test data and compute the
accuracy.
Visualization: We plot the test points and the predicted probability curve showing the
decision boundary.
Model Output:
Use Cases:
4. Summary
In this document, we covered:
Logistic Regression: How to classify data using the sigmoid function to predict probabilities.
Code Examples: Step-by-step Python code using scikit-learn for both methods.
Both methods are fundamental tools in the data science and machine learning toolkit. Experimenting
with them on different datasets will help you understand their strengths and limitations.