Intro to Linear and Logistic Reg

Introduction to Linear Regression and Logistic Regression
Machine learning models help us find patterns in data. Two popular models are linear regression and
logistic regression. Although they sound similar, they are used for different types of problems. In this
document, we'll learn what each model does, see some simple code examples, and understand when
to use them.
1. Linear Regression
What Is Linear Regression?
Linear regression is a statistical method used to predict a continuous outcome (or target variable)
based on one or more predictor variables (features). The relationship between the variables is
modeled as a straight line:
y=β0+β1x1+β2x2+⋯+βnxn+ϵy = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_n + \

epsilony=β0+β1x1+β2x2+⋯+βnxn+ϵ
 yyy: The predicted value.
 xix_ixi: The input features.
 βi\beta_iβi: The coefficients that the model learns.
 ϵ\epsilonϵ: The error term (difference between predicted and actual value).
When to Use Linear Regression?
 Predicting continuous values: For example, predicting house prices, temperature, or stock
prices.
 Simple relationships: When the relationship between the input variables and the output is
approximately linear.
Linear Regression Code Example
Below is a simple example using Python's scikit-learn library to perform linear regression.
python
CopyEdit
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
# Generate some synthetic data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1) # y = 4 + 3x + noise
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Print model coefficients

print("Intercept:", model.intercept_)
print("Coefficient:", model.coef_)
# Visualize the results
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel("X")
plt.ylabel("y")
plt.title("Linear Regression")
plt.legend()
plt.show()
Explanation:
 Data Generation: We create synthetic data where y is roughly equal to 4 + 3X with some
added noise.
 Train-Test Split: We divide the data into training and testing sets.
 Model Training: We fit a LinearRegression model using the training data.
 Prediction and Visualization: We predict the output on the test set and plot the actual data
points and the regression line.
2. Logistic Regression
What Is Logistic Regression?
Logistic regression is used for classification problems, where the goal is to predict a categorical
outcome (typically binary). Instead of fitting a line, logistic regression fits an S-shaped curve (sigmoid
function) to model the probability that an instance belongs to a particular class.
The sigmoid function is defined as:
σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1
where zzz is a linear combination of the input features:
z=β0+β1x1+β2x2+⋯+βnxnz = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_nz=β0+β1x1

+β2x2+⋯+βnxn
The output σ(z)\sigma(z)σ(z) is the probability that the input belongs to class 1.
When to Use Logistic Regression?
 Binary classification: For example, spam detection (spam vs. not spam), medical diagnosis
(disease vs. no disease).
 Probability estimation: When you want to estimate the probability of belonging to a

particular class.
Logistic Regression Code Example
Here’s a simple example using Python's scikit-learn library to perform logistic regression.
python
CopyEdit
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Generate synthetic binary classification data
np.random.seed(42)
X = np.random.randn(100, 1)
y = (X[:, 0] > 0).astype(int) # Class 1 if X > 0, else Class 0
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Model accuracy:", accuracy)
# Plotting decision boundary
plt.scatter(X_test, y_test, color='blue', label='Actual')

# Create a range of values for plotting decision boundary
X_range = np.linspace(X.min(), X.max(), 300).reshape(-1, 1)
y_prob = model.predict_proba(X_range)[:, 1]
plt.plot(X_range, y_prob, color='red', label='Predicted Probability')
plt.xlabel("X")
plt.ylabel("Probability of Class 1")
plt.title("Logistic Regression")
plt.legend()
plt.show()
Explanation:
 Data Generation: We create synthetic data where the label is 1 if X > 0 and 0 otherwise.
 Train-Test Split: The data is split into training and testing sets.
 Model Training: We fit a LogisticRegression model on the training data.
 Prediction and Evaluation: We predict the classes for the test data and compute the
accuracy.
 Visualization: We plot the test points and the predicted probability curve showing the
decision boundary.
3. Key Differences and When to Use Each

 Type of Outcome:
o Linear Regression: Used for predicting continuous numerical outcomes.
o Logistic Regression: Used for predicting categorical outcomes (typically binary).
 Model Output:
o Linear Regression: Produces a continuous value (which can be interpreted directly as

the prediction).
o Logistic Regression: Produces a probability between 0 and 1 that can be converted

into a class label.
 Use Cases:
o Linear Regression: Price prediction, temperature forecasting, trend analysis.
o Logistic Regression: Email spam classification, disease diagnosis, customer churn

prediction.
4. Summary
In this document, we covered:
 Linear Regression: How to predict continuous outcomes using a straight line.
 Logistic Regression: How to classify data using the sigmoid function to predict probabilities.
 Code Examples: Step-by-step Python code using scikit-learn for both methods.
 When to Use Them: Key differences in their applications and outputs.
Both methods are fundamental tools in the data science and machine learning toolkit. Experimenting
with them on different datasets will help you understand their strengths and limitations.
Happy learning and coding!

Intro to Linear and Logistic Reg

Uploaded by

Intro to Linear and Logistic Reg

Uploaded by

Introduction to Linear Regression and Logistic Regression

y=β0+β1x1+β2x2+⋯+βnxn+ϵy = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_n + \

 yyy: The predicted value.

 xix_ixi: The input features.

 βi\beta_iβi: The coefficients that the model learns.

When to Use Linear Regression?

Linear Regression Code Example

# Generate some synthetic data

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the linear regression model

# Make predictions on the test set

# Print model coefficients

# Visualize the results

plt.scatter(X_test, y_test, color='blue', label='Actual')

plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')

 Model Training: We fit a LinearRegression model using the training data.

The sigmoid function is defined as:

σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1

where zzz is a linear combination of the input features:

z=β0+β1x1+β2x2+⋯+βnxnz = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_nz=β0+β1x1

When to Use Logistic Regression?

 Probability estimation: When you want to estimate the probability of belonging to a

Logistic Regression Code Example

# Generate synthetic binary classification data

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Make predictions on the test set

accuracy = accuracy_score(y_test, y_pred)

print("Model accuracy:", accuracy)

# Plotting decision boundary

plt.scatter(X_test, y_test, color='blue', label='Actual')

X_range = np.linspace(X.min(), X.max(), 300).reshape(-1, 1)

plt.plot(X_range, y_prob, color='red', label='Predicted Probability')

plt.ylabel("Probability of Class 1")

 Model Training: We fit a LogisticRegression model on the training data.

3. Key Differences and When to Use Each

o Linear Regression: Used for predicting continuous numerical outcomes.

o Logistic Regression: Used for predicting categorical outcomes (typically binary).

o Linear Regression: Produces a continuous value (which can be interpreted directly as

o Logistic Regression: Produces a probability between 0 and 1 that can be converted

o Linear Regression: Price prediction, temperature forecasting, trend analysis.

o Logistic Regression: Email spam classification, disease diagnosis, customer churn

 Linear Regression: How to predict continuous outcomes using a straight line.

 When to Use Them: Key differences in their applications and outputs.

Happy learning and coding!

You might also like