0% found this document useful (0 votes)
2 views5 pages

Intro to Linear and Logistic Reg

This document introduces linear regression and logistic regression, explaining their purposes and applications. Linear regression predicts continuous outcomes using a straight line, while logistic regression is used for binary classification by modeling probabilities with a sigmoid function. It also includes code examples in Python's scikit-learn for both methods and highlights their key differences.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
2 views5 pages

Intro to Linear and Logistic Reg

This document introduces linear regression and logistic regression, explaining their purposes and applications. Linear regression predicts continuous outcomes using a straight line, while logistic regression is used for binary classification by modeling probabilities with a sigmoid function. It also includes code examples in Python's scikit-learn for both methods and highlights their key differences.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 5

Introduction to Linear Regression and Logistic Regression

Machine learning models help us find patterns in data. Two popular models are linear regression and
logistic regression. Although they sound similar, they are used for different types of problems. In this
document, we'll learn what each model does, see some simple code examples, and understand when
to use them.

1. Linear Regression
What Is Linear Regression?

Linear regression is a statistical method used to predict a continuous outcome (or target variable)
based on one or more predictor variables (features). The relationship between the variables is
modeled as a straight line:

y=β0+β1x1+β2x2+⋯+βnxn+ϵy = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_n + \


epsilony=β0+β1x1+β2x2+⋯+βnxn+ϵ

 yyy: The predicted value.

 xix_ixi: The input features.

 βi\beta_iβi: The coefficients that the model learns.

 ϵ\epsilonϵ: The error term (difference between predicted and actual value).

When to Use Linear Regression?

 Predicting continuous values: For example, predicting house prices, temperature, or stock
prices.

 Simple relationships: When the relationship between the input variables and the output is
approximately linear.

Linear Regression Code Example

Below is a simple example using Python's scikit-learn library to perform linear regression.

python

CopyEdit
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

# Generate some synthetic data

np.random.seed(42)

X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1) # y = 4 + 3x + noise

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the linear regression model

model = LinearRegression()

model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)

# Print model coefficients


print("Intercept:", model.intercept_)
print("Coefficient:", model.coef_)

# Visualize the results

plt.scatter(X_test, y_test, color='blue', label='Actual')

plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')

plt.xlabel("X")

plt.ylabel("y")

plt.title("Linear Regression")

plt.legend()

plt.show()

Explanation:

 Data Generation: We create synthetic data where y is roughly equal to 4 + 3X with some
added noise.

 Train-Test Split: We divide the data into training and testing sets.

 Model Training: We fit a LinearRegression model using the training data.

 Prediction and Visualization: We predict the output on the test set and plot the actual data
points and the regression line.
2. Logistic Regression
What Is Logistic Regression?

Logistic regression is used for classification problems, where the goal is to predict a categorical
outcome (typically binary). Instead of fitting a line, logistic regression fits an S-shaped curve (sigmoid
function) to model the probability that an instance belongs to a particular class.

The sigmoid function is defined as:

σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1

where zzz is a linear combination of the input features:

z=β0+β1x1+β2x2+⋯+βnxnz = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_nx_nz=β0+β1x1


+β2x2+⋯+βnxn

The output σ(z)\sigma(z)σ(z) is the probability that the input belongs to class 1.

When to Use Logistic Regression?

 Binary classification: For example, spam detection (spam vs. not spam), medical diagnosis
(disease vs. no disease).

 Probability estimation: When you want to estimate the probability of belonging to a


particular class.

Logistic Regression Code Example

Here’s a simple example using Python's scikit-learn library to perform logistic regression.

python

CopyEdit
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate synthetic binary classification data

np.random.seed(42)

X = np.random.randn(100, 1)
y = (X[:, 0] > 0).astype(int) # Class 1 if X > 0, else Class 0

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Create and train the logistic regression model

model = LogisticRegression()

model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Model accuracy:", accuracy)

# Plotting decision boundary

plt.scatter(X_test, y_test, color='blue', label='Actual')


# Create a range of values for plotting decision boundary

X_range = np.linspace(X.min(), X.max(), 300).reshape(-1, 1)

y_prob = model.predict_proba(X_range)[:, 1]

plt.plot(X_range, y_prob, color='red', label='Predicted Probability')

plt.xlabel("X")

plt.ylabel("Probability of Class 1")

plt.title("Logistic Regression")

plt.legend()

plt.show()

Explanation:

 Data Generation: We create synthetic data where the label is 1 if X > 0 and 0 otherwise.

 Train-Test Split: The data is split into training and testing sets.

 Model Training: We fit a LogisticRegression model on the training data.

 Prediction and Evaluation: We predict the classes for the test data and compute the
accuracy.

 Visualization: We plot the test points and the predicted probability curve showing the
decision boundary.

3. Key Differences and When to Use Each


 Type of Outcome:

o Linear Regression: Used for predicting continuous numerical outcomes.

o Logistic Regression: Used for predicting categorical outcomes (typically binary).

 Model Output:

o Linear Regression: Produces a continuous value (which can be interpreted directly as


the prediction).

o Logistic Regression: Produces a probability between 0 and 1 that can be converted


into a class label.

 Use Cases:

o Linear Regression: Price prediction, temperature forecasting, trend analysis.

o Logistic Regression: Email spam classification, disease diagnosis, customer churn


prediction.

4. Summary
In this document, we covered:

 Linear Regression: How to predict continuous outcomes using a straight line.

 Logistic Regression: How to classify data using the sigmoid function to predict probabilities.

 Code Examples: Step-by-step Python code using scikit-learn for both methods.

 When to Use Them: Key differences in their applications and outputs.

Both methods are fundamental tools in the data science and machine learning toolkit. Experimenting
with them on different datasets will help you understand their strengths and limitations.

Happy learning and coding!

You might also like