0% found this document useful (0 votes)
7 views3 pages

Lab 1

sdadg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
7 views3 pages

Lab 1

sdadg
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 3

To load a dataset from a CSV file using Pandas, you'll need to ensure that the file exists

in the specified directory. Here's a complete example that demonstrates how to load the
dataset, perform some basic operations, and visualize the data using Matplotlib.

Let's assume that `Salary.csv` contains columns `YearsExperience` and `Salary`.


Step-by-Step Example

#### 1. Import Libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#### 2. Load the Dataset

Make sure `Salary.csv` is in the same directory as your script, or provide the full path to
the file.

# Load the dataset


dataset = pd.read_csv('Salary.csv')

# Display the first few rows of the dataset


print(dataset.head())
or
dataset.head() ( also tail , info , shape , size , describe)

#### 3. Explore the Dataset

# Display basic information about the dataset


print(dataset.info())

# Display summary statistics


print(dataset.describe())

#### 4. Visualize the Data

Create a scatter plot to visualize the relationship between `YearsExperience` and


`Salary`.

# Scatter plot of YearsExperience vs Salary

plt.scatter(dataset['YearsExperience'], dataset['Salary'], color='blue')


# Adding title and labels
plt.title('Years of Experience vs Salary')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')

# Display the plot


plt.show()

#### 5. Perform Regression Analysis

Let's perform a simple linear regression to predict Salary based on Years of Experience.

from sklearn.model_selection import train_test_split


// sklearn.model_selection is used to split your dataset into training and testing sets//

from sklearn.linear_model import LinearRegression


// LinearRegression to perform a linear regression analysis on a dataset, split the data into
training and testing sets, train the model, make predictions, and evaluate the model.
from sklearn.metrics import mean_squared_error, r2_score

//The mean_squared_error and r2_score functions from sklearn.metrics are used to


evaluate the performance of a regression model.

 Mean Squared Error (MSE): Measures the average squared difference between
the actual and predicted values. Lower values are better.
 R-squared (R²) score: Represents the proportion of variance in the dependent
variable that is predictable from the independent variable(s). Higher values
(closer to 1) are better.

# Define the features (X) and target (y)

X = dataset[['YearsExperience']]
y = dataset['Salary']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

//  X: The feature(s) of the dataset. In this case, it is YearsExperience.


 y: The target variable. In this case, it is Salary.
 test_size=0.2: 20% of the data will be used as the test set.
 random_state=42: Ensures reproducibility of the split. Using the same random state
will always produce the same split.
# Create a Linear Regression model to Train a Linear Regression Mode
model = LinearRegression()

# Train the model


model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test)

# Evaluate the print('Mean Squared Error:', mse)


print('R-squared:', r2)
model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Plot the regression line


plt.scatter(X, y, color='blue')
plt.plot(X, model.predict(X), color='red', linewidth=2)

# Adding title and labels


plt.title('Years of Experience vs Salary (with Regression Line)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')

# Display the plot


plt.show()
```

You might also like