Lab Manual
Lab Manual
Learning Experiments
Experiment 1: Installing and Exploring Libraries
Problem Statement
To explore the features of NumPy, SciPy, Jupyter, Statsmodels, and Pandas packages and read
data from a text file, Excel file, and the web.
Aim
To familiarize with Python libraries essential for data analysis and scientific computation.
Algorithm
1. Install required libraries using pip install numpy scipy pandas statsmodels
jupyter.
2. Explore features of each library with basic and advanced examples.
3. Read data from a text file using Pandas.
4. Read data from an Excel file using Pandas.
5. Fetch and analyze data from a web source using Pandas and Requests.
# NumPy Examples
print("NumPy Examples:")
# Create a 1D array and compute basic statistics
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)
print("Mean of array:", np.mean(arr))
print("Standard Deviation:", np.std(arr))
# SciPy Examples
print("\nSciPy Examples:")
# Perform a t-test
print("T-test Example:")
t_stat, p_value = ttest_1samp(arr, 3)
print("T-statistic:", t_stat, "P-value:", p_value)
# Optimization Example
print("\nOptimization Example:")
result = minimize(lambda x: x**2 + 5, 0)
print("Optimization Result:", result)
# Pandas Examples
print("\nPandas Examples:")
# Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000]
}
df = pd.DataFrame(data)
print("DataFrame:")
print(df)
print("Summary Statistics:")
print(df.describe())
# Statsmodels Example
print("\nStatsmodels Example:")
X = sm.add_constant([4, 5, 6])
Y = [1, 2, 3]
model = sm.OLS(Y, X).fit()
print(model.summary())
Enhanced Output
Extensive examples of NumPy functionalities including matrix operations.
Statistical analysis using SciPy (e.g., t-tests, optimization).
Comprehensive use of Pandas for data manipulation and reading files.
Regression analysis and model summaries with Statsmodels.
To download a dataset from Kaggle and explore various commands for descriptive analytics.
Aim
Algorithm
# Dataset Overview
print("Dataset Info:")
print(df.info())
# Summary Statistics
print("\nBasic Statistics:")
print(df.describe())
# Correlation Analysis
print("\nCorrelation Matrix:")
correlation_matrix = df.corr()
print(correlation_matrix)
# Visualization
print("\nVisualizations:")
# Heatmap
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title("Heatmap of Correlations")
plt.show()
# Histogram
sns.histplot(data=df, x='Age', kde=True, bins=30, color='purple')
plt.title("Age Distribution")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()
Enhanced Output
Aim
To understand and conduct t-hypothesis testing using Excel’s built-in data analysis tools.
Algorithm
Enhanced Explanation
Sample A Sample B
12 14
15 18
14 17
13 19
Key Outputs:
o t-statistic and p-value from Excel output.
o Decision based on significance level.
To download the IRIS dataset and generate a box plot, scatter plot, and histogram.
Aim
To visualize the distribution and relationships between attributes of the IRIS dataset.
Algorithm
# Load dataset
df = pd.read_csv('iris.data', header=None, names=['sepal_length',
'sepal_width', 'petal_length', 'petal_width', 'class'])
# Box Plot
sns.boxplot(data=df[['sepal_length', 'sepal_width', 'petal_length',
'petal_width']])
plt.title("Box Plot of Iris Features")
plt.show()
# Scatter Plot
sns.scatterplot(data=df, x='sepal_length', y='sepal_width', hue='class',
palette='deep')
plt.title("Scatter Plot of Sepal Length vs Width by Class")
plt.show()
# Histogram
sns.histplot(data=df, x='sepal_length', hue='class', multiple='stack',
bins=20, palette='muted')
plt.title("Histogram of Sepal Length by Class")
plt.xlabel("Sepal Length")
plt.ylabel("Frequency")
plt.show()
Enhanced Output
To train a decision tree classification model and generate a decision tree for the Car Evaluation
dataset.
Aim
To build and visualize a decision tree for the classification of car evaluations.
Algorithm
# Load dataset
df = pd.read_csv('car.data', header=None)
df.columns = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety',
'class']
# Train-test split
X = df.drop(columns=['class'])
y = df['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Train model
clf = DecisionTreeClassifier(random_state=42, max_depth=5)
clf.fit(X_train, y_train)
# Evaluate model
accuracy = clf.score(X_test, y_test)
print(f"Model Accuracy: {accuracy:.2f}")
# Classification report
y_pred = clf.predict(X_test)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
Enhanced Output