0% found this document useful (0 votes)
2 views4 pages

index

The document outlines a series of laboratory exercises for a Data Engineering & Analytics program, focusing on data analysis using Python libraries such as Pandas and NumPy. It includes tasks related to data manipulation, statistical evaluation, data representation, and time-series analysis. Additionally, it provides links to datasets for practical applications and emphasizes data munging, aggregation, and visualization techniques.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
2 views4 pages

index

The document outlines a series of laboratory exercises for a Data Engineering & Analytics program, focusing on data analysis using Python libraries such as Pandas and NumPy. It includes tasks related to data manipulation, statistical evaluation, data representation, and time-series analysis. Additionally, it provides links to datasets for practical applications and emphasizes data munging, aggregation, and visualization techniques.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 4

Data Engineering & Analytics (ELGE-1B)

GE-1 (Electronics)
PROGRAM LIST

LAB 1-2: Basic Data Analysis using PANDAS python library

Write programs to perform following functions on the given Student datasets using Pandas library:

StudentDataset1

StudentDataset2

1. Creating Data frame from the given dataset by using DataFrame( ) function in pandas
2. Reading data (certain number of rows and columns) from a Data frame using Locate Row,
loc[ ] and iloc[ ] function
3. Adding name to each row using ‘index’ and to access data of a row using row index name
4. To access data of a student by specifying ‘Student Name
5. Reading student data from .xls/.xlsx file, .csv file and .json file
6. Determining the shape (rows, columns) of the dataset using shape function
7. Determining the size (no. of elements) in the given dataset using size function
8. Displaying first few rows/ last few rows of a dataframe using head/tail function
9. Displaying information about the type of data in a dataframe using info function
10. Listing names of all the columns in a dataset
11. Inserting a new column in an existing dataframe using/without using insert function
12. Inserting a new row in an existing dataframe using loc function
13. Concatenation of two dataframes using concat function
14. To evaluate percentage and display it in a separate column
15. Evaluation of the following statistical parameters using describe function: Count, Mean, Min,
Max, Standard Deviation, 25th Percentile, 50th Percentile, 75th Percentile

LAB 3 - 4: Data Representation using Vectors and Matrices using NUMPY Library

1. Write a program to perform the following functions

 Create matrices A and B as shown in the attachment using array() function in numpy
 Write program to find add, subtract and multiply A and B
 Write a program to find the determinant of A and B
2. Determine the Rank and Nullity of matrices 1 to 5

3. Consider the following two vectors

u = (0.5, 0.4, 0.4, 0.5, 0.1, 0.4, 0.1) and v = (-1,-2, 1,-2, 3, 1,-5)
i. Check if u or v is a unit vector.
ii. Calculate the dot product, <u, v>
iii. Are u and v orthogonal?

4. Consider the following three vectors


v = (1, 2, 5, 2,-3, 1, 2, 6, 2)
u = (-4, 3, -2, 2, 1, -3, 4, 1, -2)
w = (3, 3, 3, -1, 6,-1, 2,-5,-7)

i. Evaluate dot product <v,w>


ii. Are any pair of vectors orthogonal, and if so which ones?

5. Consider the following three matrices A,B and C

Evaluate the following


i. ATB
ii. C+B
iii. Which matrices are full rank?
iv. B-1
*Full Rank matrices are those for which Rank = No. of columns=No. of Rows

LAB 5: Data Representation using Vectors and Matrices using NUMPY Library

1. Create the dataframe shown below and perform the following functions using PANDAS library

i. Display the column names and the number of records.


ii. Display the first 4 records of the dataset.
iii. For each numeric attribute, evaluate various statistical parameters using describe()
function
iv. Check for the presence of missing values in the dataset and replace them with some valid
numeric value
v. Find and remove duplicate records (if any) in the dataset.

2. Download Pima Indians Diabetes Dataset using the link given below:

https://www.kaggle.com/datasets/kumargh/pimaindiansdiabetescsv

Perform the following operations:


i. Display the column names and the number of records.
ii. Display the first 10 records of the dataset.
iii. For each numeric attribute, evaluate various statistical parameters using describe()
function
iv. Check for the presence of missing values in the dataset and replace them with some valid
numeric value
v. Find and remove duplicate records (if any) in the dataset.
vi. Show scatter plot depicting relationship between two numeric columns of your choice.

LAB 6: Data Munging , Data Aggregation and Grouping Operations


using PANDAS, MATPLOTLIB and SEABORN library)

1. Perform the following operations on the given dataset CaloriesDataSet.csv

 Loading the Dataset


 Implement descriptive and summary statistics ( to calculate Count, Mean, Max and Min,
Percentile, Variance and Standard Deviation)
 Perform the following DATA MUNGING and DATA CLEANING operations (using
PANDAS):
o Check for the presence of missing values in the dataset and replace them with some
valid numeric value
o Find and remove duplicate records (if any) in the dataset.
o Determine the correlation matrix for different columns (attributes) in a given dataset
 Perform the following DATA VISUALISATION operations (using Matplotlib and seaborn
library)
o Plot histogram, bar plot, distplot for various features attributes of the dataset
o Plot Heatmap for the correlation between different attributes in the dataset
 Perform the following DATA AGGREGATION and Grouping operations:
o using agg(), aggregate function to calculate sum, min and max of each column
o group the dataset as per 'Duration‘ column
 display the number (count) of values for each ‘Duration’
 display the sum of all the values for each ‘Duration’
 perform various Data Aggregation functions
 perform various Data Aggregation functions for a particular column
(attribute)

2. Download Pima Indians Diabetes Dataset using the link given below:
https://www.kaggle.com/datasets/kumargh/pimaindiansdiabetescsv
Perform similar Data Munging and Data Aggregation operations on the Pima Indians Diabetes
Dataset as performed on CaloriesDataSet.csv

3. Download Boston House Prices Dataset using the link given below:
https://www.kaggle.com/datasets/vikrishnan/boston-house-prices
Perform similar Data Munging and Data Aggregation operations on the Boston House Prices
Dataset as performed on CaloriesDataSet.csv in Q.1

LAB 7: TIME-SERIES DATA ANALYSIS:


Analysis of Time-dependent data to predict Future Trends from Past Values
STATS MODEL- PYTHON LIBRARY

1. Perform the following operations on the given Time-series dataset AirPassengers.csv

 Reading Time-series data into Pandas dataframe


 PLOTTING TIME-SERIES DATA using plot() function
 ETS Decomposition-MULTIPLICATIVE MODEL
o Extracting 'TREND' component
o Extracting 'SEASONAL' component
o Extracting 'RESIDUAL' component
 ETS Decomposition-ADDITIVE MODEL
o Extracting 'TREND' component
o Extracting 'SEASONAL' component
o Extracting 'RESIDUAL' component

2. Download SuperStore-SalesDataSet using the link given below:


https://www.kaggle.com/datasets/rohitsahoo/sales-forecasting
Perform similar operations on the dataset as performed on AirPassengers.csv dataset

You might also like