BDA Lab 9 Manual
BDA Lab 9 Manual
Objective: The objectives of this lab are to provide students with a deep understanding and hands-
on experience in the following areas:
Dataset Selection and Preparation: Identify and prepare datasets that are relevant to your
chosen use case.
Implementation of Data Analysis Techniques: Apply at least two different techniques, such
as association rule mining and other methods studied in class, to analyze the data
comprehensively.
Interpretation and Application: Use the results of your data analysis to address specific
research questions or business problems. Interpret the data to provide actionable insights.
Deliverables: Submit a single file on LMS before the due date as communicated by Lab Engineer.
Note: Please ensure your own work, add screenshots from each step/ activity properly and submit in
a Word / PDF Report Lab Report.
Tasks:
Task 1: Business Context and Background:
Students are required to select a dataset of their choice from platforms such as Kaggle.com and
Data.world. The dataset should contain more than 10 columns with a mix of quantitative and
qualitative variables and should consist of at least 1000 rows of data.
Provide an overview of the business context related to the chosen dataset and discuss relevance of
the dataset to real-world applications. Describe any industry or domain-specific insights that can be
derived from the dataset.
Task 2 - Data Preparation:
Detail the steps taken for data cleaning, including handling missing values, outliers, and duplicates.
Discuss any data transformation techniques applied, such as normalization, standardization, or
encoding categorical variables.
Task 3- Data Descriptives and Correlation Analysis:
Create a full analysis for the chosen datasets with visuals and basic statistics analysis.
Execute correlation analysis using correlation matrices to identify relationships between variables.
Create scatter plots to visualize the relationships between selected variables
Report on the relationships between variables as they relate to the respective research questions.
Task 4 - Research Questions:
Propose at least three research questions (RQ) that aim to test the relationship between variables in
the dataset.
Task 6- Interpretation
of Rules:
Interpret the results, discussing the significance of coefficients, goodness-of-fit measures, and any
other relevant metrics.
Critically assess whether the applied technique provided insights that contribute to solving the use
case or answering the research questions.
Discuss any limitations of the technique in the context of the specific dataset and use case.