0% found this document useful (0 votes)
17 views3 pages

BDA Lab 9 Manual

This document outlines an open-ended lab for a big data analytics class involving selecting a real-world dataset, preparing and analyzing the data using techniques like association rule mining, and interpreting the results to gain insights.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
17 views3 pages

BDA Lab 9 Manual

This document outlines an open-ended lab for a big data analytics class involving selecting a real-world dataset, preparing and analyzing the data using techniques like association rule mining, and interpreting the results to gain insights.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 3

Department of Computing

CS-404 Big Data Analytics


Class: BESE-11 & BSCS-11
Spring 2024
Lab Manual 09: Open Ended Lab
Date: 16-04-2024
Time: 10:00-12:50 & 14:00-16:50
Instructor: Dr. Syed Imran Ali & Dr. Muhammad
Daud Abdullah Asif
Lab Engineer: Engr. Masabah Bint E Islam
Lab : 09 : Open Ended Lab
Aim: This lab is designed to deepen students' understanding of big data analysis techniques by
engaging them in a comprehensive open ended lab that applies these methods to real-world
datasets. Using Python as the primary tool for coding and computation, the focus of this lab will
extend to the practical implementation and integration of different data analysis techniques,
including association rule mining and network analysis. Students will explore how to select
appropriate datasets, apply multiple analysis techniques, and interpret their results to gain insights
into complex data structures. This hands-on experience aims to equip participants with the skills
necessary for advanced data science tasks such as predictive modeling, customer segmentation, and
market basket analysis.

Objective: The objectives of this lab are to provide students with a deep understanding and hands-
on experience in the following areas:

 Dataset Selection and Preparation: Identify and prepare datasets that are relevant to your
chosen use case.

 Implementation of Data Analysis Techniques: Apply at least two different techniques, such
as association rule mining and other methods studied in class, to analyze the data
comprehensively.

 Interpretation and Application: Use the results of your data analysis to address specific
research questions or business problems. Interpret the data to provide actionable insights.

Tools and Techniques: Python, Hadoop, RapidMiner etc.

Deliverables: Submit a single file on LMS before the due date as communicated by Lab Engineer.

Note: Please ensure your own work, add screenshots from each step/ activity properly and submit in
a Word / PDF Report Lab Report.

Tasks:
Task 1: Business Context and Background:
 Students are required to select a dataset of their choice from platforms such as Kaggle.com and
Data.world. The dataset should contain more than 10 columns with a mix of quantitative and
qualitative variables and should consist of at least 1000 rows of data.
 Provide an overview of the business context related to the chosen dataset and discuss relevance of
the dataset to real-world applications. Describe any industry or domain-specific insights that can be
derived from the dataset.
Task 2 - Data Preparation:
 Detail the steps taken for data cleaning, including handling missing values, outliers, and duplicates.
 Discuss any data transformation techniques applied, such as normalization, standardization, or
encoding categorical variables.
Task 3- Data Descriptives and Correlation Analysis:
 Create a full analysis for the chosen datasets with visuals and basic statistics analysis.
 Execute correlation analysis using correlation matrices to identify relationships between variables.
Create scatter plots to visualize the relationships between selected variables
 Report on the relationships between variables as they relate to the respective research questions.
Task 4 - Research Questions:
 Propose at least three research questions (RQ) that aim to test the relationship between variables in
the dataset.

Task 5- Applying Big Data Analysis (Technique 1):


 Select variables that fit the big data technique requirements, focusing on appropriate data types like
categorical or discretised numerical data.
 Enhance data quality for analysis by removing outliers and performing necessary cleaning.
 Prepare the data through preprocessing steps like binning continuous variables.
 Convert the data into an analysis-ready format, such as creating transactional datasets.
 Apply a relevant algorithm to extract patterns or itemset, set meaningful thresholds, and use
visualization tools like network graphs to effectively represent and highlight key findings.

Task 6- Interpretation
of Rules:
 Interpret the results, discussing the significance of coefficients, goodness-of-fit measures, and any
other relevant metrics.
 Critically assess whether the applied technique provided insights that contribute to solving the use
case or answering the research questions.
 Discuss any limitations of the technique in the context of the specific dataset and use case.

You might also like