BDA Notes

Big data is characterized by high volume, velocity, and variety. The big data architecture consists of 5 layers - data sources, ingestion/preprocessing, storage, processing, and consumption. Scalability enables increasing or decreasing storage and processing capacity. Data preprocessing is needed to clean, transform and integrate data before analysis. Hadoop is an open-source framework for distributed storage and processing of big data. Its core components are HDFS for storage and MapReduce for processing.

Uploaded by

Bhavana N S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

376 views13 pages

BDA Notes

Uploaded by

Bhavana N S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Big Data Analytics

Module 1
1. Define Big data. Explain the classification of big data (page no 55 & 69 to 70)

Big Data is high-volume, high-velocity and high-variety information asset that requires new forms
of processing for enhanced decision making, insight discovery and process optimization.

2. Explain the functions of each layer in big data architecture design with a diagram. (92 to 100)

Big Data architecture: "Big Data architecture is the logical or physical layout/structure of how Big
Data will be stored, accessed and managed within an IT environment.

Data processing architecture consists of five layers:

(i) identification of data sources,
(ii) acquisition, ingestion, extraction, pre-processing, transformation of data,
(iii) Data storage at files, servers, cluster or cloud,
(iv) data-processing, and
(v) data consumption by the number of programs and tools such as business intelligence, data
mining, discovering patterns/clusters, artificial intelligence (AI), machine learning (ML),
text analytics, descriptive and predictive analytics, and data visualization.
Layer 1:
 L1 considers the following aspects in a design:
◦ Amount of data needed at ingestion layer 2 (L2)
◦ Push from L1 or pull by L2 as per the mechanism for the usages
◦ Source datatypes: Database, files, web, or service
 Source formats, i.e., semi-structured, unstructured, or structured.

Layer 2:
 Ingestion and ETL processes either in real time, which means store and use the data as generated,
or in batches.
 Batch processing is using discrete datasets at scheduled or periodic intervals of time.

Layer 3:
 Data storage type (historical or incremental), format, compression, incoming data
 frequency, querying patterns and consumption requirements for L4 or L5
 Data storage using Hadoop distributed file system or NoSQL data stores—HBase, Cassandra,
MongoDB.

Layer 4:
 Data processing software such as MapReduce, Hive, Pig, Spark, Spark Mahout, Spark Streaming
 Processing in scheduled batches or real time or hybrid
 Processing as per synchronous or asynchronous processing requirements at L5.

Layer 5:
 Data integration
 Datasets usages for reporting and visualization
 Analytics (real time, near real time, scheduled batches), BPs, BIs, knowledge discovery
 Export of datasets to cloud, web or other systems

3. Define scalability and its types with an example (72 to 79) OR write a short note on analytics
scalability to big data and massively parallel processing platforms.

Scalability
 Scalability enables increase or decrease in the capacity of data storage, processing and analytics.
 Scalability is the capability of a system to handle the workload as per the magnitude of the work.
 System capability needs increment with the increased workloads.
4. Define data preprocessing. Explain in brief the needs of preprocessing (116 to 120)
5. Discuss the evolution of big data and characteristics of big data (28 to 30 & 56 to 57)
Module 2
1. With a neat diagram, explain Hadoop main components and ecosystem components (28 to 29 & 35
to 38)
OR
What are the core components of Hadoop? explain in brief its each of its components.

Hadoop Ecosystem Components

2. Brief out the features of Hadoop HDFS? Also explain the functions of name node and data node
(44 & 49 to 52)
Hadoop HDFS features are as follows:
(i) Create, append, delete, rename, and a ribute modifica on func ons.
(ii) Content of individual file cannot be modified or replaced but appended with new data at
the end of the file.
(iii) Write once but read many mes during usages and processing.
(iv) Average file size can be more than 500 MB.
3. Explain Apache Sqoop import and export method with neat diagram (81 + SVIT_notes 29 to 30)
4. what is Apache flume? describe the feature components and working of Apache flume (SVIT_notes
36 to 37)
5. MapReduce workflow for word count program + steps on request and two types of process (63 to
67 & 68,69)

SImplified Solutions of BAD601 Model Question Paper
No ratings yet
SImplified Solutions of BAD601 Model Question Paper
32 pages
Imp For Exam
No ratings yet
Imp For Exam
2 pages
Bda Unit-Iii-R20
No ratings yet
Bda Unit-Iii-R20
44 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
55 pages
Hadoop Basics for Data Science Students
No ratings yet
Hadoop Basics for Data Science Students
22 pages
Modeling Web Applications
No ratings yet
Modeling Web Applications
14 pages
IDS Unit3
100% (1)
IDS Unit3
16 pages
Oltp Olap Rtap
No ratings yet
Oltp Olap Rtap
53 pages
Hive Tutorial for Data Analysts
No ratings yet
Hive Tutorial for Data Analysts
11 pages
Unit 3 (Big Data Analytics)
No ratings yet
Unit 3 (Big Data Analytics)
18 pages
Big Data
No ratings yet
Big Data
3 pages
Big Data Analytics Course
No ratings yet
Big Data Analytics Course
3 pages
Big Data Tech Guide for Organizations
No ratings yet
Big Data Tech Guide for Organizations
8 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
Question Bank For Object Oriented Analysis Design Regulation 2013
No ratings yet
Question Bank For Object Oriented Analysis Design Regulation 2013
6 pages
Big Data & Hadoop Essentials
No ratings yet
Big Data & Hadoop Essentials
4 pages
Midterm Solution
0% (1)
Midterm Solution
7 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
COMP9313: Big Data Management Overview
No ratings yet
COMP9313: Big Data Management Overview
79 pages
Understanding MapReduce Job Execution
No ratings yet
Understanding MapReduce Job Execution
24 pages
BDA - Lab Manual
No ratings yet
BDA - Lab Manual
78 pages
Web Developer Assessment Guidelines
No ratings yet
Web Developer Assessment Guidelines
20 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
Web Engineering Models
100% (1)
Web Engineering Models
20 pages
Big Data Group Assingment
No ratings yet
Big Data Group Assingment
41 pages
OOP Course Guide for CS Students
No ratings yet
OOP Course Guide for CS Students
2 pages
Bda - Unit 1
No ratings yet
Bda - Unit 1
33 pages
HBase Interview Questions
No ratings yet
HBase Interview Questions
12 pages
Module 3
No ratings yet
Module 3
36 pages
Movie Recommendation System: Requirements and Specification Document
No ratings yet
Movie Recommendation System: Requirements and Specification Document
20 pages
Quastor System Design Book - NeetCode Newsletter
No ratings yet
Quastor System Design Book - NeetCode Newsletter
523 pages
Manual Hadoop HIve Installation
No ratings yet
Manual Hadoop HIve Installation
4 pages
BI - Lecture 3 - Kimball Concepts
No ratings yet
BI - Lecture 3 - Kimball Concepts
44 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
Hadoop IO Explanation
No ratings yet
Hadoop IO Explanation
3 pages
Unit 3
No ratings yet
Unit 3
14 pages
Sapthagiri College of Engineering: Department of Information Science and Engineering Big Data Analytics Question Bank
No ratings yet
Sapthagiri College of Engineering: Department of Information Science and Engineering Big Data Analytics Question Bank
3 pages
Bda Pyq
No ratings yet
Bda Pyq
4 pages
Hadoop MapReduce Programming Guide
No ratings yet
Hadoop MapReduce Programming Guide
33 pages
Mastering Cloud Data Services
No ratings yet
Mastering Cloud Data Services
18 pages
Bad601 Lab
No ratings yet
Bad601 Lab
32 pages
Hive Quiz and Questions
No ratings yet
Hive Quiz and Questions
6 pages
Data Stream Processing Insights
No ratings yet
Data Stream Processing Insights
67 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
Big Data & Data Science Diploma
No ratings yet
Big Data & Data Science Diploma
4 pages
CCchap 2
No ratings yet
CCchap 2
7 pages
Big Data Not Right Data Yes
No ratings yet
Big Data Not Right Data Yes
8 pages
MapReduce Algorithms For Big Data Analysis
No ratings yet
MapReduce Algorithms For Big Data Analysis
2 pages
Big Data in Mobile Networks
No ratings yet
Big Data in Mobile Networks
15 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
2 pages
Hadoop MapReduce Overview & Usage
No ratings yet
Hadoop MapReduce Overview & Usage
57 pages
Big Data Course Overview and Modules
No ratings yet
Big Data Course Overview and Modules
4 pages
Hadoop Final Docment
100% (1)
Hadoop Final Docment
79 pages
Understanding UML Diagrams Explained
No ratings yet
Understanding UML Diagrams Explained
22 pages
Understanding Big Data Analytics Types
100% (1)
Understanding Big Data Analytics Types
31 pages
BDA Final
No ratings yet
BDA Final
23 pages
Introduction To Big Dat1
No ratings yet
Introduction To Big Dat1
6 pages
BAD601 Big Data Model Question Paper Solution Search Creators
No ratings yet
BAD601 Big Data Model Question Paper Solution Search Creators
50 pages
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
No ratings yet
VTU Exam Question Paper With Solution of 18CS72 Big Data and Analytics Feb-2022-Dr. v. Vijayalakshmi
25 pages
Bda MQP 1
No ratings yet
Bda MQP 1
29 pages
Database Basics for Beginners
No ratings yet
Database Basics for Beginners
129 pages
Assignment On Artificial Intelligence PDF
80% (5)
Assignment On Artificial Intelligence PDF
77 pages
Seminar Report On Artificial Intelligence
69% (13)
Seminar Report On Artificial Intelligence
24 pages
Hyper Automation
No ratings yet
Hyper Automation
9 pages
100 Day Coding Challenge
No ratings yet
100 Day Coding Challenge
12 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
23 pages
NABARD IT Exam Study Plan
No ratings yet
NABARD IT Exam Study Plan
5 pages
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
No ratings yet
Automated Image Captioning With Convnets and Recurrent Nets: Andrej Karpathy, Fei-Fei Li
105 pages
Lesson 2 - COMMUNICATION ACCORDING TO MODE, CONTEXT, AND PURPOSE
100% (1)
Lesson 2 - COMMUNICATION ACCORDING TO MODE, CONTEXT, AND PURPOSE
26 pages
Class 10 Ai Sample Paper MS 23-24
50% (2)
Class 10 Ai Sample Paper MS 23-24
8 pages
Ervan PPT K3L
No ratings yet
Ervan PPT K3L
43 pages
Quiz
No ratings yet
Quiz
6 pages
Machine Learning Algorithm
No ratings yet
Machine Learning Algorithm
32 pages
AI & Machine Learning Insights
No ratings yet
AI & Machine Learning Insights
17 pages
Chapter 6 Therapeutic Communication
100% (1)
Chapter 6 Therapeutic Communication
2 pages
Pattern Recognition: Statistical and Neural: Lonnie C. Ludeman
No ratings yet
Pattern Recognition: Statistical and Neural: Lonnie C. Ludeman
47 pages
Unit 5
No ratings yet
Unit 5
25 pages
Control Systems Exam Questions 2003
100% (1)
Control Systems Exam Questions 2003
8 pages
Formulae Sheet: Markov Decision Processes
No ratings yet
Formulae Sheet: Markov Decision Processes
1 page
Introduction To Machine Learning, Third Edition by Alpaydin, Ethem
No ratings yet
Introduction To Machine Learning, Third Edition by Alpaydin, Ethem
2 pages
Intruduction Of-Wps Office
No ratings yet
Intruduction Of-Wps Office
6 pages
Swarm Intelligence Explained
No ratings yet
Swarm Intelligence Explained
28 pages
Chapter 5-Business Communication Across Culture
No ratings yet
Chapter 5-Business Communication Across Culture
36 pages
Electrical Upgrades for SCADA Systems
No ratings yet
Electrical Upgrades for SCADA Systems
19 pages
Hybrid Fuzzy PID for Mobile Robots
No ratings yet
Hybrid Fuzzy PID for Mobile Robots
4 pages
Machine Learning For Marketing in Python
No ratings yet
Machine Learning For Marketing in Python
3 pages
Computer Science & IT Research Hub
No ratings yet
Computer Science & IT Research Hub
2 pages
Machine Learning Bla CK Book
No ratings yet
Machine Learning Bla CK Book
71 pages
D4304-Syllabus-Neural Networks and Fuzzy Systems
0% (1)
D4304-Syllabus-Neural Networks and Fuzzy Systems
1 page
Systems Archetypes
100% (1)
Systems Archetypes
31 pages

BDA Notes

Uploaded by

BDA Notes

Uploaded by

Big Data Analytics

Data processing architecture consists of five layers:

Hadoop Ecosystem Components

You might also like