Course Title Batch: 2021-2022
Course
21MCP20 Big Data Programming and
Code: Semester: III
Development
Hrs /
4 L 4 T - P - Credits: 4
Week:
COURSE OBJECTIVE
1. The learn basic concepts of BigData, Working with Hadoop and Its Components
2. To implement Scripting with Hive&HBase, Programming using MapReduce for BigData
3. To implement distributed Resource synchronization using ZooKeeper,
4. To analyze how to handle largelog files using Flume
5. To handle workflows using Oozie, understanding Popular Big Data Platforms
COURSE OUTCOMES (CO)
Blooms Level
S. No Course Outcome
Defining the basic concepts of BigData, Working with Hadoop and Its K1
CO1
Components.
Summarizing the knowledge about Scripting with Hive & HBase, K2
CO2
Programming using MapReduce for BigData.
Examining about the Distributed Resource synchronization using K3
CO3
ZooKeeper.
K4
CO4 Analyze about the concept of Dataloading using Sqoop.
Evaluating the knowledge about handling large log files using Flume, K5
CO5
Handling workflows using Oozie, understanding Popular Big Data Platforms
SYLLABUS
21MCP20 Big Data Programming and Development Sem: III
Unit No. Topics Hours
Introduction to Big Data: Applicability of Big Data-Introduction to Big Data
I Technologies- Introduction to Hadoop- Distributed Computing Basics-Evolution
of Distributed Systems.
10
Working with Hadoop and Its Components and Concepts: Analysis of
Hadoop-HDFS and Hadoop Commands-Introduction to MapReduce-How
MapReduce Works- Pig- Hive.
Scripting with Hive&HBase: Hive Data Types and File Formats-Hive Query
Language-HBase Architecture Details-Working with HBase.
II Programming using MapReduce for BigData–1: Programming Concepts in
Mapreduce-HDFS programming in Java- MapReduce programming in Java- 12
Executing a MapReduce program-Debugging & Diagnosing Mapreduce
program.
Programming using MapReduce for BigData–2: JobChaining &Merging -
Input&Output patterns – NextGen MapReduce using YARN&REST.
III Distributed Resource synchronization using ZooKeeper: ZooKeeper in detail 10
Dataloading using Sqoop: Sqoop in detail – Introduction to ETL and CDC –
TelenD: Introduction – Components – ETL Perspective – Installation – Basic 8
IV
Operations
Handling largelogfiles usingFlume: Flume in detail – Kafka: Introduction –
Architecture and workflow – Installation –Basic-operations
V Handling workflows using Oozie: Workflow scheduling using Oozie 12
Understanding Popular BigData Platforms: Cloudera, Hortonworks,
Greenplum, Vertical
Note: Internal –50, External – 50.
Teaching methods: Lecturing, PowerPoint Projection through LCD, Assignment.
MAPPING WITH PROGRAM OUTCOMES
PO
PO 1 PO 2 PO 3 PO 4 PO 5 PO 6 PO 7
CO
CO1 S M M M M M M
CO2 M S S S S S S
CO3 M M S S M S S
CO4 S M M M M M M
CO5 M M M S M M S
S-Strong, M- Medium, L – Low
ASSESSMENT PATTERN (if deviation from common pattern)