Elective Course on Big Data
Jnaneshwar Bohara
Big Data - Hadoop | BoharaG 1
Big Data - Hadoop | BoharaG 2
Know Your Instructor
Jnaneshwar Bohara
M. Sc. Computer System and Knowledge
Engineering, IOE, TU (Gold Medal)
Certified Scrum Master
Senior Java Programmer
Big Data Analyst
Know Your Instructor
Jnaneshwar Bohara
Researcher on Big Data
and Bioinformatics
https://www.amazon.com/MapReduce-Approach-Longest-
Subsequence-BioSequences/dp/3659680508
How Huge Big Data is!
Big Data and Hadoop | BoharaG 5
What is Big Data?
Big Data and Hadoop | BoharaG 6
What is Big Data?
Collection of data sets so large
and complex that it becomes
difficult to process using on-
hand database management
tools or traditional data
processing applications.
Big Data - Hadoop | BoharaG 7
Big Data and Hadoop | BoharaG 8
Topics
Introduction
MapReduce
Hadoop
Hands on Hadoop
NoSQL
MongoDB
HBase
Spark
Big Data and Hadoop | BoharaG 9
Introduction
What is Big Data
Characteristic of Big Data
Current Trend in Big Data
Real Life Applications of Big Data
Scope and Challenges of Big Data
Orientation of Practical (Tools and
Techniques)
Big Data and Hadoop | BoharaG 10
MapReduce
Functional Programming
What is MapReduce?
How Does MapReduce Work?
Distributed Execution Overview
Data Distribution
Use cases of MapReduce
Anatomy of MapReduce Program
MapReduce programs in Java
Basic MapReduce API Concepts
Writing MapReduce Driver, Mappers, and Reducers in
Java
Big Data and Hadoop | BoharaG 11
Hadoop
What is Hadoop?
History of Hadoop
Motivations for Hadoop
The Hadoop Ecosystem
Hadoop Master/Slave Architecture
Hadoop Daemons
Hadoop Configuration Modes
Uses for Hadoop
Hadoop Cluster Setup
Troubleshooting of installation and running programs
in Hadoop cluster
Big Data and Hadoop | BoharaG 12
Hands on Hadoop
Basic Concept of Java Programming for Hadoop
Developers
Basic Concept of Linux to work in Hadoop
Basic HDFS Commands
Compile and Run Hadoop Programs using Command Line
Use Eclipse IDE for Hadoop Programming
Use Python in Hadoop
Write your own MapReduce Programs to solve real life
problems
Use different Data Types and Formats in Hadoop
Analyze Big Data (CSV and JSON) in your MapReduce
Program
Big Data and Hadoop | BoharaG 13
NoSQL
Types of Data
What is NoSQL?
Why NoSQL?
Types of NoSQL Databases
Big Data and Hadoop | BoharaG 14
MongoDB
Document v’s Relational Databases
Installing MongoDB
MongoDB – Collections
MongoDB – Documents
Object Ids
Queries on MongoDB
Aggregation Pipeline
Nested Documents
Twitter data analysis using MongoDB
Big Data and Hadoop | BoharaG 15
HBase
HBase: Overview
HBase vs. RDBMS
HBase vs. HDFS
HBase Architecture
HBase Data Model
HBase: Keys and Column Families
HBase Regions
Creating a Table
Writing Queries to insert and retrieve data to and from
HBase
Big Data and Hadoop | BoharaG 16
Spark
What is Spark?
Spark Core
Spark SQL
Spark SQL – Handling JSON
Spark SQL – Handling CSV
Spark Streaming
Big Data and Hadoop | BoharaG 17
Thank You !
Big Data - Hadoop | BoharaG 18