0% found this document useful (0 votes)

36 views26 pages

Lect 2 Big Data Lesson01

This lesson introduces the concepts of Big Data and Hadoop. It defines Big Data as large data sets that cannot be processed by traditional software. It describes the three V's of Big Data as volume, velocity, and variety. It also defines the different types of data as unstructured, semi-structured, and structured. Finally, it provides an overview of Hadoop as an open-source framework for distributed storage and processing of large data sets across clusters of computers.

Uploaded by

Paritosh Belekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views26 pages

Lect 2 Big Data Lesson01

Uploaded by

Paritosh Belekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Lesson 1

Objectives
By the end of this lesson, you will be
able to:
 Explain the need for Big Data
 Define the concept of Big Data
 Describe the basics and benefits of
Hadoop

2
Need for Big Data

90% of the data in the world today has been created in last two years alone.
Structured format has some limitations with respect to handling large
quantities of data. Thus, there is a need for perfect mechanism, like Big Data, to
handle these increasing quantities.
Big Data relies on three important aspects of data complexity as explained in
the following image.

3
What is Big Data
Big Data is the term applied to data sets whose size is beyond the ability of
Defining Big Data the commonly used software tools to capture, manage, and process within a
tolerable elapsed time.

● Web logs
● Sensor network
● Social media
● Internet text and documents
● Internet pages
Sources of Big Data ● Search index data
● Atmospheric science, astronomy, biochemical, medical records
● Scientific research
● Military surveillance
● Photography archives

4
Types of Data
Three types of data can be identified:
Unstructured Data
Data which do not have a pre-defined data model
E.g. Text files

Semi-structured Data
Data which do not have a formal data model
E.g. XML files

Structured Data
Data which is represented in a tabular format
E.g. Databases

5
Handling Limitations of Big Data

How to handle system uptime How to combine accumulated

and downtime data from all the systems

● Commodity hardware for data ● Analyzing data across different

storage and analysis machines
● Maintaining a copy of same ● Merging of data

data across clusters

6
Introduction to Hadoop

What is Hadoop? Why Hadoop?

● A free, Java-based ● Runs applications on

programming framework that distributed systems with
supports the processing of thousands of nodes involving
large data sets in a distributed petabytes of data
computing environment ● Distributed file system
● Based on Google File System provides fast data transfers
(GFS) among the nodes

7
History and Milestones of Hadoop

Hadoop originated from Nutch open source project on search engine to work
over distributed network nodes. Yahoo was the first company to make and use
Hadoop as a core part of their system operations. Now Hadoop is a core part in
systems like Facebook, LinkedIn, Twitter, etc.
Hadoop Milestones

8
Organizations Using Hadoop
Name of the
organization Cluster specifications Uses

[Link]: ● To build Amazon's product search indices

Clusters vary from 1 to 100 nodes ● Process millions of sessions daily for analytics
Amazon

More than 100,000 CPUs in approximately

20,000 computers running Hadoop; ● To support research for ad systems and web
Yahoo biggest cluster has 2000 nodes (2*4cpu search
boxes with 4 TB disk each)

Cluster size is 50 machines, Intel Xeon,

dual processors, dual core, each with 16 ● For a variety of functions ranging from data
AOL GB RAM and 800 GB hard disk giving us a generation to running advanced algorithms
total of 37 TB HDFS capacity for doing behavioral analysis and targeting

● To store copies of internal log and dimension

320-machine cluster with 2,560 cores and data sources
Facebook about 1.3 PB raw storage ● As a source for reporting analytics and
machine learning
9
10
Quiz 1

Which type of data is handled by Hadoop?

a. Structured data
b. Semi-structured data
c. Unstructured data
d. Flexible-structure data

11
Quiz 1

Which type of data is handled by Hadoop?

a. Structured data
b. Semi-structured data
c. Unstructured data
d. Flexible-structure data

Answer: c.

Explanation: Hadoop handles unstructured data for processing.

12
Quiz 2

Which of the following is an unstructured data?

a. Collection of text files

b. Collection of XML files
c. Collection of tables in databases
d. Collection of tickets

13
Quiz 2

Which of the following is an unstructured data?

a. Collection of text files

b. Collection of XML files
c. Collection of tables in databases
d. Collection of tickets

Answer: a.

Explanation: Text files are usually unstructured data.

14
Quiz 3

Which of the following is structured data?

a. Collection of text files

b. Collection of tickets
c. Collection of tables in databases
d. Collection of XML files

15
Quiz 3

Which of the following is structured data?

a. Collection of text files

b. Collection of tickets
c. Collection of tables in databases
d. Collection of XML files

Answer: c.

Explanation: Databases are usually structured data.

16
Quiz
4

Which of the following is semi-structured data?

a. Collection of tables in databases

b. Collection of text files
c. Collection of tickets
d. Collection of XML files

17
Quiz 4

Which of the following is semi-structured data?

a. Collection of tables in databases

b. Collection of text files
c. Collection of tickets
d. Collection of XML files

Answer: d.

Explanation: XML files are usually semi-structured data.

18
Quiz 5

Which of the following aspects of Big Data refers to data size?

a. Volume
b. Velocity
c. Variety
d. Value

19
Quiz 5

Which of the following aspects of Big Data refers to data size?

a. Volume
b. Velocity
c. Variety
d. Value

Answer: a.

Explanation: Volume in Big Data refers to the size of the data to be processed.

20
Quiz 6

Which of the following aspects of Big Data refers to the speed of the response of appropriate data request generated
by the user?

a. Variety
b. Value
c. Velocity
d. Volume

21
Quiz 6

Which of the following aspects of Big Data refers to the speed of the response of appropriate data request generated
by the user?

a. Variety
b. Value
c. Velocity
d. Volume

Answer: c.

Explanation: Velocity in Big Data refers to the speed of the response of appropriate data request generated
by the user.

22
Quiz 7

Which of the following aspects of Big Data refers to multiple data sources?

a. Variety
b. Value
c. Volume
d. Velocity

23
Quiz 7

Which of the following aspects of Big Data refers to multiple data sources?

a. Variety
b. Value
c. Volume
d. Velocity

Answer: a.

Explanation: Variety in Big Data refers to multiple data sources.

24
Summary
Let us summarize the topics covered in this lesson:
● Big Data is the term applied to data sets whose size is beyond the ability
of the commonly used software tools to capture, manage, and process
within a tolerable elapsed time.
● Big Data relies on volume, velocity, and variety with respect to
processing.
● Data can be divided into 3 types—Unstructured data, semi-structured
data, and structured data.
● Hadoop is a free, Java-based programming framework that supports the
processing of large data sets in a distributed computing environment.
● Hadoop is a software framework used by organizations like Facebook,
25
26

The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Data Science
No ratings yet
Data Science
87 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Hadoop V.01
No ratings yet
Hadoop V.01
24 pages
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
No ratings yet
Bangladesh University of Professionals: Submitted by Submitted To ID: Section: Batch
6 pages
Intr Oduction of Big Data
No ratings yet
Intr Oduction of Big Data
12 pages
Hadoop Quick Guide
No ratings yet
Hadoop Quick Guide
32 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Hadoop PPT
100% (1)
Hadoop PPT
25 pages
Mod-1 Q1. Characteristics of Big Data (5'v) Volumes
No ratings yet
Mod-1 Q1. Characteristics of Big Data (5'v) Volumes
15 pages
Big Data Insights and Hadoop Overview
No ratings yet
Big Data Insights and Hadoop Overview
29 pages
Unit II Big Data Final PDF
No ratings yet
Unit II Big Data Final PDF
25 pages
Bda Unit 1
No ratings yet
Bda Unit 1
32 pages
Big Data: Introduction To Terms, Concepts and Tools
No ratings yet
Big Data: Introduction To Terms, Concepts and Tools
23 pages
Seminar Big Data Hadoop
No ratings yet
Seminar Big Data Hadoop
28 pages
Unit - I Introduction To Big Data
No ratings yet
Unit - I Introduction To Big Data
38 pages
Hadoop and Big Data Explained
No ratings yet
Hadoop and Big Data Explained
4 pages
Big Data-2
No ratings yet
Big Data-2
40 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
Introduction To Hadoop: Module - II
No ratings yet
Introduction To Hadoop: Module - II
31 pages
Big Data - 1
No ratings yet
Big Data - 1
46 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
Hadoop & Big Data Overview
No ratings yet
Hadoop & Big Data Overview
23 pages
Big Data Analytics - Overview
No ratings yet
Big Data Analytics - Overview
66 pages
BDA Answers-1
No ratings yet
BDA Answers-1
15 pages
Module 02 - Learners Guide
No ratings yet
Module 02 - Learners Guide
82 pages
Module 2
No ratings yet
Module 2
34 pages
An Insight On Big Data Analytics Using Pig Script
No ratings yet
An Insight On Big Data Analytics Using Pig Script
7 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
Big Data Testing Strategies and Challenges
No ratings yet
Big Data Testing Strategies and Challenges
31 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Chapter - 2 Hadoop
100% (1)
Chapter - 2 Hadoop
32 pages
BDA Final
No ratings yet
BDA Final
23 pages
Big Data and Hadoop Fundamentals
No ratings yet
Big Data and Hadoop Fundamentals
7 pages
Understanding Big Data: Types & Tools
No ratings yet
Understanding Big Data: Types & Tools
24 pages
What Is Bigdata
No ratings yet
What Is Bigdata
5 pages
Wa0000.
No ratings yet
Wa0000.
35 pages
Module 2. 16974328568170
No ratings yet
Module 2. 16974328568170
113 pages
Biggdata
No ratings yet
Biggdata
24 pages
Big Data
No ratings yet
Big Data
11 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Elementary Concepts of Big Data and Hadoop
No ratings yet
Elementary Concepts of Big Data and Hadoop
4 pages
Hadoop Introduction & Use Cases
No ratings yet
Hadoop Introduction & Use Cases
22 pages
BDA-Ass01 (082) Compressed
No ratings yet
BDA-Ass01 (082) Compressed
17 pages
Big Data Training
No ratings yet
Big Data Training
244 pages
Bda Unit 1 - Mam
No ratings yet
Bda Unit 1 - Mam
198 pages
Big Data Analytics 1-5
100% (1)
Big Data Analytics 1-5
63 pages
Big Data Analytics - Lecture Slides
No ratings yet
Big Data Analytics - Lecture Slides
72 pages
Big Data
No ratings yet
Big Data
25 pages
Basic OOPs Interview Questions
100% (2)
Basic OOPs Interview Questions
8 pages
R&ew 1982 06
No ratings yet
R&ew 1982 06
100 pages
JAVA UNIT4 - M.Bommy PDF
No ratings yet
JAVA UNIT4 - M.Bommy PDF
80 pages
Practical No 5 Chandu PDF
No ratings yet
Practical No 5 Chandu PDF
2 pages
10 Alternative PC Operating Systems
No ratings yet
10 Alternative PC Operating Systems
11 pages
Verilog For Sequential Circuits: Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak
No ratings yet
Verilog For Sequential Circuits: Design of Digital Circuits 2014 Srdjan Capkun Frank K. Gürkaynak
51 pages
Final Exams - Sample Paper CS301P
No ratings yet
Final Exams - Sample Paper CS301P
9 pages
Falcon BMS Keyboard Layout Guide
No ratings yet
Falcon BMS Keyboard Layout Guide
2 pages
Synopsis: Project: NGO Management System
No ratings yet
Synopsis: Project: NGO Management System
7 pages
Grade 7 Information Technology Guide
No ratings yet
Grade 7 Information Technology Guide
27 pages
Programming Logic and Design, 9th Edition. Chapter 1
0% (3)
Programming Logic and Design, 9th Edition. Chapter 1
19 pages
NV11 Manual Set - Section 1
No ratings yet
NV11 Manual Set - Section 1
17 pages
SAP HANA Guidelines For SLES Operating System Installation
No ratings yet
SAP HANA Guidelines For SLES Operating System Installation
3 pages
The Forrester New Wave™ - Edge Development Platforms, Q4 2021
No ratings yet
The Forrester New Wave™ - Edge Development Platforms, Q4 2021
24 pages
Storage Virtualization
No ratings yet
Storage Virtualization
11 pages
ITT569 - Lab Report 3 Controlling of Fan Speed Using Keyboard
No ratings yet
ITT569 - Lab Report 3 Controlling of Fan Speed Using Keyboard
5 pages
Error Detection
No ratings yet
Error Detection
9 pages
INGENIAS Agent Framework: Development Guide Version 1.0
No ratings yet
INGENIAS Agent Framework: Development Guide Version 1.0
51 pages
Python MySQL Database Integration Guide
No ratings yet
Python MySQL Database Integration Guide
2 pages
Embedded C & RTOS Programming
No ratings yet
Embedded C & RTOS Programming
2 pages
Miracle Advance Android Tool V1.2
100% (1)
Miracle Advance Android Tool V1.2
2 pages
Simple As Possible Computer (SAP-1) : Lecture-3
No ratings yet
Simple As Possible Computer (SAP-1) : Lecture-3
44 pages
KB ProFXv2 Mac Basic Setup
No ratings yet
KB ProFXv2 Mac Basic Setup
5 pages
Excel 2007-2010 Core Level 1 PDF
No ratings yet
Excel 2007-2010 Core Level 1 PDF
103 pages
Log
No ratings yet
Log
4 pages
MBE1323 Information Technology in TVET
No ratings yet
MBE1323 Information Technology in TVET
46 pages
Cruzer Family Brochure English
No ratings yet
Cruzer Family Brochure English
2 pages
FANUC Series 16i 18i 160i 180i-WB Programmable Parameter Input Specifications
No ratings yet
FANUC Series 16i 18i 160i 180i-WB Programmable Parameter Input Specifications
3 pages
Tutorial Quartusii Simulation Verilog
No ratings yet
Tutorial Quartusii Simulation Verilog
26 pages

Lect 2 Big Data Lesson01

Uploaded by

Lect 2 Big Data Lesson01

Uploaded by

Lesson 1

How to handle system uptime How to combine accumulated

● Commodity hardware for data ● Analyzing data across different

data across clusters

What is Hadoop? Why Hadoop?

● A free, Java-based ● Runs applications on

[Link]: ● To build Amazon's product search indices

More than 100,000 CPUs in approximately

Cluster size is 50 machines, Intel Xeon,

● To store copies of internal log and dimension

Which type of data is handled by Hadoop?

Which type of data is handled by Hadoop?

Explanation: Hadoop handles unstructured data for processing.

Which of the following is an unstructured data?

a. Collection of text files

Which of the following is an unstructured data?

a. Collection of text files

Explanation: Text files are usually unstructured data.

Which of the following is structured data?

a. Collection of text files

Which of the following is structured data?

a. Collection of text files

Explanation: Databases are usually structured data.

Which of the following is semi-structured data?

a. Collection of tables in databases

Which of the following is semi-structured data?

a. Collection of tables in databases

Explanation: XML files are usually semi-structured data.

Which of the following aspects of Big Data refers to data size?

Which of the following aspects of Big Data refers to data size?

Explanation: Variety in Big Data refers to multiple data sources.

You might also like