0% found this document useful (0 votes)

82 views33 pages

Understanding MapReduce in Hadoop

MapReduce is a programming framework that processes large datasets in parallel by splitting input data into independent chunks for map tasks, which generate intermediate data that is then combined by reduce tasks. The framework includes a Job Tracker for scheduling and monitoring tasks, and Task Trackers for executing them, ensuring efficient data locality and high throughput. Key components of MapReduce include mappers, combiners, partitioners, and reducers, which work together to transform and aggregate data effectively.

Uploaded by

raios1747

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views33 pages

Understanding MapReduce in Hadoop

Uploaded by

raios1747

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

5.

11 PROCESSING DATA WITH HADOOP

• MapReduce Programming is a software framework. MapReduce Programming helps you to process

massive amounts of data in parallel.
• In MapReduce Programming, the input dataset is split into independent chunks. Map tasks process these
independent chunks completely in a parallel manner.
• The output produced by the map tasks serves as intermediate data and is stored on the local disk of that
server.
• The output of the mappers are automatically shuffled and sorted by the framework. MapReduce
Framework sorts the output based on keys.
• This sorted output becomes the input to the reduce tasks. Reduce task provides reduced output by
combining the output of the various mappers.
• Job inputs and outputs are stored in a file system. MapReduce framework also takes care of the other
tasks such as scheduling, monitoring, re-executing failed tasks, etc.
Hadoop Distributed File System and MapReduce Framework run on the same set of nodes.
• This configuration allows effective scheduling of tasks on the nodes where data is present (Data Locality).
 This in turn results in very high throughput.

 There are two daemons associated with MapReduce Programming. A single master Job
Tracker per cluster and one slave TaskTracker per cluster-node.

 The Job Tracker is responsible for scheduling tasks to the Task Trackers, monitoring the task,
and re-executing the task just in case the TaskTracker fails. The TaskTracker executes the task.
Refer Figure 5.21.

 The MapReduce functions and input/output locations are implemented via the MapReduce
applications. These applications use suitable interfaces to construct the job.

 The application and the job parameters together are known as job configuration. Hadoop job
client submits job (jar/executable, etc.) to the Job Tracker. Then it is the responsibility of Job
Tracker to schedule tasks to the slaves. In addition to scheduling, it also monitors the task and
provides status information to the job-client.
MapReduce Framework

Phases: Daemons:
Map: Converts input into Key Value pair. Job Tracker: Master, schedules task.
Reduce: Combines output of mappers and Task Tracker: Slave, executes task.
produces a reduced result set.

Figure: 5.21 MapReduce Programming phases and daemons

5.11.1 MapReduce Daemons

1. JobTracker: It provides connectivity between Hadoop and your application. When you submit code to
cluster, Job Tracker creates the execution plan by deciding which task to assign to which node. It also
monitors all the running tasks. When a task fails, it automatically re-schedules the task to a different
node after a predefined number of retries. Job Tracker is a master daemon responsible for executing
overall MapReduce job. There is a single Job Tracker per Hadoop cluster.
2. Task Tracker: This daemon is responsible for executing individual tasks that is assigned by the Job
Tracker. There is a single Task Tracker per slave and spawns multiple Java Virtual Machines (JVMs) to
handle multiple map or reduce tasks in parallel.
Task Tracker continuously sends heartbeat message to Job Tracker. When the Job Tracker fails to receive
a heartbeat from a Task Tracker, the Job Tracker assumes that the Task Tracker has failed and resubmits
the task to another available node in the cluster.
Once the client submits a job to the Job Tracker, it partitions and assigns diverse MapReduce tasks for
each Task Tracker in the cluster. Figure 5.22 depicts Job Tracker and Task Tracker interaction.
5.11.2 How Does MapReduce Work?
MapReduce divides a data analysis task into two parts - map and reduce. Figure 5.23 depicts how the
MapReduce Programming works. In this example, there are two mappers and one reducer. Each mapper
works on the partial dataset that is stored on that node and the reducer combines the output from the
mappers to produce the reduced result set.
Client

Job Tracker

Task Tracker Task Tracker Task Tracker

Reduce Reduce Reduce

Map Map
Map

Figure 5.22 Job Tracker and TaskTracker interaction

Figure 5.24 describes the working model of MapReduce Programming. The following steps describe how
MapReduce performs its task.

1. First, the input dataset is split into multiple pieces of data (several small subsets).
2. Next, the framework creates a master and several workers processes and executes the worker processes
remotely.
Map Reduce

Map Reduce

Map Reduce
3. Several map tasks work simultaneously and read pieces of data that were assigned to each map task. The map
worker uses the map function to extract only those data that are present on their server and generates key/value pair
for the extracted data.
4. Map worker uses partitioner function to divide the data into regions. Partitioner decides which reducer should get
the output of the specified mapper.
5. When the map workers complete their work, the master instructs the reduce workers to begin their work. The
reduce workers in turn contact the map workers to get the key/value data for their partition. The data thus received is
shuffled and sorted as per keys.
6. Then it calls reduce function for every unique key. This function writes the output to the file. 7. When all the reduce
workers complete their work, the master transfers the control to the user program.

5.11.3 MapReduce Example

The famous example for MapReduce Programming is Word Count. For example, consider you need to count the
occurrences of similar words across 50 files. You can achieve this using MapReduce Programming. Refer Figure 5.25.
Word Count MapReduce Programming using Java
The MapReduce Programming requires three things.
1. Driver Class: This class specifies Job Configuration details.
2. Mapper Class: This class overrides the Map Function based on the problem statement.
3. Reducer Class: This class overrides the Reduce Function based on the problem statement.
Splitter
8.1 Introduction to Map Reduce
Programming
In MapReduce Programming, Jobs (Applications) are split into a set of map tasks and reduce
tasks. Then these tasks are executed in a distributed fashion on Hadoop cluster.
Each task processes small subset of data that has been assigned to it.
This way, Hadoop distributes the load across the cluster. MapReduce job takes a set of files
that is stored in HDFS (Hadoop Distributed File System) as input.
Map task takes care of loading, parsing, transforming, and filtering. The responsibility of
reduce task is grouping and aggregating data that is produced by map tasks to generate final
output.
Each map task is broken into the following phases:
1. RecordReader.
2. Mapper.
3. Combiner.
4. Partitioner.
• The output produced by map task is known as intermediate keys and
values. These intermediate keys and values are sent to reducer. The
reduce tasks are broken into the following phases:
1. Shuffle.
2. Sort.
3. Reducer.
4. Output Format.
Hadoop assigns map tasks to the DataNode where the actual data to
be processed resides.
• This way, Hadoop ensures data locality. Data locality means that data
is not moved over network; only computational code is moved to
process data which saves network bandwidth.
8.2 MAPPER

A mapper maps the input key-value pairs into a set of intermediate key-value pairs. Maps are individual tasks that have the

responsibility of transforming input records into intermediate key-value pairs.

1. Record Reader: Record Reader converts a byte-oriented view of the input (as generated by the Input- Split) into a

record-oriented view and presents it to the Mapper tasks. It presents the tasks with keys and values. Generally the key is

the positional information and value is a chunk of data that constitutes the record.

2. Map: Map function works on the key-value pair produced by RecordReader and generates zero or more intermediate

key-value pairs. The MapReduce decides the key-value pair based on the context.

3. Combiner: It is an optional function but provides high performance in terms of network bandwidth and disk space. It

takes intermediate key-value pair provided by mapper and applies user-specific aggregate function to only that mapper.
4. Partitioner: The partitioner takes the intermediate key-value pairs produced by the mapper. splits
them into shard, and sends the shard to the particular reducer as per the user-specific code.
Usually, the key with same values goes to the same reducer. The partitioned data of each map task is
written to the local disk of that machine and pulled by the respective reducer.

8.3 REDUCER
The primary chore of the Reducer is to reduce a set of intermediate values (the ones that share a common
key) to a smaller set of values. The Reducer has three primary phases: Shuffle and Sort, Reduce, and
Output Format.
1. Shuffle and Sort: This phase takes the output of all the partitioners and downloads them into the local
machine where the reducer is running. Then these individual data pipes are sorted by keys which produce
larger data list. The main purpose of this sort is grouping similar words so that their values can be easily
iterated over by the reduce task.
2. Reduce: The reducer takes the grouped data produced by the shuffle and sort phase, applies reduce function, and
processes one group at a time. The reduce function iterates all the values associated with that key. Reducer function
provides various operations such as aggregation, filtering, and combining data. Once it is done, the output (zero or
more key-value pairs) of reducer is sent to the output format.
3. Output Format: The output format separates key-value pair with tab (default) and writes it out to a file using
record writer.
Figure 8.1 describes the chores of Mapper, Combiner, Partitioner, and Reducer for the word count problem.
The Word Count problem has been discussed under "Combiner" and "Partitioner".
8.4 COMBINER
It is an optimization technique for MapReduce Job. Generally, the reducer class is set to be the
combiner class. The difference between combiner class and reducer class is as follows:
1. Output generated by combiner is intermediate data and it is passed to the reducer.
2. Output of the reducer is passed to the output file on disk.
• A Combiner, also known as a semi-reducer, is an optional class that operates by accepting the
inputs from the Map class and thereafter passing the output key-value pairs to the Reducer class.
• The main function of a Combiner is to summarize the map output records with the same key. The
output (key-value collection) of the combiner will be sent over the network to the actual Reducer
task as input.
• The Combiner class is used in between the Map class and the Reduce class to reduce the volume
of data transfer between Map and Reduce. Usually, the output of the map task is large and the
data transferred to the reduce task is high.
• The following MapReduce task diagram shows the COMBINER PHASE.
How Combiner Works?
• Here is a brief summary on how MapReduce Combiner works −
• A combiner does not have a predefined interface and it must implement
the Reducer interfaces reduce() method.
• A combiner operates on each map output key. It must have the same
output key-value types as the Reducer class.
• A combiner can produce summary information from a large dataset
because it replaces the original Map output.
• Although, Combiner is optional yet it helps segregating data into multiple
groups for Reduce phase, which makes it easier to process.
• MapReduce Combiner Implementation
• The following example provides a theoretical idea about
combiners. Let us assume we have the following input text
file named [Link] for MapReduce.

The important phases of the MapReduce program with Combiner are discussed
below.
Record Reader
This is the first phase of MapReduce where the Record Reader reads every line
from the input text file as text and yields output as key-value pairs.
Input − Line by line text from the input file.
Output − Forms the key-value pairs. The following is the set of expected key-value
pairs.
Map Phase

The Map phase takes input from the

Record Reader, processes it, and produces
the output as another set of key-value
pairs.
Input − The following key-value pair is
the input taken from the Record Reader.

The Map phase reads each key-value pair, divides each word from the value using StringTokenizer,
treats each word as key and the count of that word as value. The following code snippet shows the
Mapper class and the map function.
• Output − The expected output is as follows −

Combiner Phase
The Combiner phase takes each key-value pair from the Map phase, processes it, and produces the output
as key-value collection pairs.
Input − The following key-value pair is the input taken from the Map phase.
• Output − The expected output is as follows −
5. Partitioner
• The partitioning happens after map phase and before reduce phase. The
default partitioner is hash partitioner.
• A partitioner works like a condition in processing an input dataset. The
partition phase takes place after the Map phase and before the Reduce phase.
• A partitioner in MapReduce distributes intermediate key-value pairs
generated by mappers to reducers, ensuring a balanced workload and
efficient processing
• The number of partitioners is equal to the number of reducers. That means a
partitioner will divide the data according to the number of reducers.
Therefore, the data passed from a single partitioner is processed by a single
Reducer.
Partitioner
A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the
data using a user-defined condition, which works like a hash function. Let us take an
example to understand how the partitioner works.
MapReduce Partitioner Implementation
For the sake of convenience, let us assume we have a small table called Employee with the
following data. We will use this sample data as our input dataset to demonstrate how the
partitioner works.
We have to write an application to process the input dataset to find the highest
salaried employee by gender in different age groups (for example, below 20,
between 21 to 30, above 30).
Input Data
The above data is saved as [Link] in the /home/hadoop/hadoopPartitioner directory and
given as input.
Based on the given input, following is the algorithmic explanation of the program.
Map Tasks
The map task accepts the key-value pairs as input while we have the text data in a text file. The input for this map task is as
follows −
Input − The key would be a pattern such as any special key + filename + line number (example: key = @input1)
and the value would be the data in that line (example: value = 1201 \t gopal \t 45 \t Male \t 50000).
Method − The operation of this map task is as follows −
Output − You will get the gender data and the record data value as key-value pairs.
Partitioner Task
The partitioner task accepts the key-value pairs from the map task as its input. Partition
implies dividing the data into segments. According to the given conditional criteria of
partitions, the input key-value paired data can be divided into three parts based on the age
criteria.
Input − The whole data in a collection of key-value pairs.
key = Gender field value in the record.
value = Whole record data value of that gender.
Method − The process of partition logic runs as follows.
Output − The whole data of key-value pairs are segmented into three
collections of key-value pairs. The Reducer works individually on each
collection.
NoSQL (NOT ONLY SQL)
The term NoSQL was first coined by Carlo Strozzi in 1998 to name his lightweight, open-source, relational database
that did not expose the standard SQL interface. Johan Oskarsson, who was then a developer at last. fm, in 2009
reintroduced the term NoSQL at an event called to discuss open-source distributed network. The #NoSQL was
coined by Eric Evans and few other database people at the event found it suitable to describe these non-relational
databases.
Few features of NoSQL databases are as follows:
1. They are open source.
2. They are non-relational.
3. They are distributed.
4. They are schema-less.
5. They are cluster friendly.
6. They are born out of 21" century web applications.
4.1.1 Where is it Used?
NoSQL databases are widely used in big data and other real-time web applications. Refer Figure 4.1. NoSQL.
databases is used to stock log data which can then be pulled for analysis. Likewise it is used to store social
media data and all such data which cannot be stored and analyzed comfortably in RDBMS.

4.1.2 What is it?

NoSQL stands for Not Only SQL. These are non-relational, open source, distributed databases. They are hugely
popular today owing to their ability to scale out or scale horizontally and the adeptness at dealing with a rich
variety of data: structured, semi-structured and unstructured data. Refer Figure 4.2 for additional features of
NoSQL. NoSQL databases.
1. Are non-relational: They do not adhere to relational data model. In fact, they are either key-value pairs or
document-oriented or column-oriented or graph-based databases.
2. Are distributed: They are distributed meaning the data is distributed across several nodes in a cluster constituted of
low-cost commodity hardware.
3. Offer no support for ACID properties (Atomicity, Consistency, Isolation, and Durability): They do not offer support for
ACID properties of transactions. On the contrary, they have adherence to Brewer's CAP (Consistency, Availability, and
Partition tolerance) theorem and are often seen compro- mising on consistency in favor of availability and partition
tolerance.
4. Provide no fixed table schema: NoSQL databases are becoming increasing popular owing to their support for flexibility
to the schema. They do not mandate for the data to strictly adhere to any schema structure at the time of storage.

MapReduce Data Processing Overview
No ratings yet
MapReduce Data Processing Overview
8 pages
Understanding MapReduce Phases and Daemons
No ratings yet
Understanding MapReduce Phases and Daemons
3 pages
Understanding MapReduce Programming
No ratings yet
Understanding MapReduce Programming
32 pages
Understanding Hadoop MapReduce Framework
No ratings yet
Understanding Hadoop MapReduce Framework
9 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
7 pages
Anatomy of Hadoop MapReduce Explained
No ratings yet
Anatomy of Hadoop MapReduce Explained
54 pages
Hadoop MapReduce Anatomy and HiveQL Guide
No ratings yet
Hadoop MapReduce Anatomy and HiveQL Guide
79 pages
Overview of MapReduce Architecture
No ratings yet
Overview of MapReduce Architecture
4 pages
Hadoop Data Processing Overview
No ratings yet
Hadoop Data Processing Overview
14 pages
Understanding MapReduce Framework
No ratings yet
Understanding MapReduce Framework
25 pages
MapReduce Workflow and Examples
No ratings yet
MapReduce Workflow and Examples
14 pages
Control Flow in Hadoop Job Execution
No ratings yet
Control Flow in Hadoop Job Execution
12 pages
Understanding Hadoop MapReduce Framework
No ratings yet
Understanding Hadoop MapReduce Framework
43 pages
Anatomy of MapReduce in Hadoop
No ratings yet
Anatomy of MapReduce in Hadoop
37 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
11 pages
Anatomy of MapReduce in Hadoop
No ratings yet
Anatomy of MapReduce in Hadoop
11 pages
Understanding Map-Reduce in Big Data
No ratings yet
Understanding Map-Reduce in Big Data
19 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
11 pages
Understanding MapReduce Architecture
No ratings yet
Understanding MapReduce Architecture
74 pages
Writing MapReduce Programs in Hadoop
100% (1)
Writing MapReduce Programs in Hadoop
11 pages
Understanding the MapReduce Framework
No ratings yet
Understanding the MapReduce Framework
47 pages
Anatomy of a MapReduce Job Run
100% (1)
Anatomy of a MapReduce Job Run
5 pages
Understanding MapReduce Framework
No ratings yet
Understanding MapReduce Framework
41 pages
Client-NameNode-DataNode Interaction Guide
No ratings yet
Client-NameNode-DataNode Interaction Guide
21 pages
MapReduce Fault Tolerance Explained
No ratings yet
MapReduce Fault Tolerance Explained
27 pages
Understanding MapReduce in Big Data
No ratings yet
Understanding MapReduce in Big Data
120 pages
Hadoop Job Execution Control Flow
No ratings yet
Hadoop Job Execution Control Flow
13 pages
MapReduce Applications in Big Data Analytics
No ratings yet
MapReduce Applications in Big Data Analytics
23 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
MapReduce Programming with Hadoop Guide
No ratings yet
MapReduce Programming with Hadoop Guide
15 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
45 pages
Understanding MapReduce Architecture
No ratings yet
Understanding MapReduce Architecture
29 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
42 pages
Understanding MapReduce for Parallel Processing
No ratings yet
Understanding MapReduce for Parallel Processing
37 pages
Hadoop MapReduce Tutorial Overview
No ratings yet
Hadoop MapReduce Tutorial Overview
20 pages
MapReduce in Big Data Management
No ratings yet
MapReduce in Big Data Management
48 pages
Big Data Analytics-4
No ratings yet
Big Data Analytics-4
26 pages
Understanding MapReduce Architecture
No ratings yet
Understanding MapReduce Architecture
29 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
8 pages
Understanding MapReduce in Hadoop
No ratings yet
Understanding MapReduce in Hadoop
17 pages
Overview of MapReduce in Big Data
No ratings yet
Overview of MapReduce in Big Data
78 pages
Understanding MapReduce Architecture
No ratings yet
Understanding MapReduce Architecture
40 pages
Overview of Hadoop Ecosystem
No ratings yet
Overview of Hadoop Ecosystem
7 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
Understanding MapReduce Basics and Workflow
No ratings yet
Understanding MapReduce Basics and Workflow
43 pages
MapReduce Framework: Roles & Architecture
No ratings yet
MapReduce Framework: Roles & Architecture
8 pages
Understanding MapReduce in Big Data
No ratings yet
Understanding MapReduce in Big Data
60 pages
Hadoop MapReduce and NoSQL Overview
No ratings yet
Hadoop MapReduce and NoSQL Overview
44 pages
Understanding MapReduce Architecture
No ratings yet
Understanding MapReduce Architecture
12 pages
Understanding MapReduce Framework
No ratings yet
Understanding MapReduce Framework
12 pages
Understanding MapReduce Components
No ratings yet
Understanding MapReduce Components
19 pages
Overview of Hadoop Framework and Features
No ratings yet
Overview of Hadoop Framework and Features
15 pages
Understanding MapReduce Framework
No ratings yet
Understanding MapReduce Framework
45 pages
Big Data Processing with MapReduce
No ratings yet
Big Data Processing with MapReduce
39 pages
Understanding MapReduce Components
No ratings yet
Understanding MapReduce Components
10 pages
MapReduce Fundamentals in Hadoop
No ratings yet
MapReduce Fundamentals in Hadoop
4 pages
MapReduce Workflows and Job Execution
No ratings yet
MapReduce Workflows and Job Execution
25 pages
Software Testing Methodologies Lab
No ratings yet
Software Testing Methodologies Lab
15 pages
Data Visualization Techniques Manual
No ratings yet
Data Visualization Techniques Manual
15 pages
Student Personality Types Explained
No ratings yet
Student Personality Types Explained
3 pages
IoT Fundamentals Syllabus for B.Tech ECE
No ratings yet
IoT Fundamentals Syllabus for B.Tech ECE
1 page
Big Data Analytics Lab Manual
0% (1)
Big Data Analytics Lab Manual
70 pages
Getting Started with Tableau Visualizations
No ratings yet
Getting Started with Tableau Visualizations
51 pages
R Programming Lab Manual R22
No ratings yet
R Programming Lab Manual R22
26 pages
Overview of Apache Hive Architecture
No ratings yet
Overview of Apache Hive Architecture
27 pages
Big Data Analytics Programme Overview
No ratings yet
Big Data Analytics Programme Overview
3 pages
Big Data Analytics Syllabus - Chandigarh University
No ratings yet
Big Data Analytics Syllabus - Chandigarh University
2 pages
MapReduce Exam Question Bank
No ratings yet
MapReduce Exam Question Bank
70 pages
HDFS Basics and Hadoop Ecosystem Overview
No ratings yet
HDFS Basics and Hadoop Ecosystem Overview
7 pages
Key Concepts in Big Data and Hadoop
No ratings yet
Key Concepts in Big Data and Hadoop
4 pages
Big Data Computing Course Overview 2025
No ratings yet
Big Data Computing Course Overview 2025
24 pages
Big Data Analytics Question Bank
No ratings yet
Big Data Analytics Question Bank
22 pages
Cloud Security Standards and Challenges
No ratings yet
Cloud Security Standards and Challenges
34 pages
Data Analytics Lifecycle Quiz Questions
No ratings yet
Data Analytics Lifecycle Quiz Questions
50 pages
Big Data Analytics Exam Paper 2024-25
No ratings yet
Big Data Analytics Exam Paper 2024-25
2 pages
Core Components of Apache Hadoop Ecosystem
No ratings yet
Core Components of Apache Hadoop Ecosystem
23 pages
Big Data Hadoop Stack
No ratings yet
Big Data Hadoop Stack
52 pages
Big Data Terms Explained for Beginners
No ratings yet
Big Data Terms Explained for Beginners
1 page
Cloudera MapReduce Quiz Practice
No ratings yet
Cloudera MapReduce Quiz Practice
44 pages
NoSQL Databases and Big Data Overview
No ratings yet
NoSQL Databases and Big Data Overview
34 pages
Big Data
No ratings yet
Big Data
5 pages
Hadoop 2 Ecosystem Overview and Tools
No ratings yet
Hadoop 2 Ecosystem Overview and Tools
23 pages
Apache Sqoop Case Study Overview
No ratings yet
Apache Sqoop Case Study Overview
5 pages
Lesson 1 - Introduction To Big Data and Hadoop
No ratings yet
Lesson 1 - Introduction To Big Data and Hadoop
46 pages
Key-Value Pairs in Big Data Analytics
100% (1)
Key-Value Pairs in Big Data Analytics
15 pages
Spark and Hadoop Configuration Guide
100% (1)
Spark and Hadoop Configuration Guide
206 pages
Data Science & AI Program Overview
No ratings yet
Data Science & AI Program Overview
29 pages
Big Data Overview and Frameworks Guide
No ratings yet
Big Data Overview and Frameworks Guide
14 pages
Parallel and Distributed Algorithms Explained
No ratings yet
Parallel and Distributed Algorithms Explained
6 pages
Hadoop: History and Components Overview
No ratings yet
Hadoop: History and Components Overview
21 pages
Tutorial MapR Administration
No ratings yet
Tutorial MapR Administration
236 pages
Lecture 24: WSC, Datacenters
No ratings yet
Lecture 24: WSC, Datacenters
19 pages
VMware Setup for Hadoop Training
No ratings yet
VMware Setup for Hadoop Training
34 pages
Understanding Hadoop and Big Data
No ratings yet
Understanding Hadoop and Big Data
492 pages

Understanding MapReduce in Hadoop

Uploaded by

Understanding MapReduce in Hadoop

Uploaded by

5.

11 PROCESSING DATA WITH HADOOP

• MapReduce Programming is a software framework. MapReduce Programming helps you to process

Figure: 5.21 MapReduce Programming phases and daemons

5.11.1 MapReduce Daemons

Task Tracker Task Tracker Task Tracker

Reduce Reduce Reduce

Figure 5.22 Job Tracker and TaskTracker interaction

5.11.3 MapReduce Example

responsibility of transforming input records into intermediate key-value pairs.

The Map phase takes input from the

4.1.2 What is it?

You might also like