0% found this document useful (0 votes)

20 views

02-Wordcount Mapreduce

mpr

Uploaded by

Mohammed Thawfeeq

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

02-Wordcount Mapreduce

mpr

Uploaded by

Mohammed Thawfeeq

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

MapReduce is the core component of the distributed processing framework Hadoop which is

written in Java –

Map Phase – Data transformation and pre-processing step. Data is input in terms of key value
pairs and after processing is sent to the reduce phase.

Reduce Phase- Data is aggregated and the business logic is implemented in this phase which
is sent to the next big data tool in the data pipeline for further processing.

The standard Hadoop’s MapReduce model has Mappers, Reducers, Combiners, Partitioner,
and sorting all of which manipulate the structure of the data to fit the business requirements.
It is evident that to manipulate the structure of the data – Map and Reduce phase need to
make use of data structures like arrays to perform various transformation operations.

Ex. No. 2:

AIM:
Word count program to demonstrate the use of Map and Reduce tasks

STEPS:
1. Analyze the input file content
2. Develop the code
a. Writing a map function
b. Writing a reduce function
c. Writing the Driver class
3. Compiling the source
4. Building the JAR file
5. Starting the DFS
6. Creating Input path in HDFS and moving the data into Input path
7. Executing the program

$ cd ~
$ sudo mkdir wordcount
$ cd wordcount
$ sudo nano WordCount.java

Wordcount program:
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*; // to tell hadoop what to run
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*; //
import org.apache.hadoop.util.*; // to run mapreduce application

public class WordCount

{
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text,
Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map (LongWritable key, Text value, OutputCollector<Text, IntWritable>
output, Reporter reporter) throws IOException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable>
{
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException
{
int sum=0;
while (values.hasNext())
{
sum+=values.next().get();
}
output.collect (key, new IntWritable(sum));
}
}
public static void main (String[] args) throws Exception
{
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}

Start the hdfs daemon

$start-all.sh

To check the hadoop classpath

$ hadoop classpath
/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop
/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoo
p/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/mapreduc
e/*:/usr/local/hadoop/share/hadoop/yarn:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/
hadoop/share/hadoop/yarn/*

Grant all permissions to user to access the folder ‘wordcount’

$ sudo chmod 777 wordcount/

Run javac
$ cd ~
$ cd wordcount
$ javac WordCount.java -cp $(hadoop classpath)

The three java class files are created. To check it, type
$ ls

create jar file to combine these class files

$ jar cf wc.jar WordCount*.class

Create input and output directory in the hadoop file system

$ hadoop fs -mkdir /input
$ hadoop fs -mkdir /output
$ hadoop fs -ls /

Create text file and move it to input folder in hadoop file system
$ nano hello.txt
Data transformation and pre-processing step. Data is input in terms of key value pairs and
after processing is sent to the reduce phase.

$ hadoop fs -put hello.txt /input

To check all files and folder

$ hadoop fs -lsr /

Execute the program

$ hadoop jar wc.jar WordCount /input /output/out1

The last two arguments are source directory location which has input file and output directory
location where result will be generated in ‘out1’ folder. ‘Out1’ must not be created before.

output is in part-00000 subfolder

To check your output

$ hadoop fs -cat /output/out1/part-00000

Any error
1. edit the file hadoop-env.sh in /usr/local/etc/hadoop

Adding Hadoop library into LD_LIBRARY_PATH :

export LD_LIBRARY_PATH=/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH

2. Include /native in bashrc file

export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"

3. java related error

in bashrc file, include the following line after case statements “case
${HADOOP_OS_TYPE}”
export HADOOP_OPTS="--add-modules java.activation"

4. any error in yarn. include the following

usr/local/hadoop/etc/hadoop/yarn-site.xml

sudo gedit /usr/local/Hadoop/etc/hadoop/yarn-site.xml

Add the following to the configuration Tag
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

Steps to rectify Error

If the SSH connect should fail, these general tips might help:

● Enable debugging with ssh -vvv localhost and investigate the error in
detail.
● Check the SSH server configuration in /etc/ssh/sshd_config, in
particular the options PubkeyAuthentication (which should be set to yes)
and AllowUsers (if this option is active, add the hduser user to it). If you
made any changes to the SSH server configuration file, you can force a
configuration reload with sudo /etc/init.d/ssh reload.

Jasperactive
0% (1)
Jasperactive
2 pages
Administrator's Guide For VTech Hotel SIP Phone Administration Tool - V1.4.3 - 20210118
No ratings yet
Administrator's Guide For VTech Hotel SIP Phone Administration Tool - V1.4.3 - 20210118
35 pages
CSP Unit 5 - Lists, Loops, and Traversals
No ratings yet
CSP Unit 5 - Lists, Loops, and Traversals
233 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Running Jar Program
No ratings yet
Running Jar Program
3 pages
Week-8 de
No ratings yet
Week-8 de
9 pages
BDA
No ratings yet
BDA
6 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Execute Java Map Reduce Sample Using Eclipse
No ratings yet
Execute Java Map Reduce Sample Using Eclipse
9 pages
Ravikant_Hadoop_file
No ratings yet
Ravikant_Hadoop_file
22 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
6 - Simple Wordcount
No ratings yet
6 - Simple Wordcount
2 pages
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
Big Data - ASSIGNMENT 2
No ratings yet
Big Data - ASSIGNMENT 2
15 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Word Count
No ratings yet
Word Count
10 pages
Word Count Program
No ratings yet
Word Count Program
3 pages
Practical 3bcbs
No ratings yet
Practical 3bcbs
5 pages
Run Wordcount
No ratings yet
Run Wordcount
3 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
Mcsl26 See QP Solution 2024
No ratings yet
Mcsl26 See QP Solution 2024
33 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
BDA3
No ratings yet
BDA3
7 pages
Exp 4 Word Count
No ratings yet
Exp 4 Word Count
4 pages
SalesData Map Reduce
No ratings yet
SalesData Map Reduce
3 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Exp-12
No ratings yet
Exp-12
7 pages
Map Reduce Example
No ratings yet
Map Reduce Example
6 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
Map Reduce
No ratings yet
Map Reduce
4 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
To Count Using Map and Reduce Program: Wordcount - Java
No ratings yet
To Count Using Map and Reduce Program: Wordcount - Java
2 pages
BDALab Assn4
No ratings yet
BDALab Assn4
9 pages
✅ PART 1- Install Java and Hadoop on Ubuntu
No ratings yet
✅ PART 1- Install Java and Hadoop on Ubuntu
4 pages
BDA Exp Removed Removed
No ratings yet
BDA Exp Removed Removed
33 pages
MapReduce Exam 2019 - Solved Paper
No ratings yet
MapReduce Exam 2019 - Solved Paper
25 pages
MapReduce Example
No ratings yet
MapReduce Example
3 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
ADA Lab Manual
No ratings yet
ADA Lab Manual
34 pages
2020300053_BDA_EXP2_CHINMAY
No ratings yet
2020300053_BDA_EXP2_CHINMAY
7 pages
Wordcount
No ratings yet
Wordcount
3 pages
BDALab Assn4
No ratings yet
BDALab Assn4
9 pages
Big Data
No ratings yet
Big Data
23 pages
Palak
No ratings yet
Palak
10 pages
Practical 2-3
No ratings yet
Practical 2-3
3 pages
Hadoop
No ratings yet
Hadoop
38 pages
DA Lab Program-2
No ratings yet
DA Lab Program-2
6 pages
bda lab
No ratings yet
bda lab
39 pages
Web-Scale Data Processing: Christopher Olston and Many Others
No ratings yet
Web-Scale Data Processing: Christopher Olston and Many Others
32 pages
Practical 2-1
No ratings yet
Practical 2-1
4 pages
Analyzing The Data With Hadoop
No ratings yet
Analyzing The Data With Hadoop
13 pages
BDAV Practical
No ratings yet
BDAV Practical
17 pages
3 MapReduce program ex code
No ratings yet
3 MapReduce program ex code
14 pages
CSF443 Lab-Report Nimish Shandilya 1000016934
No ratings yet
CSF443 Lab-Report Nimish Shandilya 1000016934
17 pages
OddEven Program
No ratings yet
OddEven Program
2 pages
Source Code for Wordcount
No ratings yet
Source Code for Wordcount
3 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
82001
No ratings yet
82001
85 pages
Interface Design: Easy To Use? Easy To Understand? Easy To Learn?
No ratings yet
Interface Design: Easy To Use? Easy To Understand? Easy To Learn?
25 pages
Ex - No:9 Construction of Dag: Program
No ratings yet
Ex - No:9 Construction of Dag: Program
8 pages
T:T-Yuu,: ,.XR Q. Hi Tr,. :,: ,) DH ' T
No ratings yet
T:T-Yuu,: ,.XR Q. Hi Tr,. :,: ,) DH ' T
13 pages
EXNO: 11 Foreign Trading System Date: AIM: Mohammed Thawfeeq 311115205036
No ratings yet
EXNO: 11 Foreign Trading System Date: AIM: Mohammed Thawfeeq 311115205036
6 pages
Cs2357-Ooad Lab Manual
0% (1)
Cs2357-Ooad Lab Manual
199 pages
Cy2151-Engineering Chemistry-I Question Bank Part-B (16 Marks) Unit-I Polymer Chemistry
No ratings yet
Cy2151-Engineering Chemistry-I Question Bank Part-B (16 Marks) Unit-I Polymer Chemistry
2 pages
DBMS Lab Manual PDF
100% (1)
DBMS Lab Manual PDF
83 pages
Write A Function That Returns A Pointer To The Maximum Value of An Array of Double's. If The Array Is Empty, Return NULL
No ratings yet
Write A Function That Returns A Pointer To The Maximum Value of An Array of Double's. If The Array Is Empty, Return NULL
3 pages
Selenium Handbook GOOD
No ratings yet
Selenium Handbook GOOD
73 pages
Company Profile
No ratings yet
Company Profile
24 pages
Crs To Srs-2022a0002
No ratings yet
Crs To Srs-2022a0002
8 pages
Observability
No ratings yet
Observability
98 pages
Software Engineering - 2023 - Assignment 6 Updated
No ratings yet
Software Engineering - 2023 - Assignment 6 Updated
6 pages
Pascal/ASM Sec. 1.3
No ratings yet
Pascal/ASM Sec. 1.3
15 pages
Tafj Q A 1
No ratings yet
Tafj Q A 1
3 pages
Classical Mistake
No ratings yet
Classical Mistake
10 pages
T e C H N o L o G I e S: VIPULA Technologies WWW - Vipula.in
No ratings yet
T e C H N o L o G I e S: VIPULA Technologies WWW - Vipula.in
75 pages
SE100_Midterm1_Fall2016_solution
No ratings yet
SE100_Midterm1_Fall2016_solution
7 pages
Exam2 f10 PDF
No ratings yet
Exam2 f10 PDF
15 pages
ACP MCQ Questions Bank: Department of Computer Engineering - 07
No ratings yet
ACP MCQ Questions Bank: Department of Computer Engineering - 07
26 pages
Week 1 Day 1 and 2 - UCS503 - Introduction of Software Engineering
No ratings yet
Week 1 Day 1 and 2 - UCS503 - Introduction of Software Engineering
46 pages
Scope Resolution Operator:: (C++ Only)
No ratings yet
Scope Resolution Operator:: (C++ Only)
7 pages
Python_Programming_Internship_Report.pdf
No ratings yet
Python_Programming_Internship_Report.pdf
4 pages
Answer
No ratings yet
Answer
2 pages
Servlet Vs Reactive Choosing The Right Stack Qcon SF
No ratings yet
Servlet Vs Reactive Choosing The Right Stack Qcon SF
45 pages
Vba & Excel-Mvo
No ratings yet
Vba & Excel-Mvo
35 pages
Architectural Styles
No ratings yet
Architectural Styles
2 pages
Trace
No ratings yet
Trace
3 pages
Dev Resume
No ratings yet
Dev Resume
9 pages
TypeScript Types
No ratings yet
TypeScript Types
1 page
PROJECT JGK Finalized
No ratings yet
PROJECT JGK Finalized
68 pages
Analisa Penggunaan Metodologi Pengembangan Perangkat Lunak: Maikel Bolung, Henry Ronald Karunia Tampangela
No ratings yet
Analisa Penggunaan Metodologi Pengembangan Perangkat Lunak: Maikel Bolung, Henry Ronald Karunia Tampangela
10 pages
JS EventHandling
No ratings yet
JS EventHandling
7 pages
Ajp PR04 (22203B0004)
No ratings yet
Ajp PR04 (22203B0004)
7 pages
Fs QB: Question Bank and Answers
No ratings yet
Fs QB: Question Bank and Answers
91 pages