0% found this document useful (0 votes)

23 views

WordCount Program Hadoop Task 2

This document discusses writing a word count program using the MapReduce API for Apache Hadoop. It provides steps to write the program in Java, compile and run it on a sample input text file to obtain the word count output. The key steps are: 1) setting the CLASSPATH, 2) writing the WordCount Java program, 3) compilation, 4) creating a JAR file, 5) creating a sample input file, 6) executing the program on HDFS to get the word count output.

Uploaded by

20261A6757 VIJAYAGIRI ANIL KUMAR

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

WordCount Program Hadoop Task 2

Uploaded by

20261A6757 VIJAYAGIRI ANIL KUMAR

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2.

Word count program using Mapreduce API

Aim: To write word count program using mapreduce API for apache
hadoop framework.

Mapreduce API:
Hadoop MapReduce is a software framework for easily writing applications
which process vast amounts of data (multi-terabyte data-sets) in-parallel on
large clusters (thousands of nodes) of commodity hardware in a reliable, fault-
tolerant manner.
A MapReduce job usually splits the input data-set into independent chunks
which are processed by the map tasks in a completely parallel manner. The
framework sorts the outputs of the maps, which are then input to the reduce
tasks.

Inputs and Outputs

The MapReduce framework operates exclusively on <key, value> pairs, that is,
the framework views the input to the job as a set of <key, value> pairs and
produces a set of <key, value> pairs as the output of the job, conceivably of
different types.

Input and Output types of a MapReduce job:

(input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3,
v3> (output)

Steps to execute the WordCount Java Program:

1. Set CLASSPATH

nano ~/.bashrc

export CLASSPATH=${HADOOP_HOME}/share/hadoop/common/hadoop-
common-3.3.1.jar:${HADOOP_HOME}/share/hadoop/mapreduce/*:

source ~/.bashrc

2. WordCount.java Program ( Write program using gedit )

import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "WordCount");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

3. Compilation

javac WordCount.java

4. Jar file creation

jar cf wc.jar WordCount*.class

5. hello.txt ( Create this file using gedit )

The Apache Hadoop software library is a framework that allows for the
distributed processing of large data sets across clusters of computers using
simple programming models. It is designed to scale up from single servers to
thousands of machines, each offering local computation and storage. Rather
than rely on hardware to deliver high-availability, the library itself is designed
to detect and handle failures at the application layer, so delivering a highly-
available service on top of a cluster of computers, each of which may be prone
to failures.

The Hadoop Distributed File System (HDFS) is a distributed file system

designed to run on commodity hardware. It has many similarities with existing
distributed file systems. However, the differences from other distributed file
systems are significant. HDFS is highly fault-tolerant and is designed to be
deployed on low-cost hardware. HDFS provides high throughput access to
application data and is suitable for applications that have large data sets. HDFS
relaxes a few POSIX requirements to enable streaming access to file system
data. HDFS was originally built as infrastructure for the Apache Nutch web
search engine project. HDFS is part of the Apache Hadoop Core project. The
project URL is http://hadoop.apache.org/.

6. Execution of Java Programs ( Jar file is a group of class files)

hdfs dfs -mkdir input

hdfs dfs -put hello.text input

hadoop jar wc.jar WordCount input ouput

7. Output

(HDFS) 1
Apache 3
Core 1
Distributed 1
File 1
HDFS 5
Hadoop 3
However, 1
It 2
Nutch 1
POSIX 1
Rather 1
System 1
The 3
URL 1
a 5
access 2
across 1
allows 1
and 4
application 2
applications 1
are 1
as 1
at 1
be 2
built 1
cluster 1
clusters 1
commodity 1
computation 1
computers 1
computers, 1
data 3
data. 1
deliver 1
delivering 1
deployed 1
designed 4
detect 1
differences 1
distributed 4
each 2
enable 1
engine 1
existing 1
failures 1
failures. 1
fault-tolerant 1
few 1
file 4
for 3
framework 1
from 2
handle 1
hardware 1
hardware. 2
has 1
have 1
high 1
high-availability, 1
highly 1
highly-available 1
http://hadoop.apache.org/. 1
infrastructure 1
is 9
itself 1
large 2
layer, 1
library 2
local 1
low-cost 1
machines, 1
many 1
may 1
models. 1
of 7
offering 1
on 4
originally 1
other 1
part 1
processing 1
programming 1
project 1
project. 2
prone1
provides 1
relaxes 1
rely 1
requirements 1
run 1
scale 1
search 1
servers 1
service 1
sets 1
sets. 1
significant. 1
similarities 1
simple 1
single 1
so 1
software 1
storage. 1
streaming 1
suitable 1
system 2
systems 1
systems. 1
than 1
that 2
the 6
thousands 1
throughput 1
to 10
top 1
up 1
using 1
was 1
web 1
which 1
with 1

Multibit Register Synthesis PDF
No ratings yet
Multibit Register Synthesis PDF
90 pages
Running Jar Program
No ratings yet
Running Jar Program
3 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Wordcount
No ratings yet
Wordcount
3 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Word Count Example
No ratings yet
Word Count Example
4 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
ExNo04
No ratings yet
ExNo04
4 pages
Palak
No ratings yet
Palak
10 pages
Run Wordcount
No ratings yet
Run Wordcount
3 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Steps to create jar file and execute word count problem in mapper reducer
No ratings yet
Steps to create jar file and execute word count problem in mapper reducer
5 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
Ravikant_Hadoop_file
No ratings yet
Ravikant_Hadoop_file
22 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
12 CodigoNetbeans
No ratings yet
12 CodigoNetbeans
5 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Word Count Program
No ratings yet
Word Count Program
3 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
4 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
Exp-11
No ratings yet
Exp-11
4 pages
CTBD Ex02
No ratings yet
CTBD Ex02
3 pages
BDA
No ratings yet
BDA
6 pages
To Count Using Map and Reduce Program: Wordcount - Java
No ratings yet
To Count Using Map and Reduce Program: Wordcount - Java
2 pages
MapReduce Word Count Example - Javatpoint
No ratings yet
MapReduce Word Count Example - Javatpoint
12 pages
DA Lab Program-2
No ratings yet
DA Lab Program-2
6 pages
Word Count Program
No ratings yet
Word Count Program
2 pages
Practical 2-1
No ratings yet
Practical 2-1
4 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
5 pages
Practical 3bcbs
No ratings yet
Practical 3bcbs
5 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
Lab3_BigData-MapReduce
No ratings yet
Lab3_BigData-MapReduce
8 pages
1WordCount
No ratings yet
1WordCount
2 pages
Exp 4 Word Count
No ratings yet
Exp 4 Word Count
4 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
2020300053_BDA_EXP1_CHINMAY
No ratings yet
2020300053_BDA_EXP1_CHINMAY
13 pages
hadoop2
No ratings yet
hadoop2
31 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
BDM Lab Manual 2
No ratings yet
BDM Lab Manual 2
4 pages
wc
No ratings yet
wc
13 pages
049
No ratings yet
049
2 pages
BDA Lab Assignment 3 PDF
No ratings yet
BDA Lab Assignment 3 PDF
17 pages
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
No ratings yet
Developing A Mapreduce Application: by Dr. K. Venkateswara Rao Professor Department of Cse
83 pages
3 MapReduce program ex code
No ratings yet
3 MapReduce program ex code
14 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Result 1st Semester Electrical Engineering Entry 10
No ratings yet
Result 1st Semester Electrical Engineering Entry 10
16 pages
5. Đề ôn tập HK1 - Tiếng Anh 10- I-learn Smart World - Đề số 5
No ratings yet
5. Đề ôn tập HK1 - Tiếng Anh 10- I-learn Smart World - Đề số 5
15 pages
Critical Chain
No ratings yet
Critical Chain
4 pages
PS Anm 25 08
No ratings yet
PS Anm 25 08
5 pages
Localization and Translation
100% (1)
Localization and Translation
14 pages
Make Game Full Code Like Subway Surfers
No ratings yet
Make Game Full Code Like Subway Surfers
2 pages
Mobile App Development Certificate Training
No ratings yet
Mobile App Development Certificate Training
37 pages
AnalytixLabs - PostGrad Cert in DATA ANALYTICS For Business
No ratings yet
AnalytixLabs - PostGrad Cert in DATA ANALYTICS For Business
40 pages
Assignment Guideline
No ratings yet
Assignment Guideline
3 pages
What Is Natural Language Processing?
No ratings yet
What Is Natural Language Processing?
5 pages
Cisco Catalyst 9800 Series Wireless Controllers Data Sheet
No ratings yet
Cisco Catalyst 9800 Series Wireless Controllers Data Sheet
25 pages
BIT 1201database Systems
No ratings yet
BIT 1201database Systems
4 pages
Hands-On Kubernetes On Azure - Shivakumar Gopalakrishnan
0% (1)
Hands-On Kubernetes On Azure - Shivakumar Gopalakrishnan
315 pages
May June Renewed Laptops
No ratings yet
May June Renewed Laptops
3 pages
Components and Format of IT Project Proposal
No ratings yet
Components and Format of IT Project Proposal
4 pages
PLC Analog Value Range
No ratings yet
PLC Analog Value Range
2 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
2023 5g Technology and Its Impact On The Use of Online Videogames A Comprehensive Systematic
No ratings yet
2023 5g Technology and Its Impact On The Use of Online Videogames A Comprehensive Systematic
14 pages
(11.2C.30 - Bahasa Inggris II) Tugas Pertemuan Ke-2
No ratings yet
(11.2C.30 - Bahasa Inggris II) Tugas Pertemuan Ke-2
2 pages
Info en
No ratings yet
Info en
2 pages
A - B Testing Rigorously (Without Losing Your Job)
No ratings yet
A - B Testing Rigorously (Without Losing Your Job)
7 pages
Brochure of DW-T6
No ratings yet
Brochure of DW-T6
5 pages
DSP PPT LISt
No ratings yet
DSP PPT LISt
2 pages
Czech IPTV Free M3U 2022
No ratings yet
Czech IPTV Free M3U 2022
2 pages
Hallermann Morgenthal Iabse Guangzhou Uav 3d Inspection Bridges
No ratings yet
Hallermann Morgenthal Iabse Guangzhou Uav 3d Inspection Bridges
9 pages
Post: Assistant Director: Bangladesh Bank
No ratings yet
Post: Assistant Director: Bangladesh Bank
1 page
3-FAARFIELD 1 - 4 and AC 150 - 5320-6F
No ratings yet
3-FAARFIELD 1 - 4 and AC 150 - 5320-6F
51 pages
Cpu 1
No ratings yet
Cpu 1
8 pages
Virtual Assistant User Guide
No ratings yet
Virtual Assistant User Guide
6 pages