MapReduce Example

This document contains code for a Java MapReduce program that counts the occurrences of different log levels (TRACE, DEBUG, etc.) in input log files. The Map class extracts the log level from each line and emits it as a key along with a count of 1. The Reduce class sums the counts for each unique log level. The main method sets up the job configuration and runs the MapReduce job to process the input path and write results to the output path.

Uploaded by

Ravi Chander

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views

MapReduce Example

Uploaded by

Ravi Chander

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

//Standard Java imports import java.io.IOException; import java.util.Iterator; import java.util.regex.Matcher; import java.util.regex.

Pattern; //Hadoop imports import import import import import import import import import import import import org.apache.hadoop.fs.Path; org.apache.hadoop.io.IntWritable; org.apache.hadoop.io.LongWritable; org.apache.hadoop.io.Text; org.apache.hadoop.mapred.FileInputFormat; org.apache.hadoop.mapred.FileOutputFormat; org.apache.hadoop.mapred.JobClient; org.apache.hadoop.mapred.JobConf; org.apache.hadoop.mapred.MapReduceBase; org.apache.hadoop.mapred.Mapper; org.apache.hadoop.mapred.OutputCollector; org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; /** * Tutorial1 * */ public class Tutorial1 { //The Mapper public static class Map extends MapReduceBase implements Mapper<LongWritable, T ext, Text, IntWritable> { //Log levels to search for private static final Pattern pattern = Pattern.compile("(TRACE)|(DEBUG)|(INFO)| (WARN)|(ERROR)|(FATAL)"); private static final IntWritable accumulator = new IntWritable(1); private Text logLevel = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable > collector, Reporter reporter) throws IOException { // split on space, '[', and ']' final String[] tokens = value.toString().split("[ \\[\\]]"); if(tokens != null) { //now find the log level token for(final String token : tokens) { final Matcher matcher = pattern.matcher(token); //log level found if(matcher.matches()) { logLevel.set(token);

//Create the key value pairs collector.collect(logLevel, accumulator); } } } } } //The Reducer public static class Reduce extends MapReduceBase implements Reducer<Text, IntWr itable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text , IntWritable> collector, Reporter reporter) throws IOException { int count = 0; //code to aggregate the occurrence while(values.hasNext()) { count += values.next().get(); } System.out.println(key + "\t" + count); collector.collect(key, new IntWritable(count)); } } //The java main method to execute the MapReduce job public static void main(String[] args) throws Exception { //Code to create a new Job specifying the MapReduce class final JobConf conf = new JobConf(Tutorial1.class); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); // Combiner is commented out to be used in bonus activity //conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); //File Input argument passed as a command line argument FileInputFormat.setInputPaths(conf, new Path(args[0]));

//File Output argument passed as a command line argument FileOutputFormat.setOutputPath(conf, new Path(args[1])); //statement to execute the job JobClient.runJob(conf); } }

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Chapter 2 Server-Side Scripting Overview
No ratings yet
Chapter 2 Server-Side Scripting Overview
35 pages
Gigamon Solution Overview PDF
No ratings yet
Gigamon Solution Overview PDF
46 pages
Map Reduce
No ratings yet
Map Reduce
10 pages
HOL Hive PDF
No ratings yet
HOL Hive PDF
23 pages
Machine Learning With Spark
No ratings yet
Machine Learning With Spark
26 pages
Ajay Singh - Hadoop Resume
67% (3)
Ajay Singh - Hadoop Resume
2 pages
Gcloud Python
No ratings yet
Gcloud Python
398 pages
STUTI - GUPTA Hadoop Resume PDF
No ratings yet
STUTI - GUPTA Hadoop Resume PDF
2 pages
Deepak Professional Summary
No ratings yet
Deepak Professional Summary
3 pages
Writing An Hadoop MapReduce Program in Python
No ratings yet
Writing An Hadoop MapReduce Program in Python
21 pages
Exercise 6 PDF
No ratings yet
Exercise 6 PDF
2 pages
MapReduce Example
No ratings yet
MapReduce Example
76 pages
Intellipaat Hands On Exercises PDF
No ratings yet
Intellipaat Hands On Exercises PDF
49 pages
STUTI - GUPTA Hadoop Resume PDF
No ratings yet
STUTI - GUPTA Hadoop Resume PDF
2 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
Sai Hadoop Resume
No ratings yet
Sai Hadoop Resume
5 pages
Hadoop Performance Tuning
100% (1)
Hadoop Performance Tuning
13 pages
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
DVS SPARK Course Content PDF
No ratings yet
DVS SPARK Course Content PDF
2 pages
Hadoop Interview Questions Faq
No ratings yet
Hadoop Interview Questions Faq
14 pages
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
No ratings yet
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
35 pages
Scala Basic Interview Questions
No ratings yet
Scala Basic Interview Questions
16 pages
Mining Data Streams
No ratings yet
Mining Data Streams
67 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
13 - m1 - Linux Basic Commands - Edureka VM PDF
No ratings yet
13 - m1 - Linux Basic Commands - Edureka VM PDF
3 pages
Pitchaiah Kuruvella: TCS Internal
No ratings yet
Pitchaiah Kuruvella: TCS Internal
6 pages
Cloudera Certification Dump 410 Anil PDF
No ratings yet
Cloudera Certification Dump 410 Anil PDF
49 pages
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
No ratings yet
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
33 pages
Spark Sample Resume 2
100% (1)
Spark Sample Resume 2
7 pages
Hadoop Interview Questions and Answers
No ratings yet
Hadoop Interview Questions and Answers
3 pages
Hadoop Interviews Q
No ratings yet
Hadoop Interviews Q
9 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
5 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
3 Mapreduce Notes
No ratings yet
3 Mapreduce Notes
25 pages
7 Hive Notes
No ratings yet
7 Hive Notes
36 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
MapReduce Questions and Answers Part 1 - Java Code Geeks
No ratings yet
MapReduce Questions and Answers Part 1 - Java Code Geeks
8 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Unit 4 Da
No ratings yet
Unit 4 Da
57 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
Big Data Hadoop Certification Training: About Intellipaat
No ratings yet
Big Data Hadoop Certification Training: About Intellipaat
13 pages
100+ Hadoop Interview Questions From Interviews
No ratings yet
100+ Hadoop Interview Questions From Interviews
32 pages
Hadoop Interview Questions - Part 1
No ratings yet
Hadoop Interview Questions - Part 1
8 pages
1 Apache Zookeeper
No ratings yet
1 Apache Zookeeper
7 pages
Hadoop
No ratings yet
Hadoop
34 pages
Spark Details
No ratings yet
Spark Details
11 pages
3170722_BDA_Lab Manual(1)
No ratings yet
3170722_BDA_Lab Manual(1)
78 pages
Cloudera Certification Dump - 410-Anil
100% (3)
Cloudera Certification Dump - 410-Anil
49 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Introduction To Apache Spark (Spark) : - by Praveen
No ratings yet
Introduction To Apache Spark (Spark) : - by Praveen
19 pages
Hadoop Lab
100% (2)
Hadoop Lab
6 pages
Pyspark IQ FREE Guide
No ratings yet
Pyspark IQ FREE Guide
57 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
14 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
WildFly Performance Tuning
From Everand
WildFly Performance Tuning
Arnold Johansson
No ratings yet
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
High-Performance Oracle: Proven Methods for Achieving Optimum Performance and Availability
From Everand
High-Performance Oracle: Proven Methods for Achieving Optimum Performance and Availability
Geoff Ingram
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
PostgreSQL 9 High Availability Cookbook
From Everand
PostgreSQL 9 High Availability Cookbook
Shaun M. Thomas
5/5 (2)
Undo Retention and Retention Guarantee
No ratings yet
Undo Retention and Retention Guarantee
2 pages
Batch Determination in Sap
No ratings yet
Batch Determination in Sap
13 pages
KPRAO SrFullStackJavaDeveloper Resume
No ratings yet
KPRAO SrFullStackJavaDeveloper Resume
9 pages
Rock Paper Scissor - Smile (6319025)
No ratings yet
Rock Paper Scissor - Smile (6319025)
38 pages
Williams Draft Book
No ratings yet
Williams Draft Book
295 pages
Gitam Mca Project Synopsis
No ratings yet
Gitam Mca Project Synopsis
9 pages
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
No ratings yet
Association Rule Generation For Student Performance Analysis Using Apriori Algorithm
5 pages
How To Invoke Script From A Siebel Button at The Business Component and Applet Levels
No ratings yet
How To Invoke Script From A Siebel Button at The Business Component and Applet Levels
7 pages
Cat Tools
No ratings yet
Cat Tools
202 pages
Data Structure
No ratings yet
Data Structure
24 pages
Project Synopsis Format
No ratings yet
Project Synopsis Format
5 pages
Dell Equallogic Host Integration Tools For Microsoft Edition Version 5.1 Release Notes
No ratings yet
Dell Equallogic Host Integration Tools For Microsoft Edition Version 5.1 Release Notes
15 pages
Robot Operating System 2: Design, Architecture, and Uses in The Wild
No ratings yet
Robot Operating System 2: Design, Architecture, and Uses in The Wild
13 pages
Design Patterns Quick Reference Card
100% (26)
Design Patterns Quick Reference Card
2 pages
Case Study: An Implementation of A Secure Steganographic System
No ratings yet
Case Study: An Implementation of A Secure Steganographic System
4 pages
Backend Flow Trace
No ratings yet
Backend Flow Trace
16 pages
Lab Assignment 6-Process Synchronization
No ratings yet
Lab Assignment 6-Process Synchronization
7 pages
Reading Challenge 4
No ratings yet
Reading Challenge 4
3 pages
SWP391-AppDevProject Design Template
No ratings yet
SWP391-AppDevProject Design Template
5 pages
List of Major Software Companies in India 4
No ratings yet
List of Major Software Companies in India 4
1 page
Sophos Central Device Encryption: Administrator Guide
No ratings yet
Sophos Central Device Encryption: Administrator Guide
26 pages
Practical Task 2 (Os)
100% (2)
Practical Task 2 (Os)
21 pages
Dbf510s - Database Fundamentals - 1st Opp - June 2022
No ratings yet
Dbf510s - Database Fundamentals - 1st Opp - June 2022
9 pages
Practice 04 Building A Web - Based Loan Calculator Visual
No ratings yet
Practice 04 Building A Web - Based Loan Calculator Visual
5 pages
Update Service Log
No ratings yet
Update Service Log
46 pages
Fill in The Blank Questions
50% (2)
Fill in The Blank Questions
16 pages
LVM - HPLab Traning
No ratings yet
LVM - HPLab Traning
281 pages
Overview of Transaction Processing and Enterprise Resource Planning Systems
No ratings yet
Overview of Transaction Processing and Enterprise Resource Planning Systems
17 pages

MapReduce Example

Uploaded by

MapReduce Example

Uploaded by

//Standard Java imports import java.io.IOException; import java.util.Iterator; import java.util.regex.Matcher; import java.util.regex.

You might also like