0% found this document useful (0 votes)
53 views3 pages

Run Wordcount

The document outlines 10 steps to run a WordCount program on Hadoop: 1) Install Hadoop and Java, 2) Create input and output directories locally and on HDFS, 3) Add input file, 4) Export Hadoop classpath, 5) Create directories on HDFS, 6) View files on HDFS, 7) Compile WordCount.java, 8) Create Jar file, 9) Run Jar file on Hadoop, 10) Output results. The WordCount.java file provided implements a Mapper, Reducer, and main method to count word frequencies in an input file.

Uploaded by

Khushi Patil
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
53 views3 pages

Run Wordcount

The document outlines 10 steps to run a WordCount program on Hadoop: 1) Install Hadoop and Java, 2) Create input and output directories locally and on HDFS, 3) Add input file, 4) Export Hadoop classpath, 5) Create directories on HDFS, 6) View files on HDFS, 7) Compile WordCount.java, 8) Create Jar file, 9) Run Jar file on Hadoop, 10) Output results. The WordCount.java file provided implements a Mapper, Reducer, and main method to count word frequencies in an input file.

Uploaded by

Khushi Patil
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 3

Steps to run WordCount Program on Hadoop:

1. Make sure Hadoop and Java are installed properly


hadoop version
javac -version

2. Create a directory on the Desktop named Lab and inside it create two
folders; one called “Input” and the other called “tutorial_classes”. 
[You can do this step using GUI normally or through terminal
commands] 
cd /home/hadoop
mkdir WordCountTutorial
mkdir WordCountTutorial/input_data
mkdir WordCountTutorial/tutorial_classes
paste WordCount.java file in dir- WordCountTutorial
3. Add the file attached with this document “input.txt” in the directory
WordCountTutorial/input_data
4. Type the following command to export the hadoop classpath into bash.
export HADOOP_CLASSPATH=$(hadoop classpath)
Make sure it is now exported. 
echo $HADOOP_CLASSPATH
5. It is time to create these directories on HDFS rather than locally. Type the
following commands.
hadoop fs -mkdir /WordCountTutorial 
hadoop fs -mkdir /WordCountTutorial/Input
hadoop fs -put WordCountTutorial/input_data/input.txt
/WordCountTutorial/Input
6.  Go to localhost:9870 from the browser, Open “Utilities → Browse File
System” and you should see the directories and files we placed in the file
system. 
7. Then, back to local machine where we will compile the WordCount.java
file. Assuming we are currently in the Desktop directory.
cd WordCountTutorial
javac -classpath $HADOOP_CLASSPATH -d tutorial_classes
WordCount.java
Put the output files in one jar file (There is a dot at the end)
jar -cvf WordCount.jar -C tutorial_classes .

9. Now, we run the jar file on Hadoop.


hadoop jar WordCount.jar WordCount
/WordCountTutorial/Input /WordCountTutorial/Output
10. Output the result:
hadoop dfs -cat /WordCountTutorial/Output/*

Wordcount.java

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper


       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);


    private Text word = new Text();

    public void map(Object key, Text value, Context context


                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer


       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,


                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {


    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

You might also like