Run Wordcount
Run Wordcount
2. Create a directory on the Desktop named Lab and inside it create two
folders; one called “Input” and the other called “tutorial_classes”.
[You can do this step using GUI normally or through terminal
commands]
cd /home/hadoop
mkdir WordCountTutorial
mkdir WordCountTutorial/input_data
mkdir WordCountTutorial/tutorial_classes
paste WordCount.java file in dir- WordCountTutorial
3. Add the file attached with this document “input.txt” in the directory
WordCountTutorial/input_data
4. Type the following command to export the hadoop classpath into bash.
export HADOOP_CLASSPATH=$(hadoop classpath)
Make sure it is now exported.
echo $HADOOP_CLASSPATH
5. It is time to create these directories on HDFS rather than locally. Type the
following commands.
hadoop fs -mkdir /WordCountTutorial
hadoop fs -mkdir /WordCountTutorial/Input
hadoop fs -put WordCountTutorial/input_data/input.txt
/WordCountTutorial/Input
6. Go to localhost:9870 from the browser, Open “Utilities → Browse File
System” and you should see the directories and files we placed in the file
system.
7. Then, back to local machine where we will compile the WordCount.java
file. Assuming we are currently in the Desktop directory.
cd WordCountTutorial
javac -classpath $HADOOP_CLASSPATH -d tutorial_classes
WordCount.java
Put the output files in one jar file (There is a dot at the end)
jar -cvf WordCount.jar -C tutorial_classes .
Wordcount.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;