Categories: HadoopTutorial

MapReduce Programming Hello World Job

In the Hadoop and MapReduce tutorial we will see how to create hello world job and what are the steps to creating a mapreduce program. There are following steps to creating mapreduce program.

Step1- Creating a file
J$ cat>file.txt
hi how are you
how is your job
how is your family
how is your brother
how is your sister
what is the time now
what is the strength of hadoop

Step2- loading file.txt from local file system to HDFS
J$ hadoop fs -put file.txt  file

Step3- Writing programs

  1.  DriverCode.java
  2.  MapperCode.java
  3.  ReducerCode.java

Step4- Compiling all above .java files

J$ javac -classpath $HADOOP_HOME/hadoop-core.jar *.java

Step5- Creating jar file

J$ jar cvf job.jar *.class

Step6- Running above job.jar on file (which there in HDFS)

J$ hadoop jar job.jar DriverCode file TestOutput

Lets start with actual code for these steps above.

Hello World Job -> WordCountJob

1. DriverCode (WordCount.java)

package com.doj.hadoop.driver;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

/**
 * @author Dinesh Rajput
 *
 */
public class WordCount extends Configured implements Tool {
 @Override
 public int run(String[] args) throws Exception {
   if (args.length != 2) {
       System.err.printf("Usage: %s [generic options] <input> <output>n",
       getClass().getSimpleName());
       ToolRunner.printGenericCommandUsage(System.err);
       return -1;
    }
   JobConf conf = new JobConf(WordCount.class);
   conf.setJobName("Word Count");
   FileInputFormat.addInputPath(conf, new Path(args[0]));
   FileOutputFormat.setOutputPath(conf, new Path(args[1]));
   conf.setMapperClass(WordMapper.class);
   conf.setCombinerClass(WordReducer.class);
   conf.setReducerClass(WordReducer.class);
   conf.setOutputKeyClass(Text.class);
   conf.setOutputValueClass(IntWritable.class);
   return conf.waitForCompletion(true) ? 0 : 1;
 }
 public static void main(String[] args) throws Exception {
   int exitCode = ToolRunner.run(new WordCount(), args);
   System.exit(exitCode);
 }
}

2. MapperCode (WordMapper.java)

package com.doj.hadoop.driver;

import java.io.IOException;
import java.util.StringTokenizer;
 
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

/**
 * @author Dinesh Rajput
 *
 */
public class WordMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
 private final static IntWritable one = new IntWritable(1);
 
 private Text word = new Text();
 
 public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
  throws IOException
  {
  String line = value.toString();
  StringTokenizer tokenizer = new StringTokenizer(line);
  while (tokenizer.hasMoreTokens())
  {
   word.set(tokenizer.nextToken());
   output.collect(word, one);
   }
  }
}

3. ReducedCode (WordReducer.java)

package com.doj.hadoop.driver;

/**
 * @author Dinesh Rajput
 *
 */
import java.io.IOException;
import java.util.Iterator;
 
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
 
public class WordReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>
{
 public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,
   Reporter reporter) throws IOException
  {
  int sum = 0;
  while (values.hasNext())
  {
   sum += values.next().get();
  }
  output.collect(key, new IntWritable(sum));
  }
}

Previous
Next
Dinesh Rajput

Dinesh Rajput is the chief editor of a website Dineshonjava, a technical blog dedicated to the Spring and Java technologies. It has a series of articles related to Java technologies. Dinesh has been a Spring enthusiast since 2008 and is a Pivotal Certified Spring Professional, an author of a book Spring 5 Design Pattern, and a blogger. He has more than 10 years of experience with different aspects of Spring and Java design and development. His core expertise lies in the latest version of Spring Framework, Spring Boot, Spring Security, creating REST APIs, Microservice Architecture, Reactive Pattern, Spring AOP, Design Patterns, Struts, Hibernate, Web Services, Spring Batch, Cassandra, MongoDB, and Web Application Design and Architecture. He is currently working as a technology manager at a leading product and web development company. He worked as a developer and tech lead at the Bennett, Coleman & Co. Ltd and was the first developer in his previous company, Paytm. Dinesh is passionate about the latest Java technologies and loves to write technical blogs related to it. He is a very active member of the Java and Spring community on different forums. When it comes to the Spring Framework and Java, Dinesh tops the list!

Share
Published by
Dinesh Rajput

Recent Posts

Strategy Design Patterns using Lambda

Strategy Design Patterns We can easily create a strategy design pattern using lambda. To implement…

2 years ago

Decorator Pattern using Lambda

Decorator Pattern A decorator pattern allows a user to add new functionality to an existing…

2 years ago

Delegating pattern using lambda

Delegating pattern In software engineering, the delegation pattern is an object-oriented design pattern that allows…

2 years ago

Spring Vs Django- Know The Difference Between The Two

Technology has emerged a lot in the last decade, and now we have artificial intelligence;…

2 years ago

TOP 20 MongoDB INTERVIEW QUESTIONS 2022

Managing a database is becoming increasingly complex now due to the vast amount of data…

2 years ago

Scheduler @Scheduled Annotation Spring Boot

Overview In this article, we will explore Spring Scheduler how we could use it by…

3 years ago