In this mapreduce tutorial we will explain mapreduce sample example with its flow chart. How to work mapreduce for a job.
To overcome listed above problems into some line using mapreduce program. Now we look into below mapreduce function for understanding how to its work on large dataset.
The emitted word, 1 will from the List that is output from the mapper
So who take ensuring the file is distributed and each line of the file is passed to each of the map function?-Hadoop Framework take care about this, no need to worry about the distributed system.
Reduce(Key2, List(Value2)) –> List(Key3, Value3)
For the List(key, value) output from the mapper Shuffle and Sort the data by key
Group by Key and create the list of values for a key
So who is ensuring the shuffle, sort, group by etc?
private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); While(tokenizer.hasMoreTokens()){ word.set(tokenizer.nextToken()); context.write(word, one); } }
public void reduce(Text key, Iterable <IntWritable> values, Context context) throws IOException, InterruptedException{ int sum = 0; for(IntWritable val : values){ sum += val.get(); } context.write(key, new IntWritable(sum)); }
Suppose we have a file with size about 200 MB, suppose content as follows
———–file.txt————
_______File(200 MB)____________
hi how are you
how is your job (64 MB) 1-Split
________________________________
——————————-
________________________________
how is your family
how is your brother (64 MB) 2-Split
________________________________
——————————-
________________________________
how is your sister
what is the time now (64 MB) 3-Split
________________________________
——————————-
_______________________________
what is the strength of hadoop (8 MB) 4-Split
________________________________
——————————-
In above file we have divided this file into 4 splits with sizes three splits with size 64 MB and last fourth split with size 8 MB.
Input File Formats:
—————————-
1. TextInputFormat
2. KeyValueTextInputFormat
3. SequenceFileInputFormat
4. SequenceFileAsTextInputFormat
——————————
Lets see in another following figure to understand the process of MAPREDUCE.
Strategy Design Patterns We can easily create a strategy design pattern using lambda. To implement…
Decorator Pattern A decorator pattern allows a user to add new functionality to an existing…
Delegating pattern In software engineering, the delegation pattern is an object-oriented design pattern that allows…
Technology has emerged a lot in the last decade, and now we have artificial intelligence;…
Managing a database is becoming increasingly complex now due to the vast amount of data…
Overview In this article, we will explore Spring Scheduler how we could use it by…