Hadoop Interview Questions and Answers – What is combiner in mapreduce framework ?
In Map Reducer Framework Part 1, we have discussed about mapper process in depth
In Map Reducer Framework Part 2, we have discussed about reducer process in depth
In this article i will explain you about combiner process in depth so let’s get started
- Combiners are called mini reducer means shuffle the mapper output that is intermediate key value data before sending it to reducer, for every mapper there will be one combiner.
- The advantages of combiner is, it minimizes the time taken for data transfer between mapper and reducer, combiner output is input to a reducer.
- Because of combiner reducer executes less amount of time which increases reducer performance, so let’s see how it works with diagram.
Consider the scenario where you are not using any combiner class, see below diagram
From the above diagram there is no combiner and if you see input file, it splits into two mapper and let’s say total 15 keys are generated see diagram carefully
Now we have (15 key/value) intermediate data further mapper will send this data to reducer directly and while sending data to reducer it will consume some network bandwidth (bandwidth means simply – the time taken to transfer data between one machine to another)
In production reducer might be running on different machine so mapper will send all its intermediate data via network if the data size is big it will take more time to transfer data to reducer so if you see the diagram reducer need to process total 15 key/value pair so reducer will get executed 15 times.
Now if we use combiner in between mapper and reducer process then combiner will shuffle intermediate data(15 key/values) for reducer before sending it to reducer, next combiner will generate total 5 key/value data as a output please see next diagram
In this diagram reducer only need to process 5 key/value pair data which is coming from two combiners so this time reducer will get executed only 5 times to produce final output which increases the performance of reducer, to get an clear idea about combiner please refer this diagrams carefully.
Syntax for combiner class
At the end reducer output will be written to hdfs for reliability, if you have any doubts in this article please post your queries at the end in comment box.
Share this knowledge ! Join us on Facebook ! Now whatsapp sharing is supportable ! Toodey Inc.