By | July 30, 2015

Hadoop Interview Questions and Answers – What is combiner in mapreduce framework ?

In Map Reducer Framework Part 1, we have discussed about mapper process in depth

In Map Reducer Framework Part 2, we have discussed about reducer process in depth

In this article i will explain you about combiner process in depth so let’s get started

Combiner

  • Combiners are called mini reducer means shuffle the mapper output that is intermediate key value data before sending it to reducer, for every mapper there will be one combiner.
  • The advantages of combiner is, it minimizes the time taken for data transfer between mapper and reducer, combiner output is input to a reducer.
  • Because of combiner reducer executes less amount of time which increases reducer performance, so let’s see how it works with diagram.

Consider the scenario where you are not using any combiner class, see below diagram

From the above diagram there is no combiner and if you see input file, it splits into two mapper and let’s say total 15 keys are generated see diagram carefully

Now we have (15 key/value) intermediate data further mapper will send this data to reducer directly and while sending data to reducer it will consume some network bandwidth (bandwidth means simply –  the time taken to transfer data between one machine to another)



In production reducer might be running on different machine so mapper will send all its intermediate data via network if the data size is big it will take more time to transfer data to reducer so if you see the diagram reducer need to process total 15 key/value pair so reducer will get executed 15 times.

Now if we use combiner in between mapper and reducer process then combiner will shuffle intermediate data(15 key/values) for reducer before sending it to reducer, next combiner will generate total 5 key/value data as a output please see next diagram

 

In this diagram reducer only need to process 5 key/value pair data which is coming from two combiners so this time reducer will get executed only 5 times to produce final output which increases the performance of reducer, to get an clear idea about combiner please refer this diagrams carefully.

Syntax for combiner class

job.setCombinerClass(MyCombiner.class);

At the end reducer output will be written to hdfs for reliability, if you have any doubts in this article please post your queries at the end in comment box.

 

Share this knowledge ! Join us on Facebook ! Now whatsapp sharing is supportable ! Toodey Inc.

Leave a Reply

Your email address will not be published. Required fields are marked *