hadoop interview questions and answers – Why mapper write their output to local disk ? not to HDFS ?
Let’s explain this diagrammatically
In diagram you can see two mappers generating some intermediate data as you know this data contains duplicate keys, see below diagram
It is clear that we are interested in more accurate data which will generated after shuffling and sorting phase so storing it in hadoop distributed file system with replication is not a good idea.
Mapper output is just a temporary data that only meaningful for reducer not for end user, it will occupy extra storage space so to avoid this issue mapper write their output to local disk.
Share this knowledge ! Join us on Facebook ! Now Whatsapp sharing is supportable ! Toodey Inc.