By | August 3, 2015

hadoop interview questions and answers – Hadoop is not designed for ?

Low latency access to data

Hadoop HDFS is mainly designed for storing large amount of data divided into blocks, the default block size is 64MB and reading entire 64MB within milliseconds is not possible using mapreduce framework so the low latency access to data within hadoop is not good idea, to overcome this issue on top of hadoop another most popular tool is used that is HBase, you can go for Hbase if you need low latency data access within hadoop framework

Storing many small files

If you know working mechanism of namenode, it stores the metadata information into memory, as per definitive guide book file, directory or block takes 150 bytes so now consider if we have billions of files.
You can imagine how big memory(RAM) will be required to manage all this file information (150 bytes multiply by billions of files – NOT GOOD)
It’s very hard to manage detail about billions of files into memory which can be lost during any failure or system crash.

Append on file

The thumb rule is hadoop supports single write and multiple read on a particular file if you are trying to append into existing file let’s say during wordcount example

1st time run –

hadoop jar mywordcount.jar /usr/input/file.txt /usr/output/out.txt - It will success

2nd time run –

hadoop jar mywordcount.jar /usr/input/file.txt /usr/output/out.txt - It will fail

this time you will get an error saying that output path is already exist because hadoop doesn’t support append into existing file(out.txt)

Appending into file may be supported by enabling hadoop property – hdfs-site.xml

<property>

 <name>dfs.support.append</name>

 <value>true</value>

 <description>Allow appends to files.

 </description>

</property>

by using this property you can append into existing file but still there is some problems about this append functionality in hadoop, you may need to chose appropriate hadoop distribution because not all distribution supports append functionality, may be in future release we can expect hadoop will fully support append.

For Cloudera Hadoop – Append is supported from Cloudera distribution 3 (CDH3)

Share this knowledge ! Join us on Facebook ! Now Whatsapp sharing is supportable ! TooDey Inc.

One thought on “

Leave a Reply

Your email address will not be published. Required fields are marked *