hadoop interview questions and answers – Hadoop is not designed for ?
Low latency access to data
Hadoop HDFS is mainly designed for storing large amount of data divided into blocks, the default block size is 64MB and reading entire 64MB within milliseconds is not possible using mapreduce framework so the low latency access to data within hadoop is not good idea, to overcome this issue on top of hadoop another most popular tool is used that is HBase, you can go for Hbase if you need low latency data access within hadoop framework
Storing many small files
If you know working mechanism of namenode, it stores the metadata information into memory, as per definitive guide book file, directory or block takes 150 bytes so now consider if we have billions of files.
You can imagine how big memory(RAM) will be required to manage all this file information (150 bytes multiply by billions of files – NOT GOOD)
It’s very hard to manage detail about billions of files into memory which can be lost during any failure or system crash.
Append on file
The thumb rule is hadoop supports single write and multiple read on a particular file if you are trying to append into existing file let’s say during wordcount example
1st time run –
hadoop jar mywordcount.jar /usr/input/file.txt /usr/output/out.txt - It will success
2nd time run –
hadoop jar mywordcount.jar /usr/input/file.txt /usr/output/out.txt - It will fail
this time you will get an error saying that output path is already exist because hadoop doesn’t support append into existing file(out.txt)
Appending into file may be supported by enabling hadoop property – hdfs-site.xml
<property> <name>dfs.support.append</name> <value>true</value> <description>Allow appends to files. </description> </property>
by using this property you can append into existing file but still there is some problems about this append functionality in hadoop, you may need to chose appropriate hadoop distribution because not all distribution supports append functionality, may be in future release we can expect hadoop will fully support append.
For Cloudera Hadoop – Append is supported from Cloudera distribution 3 (CDH3)
Share this knowledge ! Join us on Facebook ! Now Whatsapp sharing is supportable ! TooDey Inc.