What is secondary namenode
- Secondary namenode is most confusing word for hadoop beginner, people normally think that secondary namenode is a replacement for namenode when namenode get failed but the truth is “it’s not”, The working of secondary namenode is different and it’s not a replacement for namenode.
- Namenode is a single point of failure service means if namenode failed your full hadoop cluster will be unavailable, some how you need to backup your namenode data for high availability of namenode, to backup namenode metadata information, hadoop provided two ways that is with NFS remote mount and using secondary namenode.
- It’s true that secondary namenode is helper node for namenode so let’s understand how it’s helping namenode and before that lets understand some namenode internal concept.
Namenode stores the metadata information about HDFS like block location, block size, namespace information and more, during startup of namenode this information is stored in main memory of namenode further for long term storage this information is also stored in disk
Let’s understand this diagrammatically.
What is fsimage ?
When namenode started It reads the initial file system information that is filesystem snapshot from fsimage, the initial information is nothing but block information or namespace information.
What is edit logs ?
After starting namenode the fsimage information will be available in main memory of namenode later if any changes happened like file or block get deleted or updated this information is stored into edit logs.
Now, lets consider the scenario where lots of changes that is deletion, updation is happening after namenode started then the edit log size will grow further it will be harder to handle it, only after namenode restart this edit log information is get merged with the fsimage information and because of size, merging will take long time, in between if any crash happen then we will lose all namenode latest changes.
That’s why, to handle this issue we need secondary namenode, which will help namenode for merging the edit log information with fsimage which indirectly reducing the edit log size.
- Secondary namenode can be called as checkpoint node or helper node, it runs on separate machine.
- There is some time interval where secondary namenode reads edit log information and merge it will fsimage which creates new update fsimage this is again copied back to namenode.
- Now after specific interval we have updated fsimage on namenode disk which will applied in next restart of namenode means it will again read initial fsimage copy from disk which is nothing but updated one.