I hope you all enjoyed tutorial Part 1,
Now lets go further and discuss three most important parts in Cloudera step by step.
Below is a sample diagram of this three most important component.
In this tutorial i will give introduction about CDH , Manager and Navigator so lets started and enjoy the tutorial.
- This is a data Management tool with cloudera commercial support
- This tool can be helpful for data analyst to gain insights from data and for proper structuring of any data type
- Navigator enables data management and auditing it can help data managers to work on data governance
- This manager utility comes under commercial support where you can get end to end support like configuration, deployment, IT Enterprise level support etc
- Cloudera manager provide very nice GUI to handle admin work to monitor and manage your entire Hadoop cluster
- Cloudera manager APIs is available in their official site, you can call simple api to get information about cloudera hadoop cluster like disk check and health of cluster
- By using manager you can manage centralize configuration repository for your all datanodes (Hosts)
Cloudera Distribution Hadoop
- Cloudera hadoop includes many open source components which comes under hadoop ecosystem like Hive, Pig etc.
- Two most important components which comes under CDH is Cloudera’s Impala and Cloudera Search
- Let’s understand the overview of this two components
- Cloudera impala is faster than Hive, it’s true that impala refers Hive metastore information
- Impala is parallel processing SQL engine for data analytics
- Impala support select, joins, aggregation, subqueries and many features like SQL 92
- Impala gives API support where you can write your custom connector to connect Impala/Hive metastore
- While considering security feature about impala you can manage it by using Apache sentry tool which is cloudera’s authorization framework or tool
- This is complete text based search facility with solr support, you can take commercial support as well for cloudera search with apache solr
- Cloudera search internally manages indexing for faster searching capability
- This search engine is purely text based so similar criteria of text will be matched and displayed of search GUI this is flexible and reliable solution
We will discuss further more on Cloudera,
In Part 3 Tutorial