Hortonworks Hadoop HDP 2.2 – Introduction Part 5

By | July 15, 2015

 

In Part 4,


We discussed about security category available in hortonworks (HDP) Hadoop, in this tutorial we will focus on Operations category.

Operations

The definition of operations can be express in many ways it can handle or manage entire cluster or platform,there are some terminologies which comes under operations like :

  • Jobs and its sub task
  • Manage services
  • Manage process
  • Monitoring or inspection
  • database(storage)

Here In HDP Hortonworks we will focus on Deployment strategy and management of tasks or Jobs. hdp 5

What is Apache ZooKeeper ?

 

  • Mainly used for cluster management, coordination and synchronization of hadoop services.
  • It also support for group services and centralized management of the entire cluster.
  • ZooKeeper also ensures that the tasks across the cluster are synchronized in serial manner.
  • Distributed applications stores the meta or configuration information at ZooKeeper.

 

In Hortonworks Apache ambari plays a very important role.

What is apache ambari ? shortway meaning !

 

  • Its a open source framework for monitoring, controlling and managing Hadoop cluster (Hortonworks HDP 2.2)
  • It provides the facility to install hadoop on n number of host from a central location
  • It easily manages hadoop cluster and services
  • Support for RESTful web API

Below is a sample Ambari Dashboard screen of Hortonworks cluster, i will discuss about each and every component you see in this screen in later post. For now just focus on how dashboard looks and how you can monitor and deploy various things using ambari GUI,on the right hand side you can notice list of services currently Hortonworks supports using ambari that is HDFS, YARN, MapReduce2 and more. hdp 5_2

Using this ambari dashboard you can monitor HDFS Disk Usage, Live DataNodes, Memory consumption, CPU utilization, cluster load, network usage and many things.

Workflow Scheduling

What is apache Oozie ?

 

  • Oozie enables perfect control over complex hadoop (Map Reduce) jobs
  • It also helps in job repetition to continuous scheduling of jobs example batch jobs.
  • You can schedule Hive or Pig scripts also, there is provision that you can mention scheduling duration.
  • Its a really nice scalable and reliable solution in hadoop ecosystem for job scheduling.

In Later tutorial we will discuss about fairscheduler and capacity scheduler how it helps in map reduce job scheduling and parallel processing for now lets understand the overview concept about it.

Fair Scheduler

 

  • The resources are equally shared that is It enables the jobs to assign an equal sharing of resources.
  • It works on priorities

 

Capacity Scheduler

 

  • Sharing a current cluster while it ensures the specific capacity is given to individual or each organization having access to the same cluster.
  • It has many advantages which helps customer to gain value from cost effective solution

[spacer height=”20px”]

Leave a Reply

Your email address will not be published. Required fields are marked *