By | July 23, 2015

Hive interview questions and answers

  • As you already know hive is SQL on Hadoop which means there must something really needed in Hive to manage all its internal table and schema, partition information in structured format, it’s not the only thing hive stores the data in Hadoop in structured format but it is also responsibility of Hive to handle table metadata management.
  • Let’s understand the secrets behind the Hive metadata, i am assuming that Hadoop is up and running and you have just installed Hive and want to configure its meta-store or want to change current meta-store address from other db to MySQL.
  • The default metadata store in hive is in Derby, it’s a database, you can change configuration to save all your hive metadata information into JDBC supportable databases.
  • Most popular databases for storing metadata is MySQL and PostgreSQL in this article we will discuss on MySQL meta-store for hive.
  • The purpose of storing this metadata information into relational databases because if we store this information at file level then performance of hive will be down, logically file reading will take more time as compared to relational storage that is why the purpose came to store this information in separate database.
  • To configure MySQL for hive meta-store you must know the jdbc url and parameter that is hostname, database name, port name and etc.

For configuration you can refer to hive-site.xml file which is hive configuration file normally it’s present in /etc/hive/conf, you can refer following properties for MySQL.


Important note

  • You need to download MySQL-driver.jar from MySQL official site and place this jar into class-path hive/lib or you can export it through command line.
  • Now your hive is ready to store all metadata information into MySQL database, you can connect to mysql console to look into database schema and how schema and table information get store into db.

There are different meta-store modes are available for storing Hive metadata :

local mode

  • Database service runs using different process(may be on separate host) and driver and meta-store services runs under same jvm instance

 Remote Mode

  • Hive meta-store runs under its own JVM process in remote mode normally experts recommend this mode
  • There can be multiple individual processes which can communicate with hive meta-store using thrift API we need to configure this property in hive-site.xml
  • Metadata info service then communicate with database using available JDBC driver, the advantage to keep Hive process and database provides better availability, JDBC login information is static which can be same for all hive users.

 Embedded Mode

  • In Embedded meta-store Derby is a default metadata storage available for hive if you have not set any alternate relation database like MySQL or PostgreSQL
  • Experts not recommend this for production use because it support only one active user login to database.

Leave a Reply

Your email address will not be published. Required fields are marked *