Category Archives: Hadoop

big data and hadoop tutorial

Hive JDBC metastore

Hive JDBC metastore In hive the default database for storing metadata is derby database but limitation to this is it only serves only one client request at a time, it will not handle multiple client requests. To solve this limitation hive also provides connectivity with JDBC driver that is hive can store all its metadata… Read More »

Why Apache hive is a schema on read not schema on write

Apache Hive Schema On Read. Why apache hive called schema on read not schema on write? In relation database systems, let’s say we perform any insert or update operations in this case database has full control over the storage and database can enforce the schema as data is written, this statement in relation databases called… Read More »

Create table in Hive using octal code

Create table in Hive using octal code Reference Example is take from Programming in Hive book Let’s create a table in hive CREATE TABLE employee ( name  STRING, salary FLOAT,  subordinates ARRAY<STRING> deductions MAP<STRING, FLOAT> address STRUCT<street:STRING, city:STRING, state:STRING,zip:INT>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\001’ COLLECTION ITEMS TERMINATED BY ‘\002’ MAP KEYS TERMINATED BY ‘\003’ LINES… Read More »

How to add multiple partition in hive

How to add multiple partition in hive The simple example to add partition would be ALTER TABLE employee ADD IF NOT EXISTS PARTITION (countryName= ‘US’) LOCATION ‘/employee/US’ PARTITION (countryName= ‘EU’) LOCATION ‘/employee/EU’ How to drop a partition in hive ALTER TABLE employee DROP IF EXISTS PARTITION(countryName= ‘US’) How to rename a column in hive ALTER… Read More »

Partition internal or managed tables in hive

How to partition managed or internal hive table? Let’s say we need to partition the data country wise. Hive> CREATE TABLE countryDetails ( cid INT, countryName STRING ) PARTITIONED BY (countryName STRING) So in this case the country wise directory partitioned will be get created, so for US and EU there will be two separate directories will get created.… Read More »

What is Manage tables and External tables in hive?

Manage tables in hive: Manage tables in hive also called internal tables, this table uses hive default warehouse directory to store the table data. (hive.metastore.warehouse.dir) One disadvantage is, hive doesn’t allow another tools to share the data that is hive has the ownership of data and that’s where the external tables are came into picture.… Read More »