Partition internal or managed tables in hive

By | April 17, 2017

How to partition managed or internal hive table?

Let’s say we need to partition the data country wise.

Hive> CREATE TABLE countryDetails ( cid INT, countryName STRING PARTITIONED BY (countryName STRING)

So in this case the country wise directory partitioned will be get created, so for US and EU there will be two separate directories will get created.

How to retrieve or read data from hive partition table?

Hive> SELECT * FROM employee WHERE countryName = ‘US’;

What is partition filter in hive?

If we add any predicates to WHERE clause that filters on PARTITION values, this predicates are called partition filter

Example – let’s say we have a country column partition and we need to also filter out the state inside that country.

How to display or show partitions in hive?

Hive> SHOW PARTITIONS employee;

How to display specific value partition in hive?

Hive> SHOW PARTITIONS employee PARTITION(country=‘US’);

Creating partition in hive while loading the data into hive table!

Hive> LOAD DATA INPATH ‘${env:HOME}/employee’ INTO TABLE employee PARTITION (countryName = ‘EU’)

The directory structure for this partition would be …./employee/countryName=EU

Hive external table partition

CREATE EXTERNAL TABLE IF NOT EXISTS employee (name STRING, countryName STRINGPARTITIONED BY (countryName STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;

It’s the same syntax like manage table partition, now how to alter partition and location of partition table in hive

ALTER TABLE employee ADD PARTITION(countryName = ‘CH’) LOCATION ‘hdfs”//ip:port/my/hive/partition/employee/CH’

How to update the location of hive partition table, let’s say we need to store this partition data into amazon s3 location.

ALTER TABLE employee PARTITION(countryName= ‘CH’) SET LOCATION ‘s3n://mybuck/employee/CH’

Now, let’s verify the hive partition location

DESCRIBE EXTENDED employee PARTITION (countryName= ‘CH’);




Tips of the day:

In hive record encoding is handle by an input format object.

Java class – org.apache.hadoop.mapred.TextInputFormat

In hive record parsing is handle by a SerDe (Serializer/Deserializer)

Java class – org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

In hive to write the output of queries to files hive uses output format class

Java class – org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

RECOMMENDED BOOK: PROGRAMMING HIVE

Leave a Reply

Your email address will not be published. Required fields are marked *