How to run Hive HQL files in Spark – Hive on Spark

By | October 25, 2017

How to run Hive HQL files in Spark

Hive on Spark

 

In this tutorial I will explain how to execute .hql or .sql file. I am using spark-2.2.0-bin-hadoop2.7 standalone version.

You can download spark from here – Spark Download

 

Write sample .hql file, in my case i have following example file.

tushar@home:~/spark-2.2.0-bin-hadoop2.7$ cat exhiveql.hql

show databases;
use default;
show tables;
create database mydemo1;
create database mydemo2;
show databases;

To run .hql file in Spark Engine we need to use SparkSQL utility or shell.

tushar@home:bin/spark-sql –master local[*] –conf spark.ui.port=12345 -i /home/tushar/spark-2.2.0-bin-hadoop2.7/exhiveql.hql

Once you run above command you will see following output on console.


log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

17/10/25 10:33:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17/10/25 10:33:15 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore

17/10/25 10:33:15 INFO ObjectStore: ObjectStore, initialize called

17/10/25 10:33:15 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored

17/10/25 10:33:15 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored

17/10/25 10:33:21 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"

17/10/25 10:33:22 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

17/10/25 10:33:22 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

17/10/25 10:33:26 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

17/10/25 10:33:26 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

17/10/25 10:33:28 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY

17/10/25 10:33:28 INFO ObjectStore: Initialized ObjectStore

17/10/25 10:33:28 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0

17/10/25 10:33:28 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException

17/10/25 10:33:28 INFO HiveMetaStore: Added admin role in metastore

17/10/25 10:33:28 INFO HiveMetaStore: Added public role in metastore

17/10/25 10:33:29 INFO HiveMetaStore: No user is added in admin role, since config is empty

17/10/25 10:33:29 INFO HiveMetaStore: 0: get_all_databases

17/10/25 10:33:29 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_all_databases

17/10/25 10:33:29 INFO HiveMetaStore: 0: get_functions: db=default pat=*

17/10/25 10:33:29 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_functions: db=default pat=*

17/10/25 10:33:29 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.

17/10/25 10:33:30 INFO SessionState: Created local directory: /tmp/3774bc10-edda-4f21-b7f9-d9a5791d734a_resources

17/10/25 10:33:30 INFO SessionState: Created HDFS directory: /tmp/hive/zensar/3774bc10-edda-4f21-b7f9-d9a5791d734a

17/10/25 10:33:30 INFO SessionState: Created local directory: /tmp/zensar/3774bc10-edda-4f21-b7f9-d9a5791d734a

17/10/25 10:33:30 INFO SessionState: Created HDFS directory: /tmp/hive/zensar/3774bc10-edda-4f21-b7f9-d9a5791d734a/_tmp_space.db

17/10/25 10:33:30 WARN Utils: Your hostname, BIDW41N0029 resolves to a loopback address: 127.0.0.1; using 10.42.150.38 instead (on interface enp2s0)

17/10/25 10:33:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

17/10/25 10:33:30 INFO SparkContext: Running Spark version 2.2.0

17/10/25 10:33:30 INFO SparkContext: Submitted application: SparkSQL::10.42.150.38

17/10/25 10:33:30 INFO SecurityManager: Changing view acls to: zensar

17/10/25 10:33:30 INFO SecurityManager: Changing modify acls to: zensar

17/10/25 10:33:30 INFO SecurityManager: Changing view acls groups to:

17/10/25 10:33:30 INFO SecurityManager: Changing modify acls groups to:

17/10/25 10:33:30 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zensar); groups with view permissions: Set(); users with modify permissions: Set(zensar); groups with modify permissions: Set()

17/10/25 10:33:30 INFO Utils: Successfully started service 'sparkDriver' on port 44975.

17/10/25 10:33:30 INFO SparkEnv: Registering MapOutputTracker

17/10/25 10:33:30 INFO SparkEnv: Registering BlockManagerMaster

17/10/25 10:33:30 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information

17/10/25 10:33:30 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up

17/10/25 10:33:30 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-3d7abda6-8dbe-49cd-992f-cb98304d735d

17/10/25 10:33:30 INFO MemoryStore: MemoryStore started with capacity 366.3 MB

17/10/25 10:33:30 INFO SparkEnv: Registering OutputCommitCoordinator

17/10/25 10:33:30 INFO Utils: Successfully started service 'SparkUI' on port 12345.

17/10/25 10:33:30 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.42.150.38:12345

17/10/25 10:33:30 INFO Executor: Starting executor ID driver on host localhost

17/10/25 10:33:30 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33788.

17/10/25 10:33:30 INFO NettyBlockTransferService: Server created on 10.42.150.38:33788

17/10/25 10:33:30 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy

17/10/25 10:33:30 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.42.150.38, 33788, None)

17/10/25 10:33:30 INFO BlockManagerMasterEndpoint: Registering block manager 10.42.150.38:33788 with 366.3 MB RAM, BlockManagerId(driver, 10.42.150.38, 33788, None)

17/10/25 10:33:30 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.42.150.38, 33788, None)

17/10/25 10:33:30 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.42.150.38, 33788, None)

17/10/25 10:33:30 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/').

17/10/25 10:33:30 INFO SharedState: Warehouse path is 'file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/'.

17/10/25 10:33:31 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.

17/10/25 10:33:31 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore

17/10/25 10:33:31 INFO ObjectStore: ObjectStore, initialize called

17/10/25 10:33:31 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored

17/10/25 10:33:31 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored

17/10/25 10:33:32 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"

17/10/25 10:33:33 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

17/10/25 10:33:33 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

17/10/25 10:33:33 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.

17/10/25 10:33:33 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.

17/10/25 10:33:33 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing

17/10/25 10:33:33 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY

17/10/25 10:33:33 INFO ObjectStore: Initialized ObjectStore

17/10/25 10:33:33 INFO HiveMetaStore: Added admin role in metastore

17/10/25 10:33:33 INFO HiveMetaStore: Added public role in metastore

17/10/25 10:33:33 INFO HiveMetaStore: No user is added in admin role, since config is empty

17/10/25 10:33:33 INFO HiveMetaStore: 0: get_all_databases

17/10/25 10:33:33 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_all_databases

17/10/25 10:33:33 INFO HiveMetaStore: 0: get_functions: db=default pat=*

17/10/25 10:33:33 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_functions: db=default pat=*

17/10/25 10:33:33 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.

17/10/25 10:33:33 INFO HiveMetaStore: 0: get_functions: db=mydb1 pat=*

17/10/25 10:33:33 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_functions: db=mydb1 pat=*

17/10/25 10:33:33 INFO HiveMetaStore: 0: get_functions: db=mydb2 pat=*

17/10/25 10:33:33 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_functions: db=mydb2 pat=*

17/10/25 10:33:33 INFO SessionState: Created local directory: /tmp/6d4555e8-ef44-4a54-99a8-e6b1c4c1eb3e_resources

17/10/25 10:33:33 INFO SessionState: Created HDFS directory: /tmp/hive/zensar/6d4555e8-ef44-4a54-99a8-e6b1c4c1eb3e

17/10/25 10:33:33 INFO SessionState: Created local directory: /tmp/zensar/6d4555e8-ef44-4a54-99a8-e6b1c4c1eb3e

17/10/25 10:33:33 INFO SessionState: Created HDFS directory: /tmp/hive/zensar/6d4555e8-ef44-4a54-99a8-e6b1c4c1eb3e/_tmp_space.db

17/10/25 10:33:33 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/

17/10/25 10:33:33 INFO HiveMetaStore: 0: get_database: default

17/10/25 10:33:33 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_database: default

17/10/25 10:33:33 INFO HiveMetaStore: 0: get_database: global_temp

17/10/25 10:33:33 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_database: global_temp

17/10/25 10:33:33 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException

17/10/25 10:33:33 INFO SessionState: Created local directory: /tmp/5d1ea51b-7187-4ea2-828f-27498bc6b4e7_resources

17/10/25 10:33:33 INFO SessionState: Created HDFS directory: /tmp/hive/zensar/5d1ea51b-7187-4ea2-828f-27498bc6b4e7

17/10/25 10:33:33 INFO SessionState: Created local directory: /tmp/zensar/5d1ea51b-7187-4ea2-828f-27498bc6b4e7

17/10/25 10:33:33 INFO SessionState: Created HDFS directory: /tmp/hive/zensar/5d1ea51b-7187-4ea2-828f-27498bc6b4e7/_tmp_space.db

17/10/25 10:33:33 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/

17/10/25 10:33:34 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint

17/10/25 10:33:34 INFO SessionState: Created local directory: /tmp/9ee3267c-5d05-4752-93cb-1f808ea2e774_resources

17/10/25 10:33:34 INFO SessionState: Created HDFS directory: /tmp/hive/zensar/9ee3267c-5d05-4752-93cb-1f808ea2e774

17/10/25 10:33:34 INFO SessionState: Created local directory: /tmp/zensar/9ee3267c-5d05-4752-93cb-1f808ea2e774

17/10/25 10:33:34 INFO SessionState: Created HDFS directory: /tmp/hive/zensar/9ee3267c-5d05-4752-93cb-1f808ea2e774/_tmp_space.db

17/10/25 10:33:34 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/

17/10/25 10:33:34 INFO SparkSqlParser: Parsing command: show databases

17/10/25 10:33:35 INFO HiveMetaStore: 0: get_databases: *

17/10/25 10:33:35 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_databases: *

17/10/25 10:33:35 INFO CodeGenerator: Code generated in 147.993025 ms

default

mydb1

mydb2

17/10/25 10:33:35 INFO CliDriver: Time taken: 1.236 seconds, Fetched 3 row(s)

17/10/25 10:33:35 INFO SparkSqlParser: Parsing command:

use default

17/10/25 10:33:35 INFO HiveMetaStore: 0: get_database: default

17/10/25 10:33:35 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_database: default

17/10/25 10:33:35 INFO CliDriver: Time taken: 0.008 seconds

17/10/25 10:33:35 INFO SparkSqlParser: Parsing command:

show tables

17/10/25 10:33:35 INFO HiveMetaStore: 0: get_database: default

17/10/25 10:33:35 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_database: default

17/10/25 10:33:35 INFO HiveMetaStore: 0: get_database: default

17/10/25 10:33:35 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_database: default

17/10/25 10:33:35 INFO HiveMetaStore: 0: get_tables: db=default pat=*

17/10/25 10:33:35 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_tables: db=default pat=*

17/10/25 10:33:35 INFO CliDriver: Time taken: 0.036 seconds

17/10/25 10:33:35 INFO SparkSqlParser: Parsing command:

create database mydemo1

17/10/25 10:33:35 INFO HiveMetaStore: 0: create_database: Database(name:mydemo1, description:, locationUri:file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/mydemo1.db, parameters:{})

17/10/25 10:33:35 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=create_database: Database(name:mydemo1, description:, locationUri:file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/mydemo1.db, parameters:{})

17/10/25 10:33:35 WARN ObjectStore: Failed to get database mydemo1, returning NoSuchObjectException

17/10/25 10:33:35 INFO FileUtils: Creating directory if it doesn't exist: file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/mydemo1.db

17/10/25 10:33:35 INFO CliDriver: Time taken: 0.168 seconds

17/10/25 10:33:35 INFO SparkSqlParser: Parsing command:

create database mydemo2

17/10/25 10:33:35 INFO HiveMetaStore: 0: create_database: Database(name:mydemo2, description:, locationUri:file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/mydemo2.db, parameters:{})

17/10/25 10:33:35 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=create_database: Database(name:mydemo2, description:, locationUri:file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/mydemo2.db, parameters:{})

17/10/25 10:33:35 WARN ObjectStore: Failed to get database mydemo2, returning NoSuchObjectException

17/10/25 10:33:35 INFO FileUtils: Creating directory if it doesn't exist: file:/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/spark-warehouse/mydemo2.db

17/10/25 10:33:35 INFO CliDriver: Time taken: 0.036 seconds

17/10/25 10:33:35 INFO SparkSqlParser: Parsing command:

show databases

17/10/25 10:33:35 INFO HiveMetaStore: 0: get_databases: *

17/10/25 10:33:35 INFO audit: ugi=zensar ip=unknown-ip-addr cmd=get_databases: *

default

mydb1

mydb2

mydemo1

mydemo2

17/10/25 10:33:35 INFO CliDriver: Time taken: 0.007 seconds, Fetched 5 row(s)

spark-sql>




We can see new databases has been created! If you want to run it on yarn cluster refer below command

$SPARK_HOME/bin/spark-sql –master yarn –conf spark.ui.port=12345 –executor-memory 8g –driver-memory 6g –num-executors 12 -i /home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/myhive.hql

Make sure to add hive-warehouse configuration and other properties in spark-conf, for more information please refer – https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark

Additionally, we can also call .hql files inside python or Scala code.

Apache Spark Scala

var contextSQL = sqlContext.sql(scala.io.Source.fromFile(“/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/myhive.hql”).getLines

Apache Spark Python

contextSQL = sqlContext.sql(open(“/home/zensar/Downloads/spark-2.2.0-bin-hadoop2.7/myhive.hql”).read()).count()

Note – Open is a regular python function its not a Spark API.

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *