Apache Tajo™ catalog supports HiveCatalogStore to integrate with Apache Hive™. This integration allows Tajo to access all tables used in Apache Hive. Depending on your purpose, you can execute either SQL queries or HiveQL queries on the same tables managed in Apache Hive.
In order to use this feature, you need to build Tajo with a specified maven profile
and then add some configs into
This section describes how to setup HiveMetaStore integration.
This instruction would take no more than five minutes.
You need to set your Hive home directory to the environment variable
HIVE_HOME in conf/tajo-env.sh as follows:
If you need to use jdbc to connect HiveMetaStore, you have to prepare MySQL jdbc driver. Next, you should set the path of MySQL JDBC driver jar file to the environment variable HIVE_JDBC_DRIVER_DIR in conf/tajo-env.sh as follows:
Finally, you should specify HiveCatalogStore as Tajo catalog driver class in
conf/catalog-site.xml as follows:
<property> <name>tajo.catalog.store.class</name> <value>org.apache.tajo.catalog.store.HiveCatalogStore</value> </property>
Hive stores a list of partitions for each table in its metastore. If new partitions are
directly added to HDFS, HiveMetastore will not able aware of these partitions unless the user
ALTER TABLE table_name ADD PARTITION commands on each of the newly added partitions or
MSCK REPAIR TABLE table_name command.
But current tajo doesn’t provide
ADD PARTITION command and hive doesn’t provide an api for
MSK REPAIR TABLE command. Thus, if you insert data to hive partitioned
table and you want to scan the updated partitions through Tajo, you must run following command on hive
$ MSCK REPAIR TABLE [table_name];