Data Definition Language

CREATE DATABASE

Synopsis

CREATE DATABASE [IF NOT EXISTS] <database_name>

Description

Database is the namespace in Tajo. A database can contain multiple tables which have unique name in it. IF NOT EXISTS allows CREATE DATABASE statement to avoid an error which occurs when the database exists.

DROP DATABASE

Synopsis

DROP DATABASE [IF EXISTS] <database_name>

IF EXISTS allows DROP DATABASE statement to avoid an error which occurs when the database does not exist.

CREATE TABLE

Synopsis

CREATE TABLE [IF NOT EXISTS] <table_name> [(column_list)] [TABLESPACE tablespace_name]
[using <storage_type> [with (<key> = <value>, ...)]] [AS <select_statement>]

CREATE EXTERNAL TABLE [IF NOT EXISTS] <table_name> (column_list)
using <storage_type> [with (<key> = <value>, ...)] LOCATION '<path>'

Description

In Tajo, there are two types of tables, managed table and external table. Managed tables are placed on some predefined tablespaces. The TABLESPACE clause is to specify a tablespace for this table. For external tables, Tajo allows an arbitrary table location with the LOCATION clause. For more information about tables and tablespace, please refer to Overview of Tajo Tables and Tablespaces.

column_list is a sequence of the column name and its type like <column_name> <data_type>, .... Additionally, the asterisk (*) is allowed for external tables when their data format is JSON. You can find more details at JSON.

IF NOT EXISTS allows CREATE [EXTERNAL] TABLE statement to avoid an error which occurs when the table does not exist.

Compression

If you want to add an external table that contains compressed data, you should give ‘compression.code’ parameter to CREATE TABLE statement.

create EXTERNAL table lineitem (
L_ORDERKEY bigint,
L_PARTKEY bigint,
...
L_COMMENT text)

USING TEXT WITH ('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec')
LOCATION 'hdfs://localhost:9010/tajo/warehouse/lineitem_100_snappy';
compression.codec parameter can have one of the following compression codecs:
  • org.apache.hadoop.io.compress.BZip2Codec
  • org.apache.hadoop.io.compress.DeflateCodec
  • org.apache.hadoop.io.compress.GzipCodec
  • org.apache.hadoop.io.compress.SnappyCodec

DROP TABLE

Synopsis

DROP TABLE [IF EXISTS] <table_name> [PURGE]

Description

IF EXISTS allows DROP DATABASE statement to avoid an error which occurs when the database does not exist. DROP TABLE statement removes a table from Tajo catalog, but it does not remove the contents. If PURGE option is given, DROP TABLE statement will eliminate the entry in the catalog as well as the contents.

CREATE INDEX

Synopsis

CREATE INDEX [ name ] ON table_name [ USING method ]
( { column_name | ( expression ) } [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WHERE predicate ]

Description

Tajo supports index for fast data retrieval. Currently, index is supported for only plain TEXT formats stored on HDFS. For more information, please refer to Index (Experimental Feature).

Index method

Currently, Tajo supports only one type of index.

Index methods:
  • TWO_LEVEL_BIN_TREE: This method is used by default in Tajo. For more information about its structure, please refer to Index Types.

DROP INDEX

Synopsis

DROP INDEX name