Data Definition Language¶
CREATE DATABASE¶
Synopsis
CREATE DATABASE [IF NOT EXISTS] <database_name>
Description
Database is the namespace in Tajo. A database can contain multiple tables which have unique name in it. IF NOT EXISTS allows CREATE DATABASE statement to avoid an error which occurs when the database exists.
DROP DATABASE¶
Synopsis
DROP DATABASE [IF EXISTS] <database_name>
IF EXISTS allows DROP DATABASE statement to avoid an error which occurs when the database does not exist.
CREATE TABLE¶
Synopsis
CREATE TABLE [IF NOT EXISTS] <table_name> [(column_list)] [TABLESPACE tablespace_name]
[using <storage_type> [with (<key> = <value>, ...)]] [AS <select_statement>]
CREATE EXTERNAL TABLE [IF NOT EXISTS] <table_name> (column_list)
using <storage_type> [with (<key> = <value>, ...)] LOCATION '<path>'
Description
In Tajo, there are two types of tables, managed table and external table. Managed tables are placed on some predefined tablespaces. The TABLESPACE clause is to specify a tablespace for this table. For external tables, Tajo allows an arbitrary table location with the LOCATION clause. For more information about tables and tablespace, please refer to Overview of Tajo Tables and Tablespaces.
column_list is a sequence of the column name and its type like <column_name> <data_type>, .... Additionally, the asterisk (*) is allowed for external tables when their data format is JSON. You can find more details at JSON.
IF NOT EXISTS allows CREATE [EXTERNAL] TABLE statement to avoid an error which occurs when the table does not exist.
Compression¶
If you want to add an external table that contains compressed data, you should give ‘compression.code’ parameter to CREATE TABLE statement.
create EXTERNAL table lineitem (
L_ORDERKEY bigint,
L_PARTKEY bigint,
...
L_COMMENT text)
USING TEXT WITH ('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec')
LOCATION 'hdfs://localhost:9010/tajo/warehouse/lineitem_100_snappy';
- compression.codec parameter can have one of the following compression codecs:
- org.apache.hadoop.io.compress.BZip2Codec
- org.apache.hadoop.io.compress.DeflateCodec
- org.apache.hadoop.io.compress.GzipCodec
- org.apache.hadoop.io.compress.SnappyCodec
DROP TABLE¶
Synopsis
DROP TABLE [IF EXISTS] <table_name> [PURGE]
Description
IF EXISTS allows DROP DATABASE statement to avoid an error which occurs when the database does not exist. DROP TABLE statement removes a table from Tajo catalog, but it does not remove the contents. If PURGE option is given, DROP TABLE statement will eliminate the entry in the catalog as well as the contents.
CREATE INDEX¶
Synopsis
CREATE INDEX [ name ] ON table_name [ USING method ]
( { column_name | ( expression ) } [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WHERE predicate ]
Description
Tajo supports index for fast data retrieval. Currently, index is supported for only plain TEXT formats stored on HDFS. For more information, please refer to Index (Experimental Feature).
Index method¶
Currently, Tajo supports only one type of index.
- Index methods:
- TWO_LEVEL_BIN_TREE: This method is used by default in Tajo. For more information about its structure, please refer to Index Types.