How to use index?¶
1. Create index¶
The first step for utilizing index is to create an index. You can create an index using SQL (Data Definition Language) or Tajo API (Tajo Client API). For example, the following SQL statement will create a BST index on the lineitem table.
default> create index l_orderkey_idx on lineitem (l_orderkey);
If the index is created successfully, you can see the index information as follows:
default> \d lineitem table name: default.lineitem table path: hdfs://localhost:7020/tpch/lineitem store type: TEXT number of rows: unknown volume: 753.9 MB Options: 'text.delimiter'='|' schema: l_orderkey INT8 l_partkey INT8 l_suppkey INT8 l_linenumber INT8 l_quantity FLOAT4 l_extendedprice FLOAT4 l_discount FLOAT4 l_tax FLOAT4 l_returnflag TEXT l_linestatus TEXT l_shipdate DATE l_commitdate DATE l_receiptdate DATE l_shipinstruct TEXT l_shipmode TEXT l_comment TEXT Indexes: "l_orderkey_idx" TWO_LEVEL_BIN_TREE (l_orderkey ASC NULLS LAST )
For more information about index creation, please refer to the above links.
2. Enable/disable index scans¶
Reading data using index is disabled by default. So, exploiting the created index, you need a further step, enabling ‘index scan’ as following:
default> \set INDEX_ENABLED true
If you don’t want to use index scan anymore, you can simply disable it as follows:
default> \set INDEX_ENABLED false
Once index scan is enabled, Tajo will perform ‘index scan’ if possible. In some cases, it may cause performance degradation. If you always want to get better performance, you should either enable or disable ‘index scan’ according to selectivity. Usually, the performance gain of index will increase when the selectivity is low.