Session VariablesΒΆ

Once a tajo client connects to the Tajo master, it assigns a unique session. This session is kept until the client is disconnected or it is expired.

For the sake of more convenient user configuration, Tajo provides session variables. With session variables, different configurations are allowed for each session.

tsql provides the meta command \set to manipulate session variables. Just \set command shows all session variables.

default> \set
'name1'='val1'
'name2'='val2'
'name3'='val3'
     ...

\set key val will set the session variable named key with the value val.

default> \set
'CURRENT_DATABASE'='default'

default> \set key1 val1

default> \set
'CURRENT_DATABASE'='default'
'key1'='val1'

Also, \unset key will unset the session variable named key.

Currently, tajo provides the following session variables.

BROADCAST_NON_CROSS_JOIN_THRESHOLD

A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won’t exceed this threshold.

\set BROADCAST_NON_CROSS_JOIN_THRESHOLD 5120
BROADCAST_CROSS_JOIN_THRESHOLD

A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won’t exceed this threshold.

\set BROADCAST_CROSS_JOIN_THRESHOLD 1024

Warning

In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully.

JOIN_TASK_INPUT_SIZE

The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. As a result, it determines the degree of the parallel processing of the join query.

\set JOIN_TASK_INPUT_SIZE 64
JOIN_PER_SHUFFLE_SIZE

The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.

\set JOIN_PER_SHUFFLE_SIZE 128
HASH_JOIN_SIZE_LIMIT

This value provides the criterion to decide the algorithm to perform a join in a task. If the input data is smaller than this value, join is performed with the in-memory hash join. Otherwise, the sort-merge join is used.

\set HASH_JOIN_SIZE_LIMIT 64

Warning

This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully.

INNER_HASH_JOIN_SIZE_LIMIT

This value provides the criterion to decide the algorithm to perform an inner join in a task. If the input data is smaller than this value, the inner join is performed with the in-memory hash join. Otherwise, the sort-merge join is used.

\set INNER_HASH_JOIN_SIZE_LIMIT 64

Warning

This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully.

OUTER_HASH_JOIN_SIZE_LIMIT

This value provides the criterion to decide the algorithm to perform an outer join in a task. If the input data is smaller than this value, the outer join is performed with the in-memory hash join. Otherwise, the sort-merge join is used.

\set OUTER_HASH_JOIN_SIZE_LIMIT 64

Warning

This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully.

JOIN_HASH_TABLE_SIZE

The initial size of hash table for in-memory hash join.

\set JOIN_HASH_TABLE_SIZE 100000
SORT_TASK_INPUT_SIZE

The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. As a result, it determines the degree of the parallel processing of the sort query.

\set SORT_TASK_INPUT_SIZE 64
EXTSORT_BUFFER_SIZE

A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used.

\set EXTSORT_BUFFER_SIZE 200
SORT_LIST_SIZE

The initial size of list for in-memory sort.

\set SORT_LIST_SIZE 100000
GROUPBY_MULTI_LEVEL_ENABLED

A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. Otherwise, 2-phase aggregation algorithm is used.

\set GROUPBY_MULTI_LEVEL_ENABLED true
GROUPBY_PER_SHUFFLE_SIZE

The aggregation is executed in two stages. When an aggregation query is executed, this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages.

\set GROUPBY_PER_SHUFFLE_SIZE 256
GROUPBY_TASK_INPUT_SIZE

The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. As a result, it determines the degree of the parallel processing of the aggregation query.

\set GROUPBY_TASK_INPUT_SIZE 64
HASH_GROUPBY_SIZE_LIMIT

This value provides the criterion to decide the algorithm to perform an aggregation in a task. If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. Otherwise, the sort-based aggregation is used.

\set HASH_GROUPBY_SIZE_LIMIT 64

Warning

This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully.

AGG_HASH_TABLE_SIZE

The initial size of hash table for in-memory aggregation.

\set AGG_HASH_TABLE_SIZE 10000
TIMEZONE

Refer to Time Zone.

  • Configuration name: tajo.timezone
  • Property value: Time zone id
  • Default value: Default time zone of JVM
  • Example
\set TIMEZONE GMT+9
DATE_ORDER

Date order specification.

\set DATE_ORDER YMD
PARTITION_NO_RESULT_OVERWRITE_ENABLED

If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result.

\set PARTITION_NO_RESULT_OVERWRITE_ENABLED false
TABLE_PARTITION_PER_SHUFFLE_SIZE

In Tajo, storing a partition table is executed in two stages. This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages.

\set TABLE_PARTITION_PER_SHUFFLE_SIZE 256
ARITHABORT

A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero.

\set ARITHABORT false
MAX_OUTPUT_FILE_SIZE

Maximum per-output file size. 0 means infinite.

  • Property value: Integer
  • Unit: MB
  • Default value: 0
  • Example
\set MAX_OUTPUT_FILE_SIZE 0
SESSION_EXPIRY_TIME

Session expiry time.

  • Property value: Integer
  • Unit: seconds
  • Default value: 3600
  • Example
\set SESSION_EXPIRY_TIME 3600
CLI_COLUMNS

Sets the width for the wrapped format.

  • Property value: Integer
  • Default value: 120
  • Example
\set CLI_COLUMNS 120
CLI_NULL_CHAR

Sets the string to be printed in place of a null value.

  • Property value: String
  • Default value: ‘’
  • Example
\set CLI_NULL_CHAR ''
CLI_PAGE_ROWS

Sets the number of rows for paging.

  • Property value: Integer
  • Default value: 100
  • Example
\set CLI_PAGE_ROWS 100
CLI_PAGING_ENABLED

Enable paging of result display.

  • Property value: Boolean
  • Default value: true
  • Example
\set CLI_PAGING_ENABLED true
CLI_DISPLAY_ERROR_TRACE

Enable display of error trace.

  • Property value: Boolean
  • Default value: true
  • Example
\set CLI_DISPLAY_ERROR_TRACE true
CLI_FORMATTER_CLASS

Sets the output format class to display results.

  • Property value: Class name
  • Default value: org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
  • Example
\set CLI_FORMATTER_CLASS org.apache.tajo.cli.tsql.DefaultTajoCliOutputFormatter
ON_ERROR_STOP

tsql will exit if an error occurs.

  • Property value: Boolean
  • Default value: false
  • Example
\set ON_ERROR_STOP false
NULL_CHAR

Null char of text file output. This value is used when the table property text.null is not specified.

  • Property value: String
  • Default value: ‘\N’
  • Example
\set NULL_CHAR '\\N'
DEBUG_ENABLED

A flag to enable debug mode.

  • Property value: Boolean
  • Default value: false
  • Example
\set DEBUG_ENABLED false
FETCH_ROWNUM

The number of rows to be fetched from Master each time.

  • Property value: Integer
  • Default value: 200
  • Example
\set FETCH_ROWNUM 200