********************** The tajo-site.xml File ********************** You can add more configurations in the ``tajo-site.xml`` file. Note that you should replicate this file to the whole hosts in your cluster once you edited. If you are looking for the configurations for the master and the worker, please refer to :doc:`tajo_master_configuration` and :doc:`worker_configuration`. Also, catalog configurations are found here :doc:`catalog_configuration`. ========================= Join Query Settings ========================= """""""""""""""""""""""""""""""""""""" `tajo.dist-query.join.auto-broadcast` """""""""""""""""""""""""""""""""""""" A flag to enable or disable the use of broadcast join. * Property value type: Boolean * Default value: true * Example .. code-block:: xml tajo.dist-query.join.auto-broadcast true .. _tajo.dist-query.broadcast.non-cross-join.threshold-kb: """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" `tajo.dist-query.broadcast.non-cross-join.threshold-kb` """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" A threshold for non-cross joins. When a non-cross join query is executed with the broadcast join, the whole size of broadcasted tables won't exceed this threshold. * Property value type: Integer * Unit: KB * Default value: 5120 * Example .. code-block:: xml tajo.dist-query.broadcast.non-cross-join.threshold-kb 5120 .. _tajo.dist-query.broadcast.cross-join.threshold-kb: """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" `tajo.dist-query.broadcast.cross-join.threshold-kb` """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" A threshold for cross joins. When a cross join query is executed, the whole size of broadcasted tables won't exceed this threshold. * Property value type: Integer * Unit: KB * Default value: 1024 * Example .. code-block:: xml tajo.dist-query.broadcast.cross-join.threshold-kb 1024 .. warning:: In Tajo, the broadcast join is only the way to perform cross joins. Since the cross join is a very expensive operation, this value need to be tuned carefully. .. _tajo.dist-query.join.task-volume-mb: """""""""""""""""""""""""""""""""""""" `tajo.dist-query.join.task-volume-mb` """""""""""""""""""""""""""""""""""""" The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the amount of input data processed by each task at the second stage. As a result, it determines the degree of the parallel processing of the join query. * Property value type: Integer * Unit: MB * Default value: 64 * Example .. code-block:: xml tajo.dist-query.join.task-volume-mb 64 .. _tajo.dist-query.join.partition-volume-mb: """"""""""""""""""""""""""""""""""""""""""" `tajo.dist-query.join.partition-volume-mb` """"""""""""""""""""""""""""""""""""""""""" The repartition join is executed in two stages. When a join query is executed with the repartition join, this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. * Property value type: Integer * Unit: MB * Default value: 128 * Example .. code-block:: xml tajo.dist-query.join.partition-volume-mb 128 .. _tajo.executor.join.common.in-memory-hash-threshold-mb: """""""""""""""""""""""""""""""""""""""""""""""""""""""" `tajo.executor.join.common.in-memory-hash-threshold-mb` """""""""""""""""""""""""""""""""""""""""""""""""""""""" This value provides the criterion to decide the algorithm to perform a join in a task. If the input data is smaller than this value, join is performed with the in-memory hash join. Otherwise, the sort-merge join is used. * Property value type: Integer * Unit: MB * Default value: 64 * Example .. code-block:: xml tajo.executor.join.common.in-memory-hash-threshold-mb 64 .. warning:: This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully. .. _tajo.executor.join.inner.in-memory-hash-threshold-mb: """""""""""""""""""""""""""""""""""""""""""""""""""""""" `tajo.executor.join.inner.in-memory-hash-threshold-mb` """""""""""""""""""""""""""""""""""""""""""""""""""""""" This value provides the criterion to decide the algorithm to perform an inner join in a task. If the input data is smaller than this value, the inner join is performed with the in-memory hash join. Otherwise, the sort-merge join is used. * Property value type: Integer * Unit: MB * Default value: 64 * Example .. code-block:: xml tajo.executor.join.inner.in-memory-hash-threshold-mb 64 .. warning:: This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully. .. _tajo.executor.join.outer.in-memory-hash-threshold-mb: """""""""""""""""""""""""""""""""""""""""""""""""""""""" `tajo.executor.join.outer.in-memory-hash-threshold-mb` """""""""""""""""""""""""""""""""""""""""""""""""""""""" This value provides the criterion to decide the algorithm to perform an outer join in a task. If the input data is smaller than this value, the outer join is performed with the in-memory hash join. Otherwise, the sort-merge join is used. * Property value type: Integer * Unit: MB * Default value: 64 * Example .. code-block:: xml tajo.executor.join.outer.in-memory-hash-threshold-mb 64 .. warning:: This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully. .. _tajo.executor.join.hash-table.size: """"""""""""""""""""""""""""""""""""" `tajo.executor.join.hash-table.size` """"""""""""""""""""""""""""""""""""" The initial size of hash table for in-memory hash join. * Property value type: Integer * Default value: 100000 * Example .. code-block:: xml tajo.executor.join.hash-table.size 100000 ====================== Sort Query Settings ====================== .. _tajo.dist-query.sort.task-volume-mb: """""""""""""""""""""""""""""""""""""" `tajo.dist-query.sort.task-volume-mb` """""""""""""""""""""""""""""""""""""" The sort operation is executed in two stages. When a sort query is executed, this value indicates the amount of input data processed by each task at the second stage. As a result, it determines the degree of the parallel processing of the sort query. * Property value type: Integer * Unit: MB * Default value: 64 * Example .. code-block:: xml tajo.dist-query.sort.task-volume-mb 64 .. _tajo.executor.external-sort.buffer-mb: """""""""""""""""""""""""""""""""""""""" `tajo.executor.external-sort.buffer-mb` """""""""""""""""""""""""""""""""""""""" A threshold to choose the sort algorithm. If the input data is larger than this threshold, the external sort algorithm is used. * Property value type: Integer * Unit: MB * Default value: 200 * Example .. code-block:: xml tajo.executor.external-sort.buffer-mb 200 .. _tajo.executor.sort.list.size: """""""""""""""""""""""""""""""""""""" `tajo.executor.sort.list.size` """""""""""""""""""""""""""""""""""""" The initial size of list for in-memory sort. * Property value type: Integer * Default value: 100000 * Example .. code-block:: xml tajo.executor.sort.list.size 100000 ========================= Group by Query Settings ========================= .. _tajo.dist-query.groupby.multi-level-aggr: """""""""""""""""""""""""""""""""""""""""""" `tajo.dist-query.groupby.multi-level-aggr` """""""""""""""""""""""""""""""""""""""""""" A flag to enable the multi-level algorithm for distinct aggregation. If this value is set, 3-phase aggregation algorithm is used. Otherwise, 2-phase aggregation algorithm is used. * Property value type: Boolean * Default value: true * Example .. code-block:: xml tajo.dist-query.groupby.multi-level-aggr true .. _tajo.dist-query.groupby.partition-volume-mb: """""""""""""""""""""""""""""""""""""""""""""" `tajo.dist-query.groupby.partition-volume-mb` """""""""""""""""""""""""""""""""""""""""""""" The aggregation is executed in two stages. When an aggregation query is executed, this value indicates the output size of each task at the first stage, which determines the number of partitions to be shuffled between two stages. * Property value type: Integer * Unit: MB * Default value: 256 * Example .. code-block:: xml tajo.dist-query.groupby.partition-volume-mb 256 .. _tajo.dist-query.groupby.task-volume-mb: """""""""""""""""""""""""""""""""""""""""""""" `tajo.dist-query.groupby.task-volume-mb` """""""""""""""""""""""""""""""""""""""""""""" The aggregation operation is executed in two stages. When an aggregation query is executed, this value indicates the amount of input data processed by each task at the second stage. As a result, it determines the degree of the parallel processing of the aggregation query. * Property value type: Integer * Unit: MB * Default value: 64 * Example .. code-block:: xml tajo.dist-query.groupby.task-volume-mb 64 .. _tajo.executor.groupby.in-memory-hash-threshold-mb: """""""""""""""""""""""""""""""""""""""""""""""""""""""" `tajo.executor.groupby.in-memory-hash-threshold-mb` """""""""""""""""""""""""""""""""""""""""""""""""""""""" This value provides the criterion to decide the algorithm to perform an aggregation in a task. If the input data is smaller than this value, the aggregation is performed with the in-memory hash aggregation. Otherwise, the sort-based aggregation is used. * Property value type: Integer * Unit: MB * Default value: 64 * Example .. code-block:: xml tajo.executor.groupby.in-memory-hash-threshold-mb 64 .. warning:: This value is the size of the input stored on file systems. So, when the input data is loaded into JVM heap, its actual size is usually much larger than the configured value, which means that too large threshold can cause unexpected OutOfMemory errors. This value should be tuned carefully. .. _tajo.executor.aggregate.hash-table.size: """""""""""""""""""""""""""""""""""""""""" `tajo.executor.aggregate.hash-table.size` """""""""""""""""""""""""""""""""""""""""" The initial size of hash table for in-memory aggregation. * Property value type: Integer * Default value: 10000 * Example .. code-block:: xml tajo.executor.aggregate.hash-table.size 10000 ====================== Date/Time Settings ====================== .. _tajo.timezone: """"""""""""""""""" `tajo.timezone` """"""""""""""""""" Refer to :doc:`/time_zone`. * Property value type: Time zone id * Default value: Default time zone of JVM * Example .. code-block:: xml tajo.timezone GMT+9 .. _tajo.datetime.date-order: """"""""""""""""""""""""""" `tajo.datetime.date-order` """"""""""""""""""""""""""" Date order specification. * Property value type: One of YMD, DMY, MDY. * Default value: YMD * Example .. code-block:: xml tajo.datetime.date-order YMD ====================== Table partitions ====================== .. _tajo.partition.overwrite.even-if-no-result: """""""""""""""""""""""""""""""""""""""""""""""""""" `tajo.partition.overwrite.even-if-no-result` """""""""""""""""""""""""""""""""""""""""""""""""""" If this value is true, a partitioned table is overwritten even if a subquery leads to no result. Otherwise, the table data will be kept if there is no result. * Property value type: Boolean * Default value: false * Example .. code-block:: xml tajo.partition.overwrite.even-if-no-result false .. _tajo.dist-query.table-partition.task-volume-mb: """""""""""""""""""""""""""""""""""""""""""""""""""" `tajo.dist-query.table-partition.task-volume-mb` """""""""""""""""""""""""""""""""""""""""""""""""""" In Tajo, storing a partition table is executed in two stages. This value indicates the output size of a task of the former stage, which determines the number of partitions to be shuffled between two stages. * Property value type: Integer * Unit: MB * Default value: 256 * Example .. code-block:: xml tajo.dist-query.table-partition.task-volume-mb 256 ====================== Arithmetic Settings ====================== .. _tajo.behavior.arithmetic-abort: """""""""""""""""""""""""""""""""""""""""""""""""""" `tajo.behavior.arithmetic-abort` """""""""""""""""""""""""""""""""""""""""""""""""""" A flag to indicate how to handle the errors caused by invalid arithmetic operations. If true, a running query will be terminated with an overflow or a divide-by-zero. * Property value type: Boolean * Default value: false * Example .. code-block:: xml tajo.behavior.arithmetic-abort false