Compression

Using compression can make data size compact, thereby enabling efficient use of network bandwidth and storage. Most of Tajo data formats support data compression feature. Currently, compression configuration affects only for stored data format and it is enabled when a table is created with the proper table property(See Create Table).

Compression Properties for each Data Format

Compression Properties
Data Format Property Name Avaliable Values
text/json/rcfile/sequencefile [1] compression.codec Fully Qualified Classname in Hadoop [2]
parquet parquet.compression uncompressed/snappy/gzip/lzo
orc orc.compression.kind none/snappy/zlib

Footnotes

[1]For sequence file, you should specify ‘compression.type’ in addition to ‘compression.codec’. Refer to SequenceFile.
[2]All classes are available if they implement org.apache.hadoop.io.compress.CompressionCodec.