Storage Plugin Overview

Overview

Tajo supports various storage systems, such as HDFS, Amazon S3, Openstack Swift, HBase, and RDBMS. Tajo already embeds HDFS, S3, Openstack, HBase, RDBMS storage plugins, and also Tajo allows users to register custom storages and data formats to Tajo cluster instances. This section describes how you register custom storages and data types.

Register custom storage

First of all, your storage implementation should be packed as a jar file. Then, please copy the jar file into tajo/extlib directory. Next, you should copy conf/storage-site.json.template into conf/storage-site.json and modify the file like the below.

Configuration

Tajo has a default configuration for builtin storages, such as HDFS, local file system, and Amazon S3. it also allows users to add custom storage plugins

conf/storage-site.json file has the following struct:

{
  "storages": {
    "${scheme}": {
      "handler": "${class name}"
    }
  }
}

Each storage instance (i.e., Tablespaces) is identified by an URI. The scheme of URI plays a role to identify storage type. For example, hdfs:// is used for Hdfs storage, jdbc:// is used for JDBC-based storage, and hbase:// is used for HBase storage.

You should substitute a scheme name without :// for ${scheme}.

See an example for HBase storage.

{
  "storages": {
    "hbase": {
      "handler": "org.apache.tajo.storage.hbase.HBaseTablespace",
      "default-format": "hbase"
    }
  }
}