Cluster Setup

Fully Distributed Mode

A fully distributed mode enables a Tajo instance to run on Hadoop Distributed File System (HDFS). In this mode, a number of Tajo workers run across a number of the physical nodes where HDFS data nodes run.

In this section, we explain how to setup the cluster mode.


Please add the following configs to tajo-site.xml file:


  <description>TajoMaster binding address between master and workers.</description>

  <description>TajoMaster binding address between master and remote clients.</description>


The file conf/workers lists all host names of workers, one per line. By default, this file contains the single entry localhost. You can easily add host names of workers via your favorite text editor.

For example:

$ cat > conf/workers

<ctrl + d>

Make base directories and set permissions

If you want to know Tajo’s configuration in more detail, see Configuration page. Before launching the tajo, you should create the tajo root dir and set the permission as follows:

$ $HADOOP_HOME/bin/hadoop fs -mkdir       /tajo
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /tajo

Launch a Tajo cluster

Then, execute

$ $TAJO_HOME/bin/


By default, each worker is set to very little resource capacity. In order to increase parallel degree, please read Worker Configuration.


By default, TajoMaster listens on localhost/ for clients. To allow remote clients to access TajoMaster, please set tajo.master.client-rpc.address config to tajo-site.xml. In order to know how to change the listen port, please refer Cluster Service Configuration Defaults.