Apache Tajo: A big data warehouse system on Hadoop
Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources. By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities.
- Fast and Efficient
- Fully distributed SQL query processing engine
- Advanced Query Optimization such as cost-based and progressive query optimization
- Interactive analysis on reasonable data set
- Fault tolerance and dynamic scheduling for long-running queries
- Out-of-core algorithms for data sets larger than main memory
- ANSI/ISO SQL standard compliance
- Hive MetaStore access support
- JDBC driver support
- Various file formats support, such as CSV, RCFile, RowFile, SequenceFile and Parquet
- User-defined functions
- Interactive shell
- Convenient Backup/Restore utility
- Asynchronous/Synchronous Java API
- [2014-04-01] Min Zhou was invited to a new committer.
- [2014-03-24] Apache Tajo became Apache Top-level Project.
- [2014-02-27] Keuntae Park was invited to ApacheConf 2014 (schedule).
- [2014-01-02] Keuntae Park was invited to become a new committer.
- [2013-11-20] Tajo 0.2.0-incubating Released. Now available for download!
- [2013-10-15] Tajo was presented at Bay Area Hadoop User Group - LinkedIn Special Event.
- [2013-10-15] Tajo was introduced at Deview 2013.
- [2013-05-27] Two projects were accepted to Google Summer of Code 2013.
- [2013-04-09] Tajo was demonstrated at IEEE ICDE 2013.
- [2013-03-07] Tajo Project enters incubation.
- [2012-10-15] A demonstration paper of Tajo was accepted to IEEE ICDE 2013.