Queries

Overview

Synopsis

SELECT [distinct [all]] * | <expression> [[AS] <alias>] [, ...]
  [FROM <table reference> [[AS] <table alias name>] [, ...]]
  [WHERE <condition>]
  [GROUP BY <expression> [, ...]]
  [HAVING <condition>]
  [ORDER BY <expression> [ASC|DESC] [NULLS (FIRST|LAST)] [, ...]]

From Clause

Synopsis

[FROM <table reference> [[AS] <table alias name>] [, ...]]

The FROM clause specifies one or more other tables given in a comma-separated table reference list. A table reference can be a relation name , or a subquery, a table join, or complex combinations of them.

Table and Table Aliases

A temporary name can be given to tables and complex table references to be used for references to the derived table in the rest of the query. This is called a table alias.

To create a a table alias, please use AS:

FROM table_reference AS alias

or

FROM table_reference alias

The AS keyword can be omitted, and Alias can be any identifier.

A typical application of table aliases is to give short names to long table references. For example:

SELECT * FROM long_table_name_1234 s JOIN another_long_table_name_5678 a ON s.id = a.num;

Joined Tables

Tajo supports all kinds of join types.

Join Types

Cross Join
FROM T1 CROSS JOIN T2

Cross join, also called Cartesian product, results in every possible combination of rows from T1 and T2.

FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2.

Qualified joins

Qualified joins implicitly or explicitly have join conditions. Inner/Outer/Natural Joins all are qualified joins. Except for natural join, ON or USING clause in each join is used to specify a join condition. A join condition must include at least one boolean expression, and it can also include just filter conditions.

Inner Join

T1 [INNER] JOIN T2 ON boolean_expression
T1 [INNER] JOIN T2 USING (join column list)

INNER keyword is the default, and so INNER can be omitted when you use inner join.

Outer Join

T1 (LEFT|RIGHT|FULL) OUTER JOIN T2 ON boolean_expression
T1 (LEFT|RIGHT|FULL) OUTER JOIN T2 USING (join column list)

One of LEFT, RIGHT, or FULL must be specified for outer joins. Join conditions in outer join will have different behavior according to corresponding table references of join conditions. To know outer join behavior in more detail, please refer to Advanced outer join constructs.

Natural Join

T1 NATURAL JOIN T2

NATURAL is a short form of USING. It forms a USING list consisting of all common column names that appear in both join tables. These common columns appear only once in the output table. If there are no common columns, NATURAL behaves like CROSS JOIN.

Subqueries

A subquery is a query that is nested inside another query. It can be embedded in the FROM and WHERE clauses.

Example:

FROM (SELECT col1, sum(col2) FROM table1 WHERE col3 > 0 group by col1 order by col1) AS alias_name
WHERE col1 IN (SELECT col1 FROM table1 WHERE col2 > 0 AND col2 < 100) AS alias_name

For more detailed information, please refer to Joins.

Where Clause

The syntax of the WHERE Clause is

Synopsis

WHERE search_condition

search_condition can be any boolean expression. In order to know additional predicates, please refer to Predicates.

Groupby and Having Clauses

Synopsis

SELECT select_list
    FROM ...
    [WHERE ...]
    GROUP BY grouping_column_reference [, grouping_column_reference]...
    [HAVING boolean_expression]

The rows which passes WHERE filter may be subject to grouping, specified by GROUP BY clause. Grouping combines a set of rows having common values into one group, and then computes rows in the group with aggregation functions. HAVING clause can be used with only GROUP BY clause. It eliminates the unqualified result rows of grouping.

grouping_column_reference can be a column reference, a complex expression including scalar functions and arithmetic operations.

SELECT l_orderkey, SUM(l_quantity) AS quantity FROM lineitem GROUP BY l_orderkey;

SELECT substr(l_shipdate,1,4) as year, SUM(l_orderkey) AS total2 FROM lineitem GROUP BY substr(l_shipdate,1,4);

If a SQL statement includes GROUP BY clause, expressions in select list must be either grouping_column_reference or aggregation function. For example, the following example query is not allowed because l_orderkey does not occur in GROUP BY clause.

SELECT l_orderkey, l_partkey, SUM(l_orderkey) AS total FROM lineitem GROUP BY l_partkey;

Aggregation functions can be used with DISTINCT keywords. It forces an individual aggregate function to take only distinct values of the argument expression. DISTINCT keyword is used as follows:

SELECT l_partkey, COUNT(distinct l_quantity), SUM(distinct l_extendedprice) AS total FROM lineitem GROUP BY l_partkey;

Orderby and Limit Clauses

Synopsis

FROM ... ORDER BY <sort_expr> [(ASC|DESC)] [NULLS (FIRST|LAST) [,...]

sort_expr can be a column reference, aliased column reference, or a complex expression. ASC indicates an ascending order of sort_expr values. DESC indicates a descending order of sort_expr values. ASC is the default order.

NULLS FIRST and NULLS LAST options can be used to determine whether nulls values appear before or after non-null values in the sort ordering. By default, null values are dealt as if larger than any non-null value; that is, NULLS FIRST is the default for DESC order, and NULLS LAST otherwise.

Window Functions

A window function performs a calculation across multiple table rows that belong to some window frame.

Synopsis

SELECT ...., func(param) OVER ([PARTITION BY partition-expr [, ...]] [ORDER BY sort-expr [, ...]]), ....,  FROM

The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.

We will briefly explain some examples using window functions.

Examples

Multiple window functions can be used in a SQL statement as follows:

SELECT l_orderkey, sum(l_discount) OVER (PARTITION BY l_orderkey), sum(l_quantity) OVER (PARTITION BY l_orderkey) FROM LINEITEM;

If OVER() clause is empty as following, it makes all table rows into one window frame.

SELECT salary, sum(salary) OVER () FROM empsalary;

Also, ORDER BY clause can be used without PARTITION BY clause as follows:

SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;

Also, all expressions and aggregation functions are allowed in ORDER BY clause as follows:

select
  l_orderkey,
  count(*) as cnt,
  row_number() over (partition by l_orderkey order by count(*) desc)
  row_num
from
  lineitem
group by
  l_orderkey

Note

Currently, Tajo does not support multiple different partition-expressions in one SQL statement.