Predicates

IN Predicate

IN predicate provides a comparison of row, array, and result of a subquery.

Synopsis

column_reference (NOT) IN (val1, val2, ..., valN)
column_reference (NOT) IN (SELECT ... FROM ...) AS alias_name

Examples are as follows:

-- this statement filters lists down all the records where col1 value is 1, 2 or 3:
SELECT col1, col2 FROM table1 WHERE col1 IN (1, 2, 3);

-- this statement filters lists down all the records where col1 value is neither 1, 2 nor 3:
SELECT col1, col2 FROM table1 WHERE col1 NOT IN (1, 2, 3);

You can use IN clause on text data domain as follows:

SELECT col1, col2 FROM table1 WHERE col2 IN ('tajo', 'hadoop');

SELECT col1, col2 FROM table1 WHERE col2 NOT IN ('tajo', 'hadoop');

Finally, you can use subqueries in the IN clause.

SELECT col1, col2
FROM table1
WHERE col3 IN (
  SELECT avg(col2) as avg_col2
  FROM table2
  GROUP BY col1
  HAVING avg_col2 > 100);

SELECT col1, col2
FROM table1
WHERE col3 NOT IN (
  SELECT avg(col2) as avg_col2
  FROM table2
  GROUP BY col1
  HAVING avg_col2 > 100);

String Pattern Matching Predicates

LIKE

LIKE operator returns true or false depending on whether its pattern matches the given string. An underscore (_) in pattern matches any single character. A percent sign (%) matches any sequence of zero or more characters.

Synopsis

string LIKE pattern
string NOT LIKE pattern

ILIKE

ILIKE is the same to LIKE, but it is a case insensitive operator. It is not in the SQL standard. We borrow this operator from PostgreSQL.

Synopsis

string ILIKE pattern
string NOT ILIKE pattern

SIMILAR TO

Synopsis

string SIMILAR TO pattern
string NOT SIMILAR TO pattern

It returns true or false depending on whether its pattern matches the given string. Also like LIKE, SIMILAR TO uses _ and % as metacharacters denoting any single character and any string, respectively.

In addition to these metacharacters borrowed from LIKE, ‘SIMILAR TO’ supports more powerful pattern-matching metacharacters borrowed from regular expressions:

metacharacter description
| denotes alternation (either of two alternatives).
denotes repetition of the previous item zero or more times.
denotes repetition of the previous item one or more times.
? denotes repetition of the previous item zero or one time.
{m} denotes repetition of the previous item exactly m times.
{m,} denotes repetition of the previous item m or more times.
{m,n} denotes repetition of the previous item at least m and not more than n times.
[] A bracket expression specifies a character class, just as in POSIX regular expressions.
() Parentheses can be used to group items into a single logical item.

Note that .` is not used as a metacharacter in SIMILAR TO operator.

Regular expressions

Regular expressions provide a very powerful means for string pattern matching. In the current Tajo, regular expressions are based on Java-style regular expressions instead of POSIX regular expression. The main difference between java-style one and POSIX’s one is character class.

Synopsis

string ~ pattern
string !~ pattern

string ~* pattern
string !~* pattern
operator Description
~ It returns true if a given regular expression is matched to string. Otherwise, it returns false.
!~ It returns false if a given regular expression is matched to string. Otherwise, it returns true.
~* It is the same to ‘~’, but it is case insensitive.
!~* It is the same to ‘!~’, but it is case insensitive.

Here are examples:

'abc'   ~   '.*c'               true
'abc'   ~   'c'                 false
'aaabc' ~   '([a-z]){3}bc       true
'abc'   ~*  '.*C'               true
'abc'   !~* 'B.*'               true

Regular expressions operator is not in the SQL standard. We borrow this operator from PostgreSQL.

Synopsis for REGEXP and RLIKE operators

string REGEXP pattern
string NOT REGEXP pattern

string RLIKE pattern
string NOT RLIKE pattern

But, they do not support case-insensitive operators.