Predicates¶
IN Predicate¶
IN predicate provides row and array comparison.
Synopsis
column_reference IN (val1, val2, ..., valN)
column_reference NOT IN (val1, val2, ..., valN)
Examples are as follows:
-- this statement filters lists down all the records where col1 value is 1, 2 or 3:
SELECT col1, col2 FROM table1 WHERE col1 IN (1, 2, 3);
-- this statement filters lists down all the records where col1 value is neither 1, 2 nor 3:
SELECT col1, col2 FROM table1 WHERE col1 NOT IN (1, 2, 3);
You can use ‘IN clause’ on text data domain as follows:
SELECT col1, col2 FROM table1 WHERE col2 IN ('tajo', 'hadoop');
SELECT col1, col2 FROM table1 WHERE col2 NOT IN ('tajo', 'hadoop');
String Pattern Matching Predicates¶
LIKE¶
LIKE operator returns true or false depending on whether its pattern matches the given string. An underscore (_) in pattern matches any single character. A percent sign (%) matches any sequence of zero or more characters.
Synopsis
string LIKE pattern
string NOT LIKE pattern
ILIKE¶
ILIKE is the same to LIKE, but it is a case insensitive operator. It is not in the SQL standard. We borrow this operator from PostgreSQL.
Synopsis
string ILIKE pattern
string NOT ILIKE pattern
SIMILAR TO¶
Synopsis
string SIMILAR TO pattern
string NOT SIMILAR TO pattern
It returns true or false depending on whether its pattern matches the given string. Also like LIKE, SIMILAR TO uses _ and % as metacharacters denoting any single character and any string, respectively.
In addition to these metacharacters borrowed from LIKE, ‘SIMILAR TO’ supports more powerful pattern-matching metacharacters borrowed from regular expressions:
metacharacter | description |
---|---|
| | denotes alternation (either of two alternatives). |
denotes repetition of the previous item zero or more times. | |
denotes repetition of the previous item one or more times. | |
? | denotes repetition of the previous item zero or one time. |
{m} | denotes repetition of the previous item exactly m times. |
{m,} | denotes repetition of the previous item m or more times. |
{m,n} | denotes repetition of the previous item at least m and not more than n times. |
[] | A bracket expression specifies a character class, just as in POSIX regular expressions. |
() | Parentheses can be used to group items into a single logical item. |
Note that .` is not used as a metacharacter in SIMILAR TO operator.
Regular expressions¶
Regular expressions provide a very powerful means for string pattern matching. In the current Tajo, regular expressions are based on Java-style regular expressions instead of POSIX regular expression. The main difference between java-style one and POSIX’s one is character class.
Synopsis
string ~ pattern
string !~ pattern
string ~* pattern
string !~* pattern
operator | Description |
---|---|
~ | It returns true if a given regular expression is matched to string. Otherwise, it returns false. |
!~ | It returns false if a given regular expression is matched to string. Otherwise, it returns true. |
~* | It is the same to ‘~’, but it is case insensitive. |
!~* | It is the same to ‘!~’, but it is case insensitive. |
Here are examples:
'abc' ~ '.*c' true
'abc' ~ 'c' false
'aaabc' ~ '([a-z]){3}bc true
'abc' ~* '.*C' true
'abc' !~* 'B.*' true
Regular expressions operator is not in the SQL standard. We borrow this operator from PostgreSQL.
Synopsis for REGEXP and RLIKE operators
string REGEXP pattern
string NOT REGEXP pattern
string RLIKE pattern
string NOT RLIKE pattern
But, they do not support case-insensitive operators.