ClickHouse SELECT Advances

Dynamic column selection (also known as a COLUMNS expression) allows you to match some columns in a result with a re2 regular expression.

Matchers

Select all columns:

SELECT * FROM hits;

Select only specific subset of columns:

SELECT COLUMNS('URL.*ID') FROM hits;

┌─URLCategoryID─┬─URLRegionID─┐
│         15664216 │
│             00 │
│             00 │
│             00 │
│             00 │
└───────────────┴─────────────┘
SELECT COLUMNS(WatchID, UserID) FROM hits;

┌─────────────WatchID─┬───────────────UserID─┐
│ 53655340254667443682213399965098237040  │
│ 52875591965280520482213399965098237040  │
│ 90579612216795728132213399965098237040  │
│ 55205080087864745729141107111321352513

Column transformers

Apply transformations for selected columns or remove some columns from selections:

APPLY: Allows you to invoke some function for each row returned by an outer table expression of a query.

SELECT * APPLY toString FROM hits;
SELECT COLUMNS('URL.*ID') APPLY toString FROM hits;
SELECT COLUMNS('URL.*ID') APPLY x -> toString(x) FROM hits;

┌─toString(URLCategoryID)─┬─toString(URLRegionID)─┐
│ 15664216                   │
│ 00                     │
│ 00                     │
│ 00                     │
│ 00                     │
└─────────────────────────┴───────────────────────┘

EXCEPT: exclude one or more columns from the result.

SELECT * EXCEPT (UserID, URLRegionID) FROM hits;
SELECT COLUMNS('URL.*ID') EXCEPT URLCategoryID FROM hits;

┌─URLRegionID─┐
│         216 │
│           0 │
│           0 │
│           0 │
│           0 │
└─────────────┘

REPLACE: Specifies one or more expression aliases

SELECT COLUMNS('URL.*ID') REPLACE (URLCategoryID * 10 AS URLCategoryID)
FROM hits;

┌─URLCategoryID─┬─URLRegionID─┐
│        156640216 │
│             00 │
│             00 │
│             00 │
│             00 │
└───────────────┴─────────────┘

SELECT COLUMNS('URL.*ID') REPLACE (leftPad(toString(URLRegionID), 10, '*') AS URLRegionID)
FROM hits;

┌─URLCategoryID─┬─URLRegionID─┐
│         15664*******216  │
│             0*********0  │
│             0*********0  │
│             0*********0  │
│             0*********0  │
└───────────────┴─────────────┘

We can also combine them:

SELECT COLUMNS('URL.*ID') APPLY(toString) APPLY(length) APPLY(max) FROM hits;

┌─max(length(toString(URLCategoryID)))─┬─max(length(toString(URLRegionID)))─┐
│                                    55 │
└──────────────────────────────────────┴────────────────────────────────────┘

Select from multiple tables

merge() create a temporary Merge table with structure is taken from the first table encountered that matches the regular expression.

When upgrading ClickHouse usually rename system table if schema changed in a new release: system.query_log, system.query_log_0, system.query_log_1, ... The query below help querying from all them:

SELECT * FROM merge(system, '^query_log')

ClickHouse Series

DataDataClickHouseClickHouse on Kubernetes