ClickHouse Schema Design Best Practices¶
Tip
This is a work-in-progress document for collecting ClickHouse schema and querying best-practices based on experiences running ClickHouse at scale at Sentry. It is subject to change and if something doesn’t seem right please submit a PR to Snuba.
Columns based on dictionary data (tag promotion)¶
ClickHouse is a columnar datastore, and at run-time it loads columns on-demand
based on the columns referenced in the query (both the columns SELECT
ed
and those part of the WHERE
clause). The ability to store different columns independently
and not load them for every row for every query is part of the performance advantage that
ClickHouse provides over a traditional RDBMS (like PostgreSQL).
Commonly, a data schema contains a flexible key:value pair mapping
(canonically at Sentry: tags
or contexts
) and stores that
data in a Nested
column that contains two arrays where the first array contains the keys
of the dictionary and the second array contains the values. To make queries faster,
a column like this can be indexed with bloom filters as described in Bloom filter indexing on dictionary-like columns. In general
we construct this index across datasets for tags
but not for other columns.
This works well when your dataset and query design gives you the ability to filter for exact matches and a large number of rows will NOT be an exact match. Often, however, a ClickHouse query filters for rows that contain a substring match or regular expression match for a tag value of a given key. This makes bloom filter indexes not usable for the query and, depending on the other selectivity attributes of your query, can necessitate moving (or promoting) those relevant values for a given tag key to a new separate column [1].
Selectivity in queries and indices¶
Queries are much more efficient when they have the attribute of being low-selectivity on
the table indexes – meaning the query conditions on indexed columns filter the dataset
to a very small proportion of the overall number of rows. High selectivity
can break the efficiency of the bloom-filter style index on dictionary columns
(see Bloom filter indexing on dictionary-like columns). In cases of high-selectivity queries, there is a negative performance impact on both
bloom-filter indexed columns as well as promoted tag value columns (when searching for a key=value
pair exact match). The promoted column can make the penalty a bit less severe because
it does not load tag values from unrelated keys. Still, an effort should be made to avoid
low-selectivity queries.
Bloom filter indexing on dictionary-like columns¶
To facilitate faster searching on dictionary columns, we tend to create bloom filter indices
on a hashes of both the unique key
values of each row as well as hashes of all the key=value
pairs of each row. The bloom filter registers these
in a stochastic data structure designed to quickly determine which elements do NOT exist in a set.
So that it can model the entire unbounded keyspace in a fixed amount of memory, a bloom filter
is designed to have false positives. This means that there is actually a performance penalty
if the value is often present in the underlying set: First, the value must be tested
against the bloom filter (which will always return “maybe present”), and after
that operation occurs a full scan of the column must be performed.
Due to their structure, bloom filters are only good for exact value searching. They cannot be used for “is user-value a prefix of column?” or “does column match regex?” style queries. Those styles of queries require a separate column to search.
Aggregate Tables and Materialization¶
A common use case for ClickHouse and Snuba is to ingest raw data and automatically roll it up to aggregate values (keyed by a custom set of dimensions). This lets a dataset owner simplify their write logic while getting the query benefits of rows that are pre-aggregated. This is done with what we’ll call a raw table (the table the consumer writes to), an aggregate table (the table the API reads from) and a materialized view (which describes how the data should be transformed from raw to aggregate).
Sample usage of a materialized view/aggregate table from the official ClickHouse Documentation. Note that contrary to this example, the aggregate table definition in Snuba is always separate from the materialized view definition (which is just a ClickHouse SQL transformation, similar to a PostgreSQL trigger).
In general, Snuba follows the naming conventions here:
(
widgets_raw_local
,widgets_raw_dist
) for the raw (local, distributed) tableswidgets_aggregation_mv
for the materialized view (this only exists on storage nodes)(
widgets_aggregated_local
,widgets_aggregated_dist
) for the roll-up/aggregated (local, distributed) tables
Materialized views are immutable so it’s normal to have multiple versions of
widgets_aggregation_mv
when behavior is updated, with suffixes like
widgets_aggregation_mv_v1
, widgets_aggregation_mv_v2
, etc. Migration
between materialized view versions are described in the next section but in general
old materialized views should be discarded once they are no longer used.
Schema migrations using materialization_version¶
As we discussed at the end of the prior section, materialized view logic cannot
be updated in place. In order to continuously roll-up input data without data
loss or duplication in the aggregate table, we control logic changes with a
column on the raw table, materialization_version
, and making the materialized
view logic depend on specific values of that column. To update MV logic, you
create a new materialized view that looks for the last used value of
materialization_version
plus one and then, after that’s been created in all
relevant environments, update the consumer to write the new materialization_version
to the raw column.
Here is how this might look in practice:
Context:
There is a raw table
click_events_raw_local
, that has a field namedclick_duration
, of type Float64. A snuba consumer is setting this to 0 for certain types of click events.There is a materialized view
click_events_aggregation_mv
that is writing aquantilesState()
value for aclick_duration
column inclick_events_aggregated_local
including those zero-values. This materialized view looks for the value ofmaterialization_version = 0
in its WHERE condition.The query users are being surprised by p90, p95, and p99 values that are taking into account zero-duration click events which don’t make sense for the use case.
To resolve this confusion, we don’t want to set quantilesState for click_duration
if
the incoming click_duration
is 0.
Steps to resolve the issue:
Create a new materialized view
click_events_aggregation_mv_v1
via the migration system. This new materialized view will use the WHERE clause or some kind of filtering to avoid setting quantilesState(0) in the write for theclick_duration
column. This new materialized will only operate on inputs inclick_events_raw_local
wherematerialization_version = 1
Test that this fixes the issue in your local environment by changing your consumer to use
materialization_version = 1
. It can make sense to control this via the settings file in (insnuba/settings/__init.py__
)Run the migration in all relevant environments.
Change the materialization_version setting mentioned above in a specific environment, to set
materialization_version = 1
on write.Validate that the consumer is writing rows with the new materialization version, and that it produces the expected roll-up results.
Write a migration to remove the now-unused materialized view (
click_events_aggregation_mv
).