===========================================
Snuba Data Slicing (under development)
===========================================

*This feature is under active development and is subject to change*

To support a higher volume of data, we are building out support for
datasets and storages that span multiple physical resources
(Kafka clusters, Redis instances, Postgres databases, ClickHouse clusters,
etc.) with the same schema. Across Sentry, data records will
have a logical partition assignment based on the data's organization_id. In Snuba,
we maintain a mapping of logical partitions to physical slices in
``settings.LOGICAL_PARTITION_MAPPING``.

In a future revision, this ``settings.LOGICAL_PARTITION_MAPPING`` will be
used along with ``settings.SLICED_STORAGE_SETS`` to map queries and incoming
data from consumers to different ClickHouse clusters using a
(StorageSetKey, slice_id) pairing that exists in configuration.

===========================
Configuring a slice
===========================

Mapping logical partitions : physical slices
----------------------------------------------
To add a slice to a storage set's logical:physical mapping, or repartition,
increment the slice count in ``settings.SLICED_STORAGE_SETS`` for the relevant
storage set. Change the mapping of the relevant storage set's
logical partitions in ``settings.LOGICAL_PARTITION_MAPPING``.
Every logical partition **must** be assigned to a slice and the
valid values of slices are in the range of ``[0,settings.SLICED_STORAGE_SETS[storage_set])``.

Defining ClickHouse clusters in a sliced environment
----------------------------------------------------

Given a storage set, there can be three different cases:

1. The storage set is not sliced
2. The storage set is sliced and no mega-cluster is needed
3. The storage set is sliced and a mega-cluster is needed

A mega-cluster is needed when there may be partial data residing on different sliced
ClickHouse clusters. This could happen, for example, when a logical partition:slice
mapping changes. In this scenario, writes of new data will be routed to the new slice,
but reads of data will need to span multiple clusters. Now that queries need to work
across different slices, a mega-cluster query node will be needed.

For each of the cases above, different types of ClickHouse cluster
configuration will be needed.

For case 1, we simply define clusters as per usual in ``settings.CLUSTERS``.

For cases 2 and 3:

To add a sliced cluster with an associated (storage set key, slice) pair, add cluster definitions
to ``settings.SLICED_CLUSTERS`` in the desired environment's settings. Follow the same structure as
regular cluster definitions in ``settings.CLUSTERS``. In the ``storage_set_slices`` field, sliced storage
sets should be added in the form of ``(StorageSetKey, slice_id)`` where slice_id is in
the range ``[0,settings.SLICED_STORAGE_SETS[storage_set])`` for the relevant ``StorageSetKey``.


Preparing the storage for sharding
----------------------------------
A storage that should be sharded requires setting the partition key column that will be used
to calculate the logical partition and ultimately the slice ID for how to query the destination
data.

This is done with the `partition_key_column_name` property in the storage schema (we do not
support sharded storages for non-YAML based entities). You can see an example of how one
might shard by organization_id in generic_metrics_sets and generic_metrics_distributions
dataset YAML files.

Adding sliced Kafka topics
---------------------------------
In order to define a "sliced" Kafka topic, add ``(default logical topic name, slice id)`` to
``settings.SLICED_KAFKA_TOPIC_MAP``. This tuple should be mapped to a custom physical topic
name of the form ``logical_topic_name-slice_id``. Make sure to add the corresponding broker
configuration details to ``settings.SLICED_KAFKA_BROKER_CONFIG``. Here, use the same tuple
``(default logical topic name, slice id)`` as the key, and the broker config info as the value.

Example configurations:

``SLICED_KAFKA_TOPIC_MAP`` = {("snuba-generic-metrics", 1): "snuba-generic-metrics-1"}

``SLICED_KAFKA_BROKER_CONFIG`` = {("snuba-generic-metrics", 1): BROKER_CONFIG}

These types of topics can be "sliced": raw topics, replacements topics, commit log topics,
subscription scheduler topics. Note that the slicing boundary stops at this point and
the results topics for subscriptions cannot be sliced.


=================================
Working in a Sliced Environment
=================================

Starting a sliced consumer
-----------------------------

First, ensure that your slicing configuration is set up properly: ``SLICED_STORAGE_SETS``,
``SLICED_CLUSTERS``, ``SLICED_KAFKA_TOPIC_MAP``, and ``SLICED_KAFKA_BROKER_CONFIG``.
See above for details.

Start up ``snuba consumer`` as per usual, with an extra flag ``--slice-id`` set equal
to the slice number you are reading from.


TODO: handling subscriptions, scheduler and executor, etc.
----------------------------------------------------------