Expand description
Two-tier storage backend with size-based routing and redirect tombstones.
TieredStorage routes objects to a high-volume or long-term backend based
on size and maintains redirect tombstones so that reads never need to probe
both backends. See the crate-level documentation for the high-level
motivation, and the TieredStorage struct docs for routing and tombstone
semantics.
§Cross-Tier Consistency
A single logical object may span both backends: a tombstone in HV pointing
to a payload in LT. Mutations keep the two in sync through compare-and-swap
on the high-volume backend (see HighVolumeBackend::compare_and_write).
Each operation reads the current HV revision, performs its work, then
atomically swaps the HV entry only if the revision is still current —
rolling back on conflict.
§Revision Keys
Every large-object write stores its payload at a revision key in the
long-term backend: {original_key}/{uuid}. The UUID suffix is random (no
monotonicity is guaranteed), so each write targets a distinct LT path
regardless of whether another write to the same logical key is in progress.
The tombstone in HV then points to this specific revision. Because each
writer owns its own LT blob, the compare-and-swap on the tombstone becomes
an atomic pointer swap: the winner’s revision is committed and the loser
can safely delete its own blob without affecting the winner.
See new_long_term_revision for the key construction.
§Compare-and-Swap
All mutating operations follow a common pattern of reading the current revision, performing the upload, atomically swapping the revision (commit point), and cleaning up the now-unreferenced LT blob in the background:
§Large-Object Write (> 1 MiB)
- Read HV to capture the current revision (existing tombstone target, or absent).
- Write payload to LT at a unique revision key.
- Compare-and-swap in HV: write a tombstone pointing to the new
revision, only if the current revision still matches step 1.
- OK — schedule background deletion of the old LT blob, if any.
- Conflict — another writer won the race; schedule background deletion of our new LT blob.
- Error — reload the tombstone and delete the unreferenced blob or blobs.
§Small-Object Write (≤ 1 MiB)
- Write inline to HV, skipping the write if a tombstone is present.
- OK — done; the object is stored entirely in HV.
- Tombstone present — a large object already occupies this key; continue:
- Compare-and-swap in HV: replace the tombstone with inline data, only
if the tombstone’s revision still matches.
- OK — schedule background deletion of the old LT blob.
- Conflict — another writer won the race; they will clean up the LT blob and we have no new LT blob to clean up.
- Error — reload the tombstone and delete the unreferenced blob if the write went through.
§Delete
- Delete from HV if the entry is not a tombstone.
- OK — done; there is no LT data to clean up.
- Tombstone present — a large object is stored here; continue:
- Compare-and-swap in HV: remove the tombstone, only if its revision
still matches.
- OK — schedule background deletion of the LT blob.
- Conflict — another writer won the race; they will clean up.
- Error — reload the tombstone and delete the unreferenced blob if the write went through.
Tombstone removal is the commit point for deletes. If the subsequent LT cleanup fails, an orphan blob remains but the object is already unreachable through the normal read path.
§Last-Writer-Wins
Concurrent mutations on the same key are inherently a race. Even a write
that returns Ok may be immediately overwritten by another caller — there
is no ordering guarantee and objectstore cannot provide a read-your-writes
promise.
CAS conflicts are therefore not errors: the losing writer’s data is
cleaned up and Ok is returned, because the result is indistinguishable
from having succeeded a moment earlier and then been overwritten.
§Idempotency
compare_and_write is idempotent: if the row is already in the target state, it
returns true without re-applying the mutation. This is critical for retry
safety. If the server commits a write but the response is lost, a retry sees the
already-mutated state and still returns true — so callers do not mistakenly
treat a successful commit as a lost race and clean up data that was actually
persisted.
Structs§
- Tiered
Storage - Two-tier storage backend that routes objects by size.
- Tiered
Storage Config - Configuration for
TieredStorage.