Rust applications¶

Hybrid applications¶

PR: https://github.com/getsentry/streams/pull/177

User story: I want to rewrite a pipeline step in getsentry monolith in Rust.

Currently Rust-ification within the monolith is being done by adding new pyo3-based Python dependencies to getsentry’s requirements. We’ll go the same path, users can define pipeline steps using pyo3, but using our helper functions/”framework.”

Here is how a function definition works:

// mypackage/src/lib.rs
sentry_streams::rust_function!(
    RustTransformMsg,
    IngestMetric,
    TransformedIngestMetric,
    |msg: Message<IngestMetric>| -> TransformedIngestMetric {
        let (payload, _) = msg.take();
        TransformedIngestMetric {
            metric_type: payload.metric_type,
            name: payload.name,
            value: payload.value,
            tags: payload.tags,
            timestamp: payload.timestamp,
            transformed: true,
        }
    }
);

This would be packaged up in a pyo3-based crate, and then can be referenced from the regular pipeline definition like this:

.apply(Map("transform", function=my_package.RustTransformMsg()))

Message payloads¶

IngestMetric and TransformedIngestMetric types have to be defined by the user in both Rust and Python.

// mypackage/src/lib.rs
#[derive(Serialize, Deserialize)
struct IngestMetric { ... }

class IngestMetric(TypedDict): ...

The user has to write their own Python .pyi stub file to declare that RustTransformMsg takes IngestMetric and returns TransformedIngestMetric:

# mypackage/mypackage.pyi
class RustTransformMsg(RustFunction[IngestMetric, Any]):
    def __init__(self) -> None: ...
    def __call__(self, msg: Message[IngestMetric]) -> Any: ...

Then, the user has to define how conversion works between these types. They can implement this function manually, or use a builtin conversion method provided by us. We currently only provide one builtin conversion by round-tripping via JSON:

// mypackage/src/lib.rs
sentry_streams::convert_via_json!(IngestMetric);

…and the same procedure has to be repeated for the output type TransformedIngestMetric.

What happens at runtime¶

The rust_function macro currently just generates a simple Python function for the given Rust function. The GIL is released while the user’s Rust code is running, but there is still some GIL overhead when entering and exiting the function.

In the future we can transparently optimize this without users having to change their applications. For example, batching function calls to amortize GIL overhead. We would then only hold the GIL while entering and exiting the batch.

What we want to improve in the future¶

improve performance of calling convention/reduce overhead
- take inspiration from https://github.com/ealmloff/sledgehammer_bindgen
automatically generate type stubs for user’s Rust code — pyo3 does have something like that, but it doesn’t work perfectly (exposes internals of our Rust macro)
improve ergonomics of message types and their conversion, add protobuf or msgpack as a way to roundtrip
each team at sentry would have to maintain a new python package for their Rust functions, set up pyo3 and CI from scratch, etc. we can streamline this.
- we already have: sentry_relay (relay integration), ophio (grouping engine), vroomrs (profiles), symbolic (stacktrace processing)
- easiest: we provide a “monorepo” and “monopackage” where all rust functions for getsentry go. we maintain CI for this monorepo.
- medium: repository template
- also, ideally this is aligned with devinfra’s “golden path” for python devenv
- in practice some team will have to provide support for questions about pyo3, since its entire API surface is exposed to product teams (although we can templatize and abstract a lot)

Pure-Rust pipelines¶

A lot of the complexity mentioned above is only really necessary for when you want to mix Python and Rust code. For pure-Rust applications, we could do something entirely different:

The runner does not have to be started from Python at all. If we started it from Rust, we would have a much easier time optimizing function calls.
The pipeline definition does not have to be Python. We could have it be YAML or even Rust as well.
Type stubs are not really necessary. We can easily validate that the types match during startup, or if the pipeline definition is in Rust, let the compiler do that job for us.

Any of these will however split the ecosystem. I think we have plenty of ergonomic improvements we can make even for hybrid applications, that would benefit pure-Rust users as well. We should focus on those first.

Meeting notes July 24, 2025¶

a better pure-rust story
- we have too much boilerplate, and now especially for pure rust apps
- build a rust runner, and try to get rid of as much pyo3 junk as possible
  - reference: The rust arroyo runtime
- maybe hybrid will get better through this rearchitecture
- maybe denormalize Parse steps into Map (@Filippo Pacifici)
```
// mypackage/src/lib.rs as pyo3
use sentry_streams;

sentry_streams::rust_function!(...);

sentry_streams::main_function!();

// or, in bin target:
pub use sentry_streams::main;
```
```
mypackage.run_streams()
```
concerns:
- user can freely downgrade/upgrade verison, since they “own” the runtime (as they are statically linking it)
- ability to opt out of message conversion trait requirements
message type conversion
- boilerplate is an issue
  - integration with existing schema repos, or copy schema-to-type generation into streams for “inline schemas”
- better performance
better runtime semantics for rust functions
- map chains, but in rust?
- no multiprocessing!