Engineering

Stateful vs Stateless Stream Processing: Watermarks, Barriers, and Performance Trade-offs

Karthic Rao

March 27, 2025

The differences between stateful and stateless distributed stream processing

Stateful Vs. Stateless Distributed Stream Processing

Want to see e6data in action?

Learn how data teams power their workloads.

Get Demo

Distributed stream processing fundamentally involves choosing between stateful stream processing—used extensively in event-time-based analytics and windowed computations—and stateless stream processing, commonly used for highly parallelizable tasks like data ingestion or batch ETL workflows.

Systems like Apache Flink, Spark Structured Streaming, RisingWave, and Kafka Streams represent stateful processing. They hold state in-memory (such as Windows or join buffers) across streaming events, necessitating careful synchronization and global coordination to ensure correctness. In contrast, stateless tasks, such as parallel writers in Apache Iceberg map style transformations and data quality checks, operate without maintaining a persistent state across events, relying instead on external coordination mechanisms like atomic commit protocols.

This deep dive will compare how these two paradigms manage progress tracking, synchronization, and consistency. We’ll explore real-world examples—stateful stream joins and stateless distributed table writes—highlighting their distinct approaches to progress tracking, global coordination needs, performance, scalability, fault tolerance, correctness challenges, and critical design trade-offs. This guide will help you select the optimal approach for your streaming workload.

Progress Tracking Mechanisms

Stateful Streaming (e.g., Flink, Kafka Streams)

Stateful stream processors use watermarks or similar markers to track event-time progress within the data flow. A watermark is a special timestamped event that flows with the data and indicates that all earlier timestamps have been observed. Watermarks are crucial for event-time synchronization in stateful stream processing.

In stream-processing systems like Apache Flink, sources periodically emit watermarks to signal how far event time has advanced. This lets operators know when all events up to time T have arrived. For example, a join or window operator uses the minimum watermark across its inputs as its own watermark, meaning it waits until each input partition has seen time T before considering time T complete. This ensures the operator doesn’t finalize a window or join until all streams are caught up through that timestamp. This prevents early results when some partitions are lagging.

Watermarks are generated at the sources and propagate through the topology, “carrying” the event-time clock forward. The watermark mechanism effectively creates a global notion of event-time progress: an operator’s watermark is the min of its inputs’ watermarks, meaning it implicitly waits for the slowest source. This is a form of distributed coordination – all parallel tasks broadcast their progress so that none advances past the slowest. As Kafka Streams' engineer Matthias Sax explains, watermarks “broadcast time information of all upstream operators” to downstream consumers, which prevents closing a window too early because a lagging partition will withhold its watermark. In other words, the entire dataflow is time-synchronized: a window is only closed when every parallel source has advanced beyond its end, guaranteeing that no late data was missed. Similarly, Flink’s checkpoint barriers act as a global synchronization point for state consistency.

Kafka Streams, similarly, tracks progress by committing Kafka offsets and using local state stores; it doesn’t use explicit watermarks but advances a stream-time clock based on data timestamps and a configured grace period for out-of-order data. In effect, Kafka Streams waits a fixed interval for late events rather than coordinating on a global watermark. However, it still uses periodic commit markers (changelog flushes and offset commits) to record progress in processing.

Stateless Distributed Tasks (e.g., Iceberg Writers)

In stateless systems, individual workers generally process independent chunks of data without maintaining a cross-record state in memory. Instead of keeping track of shared state or intermediate results, they monitor their progress by referencing positions in the input or output, coordinating at the finalization stage.

For example, in Apache Iceberg’s distributed writing, multiple writers consume portions of the dataset, such as partitions or batches of records, and produce output files independently. These writers do not need to share state or intermediate results during processing. Rather than using watermarks to track progress, they rely on task completion and associated metadata. Each writer knows how far it has progressed in its input (using file offsets or batch counts) and generates data files as output accordingly.

Coordination occurs during the commit phase: once each writer has finished producing their files, the system must atomically commit all these outputs to the table’s metadata. This process is managed by a commit protocol instead of in-stream markers. For instance, Iceberg employs an optimistic commit approach, using the compare-and-swap (CAS) mechanism where each writer—or a driver acting on their behalf—proposes a new table state that includes all new files. A central metadata store then performs an atomic swap of the old table snapshot for the new one, provided no conflicts are detected.

Progress is documented in the table’s metadata as a new snapshot with an incremented sequence number or version. If the writers follow a multi-writer, single-committer topology, you might need a lightweight coordination mechanism to identify the Parquet file candidates for commit in the following snapshot. However, no continuous watermarks or barriers are required during the writing process—writers can operate and complete their tasks independently. Ultimately, a single atomic commit captures their collective progress (for example, “all tasks have written X files; these files are now part of snapshot N”).

In summary, stateless jobs track progress through more straightforward methods, such as counters for records processed or outputs written to files, and rely on an external commit or coordination step to merge results.

Need for Global Consistency and Coordination with Barriers

In stateful streaming systems, barriers and watermarks are complementary mechanisms ensuring consistency and coordination across distributed tasks. Since you’re already familiar with watermarks, which track event-time progress to manage out-of-order events and finalize computations, this section will focus on barriers, explaining their role compared to watermarks and how they contribute to global coordination in systems like Apache Flink or Risingwave.

What are Barriers?

Barriers are special markers injected into the data stream to establish synchronization points across all parallel tasks in a streaming job. Unlike watermarks, which are tied to event time (when events occurred), barriers are aligned with processing time (when the system processes the data). They flow through the pipeline, ensuring all tasks reach a consistent state at specific points, typically for taking state snapshots during checkpoints.

In contrast to watermarks, which propagate based on event timestamps to signal when all events up to a specific time have likely arrived, barriers propagate based on their position in the stream. They are periodically introduced at the source tasks and move downstream, coordinating the system’s progress independently of the data’s temporal content.

Stateful Systems

Stateful streaming systems, such as those performing joins or time-based windows, rely on global coordination to maintain correctness. Watermarks achieve this for event-time by ensuring that operators like window aggregators only close a window when every parallel source has advanced beyond its end—effectively waiting for the slowest partition.

Barriers, on the other hand, provide a complementary form of global coordination for state consistency. When a barrier for a checkpoint (e.g., checkpoint N) is injected, it flows through the system, and each operator:

Waits for the barrier from all input channels.
Snapshots its current state (e.g., window contents or join buffers).
Forwards the barrier downstream.

This process ensures that the snapshot reflects a consistent state across all operators as if the system had paused at the same logical point in the stream. Downstream operators may buffer data from faster inputs until slower ones catch up to the barrier, introducing temporary synchronization. This global barrier creates a well-defined cut in the infinite event stream, enabling the system to recover from failures by rolling back to this consistent state without duplicates or missing data.

Stateless Systems

In contrast, stateless distributed tasks often do not require global barriers during processing. Since each worker operates independently on disjoint data segments and doesn’t maintain state that must align with others, they can run to completion without a synchronous handshake with other workers. There is typically no need for a global “watermark” or in-band barrier for correctness because tasks aren’t waiting on each other’s internal state. For example, two Spark or Flink tasks writing to different partitions of an Iceberg table can produce files concurrently without coordinating; they do not share an intermediate state that needs alignment. Global consistency is deferred to the commit stage: a coordinated step occurs only when finalizing the output. At that point, a single transaction (or commit protocol) gathers all results. In Iceberg’s case, the commit is achieved via an atomic swap of table metadata, often using a compare-and-swap (CAS) operation on a pointer to the metadata file. This operation is effectively a small global barrier at the end – it ensures that all writers’ outputs become visible together or not at all.

However, unlike the streaming checkpoint barrier, this does not require pausing data processing; it happens off the critical path once data writing is done. There is no continuous global synchronization during the write; tasks don’t periodically stop and align with each other. Because only one commit can succeed at a time, Iceberg’s design serializes these metadata updates to maintain consistency. This means global consistency is achieved without long-lived coordination across all tasks – it’s handled by the ACID commit protocol in a relatively short critical section. In summary, stateless architectures avoid global barriers in steady-state operation, leveraging atomic commits or distributed consensus only at logical transaction boundaries. They assume that as long as the final commit is consistent, the lack of coordination during processing won’t compromise correctness.

Performance, Scalability, and Fault Tolerance Implications

The stateful Vs. stateless approach leads to different trade-offs in performance, fault tolerance, and horizontal scalability:

Performance (Latency & Throughput)

Stateful streaming systems incur some overhead from coordination. Waiting for watermarks means latency is added for out-of-order or slow streams – the system must delay emitting results until it is confident all earlier data has arrived. For example, a join might buffer events and only output matches when the watermark passes the event’s timestamp plus allowed lateness, which could add delay. Checkpoint barriers can also introduce brief pauses (or buffer bloat) if one source is slower, causing others to wait during alignment. This coordination overhead can manifest as backpressure and longer checkpoint intervals if there is a skew in source speeds. That being said, frameworks like Flink mitigate this with asynchronous snapshots (unaligned checkpoints to reduce waiting) and by advancing watermarks intelligently (marking idle sources, etc.).

The throughput of stateful jobs can be very high (processing millions of events per second). However, sustaining exactly-once guarantees costs some CPU and network for barrier handling and state backup. In contrast, stateless tasks typically achieve lower latency per task because they simply process and output data without waiting on others. There’s no need to buffer for global time coordination, so they can pipeline data as fast as the data source allows. For instance, each Iceberg writer can immediately flush records to a file without coordinating with other writers until the end. The only performance cost comes at the commit point, usually small (updating a metadata file or catalog). The runtime overhead is negligible if commits are infrequent relative to data volume. However, if commit frequency or contention is high, it can become a bottleneck: only one commit can happen simultaneously, so many small commits can throttle the overall throughput of ingest. This issue is noted in Iceberg’s design – while atomic commits ensure consistency, they serialize updates and can limit throughput under high concurrency. In practice, stateful streaming might introduce a steady overhead (a few milliseconds of added latency per window, periodic checkpoint sync). In contrast, stateless batch tasks run freely and incur a one-time sync cost.

Fault Tolerance

Stateful frameworks use checkpointing to track the overall progress and support fault tolerance. In Flink, a coordinator periodically injects a checkpoint barrier into each source stream. These barriers flow down the pipelines (much like watermarks) and never overtake regular data. When an operator receives a barrier for checkpoint N on all its input channels, it snapshots its state (e.g., contents of windows or join buffers) as part of checkpoint N. The barrier thus creates a consistent cut across the distributed job: all state updates before the barrier are included in the checkpoint, and any events after belong to the next checkpoint. This mechanism (based on the Chandy-Lamport algorithm) aligns all tasks at the same input position for a snapshot, ensuring a globally consistent state image. For example, in a Flink streaming join, each side’s operator state (buffered events waiting to be matched) will be checkpointed at the same logical time so that no duplicates or missed pairs occur on recovery.

Stateful systems invest heavily in fault tolerance via their checkpointing mechanism. In Apache Flink, if a failure occurs, the system will roll back to the latest checkpoint: all operator states are restored to that snapshot, and source positions (like Kafka offsets) are reset to that point. This approach guarantees exactly-once state consistency – the state reflects no duplicate processing even if some events were processed twice (the state reset “erases” any trace of the second processing for those events). Because a consistent checkpoint captured progress across all tasks, the cluster can continue as if the failure never happened, apart from some reprocessing of buffered events after recovery. Kafka Streams similarly uses changelog topics to back up state stores and commits input offsets in a way that, combined with idempotent producers, can achieve effective once processing. The important point is that stateful jobs must restore input positions and in-memory state (e.g., contents of windows, join tables). This can be a heavy operation if the state is large, but it is necessary for correctness.

Stateless tasks have a simpler fault model: since they keep no substantial state in memory, a failed task can typically just be restarted and reprocess its input partition from scratch (or from the last known successful offset). For example, if an Iceberg writer task fails halfway, the overall job can relaunch that task and re-read the portion of data it was handling. There’s no user-visible state to restore beyond ensuring it doesn’t duplicate what a successful task already did. Exactly-once delivery in stateless output often relies on the commit protocol: if a task partially writes some files before failing, those files won’t be committed, so they won’t affect the table. The restarted task will produce a new set of files and attempt the commit again. Because the commit is atomic, the table will include either the first successful attempt or the retried attempt, but not a mix of both. This yields at least once processing with exactly-once output from the user’s perspective.

One challenge is cleaning up any orphaned outputs from failed attempts (e.g., temporary files that were written but never committed). Systems like Iceberg provide mechanisms like snapshot expiration and removing orphan files to clean or reuse these and design the commit so that uncommitted data is not referenced (thus harmless aside from storage usage). Stateless tasks simplify fault recovery (just re-run) but rely on external atomicity to avoid duplicates. In contrast, stateful systems handle recovery internally through coordinated snapshots.

Scalability

Stateful streaming can scale to high throughput by partitioning the state and processing across many operators. However, scaling out (changing the parallelism) can be complex. Since state is tied to specific key partitions, increasing parallelism might require state redistribution (e.g., splitting a RocksDB store or repartitioning Kafka partitions). Flink supports rescaling using a saved state (key groups can be reallocated to more workers), typically with a stop-and-restart from a savepoint, a coordinated operation. Moreover, stateful jobs might be limited by how much state each machine can handle; extremely large state (think terabytes) can be shared, but managing and checkpointing it will add overhead. Stateless systems are generally more trivially scalable: if you have more data or need more throughput, you can add more workers and split the input accordingly. Because tasks don’t need to coordinate or share state, you can run N tasks in parallel, almost linearly scaling the throughput.

For example, if ingesting 1000 files with 10 workers is slow, you can ramp up to 50 workers, and each will write 20 files in parallel, almost perfectly speeding up the job. There’s no need to repartition the in-memory or merge states – new tasks just take new input slices. The main limitation is often upstream or downstream bottlenecks: e.g., the storage system must handle the parallel writes, and the commit must handle an increased number of output files. Iceberg handles large numbers of files by organizing metadata (manifests), but extremely high task counts could lead to a huge commit proposal (thousands of file entries), which increases commit time. Additionally, the single-writer commit model means no matter how many parallel tasks, at commit time they serialize. If hundreds of writers all finish around the same second, they must take turns committing. This can cause contention at very high concurrency (commit throughput might flatten out as each commit is done individually).

In summary, stateless tasks offer easier horizontal scaling during processing, whereas scaling stateful jobs is more involved due to the need to manage and redistribute state. Stateful systems, however, can achieve continuous scalability for streaming workloads that run 24/7 by careful partitioning and using reliable brokers (e.g., Kafka partitions ensure input scalability and state backends ensure storage scalability).

Correctness and Coordination Challenges

Both paradigms face challenges to ensure correctness, but the nature of those challenges differs:

Stateful Systems – Ordering, Completeness, and Duplicate Handling

Because stateful operators depend on the history of events, handling out-of-order data and event time skew is a primary concern. Watermarks are critical: if watermarks are too aggressive (arrive too fast), the system might close a window or finish a join too early, missing late events. If they are too slow, the pipeline lags unnecessarily. Ensuring that watermarks correctly reflect when “all data up to time T” has been seen is hard in heterogeneous environments. For example, if one source or one key is much slower (skewed), it drags the global progress; this event time skew can cause many timers to pile up waiting and slow down checkpoints as well. Engineers must often tune watermark strategies (and allow an inevitable lateness) to balance latency vs. completeness.

Another challenge is exactly-once output: Flink’s mechanism makes the operator state exactly-once, but when outputting to external systems, one must integrate with the checkpoint protocol. Flink solves this for some sinks via a two-phase commit pattern: on each checkpoint, the sink tasks “pre-commit” their data (e.g., write to a temp location or transaction), and only on checkpoint success do they publish it. This ensures that if a checkpoint is aborted, those side effects are invisible, preventing duplicates. For example, a Flink sink to Kafka might write messages in a transaction tied to checkpoint X and only commit that Kafka transaction when checkpoint X is acknowledged globally. This kind of coordinated commit is complex but needs to be extended exactly once it guarantees end-to-end. Similarly, a join operator must avoid emitting duplicate join results after a failure; with checkpointing, the operator resets before it emits those results, so any re-emission is correct reprocessing, not a logical duplicate.

Deadlocks or stalls can occur if coordination isn’t handled carefully – e.g., if one operator awaits a barrier that never arrives because of a stuck source. Modern frameworks include timeouts or mark idle sources (so watermarks can advance even if one partition is silent). Stateful systems must also manage state evolution and upgrades, which require careful coordination (savepoints) to ensure that the new code interprets the old state correctly. In short, correctness in stateful streaming demands carefully designed coordination (watermarks, barriers, and recovery protocols) to handle all the tricky cases of ordering and failures.

Stateless Systems – Atomicity, Consistency, and Idempotency

In stateless distributed writes, the main challenge is ensuring that concurrent or repeated operations don’t corrupt the final result. Since tasks run independently, two tasks might incidentally attempt to write the same location or update the same record – the system design must prevent conflicts. Apache Iceberg’s approach to multi-writer correctness is performing a conflict check before committing and allowing only one commit at a time for a given table. The commit operation uses an atomic compare-and-swap: it verifies that the table’s metadata has not changed since the writers started (no other commit has happened), and if it has changed, the commit is rejected. The failing job then must rebase – it reloads the latest table state and re-checks that its data doesn’t conflict, then tries again.

For example, if two jobs both try to add files to the same table snapshot, one will succeed; the other will see its commit fail, but since its operation was an append, it can safely merge its new files with the latest snapshot and retry (no true data conflict).

If it were a delete operation, the conflict logic would check if the data it intended to delete still exists in the new snapshot. This optimistic concurrency control is powerful but requires careful engineering: writers must correctly detect conflicts (e.g., overlapping data) and handle retries. If conflicts are frequent, starvation or long wait times are possible.

Another challenge is atomic visibility: we must ensure that readers never see a half-committed state. Iceberg’s design of a single metadata pointer update (or atomic rename of a file) ensures that a table moves from snapshot N to N+1 in one indivisible step. If the commit fails, none of the new data is considered part of the table; if it succeeds, all new files are visible. Achieving this atomic swap often relies on underlying storage guarantees (like atomic rename on HDFS or compare-and-swap in a database/catalog). This can be tricky in distributed file systems without transactions (some systems use global locks or Hive metastores to coordinate). Idempotency is also essential: writing data files should not produce divergent results if a commit fails and is retried. Typically, each writer uses a unique file name or partition, so even if it writes files twice, only one set will be referenced in the final metadata (the other set can be garbage-collected). This means designing the job such that its output is deterministic or has a way to prune duplicates.

Finally, stateless systems must consider visibility timing: with no global barrier, one task might finish much earlier than another – if the commit waits for all, that’s fine (no one sees partial data). However, if intermediate results were somehow visible (usually not in a well-designed pipeline), consistency issues could be caused. In batch writes, intermediate results are kept isolated (e.g., under a temporary folder or not added to the table) until the commit. Overall, the stateless approach pushes coordination to the edges (commit time), which simplifies runtime but puts the burden on the transactional guarantees and conflict resolution logic.

Conclusion

In summary, stateful distributed systems (like stream processors maintaining joins or windows) use in-band mechanisms such as watermarks and checkpoint barriers to coordinate progress, achieving strong consistency (event-time alignment and exactly-once state) at the cost of additional synchronization and state management overhead. They require careful design to handle out-of-order events and to snapshot state across many nodes, but they enable powerful continuous computations on unbounded data with correctness guarantees. Stateless distributed systems (like parallel batch writers or microservices that don’t carry state across calls) push complexity to the edges: they favor independent, parallel operation with minimal runtime coordination and rely on atomic commit protocols or external consensus only at decision points (like committing a transaction or publishing results). This yields easier scalability and often lower per-event overhead, but it demands robust commit logic to avoid consistency anomalies.

Each approach has its strengths and use cases: Stateful synchronization is ideal for streaming analytics where event order matters and one needs results that reflect a globally consistent cut of the streams, such as financial transaction monitoring or real-time fraud detection. Stateless, loosely coordinated processing is well-suited for embarrassingly parallel tasks like ETL writes or partitioned computations where each unit can be isolated. Many real-world systems blend the two – for example, using stateless processing for throughput but adding just enough coordination (watermarks, grace periods, or commit protocols) to meet correctness requirements. Understanding these mechanisms helps in choosing the proper framework and design: if you need exact event-time joins across streams, a stateful engine with watermarks and checkpoints is appropriate; if you need to ingest or transform data across a cluster into a data lake, a stateless approach with a reliable commit protocol (Iceberg, Delta) will simplify scaling and fault recovery while still providing atomicity. The trade-offs ultimately revolve around the perennial question in distributed systems: how to balance consistency, availability, and performance – stateful vs. stateless paradigms offer different answers along this spectrum, and modern data infrastructure often employs a combination to get the best of both worlds based on the use case.