Event Sourcing patterns: Replay side effect handling

In last post of the series we've discussed how to handle side effects that might happen as a result of event stream processing. In some cases we don't want to trigger these side effects at all and these include event stream replays to rebuild the projection.

Problem statement

How to perform a replay of an event stream and don't retrigger already performed side effects?

Let's consider following example. In our fraud use case the requirements have changed and now we need to look at larger window of transactions. Instead of looking at last 5 minutes and only 2 transactions, now the window was changed to 72 hours and all transactions that happened during this time period.

In order deliver a solution supporting these new requirements the team has decided to roll out new, separe consumer handling the process. The projection supporting this process will have to be built from scratch. Fortunately event sourced systems are designed to do it. When the new component is released, the consumer needs to process all events from the start and backfill the data.

The problem that we have to face now is how to make sure the fraud alerts are not sent for old events, but only for the newly created ones? There is a couple of ways of handling this problem that can be summed up with:

  • replace the component triggering side effects with a noop component
  • make the event handler smart and aware when to trigger side effects based on event metadata

Solutions

Noop component

The way noop (no operation) component works is that we change the configuration of the service to make sure it's aware of a current operation mode. The mode can be either set to a normal or replay.

In a normal mode the service passes (injects, curries) an actual working instance of a side-effect triggering component and everything works as usual. In a replay mode instead of a normal component, a noop component is passed around and the side effects are not triggered.

In our example the side-effects are triggered by message producer (that is sending fraud notifications to some message broker), so this is the component that needs to be replaced with a noop implementation.

The downside of this approach is that we probably have to manually switch the configuration of the service in order to enable or disable publication of messages. It is usually fine when we start the replay, but might be a bit tricky to time when the replay ends and we start receiving regular messages.

One of the possible ways of working around it is to consume the replayed messages from a separate queue/topic and wait until it's drained. After than we can resume consumption from the usual source and start triggering side effects. This approach should help to mitigate the issue of switching back from replay to normal model.

Smart handler

If the Noop component approach is too limited for the needs we have an alternative. The idea behind smart handler is that the code processing the events (and then triggering side effects) is aware whether the message was received because of a replay or because it was newly created.

This information can be passed around in couple of different ways. Some of the event store implementations have metadata field, so the information about replay can be dynamically added there. Other way would be to add it to the headers of the message received from a broker. The specific way of achieving this will depend on how the event delivery is implemented.

In any case the handler will now be able to branch the logic and decide if the side effects should be performed or not. This is more work than the noop alternative but also gives us more flexibility.

Fist benefit is that there are no configuration changes required to switch the consumer to and from replay mode. As the messages are consumer the logic is smart enough to figure out what needs to happen. When the replay ends it can carry on consuming regular messages.

Second benefit is that now we are able to replay a single fine-grained stream, instead of the whole event store. This will not work for every use case but only for those where data and logic is partitioned per stream. It will also probably require some work to dynamically clean up data stored for that specific stream.

It might seem not likely that such approach will be worth the investment, but for some large scale systems replays can take a very long time. The extreme case I've worked on was 2 weeks of full-throttle replay. If instead we can identify streams that need replaying (because of a bug for example) it can greatly reduce the excution time.

The challenge of the smart handler approach is that something needs to be responsible for adding the extra metadata or header to the event. Depending on specific event store implementation it can be either done in the component pushing messages to the message queue, or it the consumer itself. If the event store implementation gives global ordering of messages, then deciding if message is replayed can be done based on the global offset. If we don't have global ordering then the alternative would be to have an offset per stream, or make the decision based on the event timestamp.

Summary

In order to deal with the challange of not triggering side effects during the replay of the event stream either Noop component or Smart handler strategies can be used. The former is much simpler in implementation, but might be a bit challenging when it comes to switching back to normal mode after replay is done. The latter requires more changes but gives greater flexibility and allows for fine-grained stream replays.