Everyone today is thinking about and building microservices – I included. Microservices, from its core principles and in its true context, is a distributed system.
In a microservice architecture, a distributed transaction is an outdated approach that causes severe scalability issues. Modern patterns that rely on asynchronous data replication or model distributed write operations as orchestrated or choreographed SAGAs avoid these problems. I will try to explain orchestrated saga in great detail in this article.
Distributed Transaction is one that spans multiple databases across the network while preserving ACID properties. If a transaction requires service A and B both write to their own database, and rollback if either A or B fails, then it is a distributed transaction.
To see why a distributed transaction is hard, let’s take a look at an extremely common real-life examples: e-Commerce application.
Our e-Commerce application contains different microservices. Each service has its own database. Some business transactions, however, span multiple service so you need a mechanism to ensure data consistency across services.
Let’s say, we have an order service, an inventory service and a payment service. The boundary is clear, order service takes the order, inventory service allocates the stock, while the payment service deals only with payment and refund related issues.
A single order transaction = creating an order + reserve stock + payment, in any order. Failure at any point during the transaction should revert everything before it.
Payment failure should cause the inventory service to release the reserved stocks, and the order service to cancel the order.
if our e-commerce application designed as per above then there are some serious flaws with this approach, which are:
- The fallacy of the distributed system – Relies heavily on the stability of the network throughout the transaction.
- Transactions could end up in an indeterminate state.
- Fragile to topology changes – Each system has explicit knowledge of its dependency.
Imagine payment service calls some 3rd party API like PayPal or Stripe, the transaction is effectively out of your control. What happens if the API is down or throttled. Or a network disruption along the network path. Or one of the 3 services is down.
if the inventory service managed to reserve some stocks, but the payment service timed out for whatever reason, we cannot say that the payment has failed.
If we treat timeout as a failure, we would have rolled back the stock reservation and cancel the order, but the payment actually did go through, perhaps the external payment API is taking more time than usual or network disruption, so we cut off the connection before payment service has a chance to respond. Now the transaction is in Paid and Stock Released state simultaneously.
This is really painful, isn’t it? Your Production support team will be very busy handling such failed transaction tickets if your buyers face such issues frequently while placing an order. What if your buyer get fade up and orders from other competitors. Such small incidents can lead to HUGE financial loss.
As an Software Architect, you must think of such a problem and design your microservices & application in such a way that it does not leave data inconsistency during the transaction.
We can overcome this problem of data consistency between databases by using Saga Pattern. It models the globally distributed transaction as a series of local ACID transactions, with compensation as a rollback mechanism. The global transaction move between different defined states depending on the result of the local transaction execution.
There are two ways of coordination sagas:
- Choreography – each local transaction publishes domain events that trigger local transactions in other services
- Orchestration – an orchestrator (object) tells the participants what local transactions to execute
The difference is the method of state transition, we will talk about the “Orchestration” in this post.
Orchestration Based Saga
An orchestration-based saga has an orchestrator that tells the saga’s participants what to do. The saga orchestrator communicates with the participants using request/asynchronous response-style interaction. To execute a saga step, it sends a command message to a participant telling it what operation to perform. After the saga participant has performed the operation, it sends a reply message to the orchestrator. The orchestrator then processes the reply message and determines which saga step to perform next.
This type of Saga is a natural evolution from the naive implementation because it can be incrementally adopted.
Or a transaction manager is a coarse-grained service that exists only to facilitate the Saga. It is responsible for coordinating the global transaction flow, that is, communicating with the appropriate services that involve in the transaction, and orchestrate the necessary compensation action. The orchestrator is aware of the globally distributed transaction, but the individual services are only aware of their local transaction.
A service’s local ACID transaction should ideally consist of two steps:
- Local business logic
- Notify broker of its work done
Instead of calling another service in the middle of the transaction, let the service do its job within its scope and publishes the status through a message broker. That’s all. No long, synchronous, blocking call somewhere in the middle of the transaction. You can use any message broker (Event Hub or Kafka) as per your need and dependency on your cloud platform.
To ensure that the two steps are in a single ACID transaction, we can make use of the Event sourcing pattern. When we write the result of the local transaction into the database, the work done message is included as part of the transaction as well, into an event store table.
NOTE: Applications persist events in an event store, which is a database of events. The store has an API for adding and retrieving an entity’s events. The event store also behaves like a message broker. It provides an API that enables services to subscribe to events. When a service saves an event in the event store, it is delivered to all interested subscribers.
Once a service has done its work, it publishes a message to the broker (could be a success or failure message). If the Payment service publishes a failure message, then the orchestrator must be able to “rollback” actions done by the Order and Inventory service.
In this case, each service must implement its version of the compensating method. Order service which provides a OrderCreate method must also provide a OrderCancel compensating method. Inventory service which provides a ReserveStock method must also provide a ReleaseStock compensating method. Payment service which provides a Pay method must also provide a Refund compensating method.
The orchestrator then listens to the failure events and publishes a corresponding compensating event. The above image shows Orchestrator publishes respective compensation events and how each services rollback their operation to compensate payment failed requests.
This is not a remedy to apply “traditional transaction” at the level of a distributed system. Rather, it models transactions as a state machine, with each service’s local transaction acting as a state transition function.
It guarantees that the transaction is always in one of the many defined states. In the event of network disruption, you can always fix the problem and resume the transaction from the last known state.
I hope this will help !!!
NOTE — References taken from Microservices.io