Distributed Transaction in Microservices using SAGA Pattern

Everyone today is thinking about and building microservices – I included. Microservices, from its core principles and in its true context, is a distributed system.

In a microservice architecture, a distributed transaction is an outdated approach that causes severe scalability issues. Modern patterns that rely on asynchronous data replication or model distributed write operations as orchestrated or choreographed SAGAs avoid these problems. I will try to explain orchestrated saga in great detail in this article.

Distributed Transaction is one that spans multiple databases across the network while preserving ACID properties. If a transaction requires service A and B both write to their own database, and rollback if either A or B fails, then it is a distributed transaction.

Problem statement

To see why a distributed transaction is hard, let’s take a look at an extremely common real-life examples: e-Commerce application.

Our e-Commerce application contains different microservices. Each service has its own database. Some business transactions, however, span multiple service so you need a mechanism to ensure data consistency across services.

Let’s say, we have an order service, an inventory service and a payment service. The boundary is clear, order service takes the order, inventory service allocates the stock, while the payment service deals only with payment and refund related issues.

A single order transaction = creating an order + reserve stock + payment, in any order. Failure at any point during the transaction should revert everything before it.

Payment failure should cause the inventory service to release the reserved stocks, and the order service to cancel the order.

if our e-commerce application designed as per above then there are some serious flaws with this approach, which are:

  • The fallacy of the distributed system – Relies heavily on the stability of the network throughout the transaction.
  • Transactions could end up in an indeterminate state.
  • Fragile to topology changes – Each system has explicit knowledge of its dependency.

Imagine payment service calls some 3rd party API like PayPal or Stripe, the transaction is effectively out of your control. What happens if the API is down or throttled. Or a network disruption along the network path. Or one of the 3 services is down.

if the inventory service managed to reserve some stocks, but the payment service timed out for whatever reason, we cannot say that the payment has failed.

If we treat timeout as a failure, we would have rolled back the stock reservation and cancel the order, but the payment actually did go through, perhaps the external payment API is taking more time than usual or network disruption, so we cut off the connection before payment service has a chance to respond. Now the transaction is in Paid and Stock Released state simultaneously.

This is really painful, isn’t it? Your Production support team will be very busy handling such failed transaction tickets if your buyers face such issues frequently while placing an order. What if your buyer get fade up and orders from other competitors. Such small incidents can lead to HUGE financial loss.

As an Software Architect, you must think of such a problem and design your microservices & application in such a way that it does not leave data inconsistency during the transaction.

Solution

We can overcome this problem of data consistency between databases by using Saga Pattern. It models the globally distributed transaction as a series of local ACID transactions, with compensation as a rollback mechanism. The global transaction move between different defined states depending on the result of the local transaction execution.

There are two ways of coordination sagas:

  • Choreography – each local transaction publishes domain events that trigger local transactions in other services
  • Orchestration – an orchestrator (object) tells the participants what local transactions to execute

The difference is the method of state transition, we will talk about the “Orchestration” in this post.

Orchestration Based Saga

An orchestration-based saga has an orchestrator that tells the saga’s participants what to do. The saga orchestrator communicates with the participants using request/asynchronous response-style interaction. To execute a saga step, it sends a command message to a participant telling it what operation to perform. After the saga participant has performed the operation, it sends a reply message to the orchestrator. The orchestrator then processes the reply message and determines which saga step to perform next.

This type of Saga is a natural evolution from the naive implementation because it can be incrementally adopted.

Orchestrator (Or)

Or a transaction manager is a coarse-grained service that exists only to facilitate the Saga. It is responsible for coordinating the global transaction flow, that is, communicating with the appropriate services that involve in the transaction, and orchestrate the necessary compensation action. The orchestrator is aware of the globally distributed transaction, but the individual services are only aware of their local transaction.

Message broker

A service’s local ACID transaction should ideally consist of two steps:

  1. Local business logic
  2. Notify broker of its work done

Instead of calling another service in the middle of the transaction, let the service do its job within its scope and publishes the status through a message broker. That’s all. No long, synchronous, blocking call somewhere in the middle of the transaction. You can use any message broker (Event Hub or Kafka) as per your need and dependency on your cloud platform.

Event sourcing

To ensure that the two steps are in a single ACID transaction, we can make use of the Event sourcing pattern. When we write the result of the local transaction into the database, the work done message is included as part of the transaction as well, into an event store table.

NOTE: Applications persist events in an event store, which is a database of events. The store has an API for adding and retrieving an entity’s events. The event store also behaves like a message broker. It provides an API that enables services to subscribe to events. When a service saves an event in the event store, it is delivered to all interested subscribers.

Compensation

Once a service has done its work, it publishes a message to the broker (could be a success or failure message). If the Payment service publishes a failure message, then the orchestrator must be able to “rollback” actions done by the Order and Inventory service.

In this case, each service must implement its version of the compensating method. Order service which provides a OrderCreate method must also provide a OrderCancel compensating method. Inventory service which provides a ReserveStock method must also provide a ReleaseStock compensating method. Payment service which provides a Pay method must also provide a Refund compensating method.

The orchestrator then listens to the failure events and publishes a corresponding compensating event. The above image shows Orchestrator publishes respective compensation events and how each services rollback their operation to compensate payment failed requests.

Conclusion

This is not a remedy to apply “traditional transaction” at the level of a distributed system. Rather, it models transactions as a state machine, with each service’s local transaction acting as a state transition function.

It guarantees that the transaction is always in one of the many defined states. In the event of network disruption, you can always fix the problem and resume the transaction from the last known state.

I hope this will help !!!

NOTE — References taken from Microservices.io

ASP.NET Core : Performance Testing Techniques

In this article I will be sharing some useful links and video from ON.NET SHOW which explain us performance testing techniques. I hope this could be useful to you while you are trying to perform any such performance testing on your applications.

ASP.NET Core Series: Performance Testing Techniques video

Reference links

You need to study this material to get more insights of performance testing techniques for modern .NET core 3.x applications.

I hope will help you to start !!!

SQL Server- Order By With CASE parameter

Let us discuss a scenario, where we want to order the table ascending and descending based on the sort direction and also based on columns passed in the variable. How will you do that?

Answer: using Order By CASE option.

Here is the sample SQL which shows us, how we can do that easily.

DECLARE @SortDirection VARCHAR(10);
DECLARE @SortBy VARCHAR(100);
SET @SortDirection = 'D';
SET @SortBy = 'InvoiceID';
SELECT *
FROM [Invoices]
ORDER BY
    CASE WHEN @SortDirection = 'A' THEN
        CASE
           WHEN @SortBy = 'OrderID' THEN OrderID
           WHEN @SortBy = 'InvoiceID' THEN InvoiceID 
        END
    END ASC
    , CASE WHEN @SortDirection = 'D' THEN
        CASE
           WHEN @SortBy = 'OrderID' THEN OrderID
           WHEN @SortBy = 'InvoiceID' THEN InvoiceID  
        END
    END DESC;

Simple and very useful. Isn’t is?

I hope this will help !!!

Choose between Azure Event Grid and Event Hubs

This article describes the basic understanding of these two services, and help us understand which one to choose for our application. Let us start with basic understanding of these two services.

Event Grid

Event Grid is a fully-managed event routing service and the first of its kind. Azure Event Grid greatly simplifies the development of event-based applications and simplifies the creation of serverless workflows. Using a single service, Azure Event Grid manages all routing of events from any source, to any destination, for any application.

It uses a publish-subscribe model. Publishers emit events, but have no expectation about which events are handled. Subscribers decide which events they want to handle.

Event Grid supports dead-lettering for events that aren’t delivered to an endpoint.

It has the following characteristics:

dynamically scalable
low cost
serverless
at least once delivery

Event Hubs

Event Hubs is a Big Data streaming platform and event ingestion service, capable of receiving and processing millions of events per second. Event Hubs can process and store events, data, or telemetry produced by distributed software and devices

It facilitates the capture, retention, and replay of telemetry and event stream data. The data can come from many concurrent sources. Event Hubs allows telemetry and event data to be made available to a variety of stream-processing infrastructures and analytics services. It is available either as data streams or bundled event batches.

This service provides a single solution that enables rapid data retrieval for real-time processing as well as repeated replay of stored raw data. It can capture the streaming data into a file for processing and analysis.

It has the following characteristics:

low latency
capable of receiving and processing millions of events per second
at least once delivery

In some cases, we use the services side by side to fulfill distinct roles. For example, an e-commerce site can use Service Bus to process the order, Event Hubs to capture site telemetry, and Event Grid to respond to events like an item was shipped.

Choose Event Grid when

  • Simplicity: It is straightforward to connect sources to subscribers in Event Grid.
  • Advanced filtering: Subscriptions have close control over the events they receive from a topic.
  • Fan-out: You can subscribe to an unlimited number of endpoints to the same events and topics.
  • Reliability: Event Grid retries event delivery for up to 24 hours for each subscription.
  • Pay-per-event: Pay only for the number of events that you transmit.

Choose Event Hubs when

  • You need to support authenticating a large number of publishers.
  • You need to save a stream of events to Data Lake or Blob storage.
  • You need aggregation or analytics on your event stream.
  • You need reliable messaging or resiliency.

I hope this will help !!!

NOTE — Reference taken from Microsoft Learning Site

Choose between Azure Queue services – Queue Storage, Service Bus Queue, Service Bus Topics

Suppose you are planning the architecture for your music-sharing application. You want to ensure that music files are uploaded to the web api reliably from the mobile app. we then want to deliver the details about new songs directly to the app when an artist adds new music to their collection. This is a perfect use of a message-based system and Azure offers three solutions to this problem:

  • Azure Queue Storage
  • Azure Service Bus Queue
  • Azure Service Bus Topics

Each has a slightly different feature set, which means you can choose one or the other, or use both, depending on the problem you are solving.

Choose Service Bus Topics if

you need multiple receivers to handle each message

Choose Service Bus queues if

You need an At-Most-Once delivery guarantee.
You need a FIFO guarantee.
You need to group messages into transactions.
You want to receive messages without polling the queue.
You need to provide a role-based access model to the queues.
You need to handle messages larger than 64 KB but less than 256 KB.
Your queue size will not grow larger than 80 GB.
You would like to be able to publish and consume batches of messages.

Queue storage isn’t quite as feature-rich, but if you don’t need any of those features, it can be a simpler choice. In addition, it’s the best solution if your app has any of the following requirements.

Choose Queue storage if

You need an audit trail of all messages that pass through the queue.
You expect the queue to exceed 80 GB in size.
You want to track progress for processing a message inside of the queue.

A queue is a simple, temporary storage location for messages sent between the components of a distributed application. Use a queue to organize messages and gracefully handle unpredictable surges in demand.

Summary

Use Storage queues when you want a simple and easy-to-code queue system. For more advanced needs, use Service Bus queues. If you have multiple destinations for a single message, but need queue-like behavior, use topics.

I hope this will help !!!

NOTE — Reference taken from Microsoft Learning Site