Distributed Transaction in Microservices using SAGA Pattern

Everyone today is thinking about and building microservices – I included. Microservices, from its core principles and in its true context, is a distributed system.

In a microservice architecture, a distributed transaction is an outdated approach that causes severe scalability issues. Modern patterns that rely on asynchronous data replication or model distributed write operations as orchestrated or choreographed SAGAs avoid these problems. I will try to explain orchestrated saga in great detail in this article.

Distributed Transaction is one that spans multiple databases across the network while preserving ACID properties. If a transaction requires service A and B both write to their own database, and rollback if either A or B fails, then it is a distributed transaction.

Problem statement

To see why a distributed transaction is hard, let’s take a look at an extremely common real-life examples: e-Commerce application.

Our e-Commerce application contains different microservices. Each service has its own database. Some business transactions, however, span multiple service so you need a mechanism to ensure data consistency across services.

Let’s say, we have an order service, an inventory service and a payment service. The boundary is clear, order service takes the order, inventory service allocates the stock, while the payment service deals only with payment and refund related issues.

A single order transaction = creating an order + reserve stock + payment, in any order. Failure at any point during the transaction should revert everything before it.

Payment failure should cause the inventory service to release the reserved stocks, and the order service to cancel the order.

if our e-commerce application designed as per above then there are some serious flaws with this approach, which are:

  • The fallacy of the distributed system – Relies heavily on the stability of the network throughout the transaction.
  • Transactions could end up in an indeterminate state.
  • Fragile to topology changes – Each system has explicit knowledge of its dependency.

Imagine payment service calls some 3rd party API like PayPal or Stripe, the transaction is effectively out of your control. What happens if the API is down or throttled. Or a network disruption along the network path. Or one of the 3 services is down.

if the inventory service managed to reserve some stocks, but the payment service timed out for whatever reason, we cannot say that the payment has failed.

If we treat timeout as a failure, we would have rolled back the stock reservation and cancel the order, but the payment actually did go through, perhaps the external payment API is taking more time than usual or network disruption, so we cut off the connection before payment service has a chance to respond. Now the transaction is in Paid and Stock Released state simultaneously.

This is really painful, isn’t it? Your Production support team will be very busy handling such failed transaction tickets if your buyers face such issues frequently while placing an order. What if your buyer get fade up and orders from other competitors. Such small incidents can lead to HUGE financial loss.

As an Software Architect, you must think of such a problem and design your microservices & application in such a way that it does not leave data inconsistency during the transaction.

Solution

We can overcome this problem of data consistency between databases by using Saga Pattern. It models the globally distributed transaction as a series of local ACID transactions, with compensation as a rollback mechanism. The global transaction move between different defined states depending on the result of the local transaction execution.

There are two ways of coordination sagas:

  • Choreography – each local transaction publishes domain events that trigger local transactions in other services
  • Orchestration – an orchestrator (object) tells the participants what local transactions to execute

The difference is the method of state transition, we will talk about the “Orchestration” in this post.

Orchestration Based Saga

An orchestration-based saga has an orchestrator that tells the saga’s participants what to do. The saga orchestrator communicates with the participants using request/asynchronous response-style interaction. To execute a saga step, it sends a command message to a participant telling it what operation to perform. After the saga participant has performed the operation, it sends a reply message to the orchestrator. The orchestrator then processes the reply message and determines which saga step to perform next.

This type of Saga is a natural evolution from the naive implementation because it can be incrementally adopted.

Orchestrator (Or)

Or a transaction manager is a coarse-grained service that exists only to facilitate the Saga. It is responsible for coordinating the global transaction flow, that is, communicating with the appropriate services that involve in the transaction, and orchestrate the necessary compensation action. The orchestrator is aware of the globally distributed transaction, but the individual services are only aware of their local transaction.

Message broker

A service’s local ACID transaction should ideally consist of two steps:

  1. Local business logic
  2. Notify broker of its work done

Instead of calling another service in the middle of the transaction, let the service do its job within its scope and publishes the status through a message broker. That’s all. No long, synchronous, blocking call somewhere in the middle of the transaction. You can use any message broker (Event Hub or Kafka) as per your need and dependency on your cloud platform.

Event sourcing

To ensure that the two steps are in a single ACID transaction, we can make use of the Event sourcing pattern. When we write the result of the local transaction into the database, the work done message is included as part of the transaction as well, into an event store table.

NOTE: Applications persist events in an event store, which is a database of events. The store has an API for adding and retrieving an entity’s events. The event store also behaves like a message broker. It provides an API that enables services to subscribe to events. When a service saves an event in the event store, it is delivered to all interested subscribers.

Compensation

Once a service has done its work, it publishes a message to the broker (could be a success or failure message). If the Payment service publishes a failure message, then the orchestrator must be able to “rollback” actions done by the Order and Inventory service.

In this case, each service must implement its version of the compensating method. Order service which provides a OrderCreate method must also provide a OrderCancel compensating method. Inventory service which provides a ReserveStock method must also provide a ReleaseStock compensating method. Payment service which provides a Pay method must also provide a Refund compensating method.

The orchestrator then listens to the failure events and publishes a corresponding compensating event. The above image shows Orchestrator publishes respective compensation events and how each services rollback their operation to compensate payment failed requests.

Conclusion

This is not a remedy to apply “traditional transaction” at the level of a distributed system. Rather, it models transactions as a state machine, with each service’s local transaction acting as a state transition function.

It guarantees that the transaction is always in one of the many defined states. In the event of network disruption, you can always fix the problem and resume the transaction from the last known state.

I hope this will help !!!

NOTE — References taken from Microservices.io

Choose whether to use message or event

What are the messages?

  • A message contains raw data, produced by one component, that will be consumed by another component.
  • A message contains the data itself, not just a reference to that data.
  • The sending component expects the message content to be processed in a certain way by the destination component. The integrity of the overall system may depend on both sender and receiver doing a specific job.

For example, suppose a user uploads a new song by using the mobile music-sharing app. The mobile app must send that song to the web API that runs in Azure. The song media file itself must be sent, not just an alert that indicates that a new song has been added. The mobile app expects that the web API will store the new song in the database and make it available to other users. This is an example of a message.

What are Events?

  • An event is a lightweight notification that indicates that something happened.
  • The event may be sent to multiple receivers, or to none at all.
  • Events are often intended to “fan out,” or have a large number of subscribers for each publisher.
  • The publisher of the event has no expectation about the action a receiving component takes.
  • Some events are discrete units and unrelated to other events.
  • Some events are part of a related and ordered series.

For example, suppose the music file upload has been completed, and the new song has been added to the database. In order to inform users of the new file, the web API must inform the web front end and mobile app users of the new file. The users can choose whether to listen to the new song, so the initial notification does not include the music file but only notifies users that the song exists. The sender does not have a specific expectation that the event receivers will do anything particular in the responsiveness of receiving this event.

How to choose messages or events?

A single application is likely to use events for some purposes and messages for others. Before you choose, you must analyze your application’s architecture and all its use cases, to identify all the different purposes where its components have to communicate with each other.

For each communication, consider the following question: Does the sending component expect the communication to be processed in a particular way by the destination component?

If the answer is yes, choose to use a message. If the answer is no, you may be able to use events.

Q1: You have a distributed application with a web service that authenticates users. When a user logs on, the web service notifies all the client applications so they can display that user’s status as “Online”. Is the login notification an example of a message or an event?

A1: The login notification is an event. It contains only a simple piece of status data and there is no expectation by the authentication service for the client applications to react to the notice in any particular way.

Q2: you have a distributed application with a web service that lets users manage their accounts. Users can sign up, edit their profile, and delete their account. When a user deletes their account, your web service notifies your data layer so the user’s data will be removed from the database. Is the delete-account notification an example of a message or an event?

A2: The delete-account notification is a message. The key factor is that the web service has an expectation about how the data layer will process the message. The data layer must remove the user’s data from the database for the system to function correctly. Note that the message itself contains only simple information so this aspect of the communication could be considered an event. However, the fact that the web service requires the data layer to handle the notification in a specific way is sufficient to make this a message.

I hope this will help !!!

NOTE — Reference taken from Microsoft Learning Site

Choosing between Apache Kafka, Amazon Kinesis, Microsoft Event Hubs and Google Pub/Sub

Distributed log technologies such as Apache Kafka, Amazon Kinesis, Microsoft Event Hubs and Google Pub/Sub have matured in the last few years, and have added some great new types of solutions when moving data around for certain use cases.

To help you choose which one to use when, here is a decision flow chart. It is more of a general guide to which technologies to consider, and a few decision points to help you eliminate some technologies.

Ultimately Apache Kafka is the most flexible, and has great performance characteristics. But Apache Kafka clusters are challenging to setup, scale, and manage in production.

If you wish to go for a fully managed solution – Kinesis, Event Hubs and pub/sub offer alternative options depending on whether ordering and blob size are important to you.

On AWS, Amazon Managed Streaming for Apache Kafka (Amazon MSK) service available which is fully managed, highly available, and secure. You can also evaluate this option along with above all if you are using AWS Cloud for your workloads.

Confluent cloud has also re-engineered Apache Kafka for the cloud. They have given more focus on building apps and not managing clusters with a scalable, resilient and secure event streaming platform. It has made event streaming with Kafka simple on AWS, Azure and GCP clouds.

Hopefully this blog post will help you choose the technology that is right for you.

Patterns for Great Architecture

These concepts can be applied to any Application which is a good candidate for a SaaS application.

Contents:

  1. SaaS- Introduction
  2. SaaS -Challenges
  3. SaaS -Solution to challenges

SaaS- Introduction

A SaaS application can be defines as any “Software deployed as a service and accessed using internet technologies”. In order to realize a SaaS solution (in fact any solution) we need to do two things:

1) Build our application which can be used as a service over internet by different clients. This could range from a programmatic service (accessed programmatically by other softwares) to a stand-alone application used by different clients. This step is involves us to follow certain steps that are different than developing a normal “On-Premise” or simply called a non-SaaS application.

Typically when we develop an application (a non-SaaS application) then we do assume two things:

  • Application will be installed at client side (e.g., a desktop application) be it through a CD or downloading via internet. This also implies that the client will have to do the maintenance (like upgrading to new version, apply patches, maintain databases etc.) of application even after he has bought the application (not true for Application Service Provider model). This fact is not true in terms of a web application. So we can say that any web application forms a different category of application since it is not installed at client side. But still we cannot call ANY web application a SaaS application.
  • Application will be served to requirement of a particular client. If another client needs some changes in the application, we will make changes in the application source code and then run another instance of that application for the new client. So in nutshell, we assume that one application is meant for one client only. This assumption is true even for a web application.

Here comes the difference! The core of a SaaS application is based on the opposites of these two (above mentioned) assumptions. The following are assumptions that are true for a SaaS application.

  • Application will be typically deployed and maintained by a hosting provider. So, clients don’t have to invest in terms of buying hardware resources and employing the IT staff to manage the application. This will be done by the hosting provider.
  • Application will be accessed by clients using internet. In special cases where a SaaS application is hosted inside the enterprise itself this is not true but still the clients accessing the hosted application will still use internet technologies to access the application.
  • A single SaaS application (in fact a single database too) serves more than one client having different needs. So the application will be designed in such a way that only a single application instance will be able to provide different functionalities to different clients. This model is more popularly known as Multi-tenant model.

Among these assumptions, the last assumption related to multi-tenancy requires a SaaS application to be architected in a special way keeping in mind about certain aspects of a multi-tenant application. These aspects form the challenges of building a SaaS application and also our next topic of discussion.

2) SaaS world does not end just by building the application that satisfies SaaS characteristics. Deploying a SaaS application forms another set of challenges. Typically a SaaS application is deployed by a SaaS hosting provider and the hosting provider is responsible of maintaining the application. So, not only clients get rid of maintaining the application but also ISVs get rid of that aspect.

Microsoft provides end to end resources in developing and deploying a SaaS application. It provides resources ranging from development frameworks and technical resources used to build and design the application to hosting options that assist in deploying the application.

Read more

Shout it

Duwamish 7.0

Duwamish 7.0 is a multi-tier, distributed enterprise application built specifically for the Microsoft .NET Platform. Its design, development, and deployment provide insight into how developers can leverage various features of the .NET Platform to build reliable, scalable, and well-performing applications.

Overview

Duwamish Books Inc. is a fictitious company that sells books online. Since its model is an e-commerce business-to-consumer pattern that is most prevalent in the typical online shopping experience, it includes basic features such as membership, account management, shopping cart, search, and checkout processes.

Duwamish 7.0 is a functional port (using 100 percent .NET technologies) of the popular Duwamish series of applications developed by MSDN. Although the sample itself is built around a fictitious online bookstore, the primary focus areas of this sample are performance, issues related to porting technology from Windows DNA to the .NET developer platform, design patterns, and real-life deployment scenarios in a distributed computing environment.

Functionally, it is a complete implementation of the pattern without the fulfillment processes fully implemented (that is, credit card account decrements, checking inventory, and shipping). However, the features are sufficiently complex and cover a broad range of .NET technological areas to illustrate the primary goals.

read more : http://msdn.microsoft.com/en-us/library/aa288561%28VS.71%29.aspx

Hope this will help

Jay Ganesh