Blue/Green deploymentis a deployment model in which we keep two production-like environments on active-active standby. In this case, one of the environments is always serving production traffic while the other one can be idle or be used for testing features. So, what happens, in this case, is that one environment always contains the latest code which needs to be in production while the other environment contains the older production code.
Getting the latest changes in production is as simple as swapping the DNS to point to the environment containing the latest code. Rolling back a deployment which doesn’t meet the expectations is as simple as rolling back to the previous environment containing the older production code.
Let’s discuss how we can use Azure Web App Deployment Slots and Azure DevOps Tools like Repos, Pipelines (Build/Release) to automate this process.
Azure Boards provides backlogs and work item tracking to help development teams collaborate and coordinate their work.
Azure Repos fires a trigger to launch a Build Pipeline. The Build Pipeline includes jobs and tasks that clone the repo, install tools, build the solution, and then package and publish artifacts to Azure Artifacts.
Release Pipeline is responsible for deploying the application artifacts to development, QA, and production environments. The Release Pipeline is organized into stages which, although executed sequentially, act independently of each other. In this scenario, the Dev stage deploys the application to a Dev environment. This environment is typically hosted in a non-production Subscription and may share an App Service Plan with other non-production environments such as QA.
Between stages, you use approvals and gates to control when the next stage is executed. This allows your team to perform testing and validation in each stage before moving onto the next.
Blue-Green Deployment, the staging slot represents your “green” deployment. The production slot represents your “blue” deployment. Once you validate that everything has been successfully deployed to the staging slot (i.e. green), the Prod stage performs a swap of green and blue. This makes the green deployment live for end-users and moves the blue deployment to your staging slot where it remains until you remove it. If problems arise with the new green deployment then you can swap again to move blue back to production.
Recently, I saw a very good Azure sample code on GitHub which is about orchestration based saga on serverless. It looks very promising for serverless architectures to solve real-world business problems for futuristic solution design. So sharing the same details with you guys with links of code repositories and all. I hope this will be beneficial for you guys.
Contoso Bank is building a new payment platform leveraging the development of microservices to rapidly offer new features in the market, where legacy and new applications coexist. Operations are now distributed across applications and databases, and Contoso needs a new architecture and implementation design to ensure data consistency on financial transactions.
The traditional ACID approach is not suited anymore for Contoso Bank as the data of operations are now spanned into isolated databases. Instead of ACID transactions, a Saga addresses the challenge by coordinating a workflow through a message-driven sequence of local transactions to ensure data consistency.
The solution simulates a money transfer scenario, where an amount is transferred between bank accounts through credit/debit operations and an operation receipt is generated for the requester. It is a Saga pattern implementation reference through an orchestration approach in a serverless architecture on Azure. The solution leverages Azure Functions for the implementation of Saga participants, Azure Durable Functions for the implementation of the Saga orchestrator, Azure Event Hubs as the data streaming platform and Azure Cosmos DB as the database service.
The implementation reference addresses the following challenges and concerns from Contoso Bank:
Developer experience: A solution that allows developers focus only on the business logic of the Saga participants and simplify the implementation of stateful workflows on the Saga orchestrator. The proposed solution leverages the Azure Functions programming model, reducing the overhead on state management, checkpointing (mechanism that updates the offset of a messaging partition when a consumer service processes a message) and restarts in case of failures.
Resiliency: A solution capable of handling a set of potential transient failures (e.g. operation retries on databases and message streaming platforms, timeout handling). The proposed solution applies a set of design patterns (e.g. Retry and Circuit Breaker) on operations with Event Hubs and Cosmos DB, as well as timeout handling on the production of commands and events.
Idempotency: A solution where each Saga participant can execute multiple times and provide the same result to reduce side effects, as well as to ensure data consistency. The proposed solution relies on validations on Cosmos DB for idempotency, making sure there is no duplication on the transaction state and no duplication on the creation of events.
Observability: A solution that is capable of monitoring and tracking the Saga workflow states per transaction. The proposed solution leverages Cosmos DB collections that allow the track of the workflow by applying a single query.
Check the following sections about the core components of the solution, workflows, and design decisions:
Everyone today is thinking about and building microservices – I included. Microservices, from its core principles and in its true context, is a distributed system.
In a microservice architecture, a distributed transaction is an outdated approach that causes severe scalability issues. Modern patterns that rely on asynchronous data replication or model distributed write operations as orchestrated or choreographed SAGAs avoid these problems. I will try to explain orchestrated saga in great detail in this article.
Distributed Transaction is one that spans multiple databases across the network while preserving ACID properties. If a transaction requires service A and B both write to their own database, and rollback if either A or B fails, then it is a distributed transaction.
To see why a distributed transaction is hard, let’s take a look at an extremely common real-life examples: e-Commerce application.
Our e-Commerce application contains different microservices. Each service has its own database. Some business transactions, however, span multiple service so you need a mechanism to ensure data consistency across services.
Let’s say, we have an order service, an inventory service and a payment service. The boundary is clear, order service takes the order, inventory service allocates the stock, while the payment service deals only with payment and refund related issues.
A single order transaction = creating an order + reserve stock + payment, in any order. Failure at any point during the transaction should revert everything before it.
Payment failure should cause the inventory service to release the reserved stocks, and the order service to cancel the order.
if our e-commerce application designed as per above then there are some serious flaws with this approach, which are:
The fallacy of the distributed system – Relies heavily on the stability of the network throughout the transaction.
Transactions could end up in an indeterminate state.
Fragile to topology changes – Each system has explicit knowledge of its dependency.
Imagine payment service calls some 3rd party API like PayPal or Stripe, the transaction is effectively out of your control. What happens if the API is down or throttled. Or a network disruption along the network path. Or one of the 3 services is down.
if the inventory service managed to reserve some stocks, but the payment service timed out for whatever reason, we cannot say that the payment has failed.
If we treat timeout as a failure, we would have rolled back the stock reservation and cancel the order, but the payment actually did go through, perhaps the external payment API is taking more time than usual or network disruption, so we cut off the connection before payment service has a chance to respond. Now the transaction is in Paid and Stock Released state simultaneously.
This is really painful, isn’t it? Your Production support team will be very busy handling such failed transaction tickets if your buyers face such issues frequently while placing an order. What if your buyer get fade up and orders from other competitors. Such small incidents can lead to HUGE financial loss.
As an Software Architect, you must think of such a problem and design your microservices & application in such a way that it does not leave data inconsistency during the transaction.
We can overcome this problem of data consistency between databases by using Saga Pattern. It models the globally distributed transaction as a series of local ACID transactions, with compensation as a rollback mechanism. The global transaction move between different defined states depending on the result of the local transaction execution.
There are two ways of coordination sagas:
Choreography – each local transaction publishes domain events that trigger local transactions in other services
Orchestration – an orchestrator (object) tells the participants what local transactions to execute
The difference is the method of state transition, we will talk about the “Orchestration” in this post.
Orchestration Based Saga
An orchestration-based saga has an orchestrator that tells the saga’s participants what to do. The saga orchestrator communicates with the participants using request/asynchronous response-style interaction. To execute a saga step, it sends a command message to a participant telling it what operation to perform. After the saga participant has performed the operation, it sends a reply message to the orchestrator. The orchestrator then processes the reply message and determines which saga step to perform next.
This type of Saga is a natural evolution from the naive implementation because it can be incrementally adopted.
Or a transaction manager is a coarse-grained service that exists only to facilitate the Saga. It is responsible for coordinating the global transaction flow, that is, communicating with the appropriate services that involve in the transaction, and orchestrate the necessary compensation action. The orchestrator is aware of the globally distributed transaction, but the individual services are only aware of their local transaction.
A service’s local ACID transaction should ideally consist of two steps:
Local business logic
Notify broker of its work done
Instead of calling another service in the middle of the transaction, let the service do its job within its scope and publishes the status through a message broker. That’s all. No long, synchronous, blocking call somewhere in the middle of the transaction. You can use any message broker (Event Hub or Kafka) as per your need and dependency on your cloud platform.
To ensure that the two steps are in a single ACID transaction, we can make use of the Event sourcing pattern. When we write the result of the local transaction into the database, the work done message is included as part of the transaction as well, into an event store table.
NOTE: Applications persist events in an event store, which is a database of events. The store has an API for adding and retrieving an entity’s events. The event store also behaves like a message broker. It provides an API that enables services to subscribe to events. When a service saves an event in the event store, it is delivered to all interested subscribers.
Once a service has done its work, it publishes a message to the broker (could be a success or failure message). If the Payment service publishes a failure message, then the orchestrator must be able to “rollback” actions done by the Order and Inventory service.
In this case, each service must implement its version of the compensating method. Order service which provides a OrderCreate method must also provide a OrderCancel compensating method. Inventory service which provides a ReserveStock method must also provide a ReleaseStock compensating method. Payment service which provides a Pay method must also provide a Refund compensating method.
The orchestrator then listens to the failure events and publishes a corresponding compensating event. The above image shows Orchestrator publishes respective compensation events and how each services rollback their operation to compensate payment failed requests.
This is not a remedy to apply “traditional transaction” at the level of a distributed system. Rather, it models transactions as a state machine, with each service’s local transaction acting as a state transition function.
It guarantees that the transaction is always in one of the many defined states. In the event of network disruption, you can always fix the problem and resume the transaction from the last known state.
Suppose you are planning the architecture for your music-sharing application. You want to ensure that music files are uploaded to the web api reliably from the mobile app. we then want to deliver the details about new songs directly to the app when an artist adds new music to their collection. This is a perfect use of a message-based system and Azure offers three solutions to this problem:
Azure Queue Storage
Azure Service Bus Queue
Azure Service Bus Topics
Each has a slightly different feature set, which means you can choose one or the other, or use both, depending on the problem you are solving.
Choose Service Bus Topics if
you need multiple receivers to handle each message
Choose Service Bus queues if
You need an At-Most-Once delivery guarantee. You need a FIFO guarantee. You need to group messages into transactions. You want to receive messages without polling the queue. You need to provide a role-based access model to the queues. You need to handle messages larger than 64 KB but less than 256 KB. Your queue size will not grow larger than 80 GB. You would like to be able to publish and consume batches of messages.
Queue storage isn’t quite as feature-rich, but if you don’t need any of those features, it can be a simpler choice. In addition, it’s the best solution if your app has any of the following requirements.
Choose Queue storage if
You need an audit trail of all messages that pass through the queue. You expect the queue to exceed 80 GB in size. You want to track progress for processing a message inside of the queue.
A queue is a simple, temporary storage location for messages sent between the components of a distributed application. Use a queue to organize messages and gracefully handle unpredictable surges in demand.
Use Storage queues when you want a simple and easy-to-code queue system. For more advanced needs, use Service Bus queues. If you have multiple destinations for a single message, but need queue-like behavior, use topics.
I hope this will help !!!
NOTE — Reference taken from Microsoft Learning Site
A message contains raw data, produced by one component, that will be consumed by another component.
A message contains the data itself, not just a reference to that data.
The sending component expects the message content to be processed in a certain way by the destination component. The integrity of the overall system may depend on both sender and receiver doing a specific job.
For example, suppose a user uploads a new song by using the mobile music-sharing app. The mobile app must send that song to the web API that runs in Azure. The song media file itself must be sent, not just an alert that indicates that a new song has been added. The mobile app expects that the web API will store the new song in the database and make it available to other users. This is an example of a message.
What are Events?
An event is a lightweight notification that indicates that something happened.
The event may be sent to multiple receivers, or to none at all.
Events are often intended to “fan out,” or have a large number of subscribers for each publisher.
The publisher of the event has no expectation about the action a receiving component takes.
Some events are discrete units and unrelated to other events.
Some events are part of a related and ordered series.
For example, suppose the music file upload has been completed, and the new song has been added to the database. In order to inform users of the new file, the web API must inform the web front end and mobile app users of the new file. The users can choose whether to listen to the new song, so the initial notification does not include the music file but only notifies users that the song exists. The sender does not have a specific expectation that the event receivers will do anything particular in the responsiveness of receiving this event.
How to choose messages or events?
A single application is likely to use events for some purposes and messages for others. Before you choose, you must analyze your application’s architecture and all its use cases, to identify all the different purposes where its components have to communicate with each other.
For each communication, consider the following question: Does the sending component expect the communication to be processed in a particular way by the destination component?
If the answer is yes, choose to use a message. If the answer is no, you may be able to use events.
Q1:You have a distributed application with a web service that authenticates users. When a user logs on, the web service notifies all the client applications so they can display that user’s status as “Online”. Is the login notification an example of a message or an event?
A1: The login notification is an event. It contains only a simple piece of status data and there is no expectation by the authentication service for the client applications to react to the notice in any particular way.
Q2: you have a distributed application with a web service that lets users manage their accounts. Users can sign up, edit their profile, and delete their account. When a user deletes their account, your web service notifies your data layer so the user’s data will be removed from the database. Is the delete-account notification an example of a message or an event?
A2: The delete-account notification is a message. The key factor is that the web service has an expectation about how the data layer will process the message. The data layer must remove the user’s data from the database for the system to function correctly. Note that the message itself contains only simple information so this aspect of the communication could be considered an event. However, the fact that the web service requires the data layer to handle the notification in a specific way is sufficient to make this a message.
I hope this will help !!!
NOTE — Reference taken from Microsoft Learning Site