How to maintain data consistency across Microservices?

While Microservice applications are built as a set of modular components, they are easier to understand, simpler to test and effortless to maintain over the life of the application. It enables organizations to achieve agility and be able to improve the time it takes to get working enhancements to production.

On the downside, there are quite a bit of challenges as well, each service has its own database and wherever business transactions span multiple services you need a mechanism to ensure data consistency across services. For example, If you are building a web store application where customers have a credit limit. The application must ensure that a new order will not exceed the customer’s credit limit. Since Orders and Customers are in different databases the application cannot simply use a local ACID transaction.

In this post, let’s look at the available options to implement data consistency across Microservices.

Cross-Posted : ContainerJournal

Quick Snapshot

#1.Compensating Transactions Pattern
- Issues and considerations
#2.GRIT: a protocol for distributed transactions
Conclusion
Useful Resources :

#1.Compensating Transactions Pattern

Implement each business transaction that spans multiple services as a sequence of local transactions. Each local transaction updates the database and publishes a message or event to trigger the next local transaction. If a local transaction fails because it violates a business rule then executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.

There are two ways to do it:

For each of the local transaction, publish domain events that trigger local transactions in each other services.

For example,If you’re building store application that uses this approach,it would create an order that consists of the following steps:

The Order Service creates an Order in a pending state and publishes an ORDER_CREATED event
The Customer Service receives the event attempts to reserve credit for that Order. It publishes either a CREDIT_RESERVED event or a CREDIT_LIMIT_EXCEEDED event.
The Order Service receives the event and changes the state of the order to either approved or cancelled

Use an orchestrator kind of service to tell the participants what local transactions to execute

The Order Service creates an Order in a pending state and creates a CreateOrderOrchestration
The CreateOrderOrchestration sends a ReserveCredit command to the Customer Service
The Customer Service attempts to reserve credit for that Order and sends back a reply
The CreateOrderOrchestration receives the reply and sends either an ApproveOrder or RejectOrder command to the Order Service
The Order Service changes the state of the order to either approved or cancelled

Issues and considerations

Determination of when a step that implements eventual consistency has failed. A step might not fail immediately, but instead, it could block. It may be necessary to implement some form of time-out mechanism.
Compensation logic is not easily generalized. A compensating transaction is application-specific; it relies on the application having sufficient information to be able to undo the effects of each step in a failed operation.
Steps in a compensating transaction should be defined as idempotent commands. This enables the steps to be repeated if the compensating transaction itself fails.
A compensating transaction does not necessarily return the data in the system to the state it was in at the start of the original operation. Instead, it compensates for the work performed by the steps that completed successfully before the operation failed.
The order of the steps in the compensating transaction does not necessarily have to be the mirror opposite of the steps in the original operation. For example, one data store may be more sensitive to inconsistencies than another, and so the steps in the compensating transaction that undo the changes to this store should occur first.
Consider using retry logic that is more forgiving than usual to minimize failures that trigger a compensating transaction. If a step in an operation that implements eventual consistency fails, try handling the failure as a transient exception and repeat the step.

To address these issues, eBay has developed GRIT protocol for globally consistent distributed transactions that combines ideas from optimistic concurrency control (OCC) , 2PC, and deterministic databases to achieve, for the first time, high-performance, globally consistent transactions across microservices with multiple underlying databases.

In the next section, we look at eBay’s new protocol to address the above issues.

#2.GRIT: a protocol for distributed transactions

The following diagram illustrates the GRIT protocol in a microservices application with two databases.

Image – GRIT protocol for distributed transactions

To understand better, let’s look at key components that make up globally consistent distributed transactions.

Key components

Global Transaction Manager (GTM): It coordinates global transactions across multiple databases. There can be one or more GTMs.
Global Transaction Log (GTL): It represents the transaction request queue for a GTM. The order of transaction requests in a GTL determines the relative serializability order among global transactions. Persistence of GTLs is optional.
Database Transaction Manager (DBTM): The transaction manager at each database realm. It performs the conflict checking and resolution, i.e. local commit decision is located here.
Database Transaction Log (DBTL): The transaction log at each database realm that logs logically committed transactions that relate to this database (including single database transactions and multi-database transactions). The order of transactions in a DBTL determines the serializability order of the whole database system, including the global order dictated by the GTM. A log sequence number (LSN) is assigned to each log entry.
LogPlayer: This component sends log entries, in sequence, to the backend storage servers for them to apply the updates. Each DB server applies log entries of logically committed transactions in order.

Now that we have understood the components, we look at the steps for a distributed transaction.

Steps for a distributed transaction

Following diagram shows the main steps for a distributed transaction

In GRIT, a distributed transaction goes through three phases:

Optimistic execution (steps 1-4): As the application is executing the business logic via microservices, the database services capture the read-set and write-set of the transaction. No actual data modification occurs at this phase.
Logical commit (steps 5-11): Once the application requests the transaction commit, the read-set and write-set at each database service point are submitted to its DBTM. The DBTM uses the read-set and write-set for conflict checking to achieve local commit decision. The GTM will make the global commit decision after collecting all the local decisions of DBTMs for the transaction. A transaction is logically committed once its write-sets are persisted in log stores (DBTLs) for databases involved. This involves minimum coordination between the GTM and the DBTMs.
Physical apply (steps 12-13): The log players asynchronously sends DBTL entries to backend storage servers. The data modification occurs at this phase.

Issues and considerations

GRIT is able to achieve consistent high throughput and serializable distributed transactions for applications invoking microservices with minimum coordination.
GRIT fits well for transactions with few conflicts and provides a critical capability for applications that otherwise need complex mechanisms to achieve consistent transactions across microservices with multiple underlying databases.

Conclusion

Though there is no single approach that fits all, Cloud applications typically use data that is dispersed across data stores. Managing and maintaining data consistency in this environment can become a critical aspect of the system, particularly in terms of the concurrency and availability issues that can arise. You would need to trade strong consistency for availability. This means that you may need to design some aspects of your solutions around the notion of eventual consistency and accept that the data that your applications use might not be completely consistent all of the time.

Useful Resources :