When building applications in a microservice architecture, managing state becomes a distributed systems problem. Instead of being able to manage state as transactions inside the boundaries of a single monolithic application, a microservice must be able to manage consistency using transactions that are distributed across a network of many different applications and databases.
In this article we will explore the problems of data consistency and high availability in microservices. We will start by taking a look at some of the important concepts and themes behind handling data consistency in distributed systems.
Throughout this article we will use a reference application of an online store that is built with microservices using Spring Boot and Spring Cloud. We’ll then look at how to use reactive streams with Project Reactor to implement event sourcing in a microservice architecture. Finally, we’ll use Docker and Maven to build, run, and orchestrate the multi-container reference application.
Eventual Consistency
When building microservices, we are forced to start reasoning about state in an architecture where data is eventually consistent. This is because each microservice exclusively exposes resources from a database that it owns. Further, each of these databases would be configured for high availability, with different consistency guarantees for each type of database.
Eventual consistency is a model that is used to describe some operations on data in a distributed system—where state is replicated and stored across multiple nodes of a network. Typically, eventual consistency is talked about when running a database in high availability mode, where replicas are maintained by coordinating writes between multiple nodes of a database cluster. The challenge of the database cluster is that writes must be coordinated to all replicas in the exact order that they were received. When this happens, each replica is considered to be eventually consistent—that the state of all replicas are guaranteed to converge towards a consistent state at some point in the future.
When first building microservices, eventual consistency is a frequent point of contention between developers, DBAs, and architects. The head scratching starts to occur more frequently when the architecture design discussions begin to turn to the topic of data and handling state in a distributed system. The head scratching usually boils down to one question.
How can we guarantee high availability while also guaranteeing data consistency?
To answer this question we need to understand how to best handle transactions in a distributed system. It just so happens that most distributed databases have this problem nailed down with a healthy helping of science.
Transaction Logs
Mostly all databases today support some form of high availability clustering. Most database products will provide a list of easy to understand guarantees about a system’s consistency model. A first step to achieving safety guarantees for stronger consistency models is to maintain an ordered log of database transactions. This approach is pretty simple in theory. A transaction log is an ordered record of all updates that were transacted by the database. When transactions are replayed in the exact order they were recorded, an exact replica of a database can be generated.
The diagram above represents three databases in a cluster that are replicating data using a shared transaction log. The zipper labeled Primary is the authority in this case and has the most current view of the database. The difference between the zippers represent the consistency of each replica, and as the transactions are replayed, each replica converges to a consistent state with the Primary. The basic idea here is that with eventual consistency, all zippers will eventually be zipped all the way up.