This article introduces you to a sample application that combines multiple microservices with a graph processing platform to rank communities of users on Twitter. We’re going to use a collection of popular tools as a part of this article’s sample application. The tools we’ll use, in the order of importance, will be:
A graph processing platform is an application architecture that provides a general-purpose job scheduling interface for analyzing graphs. The application we’ll build will make use of a graph processing platform to analyze and rank communities of users on Twitter. For this we’ll use Neo4j Mazerunner, an open source project that I started that connects Neo4j’s database server to Apache Spark.
The diagram below illustrates a graph processing platform similar to Neo4j Mazerunner.
The graph processing platform I’ve described will provide us with a general purpose API for submitting PageRank jobs to Apache Spark’s GraphX module from Neo4j. The PageRank results from GraphX will be automatically applied back to Neo4j without any additional work to manually handle data loading. The workflow for this is extremely simple for our purposes. From a backend service we will only need to make a simple HTTP request to Neo4j to begin a PageRank job.
I’ve also taken care of making sure that the graph processing platform is easily deployable to a cloud provider using Docker containers. In a previous article, I describe how to use Docker Compose to run Mazerunner as a multi-container application. We’ll do the same for this sample application but extend the Docker Compose file to include additional Spring Boot applications that will become our backend microservices.
|
By default, Docker Compose will orchestrate containers on a single virtual machine. If we were to build a truly fault tolerant and resilient cloud-based application, we’d need to be sure to scale our system to multiple virtual machines using a cloud platform. This is the subject of a later article.
|
Now that we understand how we will use a graph processing platform, let’s talk about how to build a microservice architecture using Spring Boot and Spring Cloud to rank profiles on Twitter.
Building Microservices
I’ve talked a lot about microservices in past articles. When we talk about microservices we are talking about developing software in the context of continuous delivery. Microservices are not just smaller services that scale horizontally. When we talk about microservices, we are talking about being able to create applications that are the product of many teams delivering continuously in independent release cycles. Josh Long and I describe at length how to untangle the patterns of building and operating JVM-based microservices in O’Reilly’s Cloud Native Java.
In this sample, we’ll build 4 microservices, each as a Spring Boot application. If we were to build this architecture as microservices in an authentic scenario, each microservice would be owned and managed by a different team. This is an important differentiation in this new practice, as there is much confusion around what a microservice is and what it is not. A microservice is not just a distributed system of small services. The practice of building microservices should never be without the discipline of continuous delivery.
For the purposes of this article, we’ll focus on scenarios that help us gain experience and familiarity with building distributed systems that resemble a microservice architecture.
Overview
Now let’s do a quick overview of the concepts we’re going to cover as a part of this sample application. We will apply the same recipe from previous articles on similar topics for building microservices with Spring Boot and Spring Cloud. The key difference from my previous articles is that we are going to create a data service that does both batch processing tasks as well as exposing data as HTTP resources to API consumers.
System Architecture Diagram
The diagram below shows each component and microservice that we will create as a part of this sample application. Notice how we’re connecting the Spring Boot applications to the graph processing platform we looked at earlier. Also, notice the connections between the services, these connections define communication points between each service and what protocol is used.
The three applications that are colored in blue are stateless services. Stateless services will not attach a persistent backing service or need to worry about managing state locally. The application that is colored in green is the Twitter Crawler service. Components that are colored in green will typically have an attached backing service. These backing services are responsible for managing state locally, and will either persist state to disk or in-memory.