Resilient Microservice Design With Spring Boot — Bulkhead Pattern
In this article, Lets talk about Bulkhead Pattern — for designing resilient microservice. This is the fourth article in the Resilient design patterns series. If you have not read the previous articles, I would suggest you to take a look at them first.
Need For Resiliency:
MicroServices are distributed in nature. It has more components and moving parts. In the distributed architecture, dealing with any unexpected failure is one of the biggest challenges to solve. It could be a hardware failure, network failure etc. Ability of the system to recover from the failure and remain functional makes the system more resilient. It also avoids any cascading failures.
Why Bulkhead?
A ship is split into small multiple compartments using Bulkheads. Bulkheads are used to seal parts of the ship to prevent entire ship from sinking in case of flood. Similarly failures should be expected when we design software. The application should be split into multiple components and resources should be isolated in such a way that failure of one component is not affecting the other.
For ex: Lets assume that there are 2 services A and B. Some of the APIs of A depend on B. For some reason B is very slow. So, When we get multiple concurrent requests to A which depends on B, A’s performance will also get affected. It could block A’s threads. Due to that A might not be able to serve other requests which do NOT depend on B. So, the idea here is to isolate resources / allocate some threads in A for B. So that We do not consume all the threads of A and prevent A from hanging for all the requests!
Sample Application:
We are going to use the same application which we had considered as part of the previous articles.
Source Code is here.
To understand the use of bulkhead pattern, Lets consider this in our application. Our product-service has 2 endpoints.
- /product/{id} → an endpoint which gives more details about the specific product along with ratings and stuff. It depends on the results from rating-service. Users updating their rating, leaving comments, replying to the comments everything goes via this endpoint.
- /products → and endpoint which gives list of products we have in our catalog based on some search criteria. It does not depend on any other services. Users can directly order product (add to cart) from the list.
Product-service is a typical web application with multiple threads. We are going to limit the number of threads for the application to 15. It means product-service can handle up to 15 concurrent users. If all the users are busy with knowing more about the product, leaving comment, checking reviews etc, users who are searching for the products and trying to order products might experience application slowness. This is a problem.
Product Controller:
@RestController
@RequestMapping("v1")
public class ProductController {
@Autowired
private ProductService productService;
@GetMapping("/product/{id}")
public ProductDTO getProduct(@PathVariable int id){
return this.productService.getProduct(id);
}
@GetMapping("/products")
public List<ProductDTO> getProducts(){
return this.productService.getProducts();
}
}
ProductService internally calls the RatingService whose implementation is as shown below.
ProductService’s application.yaml is updated as shown below.
server:
tomcat:
max-threads: 15
If I run a performance test using JMeter — to simulate more number of users trying to access specific product details while some users are trying to access list of products, I get results as shown here. We were able to make only 26 products request. That too with average response time of 3.6 seconds even when it does not have any dependency.
Lets see how bulkhead implementation can save us here!
Bulkhead Implementation:
- I am using the same Resilience4j library which I had used in the previous article on CircuitBreaker Pattern.
application.yaml changes
- We allow max 10 concurrent requests to the rating service even when we have 15 threads.
- max wait duration is for when we get any additional requests for rating service when the existing 10 threads are busy, we wait for only 10 ms and fail the request immediately.
server:
tomcat:
max-threads: 15
port: 8082
rating:
service:
url: http://localhost:8081/v1/ratings
resilience4j.bulkhead:
instances:
ratingService:
maxConcurrentCalls: 10
maxWaitDuration: 10ms
RatingServiceImpl changes
- @Bulkhead uses the instance we have defined in the application.yaml.
- fallBackMethod is optional. It will be used when we have more than 10 concurrent requests
Now after starting our services, running the same test produces below result which is very very interesting.
- products requests average response time is 106 milli seconds compared to 3.6 seconds without bulkhead implementation. This is because we do not exhaust the resources of product-service.
- By using the fallback method any additional requests for the product/1 are responded with default response.
Summary:
Using bulkhead pattern, we allocate resources for specific component so that we do not consume all the resources of the application unnecessarily. Our application remains functional even under unexpected load.
There are other design patterns which could handle this better along with bulkhead pattern. Please take a look at these articles.
- Resilient MicroService Design — Timeout Pattern
- Resilient MicroService Design — Retry Pattern
- Resilient MicroService Design — Circuit Breaker Pattern
- Resilient MicroService Design — Rate Limiter Pattern
Happy learning 🙂