Resilient Microservice Design With Spring Boot — Circuit Breaker Pattern

Vinoth Selvaraj
6 min readOct 25, 2020

--

In this article, I would like to show you yet another design pattern — Circuit Breaker Pattern — for designing resilient microservice. This is third article in the Resilient design patterns series. If you have not read the previous articles, I would suggest you to take a look at them first.

Need For Resiliency:

MicroServices are distributed in nature. It has more components and moving parts. In the distributed architecture, dealing with any unexpected failure is one of the biggest challenges to solve. It could be a hardware failure, network failure etc. Ability of the system to recover from the failure and remain functional makes the system more resilient. It also avoids any cascading failures.

Why Circuit Breaker?

There are some problems with Timeout Pattern.

  • We had discussed the timeout pattern here. Even though it seems to solve the problem partially, for every request which comes to the product-service, we send a request to the rating-service with 3 seconds timeout.
  • When the rating-service is completely unavailable, It would affect the response time of the request for product-service.

There are some problems with Retry Pattern as well.

  • Retry pattern seems to work great with Timeout pattern. Sometimes retrying might solve problem. But if you notice, when the service is unavailable, after the first timeout, it will send another request as part of Retry. So we send 2 requests to rating-service for every request to the product service.
  • Retry pattern might worsen the response time of the product-service when the rating-service is not available.

Circuit Breaker

  • Circuit breaker pattern is based on the idea of an electrical switch to protect an electrical circuit from damage caused by excess electric current.
  • The idea here is — why to send subsequent requests to the rating-service when the service is unavailable. If we already know that rating-service is down, then do not send any request, just send the response to the product-service request directly.

Sample Application:

We are going to use the same application which we had considered as part of the previous articles.

Source Code is here.

Circuit Breaker:

Most of the online examples seem to use Hystrix lib which seems to be old. In this article, I am using Resilience4j library which is very lightweight and easy to use.

Circuit Breaker States:

  • CLOSED: Dependent service is up. Requests for product-services are also sent to rating-service.
  • OPEN: Dependent service is unavailable. Requests for product-services are NOT sent to rating-service. It happens when the failure rate reaches certain threshold.
  • HALF_OPEN: Once the state becomes OPEN, We do wait for sometime in the OPEN state. After certain duration, the state becomes HALF_OPEN. During this period, we do send some of the product-service request to the rating-service to check if we still get the proper response. If the failure rate is below the threshold, the state would become CLOSED. If the failure rate is above the threshold, then the state becomes OPEN once again. This cycle continues.

Configuration:

  • sliding window: A time-based or count-based window to store and aggregate results to check if the failure rate is above/below the threshold.
  • wait duration: Time in seconds to wait in the OPEN state before making any request to the dependent services. During this period, we assume the rating-service is down.
  • permitted number of calls in half-open state: The number of permitted calls when the Circuit Breaker is half open
  • failure rate threshold: Failure rate in %

I am going to highlight only the circuit breaker specific changes I had done to the existing application.

Rating — MicroService:

RatingController:

Product — MicroService:

  • Lets start with maven dependencies in the POM file.
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot2</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-aop</artifactId>
</dependency>

Add the application.yaml for Resilience4j configuration for circuit breaker

  • We configure the circuit breaker properties in the application.yaml file as shown below
  • We maintain a default configuration. We can have multiple instances of the circuit breaker and any configuration can be overridden under instances.
rating:
service:
url: http://localhost:8081/v1/ratings
resilience4j.circuitbreaker:
configs:
default:
slidingWindowType: COUNT_BASED
slidingWindowSize: 100
permittedNumberOfCallsInHalfOpenState: 10
waitDurationInOpenState: 10
failureRateThreshold: 60
registerHealthIndicator: true
instances:
ratingService:
baseConfig: default

RatingServiceImpl:

Run the services and send many concurrent requests to the product-service to understand the behavior.

Performance Test Results:

As part of the performance testing, here we are considering the worst case scenario that when rating-service is not available most of the times.

Circuit Breaker:

  • I used JMeter to simulate 15 concurrent users for 2 mins to send multiple requests to the product service.
  • Above results are just great! First of all Error rate is 0%
  • There are cases we do hit the rating-service & wait for the response (90th percentile is 3 seconds)
  • When the services is not available / failure rate is above threshold, We do not wait for 3 seconds. Check the median which is 2 milli seconds. More than 50% of the time, we never hit the rating-service.
  • Throughput is 18.5 /sec which sounds great!

Timeout Pattern:

If you are curious what would have happened if we had just used timeout pattern for the same test!!

  • Compare the average, median throughput etc against the circuit breaker results above.
  • As we know Timout pattern waits for 3 seconds. even though the failure rate is 0%, throughput is almost reduced by 75% because of the poor response time.

Retry Pattern:

Lets test the retry pattern as well.

  • As expected, Retry pattern retries the request every time when the response is not received within the timeout. So, the response time doubles compared to the timeout pattern.
  • When the service is not available, Retry pattern will make the problem worse by reducing the throughput by 88%.

Summary:

Among all the patterns, We had discussed so far, Circuit Breaker pattern seems to work well. However if the sliding window is large and dependent service becomes available, We will still NOT send any request when the circuit is open. So we need to set optimal window size.

There are other design patterns which could handle this better along with circuit breaker pattern. Please take a look at these articles.

Happy learning 🙂

--

--

Vinoth Selvaraj

Principal Software Engineer — passionate about software architectural design, microservices.