Skip to content

Why you need circuit breaker patterns in your LLM powered application

  • AI, LLM
AWS AI Example Question

I have worked in AI for about six years, long before OpenAI released ChatGPT, integrating LLMs and ML models to power user-friendly software applications. My then-lead was a tech lead in title but had limited software engineering experience and was stronger in academic research—he had graduate-level knowledge of software engineering and professional experience in research. If he had been clearer about his strengths and weaknesses, my life would have been much easier. In any case, I am no longer under his management. This article discusses one pattern: the circuit breaker pattern in AI application flows.

One day, after observing recurring failures in our service, I told him we were receiving many 429 errors at one of our LLM agent steps. I suggested asking OpenAI support to increase our rate limit and temporarily using a queue to slow our API usage. Our application is natively asynchronous, so we should not have started with a synchronous approach.

He replied, “Just do a retry.”

I explained that retrying would not help in this case because we were hitting a rate limit; retries only help for 500-series server errors (we did see 500 errors too, but they were only about 1–2% of all errors). He insisted on his position and essentially dismissed my suggestion. I tried to explain again, but he did not seem to understand what a 429 error is or how it differs from a 500.

I reluctantly implemented a retry logic that retried on any OpenAI API failure, whether 500 or 429. That did not solve the problem. I raised the issue again and eventually implemented a simple queue to slow down the peak requests, which resolved it.

It was very difficult to communicate and collaborate with him. I explained many times to him and the team that retries do not solve rate limit issues. This was just one example; because of his lack of industry experience, many engineers left the company. I no longer work with him.

The point of this story is to introduce the circuit breaker pattern.

Circuit breaker patterns are especially important in AI / LLM-powered applications because these systems depend on external, probabilistic, and resource-intensive services. Without circuit breakers, failures can cascade quickly and take down your entire product.


1. LLMs Are External Dependencies (and They Fail)

Most AI apps rely on third-party APIs like the OpenAI API or other model providers.

These can fail due to:

  • Rate limits
  • Temporary outages
  • Network latency
  • Internal model errors
  • Regional incidents

Without a circuit breaker, your app keeps retrying → amplifying the failure.

Circuit breaker benefit:
Stops repeated calls once failure thresholds are hit and gives the service time to recover.


2. AI Calls Are Expensive (Failures Cost Money)

LLM requests cost:

  • Money (tokens)
  • Time (latency)
  • Compute (serialization, retries)

If a downstream model is unhealthy and you keep retrying:

  • You burn budget
  • You increase latency
  • You overload your own infrastructure

Circuit breaker benefit:
Fails fast instead of wasting tokens and compute on doomed requests.


3. Latency Spikes Break User Experience

LLM calls already have higher variance than traditional APIs.

When a model degrades:

  • Responses can jump from 500ms → 30s
  • Your UI hangs
  • Async queues back up
  • Threads and workers get exhausted

Circuit breaker benefit:
Short-circuits slow calls and allows you to:

  • Return cached answers
  • Use a smaller/faster model
  • Respond with a graceful fallback

4. Prevent Cascading Failures in AI Pipelines

AI apps are often multi-step pipelines:

  • Retrieval (vector DB)
  • Prompt assembly
  • LLM inference
  • Tool calls
  • Post-processing

If one step fails and retries blindly:

  • Queues explode
  • Downstream services overload
  • Entire pipeline collapses

Circuit breaker benefit:
Contains failure to one component instead of letting it spread.


5. Enables Smart Degradation (Critical for AI UX)

AI apps don’t need to be all-or-nothing.

With circuit breakers, you can:

  • Switch to a cheaper or local model
  • Reduce context size
  • Disable tools or agents
  • Return partial or approximate answers
  • Fall back to rules or templates

Circuit breaker benefit:
Your app still works — just with reduced intelligence instead of total failure.


6. Protects You From Thundering Herd Problems

When an AI provider recovers:

  • Thousands of queued requests retry at once
  • Rate limits trigger again
  • Outage prolongs

Circuit breaker benefit:
Controls retry behavior and re-opens gradually (half-open state).


In Short

You need circuit breaker patterns in AI/LLM apps because:

LLMs are slow, expensive, external, rate-limited, and probabilistic — and failure is normal.

Circuit breakers turn those failures into controlled, predictable behavior instead of outages.

🤞Subscribe if you want to see more!

We don’t spam! Read more in our privacy policy

Leave a Reply

Your email address will not be published. Required fields are marked *