I have worked in AI for about six years, long before OpenAI released ChatGPT, integrating LLMs and ML models to power user-friendly software applications. My then-lead was a tech lead in title but had limited software engineering experience and was stronger in academic research—he had graduate-level knowledge of software engineering and professional experience in research. If he had been clearer about his strengths and weaknesses, my life would have been much easier. In any case, I am no longer under his management. This article discusses one pattern: the circuit breaker pattern in AI application flows.
One day, after observing recurring failures in our service, I told him we were receiving many 429 errors at one of our LLM agent steps. I suggested asking OpenAI support to increase our rate limit and temporarily using a queue to slow our API usage. Our application is natively asynchronous, so we should not have started with a synchronous approach.
He replied, “Just do a retry.”
I explained that retrying would not help in this case because we were hitting a rate limit; retries only help for 500-series server errors (we did see 500 errors too, but they were only about 1–2% of all errors). He insisted on his position and essentially dismissed my suggestion. I tried to explain again, but he did not seem to understand what a 429 error is or how it differs from a 500.
I reluctantly implemented a retry logic that retried on any OpenAI API failure, whether 500 or 429. That did not solve the problem. I raised the issue again and eventually implemented a simple queue to slow down the peak requests, which resolved it.
It was very difficult to communicate and collaborate with him. I explained many times to him and the team that retries do not solve rate limit issues. This was just one example; because of his lack of industry experience, many engineers left the company. I no longer work with him.
The point of this story is to introduce the circuit breaker pattern.
Circuit breaker patterns are especially important in AI / LLM-powered applications because these systems depend on external, probabilistic, and resource-intensive services. Without circuit breakers, failures can cascade quickly and take down your entire product.
1. LLMs Are External Dependencies (and They Fail)
Most AI apps rely on third-party APIs like the OpenAI API or other model providers.
These can fail due to:
- Rate limits
- Temporary outages
- Network latency
- Internal model errors
- Regional incidents
Without a circuit breaker, your app keeps retrying → amplifying the failure.
Circuit breaker benefit:
Stops repeated calls once failure thresholds are hit and gives the service time to recover.
2. AI Calls Are Expensive (Failures Cost Money)
LLM requests cost:
- Money (tokens)
- Time (latency)
- Compute (serialization, retries)
If a downstream model is unhealthy and you keep retrying:
- You burn budget
- You increase latency
- You overload your own infrastructure
Circuit breaker benefit:
Fails fast instead of wasting tokens and compute on doomed requests.
3. Latency Spikes Break User Experience
LLM calls already have higher variance than traditional APIs.
When a model degrades:
- Responses can jump from 500ms → 30s
- Your UI hangs
- Async queues back up
- Threads and workers get exhausted
Circuit breaker benefit:
Short-circuits slow calls and allows you to:
- Return cached answers
- Use a smaller/faster model
- Respond with a graceful fallback
4. Prevent Cascading Failures in AI Pipelines
AI apps are often multi-step pipelines:
- Retrieval (vector DB)
- Prompt assembly
- LLM inference
- Tool calls
- Post-processing
If one step fails and retries blindly:
- Queues explode
- Downstream services overload
- Entire pipeline collapses
Circuit breaker benefit:
Contains failure to one component instead of letting it spread.
5. Enables Smart Degradation (Critical for AI UX)
AI apps don’t need to be all-or-nothing.
With circuit breakers, you can:
- Switch to a cheaper or local model
- Reduce context size
- Disable tools or agents
- Return partial or approximate answers
- Fall back to rules or templates
Circuit breaker benefit:
Your app still works — just with reduced intelligence instead of total failure.
6. Protects You From Thundering Herd Problems
When an AI provider recovers:
- Thousands of queued requests retry at once
- Rate limits trigger again
- Outage prolongs
Circuit breaker benefit:
Controls retry behavior and re-opens gradually (half-open state).
In Short
You need circuit breaker patterns in AI/LLM apps because:
LLMs are slow, expensive, external, rate-limited, and probabilistic — and failure is normal.
Circuit breakers turn those failures into controlled, predictable behavior instead of outages.
