Implementing Retry Strategies in Google ADK: Handling 429 Errors Gracefully

When I first deployed my multi agent system to production, I quickly ran into a frustrating problem: HTTP 429 "Too Many Requests" errors that would break user conversations mid-flow. At first, I thought these were rare edge cases, but as usage grew, they became a significant reliability issue that I needed to solve.

The Problem: Production Reality Hits

Google AI APIs implement rate limiting to ensure fair usage. When my application makes too many requests in a short period, I get HTTP 429 errors. Without a retry mechanism, these errors would propagate to my users, causing failed operations and terrible user experience. I'd see conversations just stop working, and users would have to reprompt the agent or restart their sessions - not exactly the seamless AI experience I was aiming for.

The Solution: HttpRetryOptions

After some research, I discovered that Google ADK now supports HttpRetryOptions through the underlying GenAI client ADK v1.9.0.. This was exactly what I needed - configurable retry logic with exponential backoff. Here's how I implemented it in my system:

from google.genai import types
from google.adk.models import Gemini

# Configure retry options - this solved my production issues!
HTTP_RETRY_OPTIONS = types.HttpRetryOptions(
    attempts=5,  # Number of retry attempts
    initial_delay=1.0,  # Initial delay before first retry in seconds
    max_delay=120.0,  # Maximum delay in seconds
    exp_base=2,  # Exponential backoff base
    jitter=0.1,  # Add 10% jitter to prevent thundering herd
    http_status_codes=[408, 429, 500, 502, 503, 504]  # Status codes to retry
)

# Apply to your model instances
MODEL = Gemini(
    model="gemini-2.5-flash",
    retry_options=HTTP_RETRY_OPTIONS
)

This simple change dramatically improved my system's reliability. Instead of failing on rate limits, my agents would automatically retry with exponential backoff, giving the API time to recover while still providing a good user experience.

Key Parameters Explained

`attempts`

This controls how many retry attempts after the initial request fails. I set it to 5, which means the system tries up to 6 times total (1 initial + 5 retries). In my experience, this handles most transient issues without making users wait too long.

`initial_delay` and `max_delay`

initial_delay: Starting delay before the first retry (in seconds)
max_delay: Upper bound for delay to prevent excessive waiting times

1.0 second initial delay with a 120-second maximum are the defaults - it's fast enough to not impact user experience significantly, but provides enough time for the API to recover.

`exp_base`

The exponential backoff base. With exp_base=2, delays follow: 1s, 2s, 4s, 8s, 16s (capped at max_delay). This exponential growth is crucial because it gives increasingly more time for the API to recover while preventing rapid-fire retries.

`jitter`

A float between 0-1 that adds randomness to delays. When jitter=0.1, it adds up to 10% randomness to each delay. This prevents "thundering herd" problems.

What's the thundering herd problem? Imagine all your agents hit a rate limit at the same time. Without jitter, they'd all retry after exactly 1 second, then 2 seconds, then 4 seconds - creating synchronized waves of requests that can overwhelm the API again. With jitter, they retry at slightly different times (1.05s, 0.95s, 1.02s, etc.), spreading out the load much more evenly.

The default jitter=0.1 adds 10% randomness to each retry delay.

`http_status_codes`

Specifies which HTTP status codes should trigger retries. I chose:

408: Request timeout
429: Rate limit exceeded (this was my main problem)
500, 502, 503, 504: Server-side errors

Best Practices I Learned the Hard Way

Configure for Your Use Case

Adjust attempts and delays based on your application's tolerance for latency vs. failure rates. For my sourcing agents, I prioritized reliability over speed since users expect comprehensive results.

Don't Retry Other Client Errors

I learned to avoid retrying 4xx errors (except 429) since they typically indicate real issues with my requests rather than transient problems. This prevents endless retry loops on actual errors.

Use Across All Models

Apply consistent retry configuration across all Gemini model instances to ensure uniform reliability, ensuring that whether root orchestrator or specialized sub-agents all benefit from the same robust error handling mechanism. No more inconsistent behavior between different agents!

TL;DR: My Key Takeaway

The retry_options feature made available in ADK v1.9.0. transformed transient API failures from user-facing issues into automatically resolved background events. By implementing proper retry strategies with exponential backoff, my ADK application now gracefully handles rate limits and network issues without compromising user experience.

This single change had one of the biggest impacts on my production system's reliability - highly recommended for anyone running ADK agents in production!

This is part 3 of my series reflecting on building production AI agents with Google ADK and AWS. Check out the other posts in the series: