How to control llm calls with rate limiter

JoeMukherjee · September 13, 2025, 12:02pm

how to use process queue and rate limiter with agno so that if i have a rate limit of 3 and one agent does 2 llm calls and another does 1, by default rate limiter the third agent will fail or get exponentially backed off and multiple api calls will be tried and wasted

Monali · September 15, 2025, 6:52am

Hey @JoeMukherjee, Thank you for your question.

Agno doesn’t have a built-in global queue for this yet, so the recommended approach is to add a shared process queue + rate limiter around your agents/teams.

Instead of each agent calling the LLM directly, they submit requests into the queue. A single worker then pulls tasks out and executes them under the shared rate limit (e.g., 3 calls/sec).

So the fix is to introduce a queue that enforces the global limit, rather than letting each agent back off on its own.

Let me know if you have any followup questions

Topic		Replies	Views
Tool call limit parameter for agent and team General tool-call , feature-requests	3	146	April 30, 2025
Reasoning agent is going into loop General agent	2	287	September 25, 2025
I need rate limiter for free tier on Gemini General agent , teams , workflows , feature-requests , feedback	3	54	September 9, 2025
Each user will user a separate API Key General agent , feature-requests	2	122	September 15, 2025
Why can't the team agent be scheduled as I expected General agent , teams	2	133	September 29, 2025

How to control llm calls with rate limiter

Related topics