Why is Agent.run() calling the model four times, and why does num_history_responses=2 still include more history?

ank · March 13, 2025, 11:31am

Hi Agno team and community,

I’m using the Agent class and running into a few issues with one Agent.run() call for my query (“how many refunds i did”). I’d really appreciate your help with these:

Why does it call the model four times (token counts: 4,861, 3,700, 2,601, 2,665) for one query?
I set num_history_responses=2, but it’s adding 4 messages from history—why isn’t it capping at 2?
I’m using ~9,600 tokens total—any tips on reducing this to fewer tokens (maybe ~1,000)?
I tried adding table schemas to the knowledge base instead of the prompt, but the accuracy got worse—any advice?

Here’s my setup and logs with some extra debugging. Thanks so much for any insights or suggestions!

Here’s my config:

@dataclass
class Config:
    model: str =  gpt-4o-mini
    retries: int =3
    stream: bool = True
    num_history_responses: int = 2
    add_history_to_messages: bool = True
    read_chat_history: bool = True  
    read_tool_call_history: bool = False 
    add_references: bool = False
    num_documents: int =2
    debug_mode: bool = True 
    create_user_memories: bool = True  
    update_user_memories_after_run: bool = False  
    create_session_summary: bool = False  
    update_session_summary_after_run: bool = False
    add_datetime_to_instructions = True

Here’s my logs:

https://pastebin.com/embed_iframe/nn6gVJZN

Monali · March 17, 2025, 3:42am

Hi @ank
Thanks for reaching out and for using Agno! I’ve looped in the right engineers to help with your question. We usually respond within 24 hours, but if this is urgent, just let us know, and we’ll do our best to prioritize it.
Appreciate your patience—we’ll get back to you soon!

yash · March 20, 2025, 4:47am

Hey @ank !

Whenever the model requests a tool call, a separate call to the api is made with the tool call result. So, for a message with a single tool call, you will observe a total of 2 api calls. One initial call and a following call with the tool call result.
num_history_responses param captures the number of runs, not the number of messages. The reason for this implementation is that messages like tool calls and tool call results are tired together and the api will throw an error if only 1 of them is sent with the request. So, we maintain a pair of messages by linking them via a run instead of single messages.
I noticed that your system message is quite verbose, looks like that is the main reason for the high token count
Getting SQL right with an Agent is a tricky use case, in my experience, adding the schema to the prompts results in the best outcome. Take a look at this example. It uses the knowledge base to get the required context for a table.

Please let us know if you have any questions

Topic		Replies	Views
Tool call limit parameter for agent and team General tool-call , feature-requests	3	62	April 30, 2025
Tool calls returned as messages General tool-call	4	70	February 25, 2025
How to access tokens of the specific prompt General agent , knowledge	3	31	May 29, 2025
Using chat history instead of Run history General bug	5	50	June 26, 2025
Issue when add_history_to_messages=True General agent , memory	5	102	February 18, 2025

Why is Agent.run() calling the model four times, and why does num_history_responses=2 still include more history?

Related topics