Why is Agent.run() calling the model four times, and why does num_history_responses=2 still include more history?

Hi Agno team and community,

I’m using the Agent class and running into a few issues with one Agent.run() call for my query (“how many refunds i did”). I’d really appreciate your help with these:

  1. Why does it call the model four times (token counts: 4,861, 3,700, 2,601, 2,665) for one query?
  2. I set num_history_responses=2, but it’s adding 4 messages from history—why isn’t it capping at 2?
  3. I’m using ~9,600 tokens total—any tips on reducing this to fewer tokens (maybe ~1,000)?
  4. I tried adding table schemas to the knowledge base instead of the prompt, but the accuracy got worse—any advice?

Here’s my setup and logs with some extra debugging. Thanks so much for any insights or suggestions!

Here’s my config:

@dataclass
class Config:
    model: str =  gpt-4o-mini
    retries: int =3
    stream: bool = True
    num_history_responses: int = 2
    add_history_to_messages: bool = True
    read_chat_history: bool = True  
    read_tool_call_history: bool = False 
    add_references: bool = False
    num_documents: int =2
    debug_mode: bool = True 
    create_user_memories: bool = True  
    update_user_memories_after_run: bool = False  
    create_session_summary: bool = False  
    update_session_summary_after_run: bool = False
    add_datetime_to_instructions = True

Here’s my logs:

https://pastebin.com/embed_iframe/nn6gVJZN

Hi @ank
Thanks for reaching out and for using Agno! I’ve looped in the right engineers to help with your question. We usually respond within 24 hours, but if this is urgent, just let us know, and we’ll do our best to prioritize it.
Appreciate your patience—we’ll get back to you soon! :smile: