Hey Agno team,
I’m running into unexpectedly high token usage when enabling ThinkingTools
, even on relatively simple agent runs. For example, I have an RFQ agent that looks up relevant spreadsheet and calculate the correct quote:
- RFQ Agent Without ThinkingTools – 3,517 tokens
- RFQ Agent With ThinkingTools – 15,974 tokens
- I also had a single agent run consume ~70k tokens (
), which is dangerously close to the 128k limit for what should be a lightweight RFQ reply task.
I’m trying to design an agentic RAG loop, where the agent:
- Makes a custom retrieval tool call:
fetch_file_chunks(query="JFK to LAX")
- Reflects on whether it has enough information:
ThinkingTool
- If not, call the tool again:
fetch_file_chunks(query="fuel surcharges for JFK to LAX")
- Repeat until agent has all the info needed, and calculate the quote
To do this, I’m using a custom fetch_file_chunks
tool instead of Agno’s built-in knowledge base, because we store and embed our data in a postgres table with pgvector vector_embedding
column. So far this works — but once ThinkingTools
is enabled, the token usage jumps massively and unpredictably.
A few questions:
- Is ThinkingTools appending too much intermediate context each step (e.g. past tool call results)?
- What’s the best practice to implement this type of loop without blowing the token budget?
- Would love any advice for making this kind of agent work more efficiently using Agno.
- Any plans on supporting custom RAG pipelines? I would love to take advantage of Agno’s hybrid search capabilities but this issue is preventing me since I’m using pgvector column in supabase that is not hardcoded as
name
column
Appreciate the help — this is such a powerful framework and I’m excited to push it further!
Agent Setup (simplified version):
from textwrap import dedent
import random
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools import tool
from agno.tools.reasoning import ReasoningTools
from agno.tools.thinking import ThinkingTools
import json
from pprint import pprint
@tool()
def fetch_chunks(query: str) -> str:
# redacted -- it's a basic hybrid search RPC function on supabase
return result
def main():
reasoning_agent = Agent(
model=OpenAIChat(id="gpt-4o"),
tools=[
fetch_chunks,
# uncommenting this line makes the token usage jump
# ThinkingTools(add_instructions=True),
],
instructions=dedent(
"""\
You are a professional specializing in RFQs.
Your task is to draft a professional, concise, and context-aware response to the latest inbound email(s) and provide a friendly and professional response.
You will be given the following information:
1. System Prompt: This is the company's system prompt. Use this to understand the company's policies and procedures.
2. Email Details: This is the latest inbound email(s) and any attachments.
3. Email Thread History: This is the history of the email thread. Use this to understand the context of the conversation.
fetch_chunks tool can perform keyword/hybrid search on the source file chunks for you. Source file chunks are company's knowledge base about the different rates and services.
Only use the information from this section when calculating rates for quotes or answering questions. Do not make up information, if unsure, make your best guess based on the source file chunks and include your assumption in the final quote email.
When calling the fetch_chunks tool, you should provide a query that is relevant to the email details and email thread history.
Here are the examples of the tool calls:
- Tool call: fetch_chunks(query="FRA")
- Tool call: fetch_chunks(query="HKG")
After each tool call, read the result and see if you have enough information to calculate the rate. If not, you should make another tool call with a different query.
\
"""
),
add_datetime_to_instructions=True,
stream_intermediate_steps=True,
show_tool_calls=True,
markdown=True,
monitoring=True,
)
prompt = """
=== Email Details ===
Subject: RFQ: Shipping Quote Request for Container from HKG to FRA
From: test@example.com
Body: Hello,
I need a quote for shipping 200kg worth of goods from HKG to FRA.
Please provide your best rates and transit time.
Best regards,
"""
result = reasoning_agent.print_response(prompt, stream=True)
if __name__ == "__main__":
main()
Postgres table w/ pgvector setup:
create table public.source_file_chunks (
id uuid not null default extensions.uuid_generate_v4 (),
source_file_id uuid not null,
content text not null, -- raw text content
vector_embedding public.vector(1536) null, -- 1536 dimension pgvector
) TABLESPACE pg_default;