Issue when add_history_to_messages=True

Hello Agno community! I’m encountering an issue with my document review agent.

I’m using an agent that processes document chunks of fixed size and returns results using Pydantic models. I have add_history_to_messages=True and num_history_responses=2 configured, but I’m getting a Claude token limit error:

Error code: 400 - prompt is too long: 210198 tokens > 200000 maximum

What’s strange is that everything works fine if I remove these history settings. Given that I’m processing fixed-size chunks and only keeping the last 2 responses in history, I’d expect the input size to remain relatively constant. Each message should just be current chunk + previous 2 responses.

Is it possible that the history is accumulating recursively? Like, if previous messages also contain their own history, could it create a Fibonacci-like growth pattern where each message contains its previous two, which each contain their previous two, and so on?

Hi @Southern_Push2935
Thanks for reaching out and for using Agno! I’ve looped in the right engineers to help with your question. We usually respond within 24 hours, but if this is urgent, just let us know, and we’ll do our best to prioritize it.
Appreciate your patience—we’ll get back to you soon! :smile:

Hi @Southern_Push2935, thanks for bringing this issue to our attention!

Could you provide more details about the specific models where you’re encountering this issue? Also can you share your agent ? It would help us debug more effectively and identify any potential causes of the recursive history accumulation.

Thank you for the response! Here’s my agent configuration:

from agno.agent import Agent
from agno.models.anthropic import Claude
from pydantic import BaseModel, Field
from typing import List, Literal

class Error(BaseModel):
    severity: Literal["Critical", "Major", "Minor"] = Field(description="...")
    error_type: Literal["grammar", "spelling", "formatting", "consistency", 
                       "legal", "reference", "structure"] = Field(description="...")

class ChunkReport(BaseModel):
    chunk_index: int = Field(description="...")
    errors: List[Error] = Field(default_factory=list, description="...")
    total_errors: int = Field(default=0, description="...")

chunk_reviewer = Agent(
    name="Chunk Reviewer",
    model=Claude(
        id=CLAUDE_SONNET_MODEL_ID, 
        api_key=ANTHROPIC_API_KEY, 
        temperature=0.1, 
        max_tokens=2048
    ),
    role="Document quality control.",
    description="Review documents for quality control.",
    instructions=[
        "..."
    ],
    response_model=ChunkReport,
    add_history_to_messages=True,
    num_history_responses=2,
)

As you can see, I’m using Claude Sonnet with add_history_to_messages=True and num_history_responses=2. The token limit error only occurs with these history settings enabled, which makes me suspect there might be some recursive accumulation of history happening.

I’m processing documents in fixed-size chunks, calling the reviewer sequentially for each chunk. The token limit error only appears around iterations 10-12.

What’s strange is that with num_history_responses=2, each iteration should only include the current chunk plus the last 2 responses. But the error suggests the input is growing much larger. Could each response in the history be carrying its own history recursively? Is there a way to inspect the actual input message being sent to Claude? Since Agno is managing the Anthropic client, I can’t see the full prompt to verify if there’s indeed a recursive history accumulation happening.

Hi @Southern_Push2935 ,Thanks for sharing the information.The latest version 1.1.4 of the sdk should fix this issue.
Also can you please let us know the agno sdk version you are on ?

Thank you so much! I’m currently using v0.8, and I’ll make sure to update to the latest version as soon as possible.