Issue when add_history_to_messages=True

Southern_Push2935 · February 13, 2025, 4:16pm

Hello Agno community! I’m encountering an issue with my document review agent.

I’m using an agent that processes document chunks of fixed size and returns results using Pydantic models. I have add_history_to_messages=True and num_history_responses=2 configured, but I’m getting a Claude token limit error:

Error code: 400 - prompt is too long: 210198 tokens > 200000 maximum

What’s strange is that everything works fine if I remove these history settings. Given that I’m processing fixed-size chunks and only keeping the last 2 responses in history, I’d expect the input size to remain relatively constant. Each message should just be current chunk + previous 2 responses.

Is it possible that the history is accumulating recursively? Like, if previous messages also contain their own history, could it create a Fibonacci-like growth pattern where each message contains its previous two, which each contain their previous two, and so on?

Monali · February 14, 2025, 6:28am

Hi @Southern_Push2935
Thanks for reaching out and for using Agno! I’ve looped in the right engineers to help with your question. We usually respond within 24 hours, but if this is urgent, just let us know, and we’ll do our best to prioritize it.
Appreciate your patience—we’ll get back to you soon!

monalisha · February 17, 2025, 8:24am

Hi @Southern_Push2935, thanks for bringing this issue to our attention!

Could you provide more details about the specific models where you’re encountering this issue? Also can you share your agent ? It would help us debug more effectively and identify any potential causes of the recursive history accumulation.

Southern_Push2935 · February 17, 2025, 1:35pm

Thank you for the response! Here’s my agent configuration:

from agno.agent import Agent
from agno.models.anthropic import Claude
from pydantic import BaseModel, Field
from typing import List, Literal

class Error(BaseModel):
    severity: Literal["Critical", "Major", "Minor"] = Field(description="...")
    error_type: Literal["grammar", "spelling", "formatting", "consistency", 
                       "legal", "reference", "structure"] = Field(description="...")

class ChunkReport(BaseModel):
    chunk_index: int = Field(description="...")
    errors: List[Error] = Field(default_factory=list, description="...")
    total_errors: int = Field(default=0, description="...")

chunk_reviewer = Agent(
    name="Chunk Reviewer",
    model=Claude(
        id=CLAUDE_SONNET_MODEL_ID, 
        api_key=ANTHROPIC_API_KEY, 
        temperature=0.1, 
        max_tokens=2048
    ),
    role="Document quality control.",
    description="Review documents for quality control.",
    instructions=[
        "..."
    ],
    response_model=ChunkReport,
    add_history_to_messages=True,
    num_history_responses=2,
)

As you can see, I’m using Claude Sonnet with add_history_to_messages=True and num_history_responses=2. The token limit error only occurs with these history settings enabled, which makes me suspect there might be some recursive accumulation of history happening.

I’m processing documents in fixed-size chunks, calling the reviewer sequentially for each chunk. The token limit error only appears around iterations 10-12.

What’s strange is that with num_history_responses=2, each iteration should only include the current chunk plus the last 2 responses. But the error suggests the input is growing much larger. Could each response in the history be carrying its own history recursively? Is there a way to inspect the actual input message being sent to Claude? Since Agno is managing the Anthropic client, I can’t see the full prompt to verify if there’s indeed a recursive history accumulation happening.

monalisha · February 18, 2025, 8:09am

Hi @Southern_Push2935 ,Thanks for sharing the information.The latest version 1.1.4 of the sdk should fix this issue.
Also can you please let us know the agno sdk version you are on ?

Southern_Push2935 · February 18, 2025, 1:16pm

Thank you so much! I’m currently using v0.8, and I’ll make sure to update to the latest version as soon as possible.

Topic		Replies	Views
Why is Agent.run() calling the model four times, and why does num_history_responses=2 still include more history? General agent , tool-call , bug	2	59	March 20, 2025
After conducting multiple conversations in the same session, my question tokens keep exceeding the limit General agent , memory , knowledge	3	53	May 16, 2025
Error: 'text content blocks must be non-empty' when using Claude Agent with Pydantic Response Model General agent	4	52	February 6, 2025
Initialize agent with custom chat history General agent , storage	3	37	May 30, 2025
Token over error for Document Chunking on Japanese pdf and csv documents General knowledge	6	45	March 19, 2025

Issue when add_history_to_messages=True

Related topics