Agno Agent (GPT-4o) Ignoring Explicit First Tool Call Instruction in Procedural Flow

Tristan · April 16, 2025, 12:35am

Hi All,

I’m encountering persistent issues getting an Agno Agent (using GPT-4o) to strictly follow a specific procedural instruction, specifically regarding its first tool call.

Goal:
I have an onboarding_evaluator agent designed to:

Greet the user and get their work email.
MANDATORILY call a specific tool (check_or_save_user) with the email first to check if the user exists in our database.
Based on the tool result:
- If user exists: Respond with “Welcome back” and stop.
- If user is new: Call a second tool (trigger_research_tool) and then immediately proceed with asking follow-up questions (division, role, etc.).
Gather remaining details and finally call check_or_save_user again to save all info.

Agent Setup:

Model: OpenAIChat(id="gpt-4o")

Instructions: Detailed, multi-step conversational flow emphasizing the mandatory nature of Step 3 (calling check_or_save_user first and only, immediately after getting the email). Example instruction snippet:

3. **MANDATORY FIRST TOOL CALL: Check User Status (Internal Action)**
   Immediately after the transition phrase in Step 2, your **ONLY ALLOWED ACTION** is to call the `check_or_save_user` tool with the email.
   `check_or_save_user(email: str)`
   **DO NOT** perform any other actions, searches, or ask any other questions before calling this tool and receiving its response. WAIT for the tool's result.

Tools: Currently configured only with check_or_save_user (custom tool) and trigger_research_tool (custom tool). (Note: DuckDuckGoTools was initially present but removed during troubleshooting).
Invocation: The agent is invoked via a FastAPI route (/v1/agents/onboarding_evaluator/runs). A get_agent function routes the request and calls a factory function (get_onboarding_evaluator) which creates a new Agent instance on each request, passing the globally defined instructions and tools list.

Problem:
Despite numerous refinements to the instructions emphasizing the mandatory first step, the agent consistently fails to call check_or_save_user after receiving the email.

Instead, it often:

Attempts to call duckduckgo_search (even when this tool is removed from the agent’s tools list in the code!).
Makes assumptions based on the email domain and proceeds to later steps incorrectly (e.g., assuming the user exists without checking, or asking follow-up questions without triggering research).
Sometimes uses the correct response text for a scenario (e.g., the “new user” text) but without actually performing the required preceding tool calls (check_or_save_user, trigger_research_tool).

Troubleshooting Performed:

Multiple iterations of refining instructions, adding keywords like “MANDATORY”, “ONLY ALLOWED ACTION”, “DO NOT”.
Temporarily removed potentially distracting tools (DuckDuckGoTools) from the agent’s configuration.
Successfully role-played the exact same instructions with the base GPT-4o model via the ChatGPT interface. In that context, the model correctly identified and simulated the mandatory check_or_save_user call first, followed the conditional logic for the trigger_research_tool, and proceeded correctly. This strongly suggests the issue lies within the Agno execution environment.
Confirmed the API route uses a factory function (get_onboarding_evaluator) to instantiate the agent per-request, rather than using a single global instance.

Question:
Why might the agent running within the Agno framework consistently ignore the explicit, mandatory instruction for its first tool call, even when the base model understands the instruction correctly outside the framework?

Are there known nuances or best practices for:

Ensuring strict procedural adherence with multi-step instructions and tool calls in Agno?
How tools are presented to the LLM (tool descriptions, potential conflicts)?
Potential differences in agent behavior when instantiated via a factory function per-request versus using a singleton/global instance within Agno?
Any Agno-specific context management or prompting that might interfere with these kinds of strict instructions?

Any insights or suggestions on how to debug or resolve this discrepancy would be greatly appreciated!

Thanks!

Monali · April 16, 2025, 6:05am

Hi @Tristan

thanks for reaching out and supporting Agno!We’ve shared this with the team and are working through requests one by one—we’ll get back to you as soon as we can.We’ve just kicked off the Global Agent Hackathon , so things are a bit busier than usual. If you’re up for it, we’d love for you to join—it’s a great chance to build, win some exciting prizes and connect with the agent community!If it’s urgent, just let us know. Thanks for your patience!

mustafa · April 16, 2025, 4:16pm

Hello!

Ideally that should not be case. I ran some tests and gpt-4o seems to be following the instrucions correctly.

Here’s a test script for reference. Could you try adding proper logging to our code and pinpoint the issue?

Optionally you could share the code snippets and the debug logs here. If they are too big a gist url would work as well. hope the issue is resolved now

Here’s the test script:

import os
import sys
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.function import Function
from pydantic import BaseModel, Field
from typing import List, Dict, Optional, Any
import json

# Add basic logging configuration
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# --- 1. Dummy Tool Definitions ---
# Define input/output schemas using Pydantic
class CheckUserInput(BaseModel):
    email: str = Field(..., description="The email address of the user.")

class CheckUserOutput(BaseModel):
    status: str = Field(..., description="Either 'existing_user' or 'new_user'.")
    message: str = Field(..., description="A message confirming the action.")

class TriggerResearchInput(BaseModel):
    email: str = Field(..., description="The email address for research context.")

class TriggerResearchOutput(BaseModel):
    status: str = Field(..., description="Status of the research trigger.")
    message: str = Field(..., description="A message confirming the action.")

# Define the tool functions (using logging)
def check_or_save_user_func(email: str) -> dict:
    logging.info(f"--- TOOL CALLED: check_or_save_user ---")
    logging.info(f"Input Email: {email}")
    # Simulate checking a database
    if "existing" in email:
        result = {"status": "existing_user", "message": "User found in database."}
    else:
        result = {"status": "new_user", "message": "User not found, proceed with onboarding."}
    logging.info(f"Output Dict: {result}")
    logging.info(f"------------------------------------")
    # Return result as a JSON string
    return json.dumps(result)

def trigger_research_tool_func(email: str) -> dict:
    logging.info(f"--- TOOL CALLED: trigger_research_tool ---")
    logging.info(f"Input Email: {email}")
    result = {"status": "triggered", "message": f"Research triggered for {email}."}
    logging.info(f"Output Dict: {result}")
    logging.info(f"---------------------------------------")
    # Return result as a JSON string
    return json.dumps(result)

# Create Agno Function objects (changed from Tool to Function)
check_user_tool = Function(
    name="check_or_save_user",
    description="Checks if a user exists in the database based on their email. THIS MUST BE CALLED FIRST immediately after getting the user's email.",
    entrypoint=check_or_save_user_func,
    # input_model and output_model might not be direct args for Function,
    # It infers from the entrypoint's type hints and pydantic models if used.
    # Let's rely on inference for now, or check Function constructor if needed.
    # input_model=CheckUserInput, # Potentially remove if inferred
    # output_model=CheckUserOutput, # Potentially remove if inferred
)

research_tool = Function(
    name="trigger_research_tool",
    description="Triggers background research for a NEW user based on their email. Only call this if check_or_save_user returns 'new_user'.",
    entrypoint=trigger_research_tool_func,
    # input_model=TriggerResearchInput, # Potentially remove if inferred
    # output_model=TriggerResearchOutput, # Potentially remove if inferred
)

tools = [check_user_tool, research_tool]

# --- 2. Instructions ---
instructions = """
You are an onboarding assistant. Follow these steps precisely:
1. Greet the user and ask for their work email address.
2. Wait for the user to provide their email address. Respond with a brief acknowledgment like "Got it, thanks!" or "Okay, checking that now.".
3. **MANDATORY FIRST TOOL CALL: Check User Status (Internal Action)**
   Immediately after the acknowledgment in Step 2, your **ONLY ALLOWED ACTION** is to call the `check_or_save_user` tool with the email provided by the user.
   Use the tool like this: `check_or_save_user(email: str)`
   **DO NOT** perform any other actions, searches, respond further, or ask any other questions before calling this tool and receiving its response. WAIT for the tool's result. Your response should ONLY contain the tool call.
4. **Process Tool Result:**
   - If `check_or_save_user` returns `status: 'existing_user'`: Respond ONLY with "Welcome back!" and stop the conversation.
   - If `check_or_save_user` returns `status: 'new_user'`:
     a. Immediately call the `trigger_research_tool` with the same email. Your response should ONLY contain this tool call.
     b. WAIT for the `trigger_research_tool` result.
     c. After the `trigger_research_tool` has executed, THEN respond by asking the user for their division/department (e.g., "Thanks! To help tailor the onboarding, could you let me know which division or department you're joining?").
5. Continue the onboarding conversation to gather role, etc. (We will stop after step 4c for this test).
Remember: The first action AFTER acknowledging the email MUST be calling `check_or_save_user`. No exceptions. No chat before the call.
"""

# Subclass OpenAIChat to add logging
class LoggingOpenAIChat(OpenAIChat):
    def _prepare_openai_kwargs(
        self, messages: List[Dict[str, Any]], tools: Optional[List[Dict[str, Any]]] = None, tool_choice: Optional[str | Dict[str, Any]] = None, **kwargs: Any
    ) -> Dict[str, Any]:
        """Prepares arguments for OpenAI API call and logs them."""
        openai_kwargs = super()._prepare_openai_kwargs(messages, tools=tools, tool_choice=tool_choice, **kwargs)

        logging.info("\n--- DEBUG: Preparing OpenAI Request ---")
        logging.info(f"Model: {openai_kwargs.get('model')}")
        logging.info(f"Messages: {openai_kwargs.get('messages')}")
        logging.info(f"Tools: {openai_kwargs.get('tools')}")
        logging.info(f"Tool Choice: {openai_kwargs.get('tool_choice')}")
        logging.info("-------------------------------------\n")
        return openai_kwargs

    # Override invoke or the method that calls the API to log the response
    # Based on potential Agno structure, let's assume invoke calls a lower-level method
    # If this doesn't log, we might need to override _invoke or _call
    def invoke(self, messages: list[dict], tools: list[dict] | None = None, tool_choice: str | dict | None = None) -> dict:
        # Note: Logging of the request happens in _prepare_openai_kwargs now
        try:
            # Corrected: Call super().invoke with only the arguments it expects.
            # The base invoke likely uses instance variables or prepared kwargs for tools.
            response = super().invoke(messages) # Removed tools and tool_choice arguments
            logging.info("\n--- DEBUG: Raw Response from OpenAI ---") # Corrected line
            logging.info(response)
            logging.info("-------------------------------------\n") # Corrected line
            return response
        except Exception as e:
            # This block should ideally not be hit if the invoke call is correct
            logging.error(f"Error during OpenAI API call: {e}", exc_info=True)
            # Return a dummy error structure or re-raise
            return {"error": str(e), "choices": [{"message": {"role": "assistant", "content": f"Error calling model: {e}"}}]}


# Configure the model instance
# Make sure you have OPENAI_API_KEY set in your environment
model = LoggingOpenAIChat(id="gpt-4o", api_key=api_key)

# --- 4. Agent Creation Function ---
def create_test_agent(agent_description: str):
    return Agent(
        model=model,
        instructions=instructions,
        tools=tools,
        show_tool_calls=True, # Show Agno's interpretation of tool calls/results
        description=agent_description,
        # Add memory clear or ensure state doesn't leak if needed,
        # but new instance per test should suffice.
    )

# --- 5. Run Test Function ---
def run_test_case(agent_instance: Agent, user_email: str):
    logging.info(f">>> Simulating user providing email: {user_email}")
    logging.info(">>> Agent's response:")
    # Use print_response to handle the turn, including potential tool calls/responses
    # print_response adds the user message to history and gets the agent's full turn response
    agent_instance.print_response(user_email)
    logging.info("<<< End Agent Turn")


# --- 6. Execute Test Cases ---
if __name__ == "__main__":
    user_email_new = "test.new@example.com"
    user_email_existing = "test.existing@example.com"

    # --- Test Case 1: New User ---
    logging.info("="*20 + " TEST CASE 1: NEW USER " + "="*20)
    # We assume the agent has already greeted and asked for the email.
    # The conversation history starts effectively empty for this test run,
    # and the user's *first* message is the email.
    agent_new = create_test_agent("Test agent - New User")
    run_test_case(agent_new, user_email_new)
    logging.info("="*50 + "\n")


    # --- Test Case 2: Existing User ---
    logging.info("="*20 + " TEST CASE 2: EXISTING USER " + "="*20)
    # Create a new agent instance to ensure clean state/history
    agent_existing = create_test_agent("Test agent - Existing User")
    run_test_case(agent_existing, user_email_existing)
    logging.info("="*50 + "\n")

    logging.info("Test script finished.")

Tristan · April 16, 2025, 9:33pm

Thank you for this Mustafa. Will try and make some mods and get back to you.

Topic		Replies	Views
Irregular Behavior of FileTools General tool-call	4	46	February 17, 2025
SQLTools with agno agent using groq General agent	2	31	February 27, 2025
How to Yield Response from an agent while executing the tool General agent , tool-call	3	81	February 21, 2025
Approach for 2 users (End user + Approver HITL) General agent , tool-call	2	32	April 7, 2025
Create agent at Run time General agent , tool-call , feature-requests	5	127	April 24, 2025

Agno Agent (GPT-4o) Ignoring Explicit First Tool Call Instruction in Procedural Flow

Related topics