Running evals on stored runs

aberk · July 10, 2025, 12:48pm

We are looking to be able to run evals on stored runs - I believe this data is stored in sessions. Is there a standard way to do this? Like would we query the DB and rebuild the run object to run an eval>

Monali · July 11, 2025, 4:58am

Hi @aberk, thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.

in the meantime, please refer our docs: Simple Agent Evals - Agno

If it’s urgent, please let us know. We appreciate your patience!

manu · July 14, 2025, 8:52am

Hey there @aberk

Which type of evaluations are you looking to run?
– Our Accuracy evaluations can run with an already generated response, as you are proposing
– Our Reliability evaluations always expect a RunResponse (the stored runs you are thinking about) to run
– Our Performance evaluations are thought to evaluate functions, and are probably not what you are looking for right now
For both Accuracy and Reliability evaluations, yes, we can use stored runs!
To store and retrieve runs, you want to be using memory with your agents. This is a simple example:

# Setup memory
memory_db = SqliteMemoryDb(table_name="memory", db_file="tmp/memory.db")
memory = Memory(db=memory_db)

# Setup an agent
agent = Agent(
    memory=memory,
    storage=SqliteStorage(table_name="sessions", db_file="tmp/memory.db"),
    enable_user_memories=True,
)

# Run the agent
agent.run(
    ...,
    session_id="session_1",
)

# Access the runs in memory
session_runs = memory.runs["session_1"]

Then you can run your evaluations with those runs:

reliability_evaluation = ReliabilityEval(agent_response=session_run, ...)
reliability_evaluation_results  = reliability_evaluation.run(print_results=True)

accuracy_evaluation = AccuracyEval(...)
accuracy_evaluation_result = accuracy_evaluation.run_with_output(output=session_run.content, ...)

Let me know if I can help with anything else, and thanks for using Agno!

Topic		Replies	Views
Initialize agent with custom chat history General agent , storage	3	54	May 30, 2025
Corrupt data on Evals preventing me from seeing data General bug	2	33	June 6, 2025
How does AGNO achieve a conversation store like OpenAI? General agent , memory , storage	2	59	April 23, 2025
Oh, my God! Why is my conversation store not taking effect? Obviously, the sqlite database and model have been configured General agent , storage	2	38	April 23, 2025
Using chat history instead of Run history General bug	5	50	June 26, 2025

Running evals on stored runs

Related topics