We are looking to be able to run evals on stored runs - I believe this data is stored in sessions. Is there a standard way to do this? Like would we query the DB and rebuild the run object to run an eval>
Hi @aberk, thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.
in the meantime, please refer our docs: Simple Agent Evals - Agno
If it’s urgent, please let us know. We appreciate your patience!
Hey there @aberk
- Which type of evaluations are you looking to run?
– Our Accuracy evaluations can run with an already generated response, as you are proposing
– Our Reliability evaluations always expect a RunResponse (the stored runs you are thinking about) to run
– Our Performance evaluations are thought to evaluate functions, and are probably not what you are looking for right now - For both Accuracy and Reliability evaluations, yes, we can use stored runs!
- To store and retrieve runs, you want to be using memory with your agents. This is a simple example:
# Setup memory
memory_db = SqliteMemoryDb(table_name="memory", db_file="tmp/memory.db")
memory = Memory(db=memory_db)
# Setup an agent
agent = Agent(
memory=memory,
storage=SqliteStorage(table_name="sessions", db_file="tmp/memory.db"),
enable_user_memories=True,
)
# Run the agent
agent.run(
...,
session_id="session_1",
)
# Access the runs in memory
session_runs = memory.runs["session_1"]
- Then you can run your evaluations with those runs:
reliability_evaluation = ReliabilityEval(agent_response=session_run, ...)
reliability_evaluation_results = reliability_evaluation.run(print_results=True)
accuracy_evaluation = AccuracyEval(...)
accuracy_evaluation_result = accuracy_evaluation.run_with_output(output=session_run.content, ...)
Let me know if I can help with anything else, and thanks for using Agno!