Why do AccuracyEval and ReliabilityEval store model_id/model_provider of the evaluated Agent/Team instead of the judge model?

While reviewing the Agno evaluation module, I noticed that when AccuracyEval and ReliabilityEval log results to the database, the stored model_id and model_provider correspond to the evaluated Agent/Team’s model, not the judge (evaluation) model.

Relevant source:

  • @.venv/Lib/site-packages/agno/eval/reliability.py:264-292
    if self.db:
        if self.agent_response is not None:
            agent_id = self.agent_response.agent_id
            team_id = None
            model_id = self.agent_response.model
            model_provider = self.agent_response.model_provider
        elif self.team_response is not None:
            agent_id = None
            team_id = self.team_response.team_id
            model_id = self.team_response.model
            model_provider = self.team_response.model_provider

        eval_input = {
            "expected_tool_calls": self.expected_tool_calls,
        }

        log_eval_run(
            db=self.db,
            run_id=self.eval_id,  # type: ignore
            run_data=asdict(self.result),
            eval_type=EvalType.RELIABILITY,
            name=self.name if self.name is not None else None,
            agent_id=agent_id,
            team_id=team_id,
            model_id=model_id,
            model_provider=model_provider,
            eval_input=eval_input,
        )
  • @.venv/Lib/site-packages/agno/eval/accuracy.py:567-602
# Log results to the Agno DB if requested
if self.agent is not None:
    agent_id = self.agent.id
    team_id = None
    model_id = self.agent.model.id if self.agent.model is not None else None
    model_provider = self.agent.model.provider if self.agent.model is not None else None
    evaluated_component_name = self.agent.name
elif self.team is not None:
    agent_id = None
    team_id = self.team.id
    model_id = self.team.model.id if self.team.model is not None else None
    model_provider = self.team.model.provider if self.team.model is not None else None
    evaluated_component_name = self.team.name

if self.db:
    log_eval_input = {
        "additional_guidelines": self.additional_guidelines,
        "additional_context": self.additional_context,
        "num_iterations": self.num_iterations,
        "expected_output": self.expected_output,
        "input": self.input,
    }

    log_eval_run(
        db=self.db,
        run_id=self.eval_id,  # type: ignore
        run_data=asdict(self.result),
        eval_type=EvalType.ACCURACY,
        agent_id=agent_id,
        team_id=team_id,
        model_id=model_id,
        model_provider=model_provider,
        name=self.name if self.name is not None else None,
        evaluated_component_name=evaluated_component_name,
        eval_input=log_eval_input,
    )

My understanding:

  • The evaluation is performed by a judge model in some setups.

  • But the DB fields record the evaluated model’s identity instead.

Questions:

  1. Is this an intentional design choice (i.e., eval results should be attributed to the evaluated model)?

  2. If so, where should the judge model metadata be captured, if at all?

  3. If not intentional, would it make sense to add judge model info to eval logs?

Thanks for any clarification!