I hope this message finds you well. I am writing to seek clarification regarding the calculation of tokens, specifically how to retrieve the number of tokens used by the LLM when utilizing the stream method from team.arun.
Prior to the 1.6 version update, within the streaming responses, there was comprehensive information about all the messages sent and received by the team. This data was extremely useful as it allowed for the calculation of tokens used during the interaction with the LLM. However, after updating to version 1.6, this functionality seems to have disappeared.
This change has brought about significant inconvenience in our work. We rely on accurate token usage data for various purposes, such as cost estimation, resource allocation, and performance optimization. Without this information, it becomes challenging to effectively manage our usage of the LLM.
I would greatly appreciate it if you could provide guidance on how to calculate tokens in the new version 1.6 when using the stream method from team.arun. Any relevant code snippets, API calls, or detailed instructions would be extremely helpful.
Thank you very much for your time and assistance. I look forward to your prompt response.
Hi @Dreamer, thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.
If it’s urgent, please let us know. We appreciate your patience!
Hi @Dreamer
When running a team you can access the TeamRunResponse after the run is completed (via team.run_response. This would include metrics for that particular run, and member_responses which is a list of the individual member run responses which also has metrics on them. You can also then get access to the individual messages like that and access the metrics on the message.
I will add documentation to our docs to make this clearer.
Hey @Dreamer, here is an example for getting token metrics from team.arun
import asyncio
from agno.team.team import Team
from agno.models.openai import OpenAIChat
from agno.agent import Agent
async def stream_with_metrics():
# Set up team
team = Team(
model=OpenAIChat("gpt-4o"),
members=[
Agent(
name="Research Agent",
model=OpenAIChat("gpt-4o"),
role="Research information"
),
Agent(
name="Research Agent 2",
model=OpenAIChat("gpt-4o"),
role="Research information 2",
),
]
)
try:
# Run with streaming
stream = await team.arun("What is AI?", stream=True)
# Process stream chunks
response_content = ""
async for chunk in stream:
# Handle different chunk types
if hasattr(chunk, 'content') and chunk.content:
response_content += str(chunk.content)
# Get final metrics after stream completion
if team.run_response and team.run_response.metrics:
metrics = team.run_response.metrics
print("=== Token Usage Metrics ===")
print(f"Input tokens: {metrics.get('input_tokens', 0)}")
print(f"Output tokens: {metrics.get('output_tokens', 0)}")
print(f"Total tokens: {metrics.get('total_tokens', 0)}")
# Additional metrics that might be available
if 'cached_tokens' in metrics:
print(f"Cached tokens: {metrics['cached_tokens']}")
if 'cache_write_tokens' in metrics:
print(f"Cache write tokens: {metrics['cache_write_tokens']}")
return metrics
else:
print("No metrics available")
return None
except Exception as e:
print(f"Error during streaming: {e}")
return None
# Run the example
metrics = asyncio.run(stream_with_metrics())