Regarding Token Calculation in Version 1.6 When Using team.arun's Stream Method

I hope this message finds you well. I am writing to seek clarification regarding the calculation of tokens, specifically how to retrieve the number of tokens used by the LLM when utilizing the stream method from team.arun.

Prior to the 1.6 version update, within the streaming responses, there was comprehensive information about all the messages sent and received by the team. This data was extremely useful as it allowed for the calculation of tokens used during the interaction with the LLM. However, after updating to version 1.6, this functionality seems to have disappeared.

This change has brought about significant inconvenience in our work. We rely on accurate token usage data for various purposes, such as cost estimation, resource allocation, and performance optimization. Without this information, it becomes challenging to effectively manage our usage of the LLM.

I would greatly appreciate it if you could provide guidance on how to calculate tokens in the new version 1.6 when using the stream method from team.arun. Any relevant code snippets, API calls, or detailed instructions would be extremely helpful.

Thank you very much for your time and assistance. I look forward to your prompt response.

Hi @Dreamer, thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.
If it’s urgent, please let us know. We appreciate your patience!

This matter is urgent for us, and we would be deeply grateful if you could assist in resolving it at your earliest convenience.

Hi @Dreamer
When running a team you can access the TeamRunResponse after the run is completed (via team.run_response. This would include metrics for that particular run, and member_responses which is a list of the individual member run responses which also has metrics on them. You can also then get access to the individual messages like that and access the metrics on the message.

I will add documentation to our docs to make this clearer.

Hey @Dreamer, here is an example for getting token metrics from team.arun

import asyncio
from agno.team.team import Team
from agno.models.openai import OpenAIChat
from agno.agent import Agent

async def stream_with_metrics():
    # Set up team
    team = Team(
        model=OpenAIChat("gpt-4o"),
        members=[
            Agent(
                name="Research Agent",
                model=OpenAIChat("gpt-4o"),
                role="Research information"
            ),
            Agent(
                name="Research Agent 2",
                model=OpenAIChat("gpt-4o"),
                role="Research information 2",
            ),
        ]
    )
    
    try:
        # Run with streaming
        stream = await team.arun("What is AI?", stream=True)
        
        # Process stream chunks
        response_content = ""
        async for chunk in stream:
            # Handle different chunk types
            if hasattr(chunk, 'content') and chunk.content:
                response_content += str(chunk.content)
                
        # Get final metrics after stream completion
        if team.run_response and team.run_response.metrics:
            metrics = team.run_response.metrics
            
            print("=== Token Usage Metrics ===")
            print(f"Input tokens: {metrics.get('input_tokens', 0)}")
            print(f"Output tokens: {metrics.get('output_tokens', 0)}")
            print(f"Total tokens: {metrics.get('total_tokens', 0)}")
            
            # Additional metrics that might be available
            if 'cached_tokens' in metrics:
                print(f"Cached tokens: {metrics['cached_tokens']}")
            if 'cache_write_tokens' in metrics:
                print(f"Cache write tokens: {metrics['cache_write_tokens']}")
            
            return metrics
        else:
            print("No metrics available")
            return None
            
    except Exception as e:
        print(f"Error during streaming: {e}")
        return None

# Run the example
metrics = asyncio.run(stream_with_metrics())

Let me know if you have any other questions.

I have pushed an update to our docs. See here.

Thank you. You have helped me a great deal.

I really appreciate you taking the time.