Does Agno currently support streaming input to the LLM (i.e., sending user input incrementally or as deltas before the full prompt is finalized), similar to realtime / session-based LLM APIs?
From what I understand, Agno assembles the complete prompt (system, role, tenant, history, user input) and then makes a single LLM call, with support for streaming output only.
Please confirm if this understanding is correct, or if there is any existing or planned support for true input streaming.
Hi @Aboobacker, thank you for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all queries one by one and will get back to you soon. If it’s urgent, please let us know. We appreciate your patience!
Realtime Audio Processing with Delta Inputs (Low-Latency Approach)
Instead of the traditional flow—audio → transcription → text → LLM → text → TTS,OpenAI’s realtime models support delta-based audio inputs, enabling end-to-end audio processing with significantly reduced latency.
Traditional Pipeline (Higher Latency)
Capture audio
Transcribe audio to text (ASR)
Send text to LLM
Generate text response
Convert text to audio (TTS)
Each step adds delay and context-switch overhead.
Realtime / Delta Audio Pipeline (Low Latency)
Stream audio chunks (delta frames) directly to the realtime model
Model processes audio incrementally
Model internally handles understanding, reasoning, and response generation
Output is streamed back as audio deltas (and optionally text)
No explicit transcription or TTS step is required unless you explicitly request text output.
What Are Delta Inputs?
Delta inputs are small, continuous audio frames sent over a persistent connection (WebSocket / WebRTC).
The model consumes these frames as they arrive, rather than waiting for the full audio.
This allows:
Early intent detection
Partial reasoning
Streaming responses
Key Benefits
Ultra-low latency (near human conversation speed) No intermediate transcription overhead Continuous, natural conversation flow Reduced infrastructure complexity