Agno and LLM input streaming support

Does Agno currently support streaming input to the LLM (i.e., sending user input incrementally or as deltas before the full prompt is finalized), similar to realtime / session-based LLM APIs?

From what I understand, Agno assembles the complete prompt (system, role, tenant, history, user input) and then makes a single LLM call, with support for streaming output only.

Please confirm if this understanding is correct, or if there is any existing or planned support for true input streaming.

Hi @Aboobacker, thank you for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all queries one by one and will get back to you soon. If it’s urgent, please let us know. We appreciate your patience!

if possible just say whether its available or not so i can integrate in other way directly.

Hey @Aboobacker ! We don’t support streaming input to the LLM. Would love to understand your use case

Realtime Audio Processing with Delta Inputs (Low-Latency Approach)

Instead of the traditional flow—audio → transcription → text → LLM → text → TTS,OpenAI’s realtime models support delta-based audio inputs, enabling end-to-end audio processing with significantly reduced latency.

Traditional Pipeline (Higher Latency)

  1. Capture audio

  2. Transcribe audio to text (ASR)

  3. Send text to LLM

  4. Generate text response

  5. Convert text to audio (TTS)

Each step adds delay and context-switch overhead.

Realtime / Delta Audio Pipeline (Low Latency)

  1. Stream audio chunks (delta frames) directly to the realtime model

  2. Model processes audio incrementally

  3. Model internally handles understanding, reasoning, and response generation

  4. Output is streamed back as audio deltas (and optionally text)

No explicit transcription or TTS step is required unless you explicitly request text output.


What Are Delta Inputs?

  • Delta inputs are small, continuous audio frames sent over a persistent connection (WebSocket / WebRTC).

  • The model consumes these frames as they arrive, rather than waiting for the full audio.

  • This allows:

    • Early intent detection

    • Partial reasoning

    • Streaming responses


Key Benefits

Ultra-low latency (near human conversation speed)
No intermediate transcription overhead
Continuous, natural conversation flow
Reduced infrastructure complexity