Streaming is not supported for Ollama models

Ahmad · April 8, 2025, 7:48am

I am using ollama QWEN model for a RAG Agent application. when I select the o3-mini model, the response from the agent is coming in a chunks format, but when I select the ollama QWEN model, ther response is coming as a complete string.

This behaviour is making the blank time very high especially when the response is large text.

Here is the code:

    if last_message and last_message.get("role") == "user":
        question = last_message["content"]
        with st.chat_message("assistant"):
            # Create container for tool calls
            tool_calls_container = st.empty()
            resp_container = st.empty()
            with st.spinner("🤔 Thinking..."):
                response = ""
                try:
                    # Run the agent and stream the response
                    run_response = agentic_rag_agent.run(question, stream=True)
                    for _resp_chunk in run_response:
                        # Display tool calls if available
                        if _resp_chunk.tools and len(_resp_chunk.tools) > 0:
                            display_tool_calls(tool_calls_container, _resp_chunk.tools)

                        print(_resp_chunk, _resp_chunk.content)
                        # Display response
                        if _resp_chunk.content is not None:
                            response += _resp_chunk.content
                            resp_container.markdown(response)

                    add_message(
                        "assistant", response, agentic_rag_agent.run_response.tools
                    )
                except Exception as e:
                    error_message = f"Sorry, I encountered an error: {str(e)}"
                    add_message("assistant", error_message)
                    st.error(error_message)

o3-mini response:

RunResponse(content='', content_type='str', thinking=None, event='RunResponse', messages=None, metrics=None, model='o3-mini', run_id='e8e64837-3e68-4231-b304-7581d6c9a97b', agent_id='86090364-1c65-4c49-926a-0ad725c0dad4', session_id='8baae117-b6a8-4e5f-9c1a-50d35c7b0bc5', workflow_id=None, tools=None, images=None, videos=None, audio=None, response_audio=None, extra_data=None, created_at=1744093287) 
RunResponse(content='Hello', content_type='str', thinking=None, event='RunResponse', messages=None, metrics=None, model='o3-mini', run_id='e8e64837-3e68-4231-b304-7581d6c9a97b', agent_id='86090364-1c65-4c49-926a-0ad725c0dad4', session_id='8baae117-b6a8-4e5f-9c1a-50d35c7b0bc5', workflow_id=None, tools=None, images=None, videos=None, audio=None, response_audio=None, extra_data=None, created_at=1744093287) Hello
RunResponse(content='!', content_type='str', thinking=None, event='RunResponse', messages=None, metrics=None, model='o3-mini', run_id='e8e64837-3e68-4231-b304-7581d6c9a97b', agent_id='86090364-1c65-4c49-926a-0ad725c0dad4', session_id='8baae117-b6a8-4e5f-9c1a-50d35c7b0bc5', workflow_id=None, tools=None, images=None, videos=None, audio=None, response_audio=None, extra_data=None, created_at=1744093287) !
RunResponse(content=' Welcome', content_type='str', thinking=None, event='RunResponse', messages=None, metrics=None, model='o3-mini', run_id='e8e64837-3e68-4231-b304-7581d6c9a97b', agent_id='86090364-1c65-4c49-926a-0ad725c0dad4', session_id='8baae117-b6a8-4e5f-9c1a-50d35c7b0bc5', workflow_id=None, tools=None, images=None, videos=None, audio=None, response_audio=None, extra_data=None, created_at=1744093287)  Welcome

Ollama QWEN Response:

RunResponse(content="Hello! I'm here to assist you with any IT-related issues. Whether it's a login problem, software installation, system crash, or anything else, I can help create a ticket and get it resolved. How can I assist you today? If this is your first time interacting with me, just let me know what you need help with! 😊", content_type='str', thinking=None, event='RunResponse', messages=None, metrics=None, model='qwen2.5:72b-instruct-q2_K-16k', run_id='8917c70c-cc1d-4f8d-9200-883ea30c5848', agent_id='192d3fcd-9dba-47ea-a704-e30d4c265f14', session_id='20673cd6-7b0b-4d50-838e-13b7156ef628', workflow_id=None, tools=None, images=None, videos=None, audio=None, response_audio=None, extra_data=None, created_at=1744093287) Hello! I'm here to assist you with any IT-related issues. Whether it's a login problem, software installation, system crash, or anything else, I can help create a ticket and get it resolved. How can I assist you today? If this is your first time interacting with me, just let me know what you need help with! 😊

Monali · April 9, 2025, 5:36am

Hi @Ahmad

thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.
If it’s urgent, please let us know. We appreciate your patience!

WillemdeJongh1 · April 15, 2025, 8:04am

Hi @Ahmad

In our experience, performance with these models are not always very consistent. We can recommend you try using LMStudio and see if it makes things any better for you.
There are instructions in the LMStudio cookbook readme

cookbook/models/lmstudio/README.md

Topic		Replies	Views
Why can't my agent stream output? General agent	2	42	April 15, 2025
Intro agent example does not work with stream=True General agent , bug	5	102	January 20, 2025
Ollama: ConnectionError: Failed to connect to Ollama General agent , bug	5	259	April 22, 2025
RunResponse with output tokens: 1 when connecting to LangChainKnowledgeBase that exists in ElasticSearch vector store with ollama embeddings General agent , knowledge	2	49	March 3, 2025
Agents running through Ollama not able to call tools General agent , tool-call	5	67	June 16, 2025

Streaming is not supported for Ollama models

Related topics