I am using ollama QWEN model for a RAG Agent application. when I select the o3-mini model, the response from the agent is coming in a chunks format, but when I select the ollama QWEN model, ther response is coming as a complete string.
This behaviour is making the blank time very high especially when the response is large text.
Here is the code:
if last_message and last_message.get("role") == "user":
question = last_message["content"]
with st.chat_message("assistant"):
# Create container for tool calls
tool_calls_container = st.empty()
resp_container = st.empty()
with st.spinner("🤔 Thinking..."):
response = ""
try:
# Run the agent and stream the response
run_response = agentic_rag_agent.run(question, stream=True)
for _resp_chunk in run_response:
# Display tool calls if available
if _resp_chunk.tools and len(_resp_chunk.tools) > 0:
display_tool_calls(tool_calls_container, _resp_chunk.tools)
print(_resp_chunk, _resp_chunk.content)
# Display response
if _resp_chunk.content is not None:
response += _resp_chunk.content
resp_container.markdown(response)
add_message(
"assistant", response, agentic_rag_agent.run_response.tools
)
except Exception as e:
error_message = f"Sorry, I encountered an error: {str(e)}"
add_message("assistant", error_message)
st.error(error_message)
o3-mini response:
RunResponse(content='', content_type='str', thinking=None, event='RunResponse', messages=None, metrics=None, model='o3-mini', run_id='e8e64837-3e68-4231-b304-7581d6c9a97b', agent_id='86090364-1c65-4c49-926a-0ad725c0dad4', session_id='8baae117-b6a8-4e5f-9c1a-50d35c7b0bc5', workflow_id=None, tools=None, images=None, videos=None, audio=None, response_audio=None, extra_data=None, created_at=1744093287)
RunResponse(content='Hello', content_type='str', thinking=None, event='RunResponse', messages=None, metrics=None, model='o3-mini', run_id='e8e64837-3e68-4231-b304-7581d6c9a97b', agent_id='86090364-1c65-4c49-926a-0ad725c0dad4', session_id='8baae117-b6a8-4e5f-9c1a-50d35c7b0bc5', workflow_id=None, tools=None, images=None, videos=None, audio=None, response_audio=None, extra_data=None, created_at=1744093287) Hello
RunResponse(content='!', content_type='str', thinking=None, event='RunResponse', messages=None, metrics=None, model='o3-mini', run_id='e8e64837-3e68-4231-b304-7581d6c9a97b', agent_id='86090364-1c65-4c49-926a-0ad725c0dad4', session_id='8baae117-b6a8-4e5f-9c1a-50d35c7b0bc5', workflow_id=None, tools=None, images=None, videos=None, audio=None, response_audio=None, extra_data=None, created_at=1744093287) !
RunResponse(content=' Welcome', content_type='str', thinking=None, event='RunResponse', messages=None, metrics=None, model='o3-mini', run_id='e8e64837-3e68-4231-b304-7581d6c9a97b', agent_id='86090364-1c65-4c49-926a-0ad725c0dad4', session_id='8baae117-b6a8-4e5f-9c1a-50d35c7b0bc5', workflow_id=None, tools=None, images=None, videos=None, audio=None, response_audio=None, extra_data=None, created_at=1744093287) Welcome
Ollama QWEN Response:
RunResponse(content="Hello! I'm here to assist you with any IT-related issues. Whether it's a login problem, software installation, system crash, or anything else, I can help create a ticket and get it resolved. How can I assist you today? If this is your first time interacting with me, just let me know what you need help with! 😊", content_type='str', thinking=None, event='RunResponse', messages=None, metrics=None, model='qwen2.5:72b-instruct-q2_K-16k', run_id='8917c70c-cc1d-4f8d-9200-883ea30c5848', agent_id='192d3fcd-9dba-47ea-a704-e30d4c265f14', session_id='20673cd6-7b0b-4d50-838e-13b7156ef628', workflow_id=None, tools=None, images=None, videos=None, audio=None, response_audio=None, extra_data=None, created_at=1744093287) Hello! I'm here to assist you with any IT-related issues. Whether it's a login problem, software installation, system crash, or anything else, I can help create a ticket and get it resolved. How can I assist you today? If this is your first time interacting with me, just let me know what you need help with! 😊