I’m currently working on integrating Mistral Small 3.1 (running locally via Ollama) with Agno, and I’m using the MultiMCPTools functionality to interact with multiple MCP servers. My goal is to make tool calls to two different MCP servers within a single request to the model—specifically:
Call to the playwright server: Retrieve some data.
Call to the filesystem server: Write the retrieved data to the filesystem.
What I’ve Observed
The model can successfully make multiple, different tool calls to a single MCP server (e.g., multiple calls to the playwright server).
However, when I attempt to make a second tool call to a different MCP server (e.g., server-filesystem), the model doesn’t initiate the second call at all.
My Setup
Model: Mistral Small 3.1 (running via Ollama)
Tooling: Agno agent configured with MultiMCPTools
Servers: Two MCP servers (playwright and server-filesystem)
My Questions
Is this behavior expected given my current tooling and setup?
Should Mistral Small 3.1, when running via Ollama, be capable of making tool calls to multiple MCP servers within a single request?
Or is this functionality unsupported in this configuration?
If this behavior is not expected, what might be causing the issue?
Could it be related to how Agno interfaces with Ollama or how tools are presented/namespaced?
Are there any specific prompt engineering techniques or configuration changes I should try to enable cross-server tool calling?
Additional Context
The documentation for Mistral Small 3.1 on Ollama is quite sparse, and I haven’t been able to find clear guidance on whether cross-server tool calling is supported. I’ve reviewed Agno’s documentation (docs.agno.com) and examples like multiple_servers.py, but I’m still unsure if my expectations align with the model’s capabilities.
Any insights, suggestions, or confirmations would be greatly appreciated! Thanks in advance for your help.
Hey @voidnovember
thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.
If it’s urgent, please let us know. We appreciate your patience!
My problem is I can’t quite tell where things are breaking down…I’m fairly certain Agno is doing everything that it should be doing, and the problem is on the model side. I’ve experimented with basically all of the tool supported models available on ollama that will fit on my RTX4090. I expected mistral to do well but it just does not seem to make tool calls when I expect it to (or even seems to think that it’s done so, and it doesn’t?)
As another data point I used gemini-2.0-flash, and that worked.
I have increased the context window to 16k (and even 32k) setting num_ctx as a model parameter and rebuilding it with ollama. Just very difficult to concretely diagnose these problems, I’m curious about techniques you recommend for doing so. I’m an experienced computer engineer and agents seem to me to have tremendous potential, but the nondeterminism is difficult to contend with.
Will do, that definitely mirrors my experience. It seems Gemini 2.0 flash as well as the newly released gpt-4.1-nano handle the task without issue.
I’m pretty sure llama 3.1 worked, but not completely reliably. I think I tried qwq and it failed, but I’m not sure I was using an extended context. I’ll report back thank you.
Surprising because mistral seemed to be advertised as specifically supporting tooling.
Where did you end up? Which LLM did you finally settle on and/or are you still looking, trying, testing, … We’ve found that are issues with repeatability. You can get two (2) different results simply making a second run without any changes… and yes memory and context window can play a part… The ‘completeness’ of the prompts also plays a big role. If there is any an ambiguity on what something means and/or how to do something, you can wildly different results LLM to LLM and even with repetitive runs. One other surprising result we have found is how much influence the system and GPU that is running the LLM plays. It all can be very frustrating if you are trying to create some sort of production system/service, but extremely interesting if you are doing basic research. And as Agno brings out new releases, it too influences the results