Is Cross-Server Tool Calling Expected Behavior with Mistral Small 3.1 in Ollama?

voidnovember · April 13, 2025, 2:36pm

Hi everyone,

I’m currently working on integrating Mistral Small 3.1 (running locally via Ollama) with Agno, and I’m using the MultiMCPTools functionality to interact with multiple MCP servers. My goal is to make tool calls to two different MCP servers within a single request to the model—specifically:

Call to the playwright server: Retrieve some data.
Call to the filesystem server: Write the retrieved data to the filesystem.

What I’ve Observed

The model can successfully make multiple, different tool calls to a single MCP server (e.g., multiple calls to the playwright server).
However, when I attempt to make a second tool call to a different MCP server (e.g., server-filesystem), the model doesn’t initiate the second call at all.

My Setup

Model: Mistral Small 3.1 (running via Ollama)
Tooling: Agno agent configured with MultiMCPTools
Servers: Two MCP servers (playwright and server-filesystem)

My Questions

Is this behavior expected given my current tooling and setup?
- Should Mistral Small 3.1, when running via Ollama, be capable of making tool calls to multiple MCP servers within a single request?
- Or is this functionality unsupported in this configuration?
If this behavior is not expected, what might be causing the issue?
- Could it be related to how Agno interfaces with Ollama or how tools are presented/namespaced?
Are there any specific prompt engineering techniques or configuration changes I should try to enable cross-server tool calling?

Additional Context

The documentation for Mistral Small 3.1 on Ollama is quite sparse, and I haven’t been able to find clear guidance on whether cross-server tool calling is supported. I’ve reviewed Agno’s documentation (docs.agno.com) and examples like multiple_servers.py, but I’m still unsure if my expectations align with the model’s capabilities.

Any insights, suggestions, or confirmations would be greatly appreciated! Thanks in advance for your help.

Monali · April 14, 2025, 5:16am

Hey @voidnovember
thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.
If it’s urgent, please let us know. We appreciate your patience!

voidnovember · April 14, 2025, 10:45pm

My problem is I can’t quite tell where things are breaking down…I’m fairly certain Agno is doing everything that it should be doing, and the problem is on the model side. I’ve experimented with basically all of the tool supported models available on ollama that will fit on my RTX4090. I expected mistral to do well but it just does not seem to make tool calls when I expect it to (or even seems to think that it’s done so, and it doesn’t?)

As another data point I used gemini-2.0-flash, and that worked.

I have increased the context window to 16k (and even 32k) setting num_ctx as a model parameter and rebuilding it with ollama. Just very difficult to concretely diagnose these problems, I’m curious about techniques you recommend for doing so. I’m an experienced computer engineer and agents seem to me to have tremendous potential, but the nondeterminism is difficult to contend with.

WillemdeJongh1 · April 15, 2025, 8:48am

Hi @voidnovember

So in our experience, we have seen that Mistral, especially Small 3.1, does not always perform well with tool calling.

Can you please try running your agent with LLama or QWen and let us know if this improves your experience?

voidnovember · April 15, 2025, 12:55pm

Will do, that definitely mirrors my experience. It seems Gemini 2.0 flash as well as the newly released gpt-4.1-nano handle the task without issue.

I’m pretty sure llama 3.1 worked, but not completely reliably. I think I tried qwq and it failed, but I’m not sure I was using an extended context. I’ll report back thank you.

Surprising because mistral seemed to be advertised as specifically supporting tooling.

bills · May 9, 2025, 9:43pm

Where did you end up? Which LLM did you finally settle on and/or are you still looking, trying, testing, … We’ve found that are issues with repeatability. You can get two (2) different results simply making a second run without any changes… and yes memory and context window can play a part… The ‘completeness’ of the prompts also plays a big role. If there is any an ambiguity on what something means and/or how to do something, you can wildly different results LLM to LLM and even with repetitive runs. One other surprising result we have found is how much influence the system and GPU that is running the LLM plays. It all can be very frustrating if you are trying to create some sort of production system/service, but extremely interesting if you are doing basic research. And as Agno brings out new releases, it too influences the results

Topic		Replies	Views
Agents running through Ollama not able to call tools General agent , tool-call	5	100	June 16, 2025
MCP Server Bridge Code General agent , tool-call	5	136	April 28, 2025
MCP tool call failure while streaming General tool-call	7	74	August 3, 2025
Connect the Agent with the Playground General agent , tool-call	2	56	June 16, 2025
Agent Stops Calling Tool After Adding response_model General agent , tool-call	1	54	July 30, 2025

Is Cross-Server Tool Calling Expected Behavior with Mistral Small 3.1 in Ollama?

What I’ve Observed

My Setup

My Questions

Additional Context

Related topics