Is Cross-Server Tool Calling Expected Behavior with Mistral Small 3.1 in Ollama?

Hi everyone,

I’m currently working on integrating Mistral Small 3.1 (running locally via Ollama) with Agno, and I’m using the MultiMCPTools functionality to interact with multiple MCP servers. My goal is to make tool calls to two different MCP servers within a single request to the model—specifically:

  1. Call to the playwright server: Retrieve some data.
  2. Call to the filesystem server: Write the retrieved data to the filesystem.

What I’ve Observed

  • The model can successfully make multiple, different tool calls to a single MCP server (e.g., multiple calls to the playwright server).
  • However, when I attempt to make a second tool call to a different MCP server (e.g., server-filesystem), the model doesn’t initiate the second call at all.

My Setup

  • Model: Mistral Small 3.1 (running via Ollama)
  • Tooling: Agno agent configured with MultiMCPTools
  • Servers: Two MCP servers (playwright and server-filesystem)

My Questions

  1. Is this behavior expected given my current tooling and setup?

    • Should Mistral Small 3.1, when running via Ollama, be capable of making tool calls to multiple MCP servers within a single request?
    • Or is this functionality unsupported in this configuration?
  2. If this behavior is not expected, what might be causing the issue?

    • Could it be related to how Agno interfaces with Ollama or how tools are presented/namespaced?
  3. Are there any specific prompt engineering techniques or configuration changes I should try to enable cross-server tool calling?

Additional Context

The documentation for Mistral Small 3.1 on Ollama is quite sparse, and I haven’t been able to find clear guidance on whether cross-server tool calling is supported. I’ve reviewed Agno’s documentation (docs.agno.com) and examples like multiple_servers.py, but I’m still unsure if my expectations align with the model’s capabilities.

Any insights, suggestions, or confirmations would be greatly appreciated! Thanks in advance for your help.

Hey @voidnovember
thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.
If it’s urgent, please let us know. We appreciate your patience!

1 Like

My problem is I can’t quite tell where things are breaking down…I’m fairly certain Agno is doing everything that it should be doing, and the problem is on the model side. I’ve experimented with basically all of the tool supported models available on ollama that will fit on my RTX4090. I expected mistral to do well but it just does not seem to make tool calls when I expect it to (or even seems to think that it’s done so, and it doesn’t?)

As another data point I used gemini-2.0-flash, and that worked.

I have increased the context window to 16k (and even 32k) setting num_ctx as a model parameter and rebuilding it with ollama. Just very difficult to concretely diagnose these problems, I’m curious about techniques you recommend for doing so. I’m an experienced computer engineer and agents seem to me to have tremendous potential, but the nondeterminism is difficult to contend with.

Hi @voidnovember

So in our experience, we have seen that Mistral, especially Small 3.1, does not always perform well with tool calling.

Can you please try running your agent with LLama or QWen and let us know if this improves your experience?

1 Like

Will do, that definitely mirrors my experience. It seems Gemini 2.0 flash as well as the newly released gpt-4.1-nano handle the task without issue.

I’m pretty sure llama 3.1 worked, but not completely reliably. I think I tried qwq and it failed, but I’m not sure I was using an extended context. I’ll report back thank you.

Surprising because mistral seemed to be advertised as specifically supporting tooling.