Audio generation using gemini tts modals

Prajwal · August 20, 2025, 2:48pm

i tried but im getting errors also i cant find any example for googles tts modals. in cookbook or in agno docs.

import base64

import requests
from agno.agent import Agent
from agno.media import Audio
from agno.models.google import Gemini
from agno.utils.audio import write_audio_to_file
from dotenv import load_dotenv

load_dotenv()

# Fetch the audio file and convert it to a base64 encoded string
url = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"
response = requests.get(url)
response.raise_for_status()
wav_data = response.content

agent = Agent(
    model=Gemini(
        id="gemini-2.5-flash-preview-tts",
        speech_config="Kore"
    ),
    markdown=True,
)

agent.run("What's in these recording?", audio=[Audio(content=wav_data, format="wav")])

if agent.run_response.response_audio is not None:
    write_audio_to_file(
        audio=agent.run_response.response_audio.content, filename="tmp/result.wav"
    )

i also tried this “response_modalities=[“text”, “audio”],“ parameter but for gemini it only has text and image i think. and “speech_config={“voice”: “Kore”, “format”: “wav”},“ “audio={“voice”: “Kore”, “format”: “wav”},“ this parameters also wont work.

Monali · August 21, 2025, 6:05am

Hi @Prajwal, thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.If it’s urgent, please let us know. We appreciate your patience!

mustafa · August 22, 2025, 4:12pm

Hey @Prajwal! We are adding support for TTS models soon. Will send updates here

mustafa · September 10, 2025, 8:26pm

Hey @Prajwal ! We added support for TTS and it’ll be released in our next release

Topic		Replies	Views
Real-Time cascading Speech-to-Speech Chatbot: Whisper, Agno (Llama 3.1), Kokoro, and Silero VAD 🚀 General agent , tool-call	3	276	May 1, 2025
My exploring about agno Ai General agent	1	73	June 23, 2025
Gemini image generation issue General agent , teams , feature-requests , bug	5	503	August 21, 2025
Text generation model independent of Openai General agent , knowledge , rag , tool-call , bug	4	82	April 8, 2025
Agno and LLM input streaming support General agent	4	696	December 18, 2025

Audio generation using gemini tts modals

Related topics