Real-Time cascading Speech-to-Speech Chatbot: Whisper, Agno (Llama 3.1), Kokoro, and Silero VAD 🚀

Hey everyone! I just launched Vocal-Agent — an open-source, real-time cascading speech-to-speech chatbot!

It combines:

  • Whisper for speech recognition
  • Silero VAD for voice activity detection
  • Extensible agent framework powered by Agno
  • Kokoro ONNX for natural voice synthesis

Key features include:

  • Ultra low-latency audio processing
  • Seamless web integration (Google Search, Wikipedia, Arxiv)

It’s built for smooth, real-time conversations and is easy to extend for custom applications.

:backhand_index_pointing_right: Check it out on GitHub: GitHub - tarun7r/Vocal-Agent: A cutting-edge Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.

Would love to hear your thoughts, feedback, and ideas!

1 Like

@martian7r This is awesome — huge congrats on the launch! :tada:
I’ll definitely share this with the team.

If you’ve posted about it on social media, feel free to share the link — we’d be happy to boost it from our side too!

Also, you should totally submit Vocal-Agent to our Global Agent Hackathon — there’s a $25k prize pool up for grabs! :money_bag: We’ll be giving feedback along the way to help you level it up even more. Would love to see this in the mix!

And if you haven’t already, join the Agno Discord to connect with fellow builders working on similar projects.

Thanks so much! :tada: I’ll send over the link once it’s live.
The hackathon sounds awesome — I’ll definitely check it out! Just joined the Discord too. Appreciate all the support!

1 Like