RAG multimodal (TEXT, IMAGE, TABLES)

sathishkumar.chin · July 1, 2025, 8:48pm

Hi Team
I am tryign to build multimodal RAG using ollama scout and ollama nomic text embedder .
Results are not good if too much images involved. any suggestion on which open source embedder to use to get better result ?

Regards
Sathish

Monali · July 2, 2025, 3:48am

Hey @sathishkumar.chin, thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.
If it’s urgent, please let us know. We appreciate your patience!

monalisha · July 2, 2025, 6:37am

Hi @sathishkumar.chin , you can try these out instead.

CLIP via SentenceTransformerEmbedder
Great for image–text alignment. You can use a model like sentence-transformers/clip-ViT-B-32 to embed both image captions and text in the same vector space.
BLIP-2 + Text Embedder (Two-stage)
Use BLIP-2 or GritCaption to generate detailed captions for images, then embed those captions using a strong text model like bge-small-en-v1.5 via FastEmbedEmbedder or SentenceTransformerEmbedder.
Thanks

sathishkumar.chin · September 7, 2025, 8:01am

Thank you ! I tried it and it works when I use image knowledge base or text knowledge base sepaartely but when I use combined knowledge base (image + text) both are 768 dimension

combined_kb = CombinedKnowledgeBase(

      *sources*=\[text_knowledge_base, image_knowledge_base\]

)

# print(f"Combined KBAgentKnowledge created: {combined_kb}")

# Create the Agent

return Agent(

name=“agentic_rag_agent”,

session_id=session_id, # Track session ID for persistent conversations

user_id=user_id,

model=model,

memory=memory,

storage=PostgresStorage(

table_name=“agentic_rag_agent_sessions”, db_url=db_url

), # Persist session data

knowledge=combined_kb, # Use combined knowledge base only

I get no results from RAG, Sometimes I get warning like no vector db defined when I use combined KB, can you pls kindly help me here ?

monalisha · September 10, 2025, 6:07am

Hi @sathishkumar.chin , You will have to defined vector_db along with sources.
You can refer to the following example for more details Combined Knowledge Base - Agno
Thanks

Topic		Replies	Views
Team calling 2 agents(RAG and web search) , but no context is being returned General agent , knowledge , rag	2	96	March 12, 2025
Search_knowledge_base Error with Ollama and Agent_rag General knowledge	8	133	April 30, 2025
Would be awesome if we can utilize cohere v4 native multimodal capabilities General knowledge	2	25	May 20, 2025
Knowledge agent - Help request General knowledge	7	157	April 8, 2025
Rag agent is not pulling information from the local knowledge base General agent , knowledge	9	230	April 8, 2025

RAG multimodal (TEXT, IMAGE, TABLES)

Related topics