Techniques to Speed Up Ingesting 3 Million Records

Hi agno Team,

I’m in the process of loading a JSON dataset of roughly 3 million documents into our PostgreSQL/vector store via JSONKnowledgeBase and PgVector. At this scale, the ingestion is taking much longer than is practical.

If you have any methods, best practices, or configuration tweaks that can significantly accelerate this bulk import—whether through parallelization, specialized bulk‐load paths, client settings, or other optimizations—I’d greatly appreciate any pointers or examples you can share.

Thank you for your help!

@Monali
Hello Monali, would it be possible for you to transfer this ticket to your engineering team? Thanks in advance for your assistance.

Hey @Kenniferm thanks for reaching out and supporting Agno. I’ve shared this with the team, we’re working through all requests one by one and will get back to you soon.
If it’s urgent, please let us know. We appreciate your patience!

Hi @Kenniferm ! The simplest speed up would be to use async ingestion. Here is a small snippet for example:

    pg = PgVector(
        table_name="big_corpus",
        db_url="",
        search_type=SearchType.vector,
        vector_index=None,      
    )

    kb = JSONKnowledgeBase(
        path="/data/large_corpus/",   
        vector_db=pg,
        num_documents=5,           
    )

    await kb.aload(
        recreate=True,      
        skip_existing=True,  
    )

if __name__ == "__main__":
    asyncio.run(main())

Hope this helps!

@mustafa @Monali

Hi,

Frankly, it feels like you’re just running ChatGPT on autopilot and tossing out examples without ever verifying they work. It’s painfully obvious there is no async_read method in your JSONReader, so your “async ingestion” snippet is fundamentally broken. Could you point out a clear solution, please?

Hi @Kenniferm ! I verified in the codebase and well as with the team we do have “async ingestion”. You can refer to this example: agno/cookbook/agent_concepts/knowledge/json_kb_async.py at main · agno-agi/agno · GitHub

Another general tip: Try updating to latest version agno and run this.

You could combine this with the techniques I mentioned above to speed the ingestion even more!!