Talk with code kind of agent

I have the following case, a huge code base with lets say approximately 10k source files. I want to build an AI agent that answers “How does something work” or “Is something supported by the system” kind of questions.

Given the fact that I can’t just load the entire codebase into a model because of the context limitations,

Do I need a RAG?
Do I need a vector database or I don’t?
Do I have to store just the general code structure aka something like repository map similar to what https://ctags.io does?

How should I approach that and how an Agno flow would look like?

Thanks in advance

Hi @dishev
Thanks for reaching out and for using Agno! I’ve looped in the right engineers to help with your question. We usually respond within 48 hours, but if this is urgent, just let us know, and we’ll do our best to prioritize it.
Appreciate your patience—we’ll get back to you soon! :smile:

Hi @dishev , You can upload the codebase in any format into our knowledgeBase and perform RAG on it using our agents.
Your can refer our documentation for more details on the same.

Well is this the best approach? Like uploading the whole codebase contents into a vector database, what about code changes that happen everyday, I need to reindex the whole db probably?

Isn’t it an option to first make a map of the code without the whole content, then analyze the question to select which files will be needed from the codebase to answer the question, load them into the AI model context, and then run the question?

Hi @dishev This is a cool use case!
I like the idea of first creating a map and then use that to index your files, but I am curious to know how you will with certainty know that you are identifying the correct files when selecting them from the map. Like you said, code changes on the daily, so its very possible that a map created today will not be relevant tomorrow.