Retrieve Reranking Score

Kenniferm · March 11, 2025, 9:03am

Dear Agno Support Team,

I am currently using the Agno library to search internal documents for Company X. My goal is to first retrieve 100 candidate documents from our combined PDF and CSV knowledge base, then group the results by file—so that each file appears only once (selecting the entry with the highest score)—and finally narrow down the results to the top 5 entries.

I have already investigated the issue by reviewing the Document structure and the CohereReranker implementation. In our earlier internal discussion, I confirmed that each Document instance should include a “reranking_score” attribute (populated by the reranker), yet when I run my code, the scores remain as None.

Below is a simplified version of my code (with sensitive names and file paths anonymized):

from pathlib import Path
from agno.agent import Agent
from agno.knowledge.pdf import PDFKnowledgeBase, PDFReader
from agno.knowledge.csv import CSVKnowledgeBase
from agno.knowledge.combined import CombinedKnowledgeBase
from agno.vectordb.pgvector import PgVector, SearchType
from agno.models.openai import OpenAIChat
from agno.embedder.openai import OpenAIEmbedder
from agno.reranker.cohere import CohereReranker

DATABASE_URL = "DB_URL"

agent_description = "You are an agent that searches internal documents for Company X."

# PDF Knowledge Base: Load PDF files from a local directory.
pdf_kb = PDFKnowledgeBase(
    path="/path/to/pdf_files",
    vector_db=PgVector(
        table_name="pdf_documents",
        db_url=DATABASE_URL
    ),
    reader=PDFReader(chunk=True),
)

# CSV Knowledge Base: Load a CSV file from a specified path.
csv_kb = CSVKnowledgeBase(
    path=Path("/path/to/filtered_qast.csv"),
    vector_db=PgVector(
        table_name="csv_documents",
        db_url=DATABASE_URL
    ),
)

# Combined Knowledge Base: Combine both PDF and CSV sources.
knowledge_base = CombinedKnowledgeBase(
    sources=[
        pdf_kb,
        csv_kb,
    ],
    vector_db=PgVector(
        table_name="combined_documents",
        db_url=DATABASE_URL,
        search_type=SearchType.hybrid,
        embedder=OpenAIEmbedder(id="text-embedding-3-small"),
        reranker=CohereReranker(model="rerank-v3.5")
    ),
)

# Create the agent.
agent = Agent(
    name="company_agent",
    model=OpenAIChat(id="gpt-4o"),
    description=agent_description,
    knowledge=knowledge_base,
    tools=[],
    show_tool_calls=True,
    markdown=True
)

# Optionally load the Knowledge Base if necessary.
# if agent.knowledge is not None:
#     agent.knowledge.load()

while True:
    query = input("Enter your query (type 'exit' to quit): ")
    if query.strip().lower() == "exit":
        break

    # Retrieve 100 candidate documents (reranker is applied internally).
    results = agent.knowledge.vector_db.search(query, limit=100)

    # Print the file name and reranking score for each of the 100 results.
    print("=== Initial Search Results (100 entries) ===")
    for doc in results:
        print(f"Name: {doc.name}, Score: {doc.reranking_score}")

To summarize, I have already:

Verified the Document structure and noted the existence of the reranking_score attribute.
Reviewed the CohereReranker implementation, which should assign the relevance score from the reranker response to each Document.
Confirmed that my search query returns 100 candidate documents, yet all documents show a None value for their score.

Could you please advise on the following:

What additional configuration or modifications are necessary to ensure that valid reranking scores are populated?
How can I implement grouping by file (selecting the highest scored entry per file) and then filter the results to the top 5 entries?

Thank you so much for your time in advance.

Monali · March 12, 2025, 6:49am

Hi @Kenniferm
Thanks for reaching out and for using Agno! I’ve looped in the right engineers to help with your question. We usually respond within 48 hours, but if this is urgent, just let us know, and we’ll do our best to prioritize it.
Appreciate your patience—we’ll get back to you soon!

Kenniferm · March 12, 2025, 6:49am

Hello Monali, thanks α lot for reaching out. This is αn urgent matter and I’d appreciate it if you could priortize this.

Kenniferm · March 14, 2025, 3:23am

@Monali
Hi, why am I not getting a support in this matter? Please assist as soon as possible. Thanks.

windrunnner · March 14, 2025, 4:38am

hi @Kenniferm
I am also a developer, not an official staff member. I took a look at the source code of agno. It seems that reranker only works in vector search, and the hybrid search you use does not use reranker. But if you need a score, I think you can try to modify the source code and assign hybrid_score to the reranking_score of Document.
Or try to use vector search instead of hybrid search, maybe they will optimize this later.

Kenniferm · March 14, 2025, 4:50am

Hi @windrunnner ,

Thanks so much for the clarification—I really appreciate your insights!

Just to confirm, does this mean that to utilize the reranking feature, I must change my configuration from search_type=SearchType.hybrid to search_type=SearchType.vector? Also, could you point me to the specific documentation or source code you referenced that indicates reranking only works with vector search and not hybrid search?

I’d like to fully understand the reasoning behind this limitation.

Thanks again for your support!

windrunnner · March 14, 2025, 5:39am

You can refer to agno/vectordb/pgvector/pgvector.py in the source code
def vector_search calls self.reranker in line 513.
def hybrid_search does not use self.reranker.

Kenniferm · March 14, 2025, 7:57am

@windrunnner
Thank you very much for clarifying and pointing me to the exact part of the source code—this was very helpful!

If possible, may I ask one additional question?

Currently, I’m using the Agentic Chunking feature with ada-002 to vectorize and store CSV content into Pgvector. However, I noticed that chunks are being split at unexpected places. What I would ideally like is for each CSV row to become exactly one chunk, ensuring one row per chunk.

Could you please advise how I can achieve this?

Thanks again—I truly appreciate your continued support!

windrunnner · March 14, 2025, 8:13am

Sorry, I don’t have any experience with this.

Kenniferm · March 14, 2025, 8:37am

No problem, thanks a lot for your assist in this.

Topic		Replies	Views
Request for Advice on Improving Accuracy of Multi-source RAG Implementation General agent , knowledge , rag	8	102	April 8, 2025
Issue with PDF Knowledge Base for Japanese Documents: 'NoneType' object has no attribute 'references' Error General bug	2	21	March 5, 2025
Managing Knowledge Entries in Vector DB (Add/Delete Files) - URGENT❗ General agent , knowledge	4	45	June 4, 2025
Error searching for documents: list indices must be integers or slices, not str General vectordb , knowledge	5	63	February 20, 2025
Sharepoint or Gmail drive General knowledge	2	29	March 27, 2025

Retrieve Reranking Score

Related topics