Dear Agno Support Team,
I am currently using the Agno library to search internal documents for Company X. My goal is to first retrieve 100 candidate documents from our combined PDF and CSV knowledge base, then group the results by file—so that each file appears only once (selecting the entry with the highest score)—and finally narrow down the results to the top 5 entries.
I have already investigated the issue by reviewing the Document structure and the CohereReranker implementation. In our earlier internal discussion, I confirmed that each Document instance should include a “reranking_score” attribute (populated by the reranker), yet when I run my code, the scores remain as None
.
Below is a simplified version of my code (with sensitive names and file paths anonymized):
from pathlib import Path
from agno.agent import Agent
from agno.knowledge.pdf import PDFKnowledgeBase, PDFReader
from agno.knowledge.csv import CSVKnowledgeBase
from agno.knowledge.combined import CombinedKnowledgeBase
from agno.vectordb.pgvector import PgVector, SearchType
from agno.models.openai import OpenAIChat
from agno.embedder.openai import OpenAIEmbedder
from agno.reranker.cohere import CohereReranker
DATABASE_URL = "DB_URL"
agent_description = "You are an agent that searches internal documents for Company X."
# PDF Knowledge Base: Load PDF files from a local directory.
pdf_kb = PDFKnowledgeBase(
path="/path/to/pdf_files",
vector_db=PgVector(
table_name="pdf_documents",
db_url=DATABASE_URL
),
reader=PDFReader(chunk=True),
)
# CSV Knowledge Base: Load a CSV file from a specified path.
csv_kb = CSVKnowledgeBase(
path=Path("/path/to/filtered_qast.csv"),
vector_db=PgVector(
table_name="csv_documents",
db_url=DATABASE_URL
),
)
# Combined Knowledge Base: Combine both PDF and CSV sources.
knowledge_base = CombinedKnowledgeBase(
sources=[
pdf_kb,
csv_kb,
],
vector_db=PgVector(
table_name="combined_documents",
db_url=DATABASE_URL,
search_type=SearchType.hybrid,
embedder=OpenAIEmbedder(id="text-embedding-3-small"),
reranker=CohereReranker(model="rerank-v3.5")
),
)
# Create the agent.
agent = Agent(
name="company_agent",
model=OpenAIChat(id="gpt-4o"),
description=agent_description,
knowledge=knowledge_base,
tools=[],
show_tool_calls=True,
markdown=True
)
# Optionally load the Knowledge Base if necessary.
# if agent.knowledge is not None:
# agent.knowledge.load()
while True:
query = input("Enter your query (type 'exit' to quit): ")
if query.strip().lower() == "exit":
break
# Retrieve 100 candidate documents (reranker is applied internally).
results = agent.knowledge.vector_db.search(query, limit=100)
# Print the file name and reranking score for each of the 100 results.
print("=== Initial Search Results (100 entries) ===")
for doc in results:
print(f"Name: {doc.name}, Score: {doc.reranking_score}")
To summarize, I have already:
- Verified the Document structure and noted the existence of the
reranking_score
attribute. - Reviewed the CohereReranker implementation, which should assign the relevance score from the reranker response to each Document.
- Confirmed that my search query returns 100 candidate documents, yet all documents show a
None
value for their score.
Could you please advise on the following:
- What additional configuration or modifications are necessary to ensure that valid reranking scores are populated?
- How can I implement grouping by file (selecting the highest scored entry per file) and then filter the results to the top 5 entries?
Thank you so much for your time in advance.