Hello,
I’m experiencing an issue with the Agno setup that uses the PDFKnowledgeBase. When I run the agent, the response does not include any PDF document references, and the process fails with the following error:
Traceback (most recent call last):
File "/path/to/agent_with_pdf.py", line 66, in <module>
show_agent_response(question, agent)
File "/path/to/agent_with_pdf.py", line 28, in show_agent_response
for message_references in response.extra_data.references:
AttributeError: 'NoneType' object has no attribute 'references'
Below is the relevant code snippet (sensitive details have been masked for privacy):
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.embedder.openai import OpenAIEmbedder
from agno.tools.duckduckgo import DuckDuckGoTools
from agno.knowledge.pdf import PDFKnowledgeBase
from agno.vectordb.lancedb import LanceDb, SearchType
def create_agent(model: OpenAIChat, vector_db: LanceDb) -> Agent:
agent = Agent(
model=model,
description=agent_description,
instructions=instructions,
knowledge=PDFKnowledgeBase(
path="[MASKED_PATH]",
vector_db=vector_db,
),
tools=[],
show_tool_calls=True,
add_references=True,
markdown=True
)
return agent
def show_agent_response(question: str, agent: Agent):
response = agent.run(question)
print("--------------------------------")
print(response.content)
for message_references in response.extra_data.references:
for ref in message_references.references:
print(f"name: {ref['name']}, meta_data: {ref['meta_data']}")
def show_docs(question: str, vector_db: LanceDb):
docs = vector_db.search(question)
print("--------------------------------")
for doc in docs:
print(f"name: {doc.name}, meta_data: {doc.meta_data}")
if __name__ == "__main__":
# Specify a directory containing PDFs or a specific PDF file.
# If left empty, all PDFs in the current directory will be considered.
path = "[MASKED_PATH]"
agent_description = "You are a customer support expert!"
instructions = [
"Search your knowledge base",
]
table_name = "[MASKED_TABLE_NAME]"
questions = [
"Question 1",
"Question 2",
"Question 3",
]
vector_db = LanceDb(
uri="tmp/lancedb",
table_name=table_name,
search_type=SearchType.hybrid,
embedder=OpenAIEmbedder(id="text-embedding-3-small"),
)
agent = create_agent(OpenAIChat(id="gpt-4o"), vector_db)
for question in questions:
show_agent_response(question, agent)
print(show_docs("Some keyword", vector_db))
Please note that both the documents and the questions are entirely in Japanese. It appears that response.extra_data
is None
, causing the iteration over .references
to fail. I suspect this may be due to a misconfiguration in the PDFKnowledgeBase setup or an issue with how the Japanese PDFs are being indexed.
Has anyone encountered a similar problem or could offer guidance on resolving this issue? Any insight or suggestions would be greatly appreciated.
Thank you!