Issue with PDF Knowledge Base for Japanese Documents: 'NoneType' object has no attribute 'references' Error

Hello,

I’m experiencing an issue with the Agno setup that uses the PDFKnowledgeBase. When I run the agent, the response does not include any PDF document references, and the process fails with the following error:

Traceback (most recent call last):
  File "/path/to/agent_with_pdf.py", line 66, in <module>
    show_agent_response(question, agent)
  File "/path/to/agent_with_pdf.py", line 28, in show_agent_response
    for message_references in response.extra_data.references:
AttributeError: 'NoneType' object has no attribute 'references'

Below is the relevant code snippet (sensitive details have been masked for privacy):

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.embedder.openai import OpenAIEmbedder
from agno.tools.duckduckgo import DuckDuckGoTools
from agno.knowledge.pdf import PDFKnowledgeBase
from agno.vectordb.lancedb import LanceDb, SearchType

def create_agent(model: OpenAIChat, vector_db: LanceDb) -> Agent:
    agent = Agent(
        model=model,
        description=agent_description,
        instructions=instructions,
        knowledge=PDFKnowledgeBase(
            path="[MASKED_PATH]",
            vector_db=vector_db,
        ),
        tools=[],
        show_tool_calls=True,
        add_references=True,
        markdown=True
    )
    return agent

def show_agent_response(question: str, agent: Agent):
    response = agent.run(question)
    print("--------------------------------")
    print(response.content)
    for message_references in response.extra_data.references:
        for ref in message_references.references:
            print(f"name: {ref['name']}, meta_data: {ref['meta_data']}")

def show_docs(question: str, vector_db: LanceDb):
    docs = vector_db.search(question)
    print("--------------------------------")
    for doc in docs:
        print(f"name: {doc.name}, meta_data: {doc.meta_data}")

if __name__ == "__main__":
    # Specify a directory containing PDFs or a specific PDF file.
    # If left empty, all PDFs in the current directory will be considered.
    path = "[MASKED_PATH]"
    agent_description = "You are a customer support expert!"
    instructions = [
        "Search your knowledge base",
    ]
    table_name = "[MASKED_TABLE_NAME]"
    questions = [
        "Question 1",
        "Question 2",
        "Question 3",
    ]
    vector_db = LanceDb(
        uri="tmp/lancedb",
        table_name=table_name,
        search_type=SearchType.hybrid,
        embedder=OpenAIEmbedder(id="text-embedding-3-small"),
    )
    agent = create_agent(OpenAIChat(id="gpt-4o"), vector_db)
    for question in questions:
        show_agent_response(question, agent)

    print(show_docs("Some keyword", vector_db))

Please note that both the documents and the questions are entirely in Japanese. It appears that response.extra_data is None, causing the iteration over .references to fail. I suspect this may be due to a misconfiguration in the PDFKnowledgeBase setup or an issue with how the Japanese PDFs are being indexed.

Has anyone encountered a similar problem or could offer guidance on resolving this issue? Any insight or suggestions would be greatly appreciated.

Thank you!

Hi @Kenniferm
Thanks for reaching out and for using Agno! I’ve looped in the right engineers to help with your question. We usually respond within 48 hours, but if this is urgent, just let us know, and we’ll do our best to prioritize it.
Appreciate your patience—we’ll get back to you soon! :smile:

Hey @Kenniferm ,
This is coming from this chunk of the code:

for message_references in response.extra_data.references:
        for ref in message_references.references:
            print(f"name: {ref['name']}, meta_data: {ref['meta_data']}")

This is because it’s failing to create references and is returning None. Can you help me debug this for you by :

  • checking the path of the data is correct
  • setting debug_mode=True and checking if documents got loaded to the vector db.
  • Print response and check if references exist

If you have the vector db loaded- it shouldn’t be the case