pdf_knowledge_base = PDFKnowledgeBase(
path=“C:/prg”, #Use LanceDB as the vector database
vector_db=LanceDb(
table_name=“pdf_documents”,
uri=“data/lancedb”,
search_type=SearchType.vector,
embedder=OpenAIEmbedder(model=“text-embedding-3-small”),
),
reader=PDFReader(chunk=True),
)
Comment out after first run as the knowledge base is loaded
pdf_knowledge_base.load(recreate=False)
agent = Agent(
model=OpenAIChat(id=“gpt-4o”),
knowledge=pdf_knowledge_base,
tools=[FileTools()],
show_tool_calls=True,
add_context=True,
search_knowledge=True,
markdown=True,
)
agent.print_response(“List Net Amount Due from the PDF docs”, stream=True)
Hi @gsriniva
Thank you for reaching out and using Phidata! I’ve tagged the relevant engineers to assist you with your query. We aim to respond within 48 hours.
If this is urgent, please feel free to let us know, and we’ll do our best to prioritize it.
Thanks for your patience!
It works now, Thank you. However it brings only 5 records from the local knowledge base even though it has 1000 of records. Does the agent has any configuration parameters to increase this limit. ?
I noticed that you did not provide a system prompt using the instruction parameter when creating the Agent. Can you please add this and clearly indicate to the Agent that it should use it’s knowledge base when determining answers. Let me know how this goes!
I have added instruction parameter and given following instruction
instructions=[“Use pdf_knowledge_base”,
“use the local lancedb”,
“include all the records from the lancedb”,
“Always include sources in your response”],
It picks only 5 records (5 relevant documents) when I do the debug
DEBUG Getting 5 relevant documents for query: Total Amount Due by Year with Week Ending calculation
Any thoughts on how to include all the documents, records searching?