I’m trying to use Sentence Transformers embeddings (in fact this is my first attempt to do a RAG with Agno.) But it seems in the LanceDb, the embedding vectors are all zeros. I’ll include the codes below. Hopefully they are reproducible.
from agno.knowledge.knowledge import Knowledge
from agno.vectordb.lancedb import LanceDb
from agno.knowledge.embedder.sentence_transformer import SentenceTransformerEmbedder
from agno.knowledge.reader.pdf_reader import PDFReader
# Create an vector db
vector_db = LanceDb(
uri="tmp/lancedb",
table_name="pdf_docs2",
embedder=SentenceTransformerEmbedder()
)
# Use page-based chunking for your PDF
pdf_reader = PDFReader(
name="Page Chunking Reader",
chunk_by="page", # chunks by page
)
# Define Knowledge
knowledge = Knowledge(
vector_db=vector_db,
)
# Add content
knowledge.add_content(
url="https://agno-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf",
reader=pdf_reader,
)
# Peek at the vector_db
pd = vector_db.table.to_pandas()
print(pd)
print(pd.loc[0]['vector'])
And I get something like this:
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
...
Is it a bug? Or am I missing something?