Hello everyone!
I like agno and I’m diving deep into everything, especially memory.
However, I’m spending days debugging why user memory is clearly exploding. There are some chats during the conversation session in which, apparently, the model starts calling memory tools again and again adding the same duplicate lines. The user experience is also worsened because everything is stopped for minutes until the memory is done. After that there are so many duplicate lines which are never cleaned (even if they are often identical or basically express the same exact thing with a preposition change).
This makes user memories not usable for a long term session (I just need a single session which could run for months).
I think it’d help massively all developers.
After a 20 message exchange chat history there are about 15k tokens injected in the system prompt and in the MemoryManager which is totally nonsense because only 2k are non-duplicates.
If I were you, I’d implement the following:
- Check if memory tools are not updating the memory > n times per chat (or maybe something more fancy checking similarity between just written sentences…it’s important that it’s effective and quite deterministic in pruning). You probably can do it all deterministically, no LLMs.
- Periodically, delete memories which are not used at all (memory_ids which are old and called way less than others…it means that memory was likely not that important because it’s not frequently used…over a long chat session, this becomes evident). So just keep a counter about how many times each memory id is called…the fewer times it’s called and the older, the more it should be pruned). You can do it all deterministically, no LLMs, so very effective and high quality.
- allow for pruning aggressivity (I want to be able to choose how often and how much I want to prune old memories). Again, this is deterministic depending on how much distribution we want to cut and how often.
- allow to select a max_cap parameter to tell me model to “summarize” the user memories as a whole (it might mean deleting some memories which look similar or don’t add much value) when the token cap is reached. As an example Anthropic would do the same for the pokemon challenge that was streamed on twitch.
- same thing for the Summarizer. You can’t just throw the full chat history into the summarizer, otherwise after 20 messages, it’s a whole lot of context and after 100 is definitely too much and mostly useless. I’m having a hard time using any of these features. Something very easy that you could do would be take the earliest session summary and the new messages and summarize them together (maybe you are already doing it, but from the logs I see you dumping the whole message history from scratch to get summarized).
I have a few other suggestions, so feel free to ask.
This is high priority for all devs. Memories should be more controllable (and most things you can do deterministically for cheap with simple steps as above, or using semantic search to check for identical memories, or even keywords, …).
Please, let me know if you intend to solve this in the next few weeks.
Thanks!
Davide
PS: Here is an extract (the whole is much longer and duplicated) from my user memory injected in the system prompt or MemoryManager:
<memories_from_previous_interactions>
- AI's purpose is to manage user's key information.
- User's name is Davide.
- User is concerned that transportation costs impact their experience and overall expenses.
- Today's date is July 20, 2025.
- Today's date is July 20, 2025.
- Today's date is July 20, 2025
- Today's date is July 20, 2025.
- Today's date is July 20, 2025.
- Today's date is July 20, 2025.
- Today's date is July 20, 2025.
- The invoice issue date is today and the due date is in 30 days.
- Today's date is July 20, 2025.
- The invoice issue dates are July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.
- The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.
- The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.
- The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.
- The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.
- The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.
- The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.
- User is considering breaking the 30,000 CHF invoice to OECD into two parts, one for the onboarding project and one for MCP, as they are about to sign the
contract.
- The invoice for OECD is a total of 30,000 CHF and consists of an onboarding project for deployment of a statistical search engine on the OECD website and MCP
for internal employees.
- The invoice issue dates will be July 20, 2025, October 20, 2025, and December 20, 2025, with a 30-day payment period for each.
- Invoice items are a combined project consisting of an onboarding project for deployment of a statistical search engine on the OECD website and MCP for internal
employees.
- The invoice total amount is 30,000 CHF.
- Invoice item is an onboarding project for deployment of a statistical search engine on the OECD website.
- Invoice item is MCP for internal employees.
- The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.
- The invoice total amount is 30,000 CHF.
- The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.
- Invoice item is MCP for internal employees.
- The invoice total amount is 30,000 CHF.
- Invoice item is an onboarding project for deployment of a statistical search engine on the OECD website.
- Invoice item is MCP for internal employees.