If you solve memory, you are the best agent platform!

Hello everyone!

I like agno and I’m diving deep into everything, especially memory.

However, I’m spending days debugging why user memory is clearly exploding. There are some chats during the conversation session in which, apparently, the model starts calling memory tools again and again adding the same duplicate lines. The user experience is also worsened because everything is stopped for minutes until the memory is done. After that there are so many duplicate lines which are never cleaned (even if they are often identical or basically express the same exact thing with a preposition change).

This makes user memories not usable for a long term session (I just need a single session which could run for months).

I think it’d help massively all developers.
After a 20 message exchange chat history there are about 15k tokens injected in the system prompt and in the MemoryManager which is totally nonsense because only 2k are non-duplicates.

If I were you, I’d implement the following:

  1. Check if memory tools are not updating the memory > n times per chat (or maybe something more fancy checking similarity between just written sentences…it’s important that it’s effective and quite deterministic in pruning). You probably can do it all deterministically, no LLMs.
  2. Periodically, delete memories which are not used at all (memory_ids which are old and called way less than others…it means that memory was likely not that important because it’s not frequently used…over a long chat session, this becomes evident). So just keep a counter about how many times each memory id is called…the fewer times it’s called and the older, the more it should be pruned). You can do it all deterministically, no LLMs, so very effective and high quality.
  3. allow for pruning aggressivity (I want to be able to choose how often and how much I want to prune old memories). Again, this is deterministic depending on how much distribution we want to cut and how often.
  4. allow to select a max_cap parameter to tell me model to “summarize” the user memories as a whole (it might mean deleting some memories which look similar or don’t add much value) when the token cap is reached. As an example Anthropic would do the same for the pokemon challenge that was streamed on twitch.
  5. same thing for the Summarizer. You can’t just throw the full chat history into the summarizer, otherwise after 20 messages, it’s a whole lot of context and after 100 is definitely too much and mostly useless. I’m having a hard time using any of these features. Something very easy that you could do would be take the earliest session summary and the new messages and summarize them together (maybe you are already doing it, but from the logs I see you dumping the whole message history from scratch to get summarized).

I have a few other suggestions, so feel free to ask.

This is high priority for all devs. Memories should be more controllable (and most things you can do deterministically for cheap with simple steps as above, or using semantic search to check for identical memories, or even keywords, …).

Please, let me know if you intend to solve this in the next few weeks.

Thanks!
Davide

PS: Here is an extract (the whole is much longer and duplicated) from my user memory injected in the system prompt or MemoryManager:

      <memories_from_previous_interactions>                                                                                                                             
      - AI's purpose is to manage user's key information.                                                                                                               
      - User's name is Davide.                                                                                                                                          
      - User is concerned that transportation costs impact their experience and overall expenses.                                                                       
      - Today's date is July 20, 2025.                                                                                                                                  
      - Today's date is July 20, 2025.                                                                                                                                  
      - Today's date is July 20, 2025                                                                                                                                   
      - Today's date is July 20, 2025.                                                                                                                                  
      - Today's date is July 20, 2025.                                                                                                                                  
      - Today's date is July 20, 2025.                                                                                                                                  
      - Today's date is July 20, 2025.                                                                                                                                  
      - The invoice issue date is today and the due date is in 30 days.                                                                                                 
      - Today's date is July 20, 2025.                                                                                                                                  
      - The invoice issue dates are July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.                    
      - The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.                
      - The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.                
      - The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.                
      - The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.                
      - The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.                
      - The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.                
      - User is considering breaking the 30,000 CHF invoice to OECD into two parts, one for the onboarding project and one for MCP, as they are about to sign the       
      contract.                                                                                                                                                         
      - The invoice for OECD is a total of 30,000 CHF and consists of an onboarding project for deployment of a statistical search engine on the OECD website and MCP   
      for internal employees.                                                                                                                                           
      - The invoice issue dates will be July 20, 2025, October 20, 2025, and December 20, 2025, with a 30-day payment period for each.                                  
      - Invoice items are a combined project consisting of an onboarding project for deployment of a statistical search engine on the OECD website and MCP for internal 
      employees.                                                                                                                                                        
      - The invoice total amount is 30,000 CHF.                                                                                                                         
      - Invoice item is an onboarding project for deployment of a statistical search engine on the OECD website.                                                        
      - Invoice item is MCP for internal employees.                                                                                                                     
      - The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.                
      - The invoice total amount is 30,000 CHF.                                                                                                                         
      - The invoice issue dates will be July 20, 2025, October 20, 2025, December 20, 2025, and January 20, 2026, with a 30-day payment period for each.                
      - Invoice item is MCP for internal employees.                                                                                                                     
      - The invoice total amount is 30,000 CHF.                                                                                                                         
      - Invoice item is an onboarding project for deployment of a statistical search engine on the OECD website.                                                        
      - Invoice item is MCP for internal employees. 

Hi @Davide, thank you so much for the detailed write-up and for diving deep into Agno! Really appreciate you taking the time to not only surface this issue but also outline such thoughtful suggestions.

We’ve shared your feedback with the team. If you’re open to it, we’d love to keep you looped in as we iterate here — and feel free to share any more thoughts!

Thanks again for supporting Agno and helping us make it better

1 Like

@Davide thanks for the fantastic feedback!
We are currently busy revamping and gearing up for a major release 2.0.0. We are including your suggestions!

1 Like

@Monali @Dirk

Sure, I’m down to stay in the loop!

Just ping me for insights or if there is anything to test. I’m also open to share more explicitly what I’m building with my startup if it’s useful to dive into actual use cases.

I’ll leave here in the thread short additional suggestions that could help, as they come to my mind:

  • I’ve used Gemini-2.5-flash for tool use. I’ve used it for relatively short and straightforward user input messages, so it should be good enough as a model

  • Something important as well would be the possibility to add a memory timeout after n seconds and, maybe, even memory retries. Memory is very tightly built-in in agno but in a production use case I must be able to “give up” or retry something if that doesn’t work. I haven’t tried yet, but I understood that I can build custom timeout and retries for tools (which is great), but memory tools are more difficult to control even when coding additional workflow functions, so having a built in memory timeout/retry (with associated acknowledgment of such failure as an error) helps build a custom default flow to inform the customer about non-successful response (while maybe the system is trying to instanciate a retry). The purpose is: we don’t want to leave the end-user hanging there for minutes without any clue when we know it’s highly likely the memory update failed after the first 20 seconds.

Rooting for a greatly successful 2.0 version of agno!

I also didn’t like about Agno that the memory system was uncontrollable (i.e., which prompts are used, how notes are provided to the agent). And most importantly, all this requires separate LLM calls for each note…
At the beginning of the year, I tried to create a function for an active agent so it could manage notes independently. Many updates have passed since then, and I’m not sure if it will work out-of-the-box now. But back then it worked perfectly, and the agent on Gemini Flash 2.0 independently kept notes (naturally, it was instructed to call a function along with the usual message output).
Later, I slightly refactored this code to work with memory 2.0; it seemed to work as well.

I’m also curious about what Agno 2.0 will bring, but I doubt they will move away from a separate memory manager. I hope that at least the code below can be adapted to the updated system; such functionality is quite sufficient for basic bot notes in Discord.


...
agent.context = {"user_id": message.author.id} # agent.user_id didn't exist back then
agent.additional_context+= await get_user_memory_string(agent) # get notes and add them to the system prompt

run_response: Iterator[RunResponse] = await agent.arun(
message_content,
stream=True
)
...

async def get_user_memory_string(agent: Agent) -> str:
"""
Retrieves all notes about the user from the agent's memory, using agent.memory.get_user_memories,
and formats them as a numbered list for the system prompt.

Args:
    agent (Agent): The agent instance.

Returns:
    str: A string with a numbered list of notes, wrapped in <memories_about_current_user> tags,
         or an empty string if there are no notes.
"""
user_id = str(agent.context.get("user_id"))
if not user_id:
    print("User ID not found in agent context for getting memory string.")
    return ""

if not hasattr(agent, 'memory') or not hasattr(agent.memory, 'get_user_memories'):
    print("Memory component or get_user_memories method not found in agent.")
    return ""

try:
    # Get a list of UserMemory objects directly from memory
    existing_memories = agent.memory.get_user_memories(user_id) # get_user_memories is not async

    if not existing_memories:
        return "" # No notes
    else:
        output_lines = ["<memories_about_current_user>"]
        added_memories_count = 0
        for i, memory_obj in enumerate(existing_memories):
            memory_text = memory_obj.memory
            if memory_text:
                # Use an added_memories_count, not index i,
                # so that numbering is continuous if there are empty notes
                output_lines.append(f"{added_memories_count + 1}. {memory_text}")
                added_memories_count += 1

        if added_memories_count == 0: # If all notes turned out to be empty
            return ""
        else:
            output_lines.append("</memories_about_current_user>")
            output_lines.append("(to manage notes, use the `update_user_memory` function)")
            return "\n".join(output_lines) # Form the string each time

except Exception as e:
    print(f"Error retrieving memories for user {user_id} using agent.memory.get_user_memories: {e}")
    return "Error retrieving user notes."


@tool(stop_after_tool_call=True)
async def update_user_memory(agent: Agent, action: str, num: Optional[int] = None, data: Optional[str] = None) -> str:
"""
Manages long-term memory about the current user by creating, modifying, or deleting notes.
This helps remember information beyond the standard 30-message context window.
Use it to record significant user traits (communication style, preferences), key facts, or important interaction history.
Call this function when specific information about the user needs to be preserved for future dialogues.
This is your internal, confidential tool; do not mention it to the user.

Args:
    action (str): The action: "add", "replace", or "delete".
    num (int, optional): The sequential number of the entry (starting from 1). Required for "replace" and "delete". Not used for "add".
    data (str, optional): The note text. Required for "add" and "replace". Not used for "delete".

Returns:
    str: The result of the operation ("success" or an error message).
"""
# Check for user_id in agent context
user_id = str(agent.context.get("user_id"))
if not user_id:
    print("User ID not found in agent context.")
    return "error: User ID not found in agent context."

if not hasattr(agent, 'memory') or not hasattr(agent.memory, 'manager'):
    print("Memory manager not found in agent.")
    return "error: Memory manager configuration error."

memory_manager: MemoryManager = agent.memory.manager
memory_manager.user_id = user_id

action = action.lower() # Convert action to lowercase for reliability

if action == "add":
    if not data:
        return "error: 'data' is required for action 'add'."
    try:
        # Use the existing add_memory method
        result = memory_manager.add_memory(memory=data)
        print(f"Note added for user: {data}")
        return result # Should return "Memory added successfully" or an error
    except Exception as e:
        print(f"Error adding memory for user {user_id}: {e}")
        return f"error: Failed to add memory - {e}"

elif action in ["replace", "delete"]:
    if num is None or num < 1:
        return f"error: A valid 'num' (>= 1) is required for action '{action}'."
    if action == "replace" and not data:
        return "error: 'data' is required for action 'replace'."

    try:
        # 1. Get existing notes for num -> id mapping
        # get_existing_memories is not async
        existing_memories: Optional[List[MemoryRow]] = memory_manager.get_existing_memories()

        if not existing_memories or num > len(existing_memories):
            return f"error: Memory item number {num} not found for this user."

        # 2. Find the ID of the required entry (0-based indexing)
        target_memory: MemoryRow = existing_memories[num - 1]
        target_id: str = target_memory.id
        if not target_id:
            # This should not happen if the ID is always generated by the database
            print(f"Memory item {num} (index {num-1}) for user {user_id} has no ID.")
            return f"error: Internal error - memory item {num} has no ID."


        # 3. Perform the action
        if action == "replace":
            # Use the existing update_memory method
            result = memory_manager.update_memory(id=target_id, memory=data)
            print(f"User's note changed: {data}")
            return result # Should return "Memory updated successfully" or an error
        elif action == "delete":
            # Use the existing delete_memory method
            result = memory_manager.delete_memory(id=target_id)
            print(f"User's note deleted with ID: {target_id}")
            return result # Should return "Memory deleted successfully" or an error

    except Exception as e:
        print(f"Error performing '{action}' on memory item {num} for user {user_id}: {e}")
        return f"error: Failed to {action} memory item {num} - {e}"

else:
    return "error: Invalid action specified. Use 'add', 'replace', or 'delete'."

@Davide
So our current memory implementation is overloaded in that it also caters for managing runs. In our upcoming 2.0 release, this is resolved.

In your case, do you need a single session over time, or rather does session management not matter to your use-case? Because we are also going to remove the “sticky session” to avoid exactly these cases of memory (RAM) growth and to make it easier to manage (or not manage) sessions.

@Alex88
We do still plan to have the memory manager, but only to manage the user memories. The prompt and execution of this memory manager can totally be overwritten though. We allow you to either replace the system prompt, so you can instruct the memory manager to manage memories differently, OR to replace the entire memory manager with one of your own.

We are making it smaller and more purpose built as well, to make it much simpler to replace if needed.

@Dirk
For my use case, feel free to go in the direction in which it is easier to manage (or not manage) the session!
I’m flexible, I think solving such memory growth problems is the priority so use your best tools to achieve that, then I’ll figure out how to make it work around my use case if I need to customize/handle something!

I think primarily I need a well managed user memory.

Currently I’ve decided to use agno memory by customizing it a bit, but I’m waiting for 2.0 to write the perfect memory code for my use case (maybe it will come already perfect as of the new version, so I’ll have minimal work to do, which would be great!).

If my answer was not clear (or too general), feel free to ask more in detail!