Your AI Assistant Keeps Forgetting: A Full Breakdown of AI Agent Memory

📖 This is Part 2 of the “Dissecting AI Agents” series. ← Previous: An AI Agent Is Not AI

In the old movie 50 First Dates, the female lead wakes up every morning and forgets everything that happened the day before. The male lead has to introduce himself and win her over again every day.

An AI Agent is in a surprisingly similar situation, and it is even more extreme: a language model restarts every conversation, far more often than once a day.

Why Does the Model Remember Nothing?

Back to the basics: a language model continues text. You pass in some text, it continues it, and then the call ends. Next time you pass text in, it has no idea what happened last time.

No memory, no buffer, just a fresh start every time.

How Does an AI Agent Create the Illusion of “Remembering”?

The trick is almost stupidly simple: feed everything from previous conversations back into the model every time.

When you chat with an AI Agent in a messaging app, it looks like a normal back-and-forth conversation. Behind the scenes, the message the Agent sends to the model looks more like this:

[System Prompt（身份、規則、工具說明）]
+ [之前所有的對話記錄]
+ [使用者剛剛說的話]

This giant chunk of text is sent to the language model. After reading it, the model continues the text, so the result naturally looks like it “remembers what we talked about.”

Does it really remember? No. It just read the records again, like flipping through a diary.

System Prompt: The AI’s ID Card

So how does the AI know its own name or personality?

The answer is the System Prompt. This is a long piece of text the Agent puts at the front every time it calls the language model. It includes:

The AI’s name, identity, and personality settings
Which tools it can use, and how to use them
Behavioral rules and limits
The owner’s preferences and habits
Where memory files are stored

This System Prompt can be very long: thousands or even tens of thousands of tokens. That is also why AI Agents can be expensive. Even if you only ask “How are you?”, the text sent to the model behind the scenes may already be thousands of words.

The Context Window Problem

This “repeat everything every time” strategy has a fatal flaw: text length has a limit.

The input length of a language model is limited by something called a Context Window. Better models today can handle around a million tokens, which sounds huge, but if an Agent runs 24/7 and conversations keep accumulating, that window fills up fast.

And problems do not start only when the window is full. Research shows that the longer the input, the worse the model performs. It is like asking someone about page three after making them read an entire book; the answer quality will be worse than if they only read the first five pages.

Compressing Memory: The AI Version of Note-Taking

To extend an Agent’s useful lifespan, frameworks usually include Context Compression.

The method is intuitive: when the conversation history is close to the window limit, the Agent sends older conversation chunks to the language model and asks it to summarize them. The summary replaces the original full records.

This compression can happen repeatedly. The first summary gets compressed into the second summary, like nesting dolls. Every compression loses more early detail.

That explains why an AI Agent remembers the last day or two clearly, because the original records are still there, but older events become fuzzy after several rounds of compression.

Some approaches are even more aggressive: truncating the middle of tool output and keeping only the beginning and end, or replacing the whole tool output with “there used to be content here.”

Persistent Memory: The Habit of Keeping a Diary

Smarter Agent frameworks train the AI to keep a diary.

The System Prompt includes an instruction like: “Your memory is wiped every time. To make sure important things are not forgotten, proactively write them into files.”

So during a conversation, the model decides what is worth keeping, calls a file-writing tool, and saves the memory as a .md file. When the Agent restarts next time, the System Prompt loads those files, and the model can “recall” what happened before.

Which things should become long-term memory and which should only be short-term diary entries? The model decides. For example, if the user tells it a birthday, a good model will proactively write that to a memory file even if the user did not say “remember this."

"Remembered Nothing”: The Most Common Trap

Here is a very practical reminder:

If the AI only says “I remembered it” but does not actually write to a file, it remembered nothing.

Weaker models are especially prone to this. You tell it “remember this,” it replies “No problem, I will remember it firmly,” and then it never calls any write tool. Next conversation, it forgets everything.

How do you verify it? Check whether it really opened a tool and edited a memory file. If not, it was empty talk.

Memory Retrieval: Can It Recall Things at the Right Moment?

Storing memory is only the first step. The key is whether the Agent can find it when needed.

Most Agent frameworks use RAG (Retrieval-Augmented Generation). They split memory into chunks, then when the model needs to recall something, they use keywords or semantic similarity to search for the most relevant fragments and put those fragments into the Context.

This works well in the ideal case, but search quality depends on chunking and matching algorithms. With basic settings, the last few days may be accurate, while older memories start to get mixed up or missed.

Once you understand the memory mechanism, you can judge more accurately what the AI really remembers and what it is inventing.

Memory compression is like nesting dolls

Penchan’s Experience

I run multiple agents on OpenClaw: Opus, Sonnet, and Codex. In practice, memory is definitely the painful part. Handling memory well without letting the agent forget is hard. The trick is keeping the core files clean: the simpler the files, the higher the chance the agent remembers what matters. I do not connect a vector database; everything uses a .md file system. The main reason is that my memory volume is not huge, and I need to edit it manually at any time. The RAG stack would be friction for this workflow.

FAQ

Q: Do language models really have no memory?

Yes. Every time a language model is called, it only sees the text passed in at that moment. It has no memory of previous conversations. An Agent has to stuff conversation history back in every time so the model can “pretend” to remember.

Q: What is a Context Window?

A Context Window is the maximum amount of text a language model can process. If input plus output exceeds that length, things break. Even million-token windows can become insufficient when an Agent runs for a long time.

Q: When AI says “I remembered it,” did it really remember?

Not necessarily. If the AI only replies “I remembered it” but does not actually call a tool to write a memory file, it remembered nothing. The next conversation will forget it. Check whether it really wrote to a .md file.

Concepts reference Professor Hung-yi Lee’s public NTU course. — Penchan