How Should an AI Assistant's Memory Be Designed? Starting from the Claude Code Source Leak

At the end of March 2026, Claude Code’s source code was accidentally exposed. The community quickly produced a Python port, and within hours it had climbed high on GitHub stars.

The part worth dissecting was the memory system.

How Claude Code Manages Memory

Inside Claude Code, there is a module called memdir. From the leaked file structure, you can see several submodules: finding relevant memories, batch scanning, memory aging, type classification, and shared memory paths for multiple users.

Just reading those names lets you reconstruct a design philosophy: storing memory is not enough. The system has to find it, judge whether it is stale, and decide whether to load it.

But this design has an assumption: it is built for a developer tool. Claude Code assumes each conversation is an independent task and that the user and AI do not have a long-term relationship.

What if the AI is a long-term partner?

First Discovery: The Classification Was Wrong

People who use AI assistants long-term often create a “corrections” folder to store everything they have corrected the AI about:

Prefers tea, not coffee
Fitness coaching should use body-feel descriptions
All code should be handed to another AI
Never touch cryptocurrency private keys

It looks tidy, but then comes the problem. When the AI starts a new conversation, how does it know which items to read?

Scanning all file titles, judging which ones relate to the current topic, then loading them is exactly what Claude Code’s findRelevantMemories does.

The first version of the solution often goes this way: build an index board, track the usage frequency of each record, put frequently used ones at the top, and “graduate” the most important ones into permanent rules.

The design gets more and more complex. After writing a spec and asking another AI to challenge it for several rounds, a more fundamental question surfaces:

Is “correction record” really a type of memory?

No.

“Prefers tea, not coffee” belongs to facts about the user. “Fitness coaching should use body-feel descriptions” belongs to methods for health training. “Give code to another AI” belongs in operating rules.

These things received the same label only because they were all “created after correcting the AI.”

In other words, most people classify by “how it was learned.” The better standard is “what this information is.”

Figure 1: from messy folder to proper classification

How the Human Brain Classifies Memory

Cognitive science divides long-term memory into several types:

Semantic memory, facts and knowledge. Taipei’s coordinates, or the API format of a program. Procedural memory, how to do things. The steps of driving, or the code review process. Episodic memory, personal experiences. “The day the system first went live,” or “the memory of debugging at 3 a.m.” Emotional memory, feelings and relationship dynamics. “He does not like being rushed,” or “she needs quiet time before making decisions.”

Notice that there is no category called “things other people taught me.” The brain does not create a special “things Mom corrected” area and dump cooking skills, traffic rules, and manners into it.

The brain performs consolidation: the same experience is split apart and sent to different regions based on content. “Mom said the stove is hot” becomes “the stove temperature” in semantic memory, “I was burned” in episodic memory, and “be careful around the stove” in emotional memory.

AI memory may need to be designed this way too.

Figure 2: AI memory layer architecture

The Best Retrieval Is Putting Things in the Right Place

Once this clicked, the solution was surprisingly simple:

Put knowledge where it should be read.

Fitness training methods? Put them inside the health project. Next time health comes up, they are naturally read. Operating rules? Put them in the rule file. They load on every boot. Personal preferences? Put them in the personal file. Read it when the topic asks for it.

No index board. No usage-frequency tracking. No “graduation” mechanism.

Location itself is the best filter.

After the messy “correction records” were split apart, each item returned to where it belonged: training methods into the health project, brand voice into brand guidelines, operating rules back into the rule file (most already existed there, just duplicated), and personal preferences into personal files. The whole folder was emptied.

Guardrails Are More Worth the Effort Than Structure

After cleanup, the next question appears immediately: will the same thing be written in two places again?

For example, if a body discomfort comes up during discussion of a workout plan, should it go into the health project or the personal health file? If both places write it down, they will eventually diverge.

You can set one decision rule:

If this project disappeared tomorrow, would this information still be reused in completely unrelated conversations? No → write it into the project. Yes → write it into the personal file. Unsure → lean toward the project. Never write it in both places.

This rule is more effective than any search algorithm.

The most common problem is usually the same memory being written twice, then fighting itself a few weeks later. Not being able to find a memory is not the main issue.

What I Learned from Claude Code, and What I Did Not

Looking back at Claude Code’s design:

Memory aging. Claude Code lets old memories decay automatically. This makes sense for a tool scenario, but not for a partner relationship. Nobody forgets what a friend likes just because they have not talked for a month.

Relevance search. If things are placed correctly, search is often unnecessary. A refrigerator does not need a search engine to find milk; the milk is on the fridge door.

Memory type classification. Claude Code uses user / feedback / project / reference. That is enough for short-term tools, but not enough for long-term partners. A better classification may follow cognitive science: semantic, procedural, episodic, emotional.

Claude Code’s architecture is designed for “independent work sessions with unfamiliar developers.” For long-term partner AI, changing the classification method first and making it more like a human brain that separates facts, methods, and experiences usually works better than adding a pile of search features.

The Final Memory Architecture

After cleanup, it looked like this:

Always loaded: core values, operating rules, team roles. Like personality and habits; they are always present when awake.

Loaded every conversation: current tasks and today’s events. Like glancing at the calendar before leaving in the morning.

Loaded when relevant: each project’s context and notes, personal preference files. Like memories of a friend surfacing when you think of them.

Actively searched: historical records and technical documents. Like pulling an old book from a shelf.

Each layer loads only what is needed right now. The boot cost for each conversation drops significantly, and over the long term this is noticeable in both token cost and load speed.

If You Are Building Something Similar

A few key points:

Do not copy Claude Code’s memory classification directly. Its four-way split is built for short-term tools. Long-term partners need a different architecture.

Clarify the classification standard first. Are you classifying by “where the information came from” or “what the information is”? It sounds obvious, but many people get stuck here.

Put things in the right place before thinking about search. A refrigerator does not need a search engine.

Guardrail rules are more worth investing in than features. A hard rule like “never write the same thing in two places” saves far more maintenance cost than many automation tools.

Good system design often feels like this: the answer is so simple that you start wondering what you were busy with before.

Penchan’s Experience

After running AI assistants long-term, memory really is the troublesome part. The difficulty in handling memory without letting the agent forget is not “can it be stored?” but “will it naturally be read next time?” In practice, the trick is to keep core files concise and let location itself be the index. That works better than designing search mechanisms. The multi-agent architecture on OpenClaw, Opus / Sonnet / ChatGPT, runs on this logic.

FAQ

Q: How does Claude Code’s memory system work?

Claude Code internally has a memdir module with memory search, batch scanning, aging, type classification, and shared memory paths. It classifies memory into user, feedback, project, and reference, then uses relevance search to decide which memories to load.

Q: How should AI assistant memory be classified?

Cognitive science is a useful reference: classify by the nature of information, not by the source. Semantic memory is facts, procedural memory is methods, episodic memory is experiences, and emotional memory is relationship dynamics. Put knowledge where it will naturally be read when used.

Q: What is the SSoT principle for memory?

Single Source of Truth: each piece of information lives in only one place. Ask whether the information will be reused in unrelated contexts if the project disappears. If not, put it in the project; if yes, put it in a personal file. Never write it in both places.

This article is for research and discussion only, not investment advice. DYOR + NFA.

— Penchan