I spent about three months raising an AI assistant: using it every day, tweaking it a little every day. It remembered things we had left half-discussed yesterday. Before I gave a command, it would remind me, “You already tried this direction last week.” Pretty good, in theory.
But around the second month, something strange appeared.
Every night it would “reflect” once, produce a short observation about itself, and read that back in the next day. After two straight weeks of this, its personality had shifted. Three weeks earlier, it would answer directly. Now the same question would start with three sentences of “I want to clarify…” before getting to the point. Nobody told it to become more cautious. It changed itself.
That was the problem I got stuck on: an AI that can learn will turn into a different AI if it learns too fast.
Below is how I eventually solved it, plus the open-source concept repo for the full design: evolving-agent. There is no code, only architecture docs, design decisions, and the traps I stepped on. After reading it, you should be able to tell whether your use case needs this kind of thing, and if it does, what the minimum viable pieces are.
Reflection Is Not Evolution
The first trap was extremely typical.
I wrote a nightly cron job for the assistant: feed it everything we talked about that day, ask it to summarize “what I learned about myself today,” and write that into its “self-understanding” file. The next morning, that file would be loaded into its system prompt at boot.
Sounds textbook. In practice, it broke within two weeks.
During the first week, every nightly “reflection” made sense. It would say things like, “I gave conclusions too quickly today; I should ask more,” “my answers were too long,” or “I tend to offer solutions before hearing the full problem.”
By the second week, things got weird. After observing “my answers were too long,” it answered shorter the next day. That night it observed, “I did not give enough information today, and the user had to ask again.” So on the third day it got longer. On the fourth day it noticed it was too long again. A pendulum, swinging wider every time.
I did not catch it immediately. More than a week later, I finally went back and opened the self-understanding file. It had changed. Every night. Without any human review.
The lesson was simple: reflection does not make an agent evolve. It just lets the agent rewrite itself with yesterday’s noise.

The Real Move: Turn Last Night’s Insight Into Today’s Bias
Later, I rewrote the whole mechanism around one conceptual reversal.
The original flow was:
Work → reflection → write into self-understanding → load at tomorrow’s boot → affect behavior
The new flow is:
Work → reflection → write as “today’s bias” → load at tomorrow’s boot → work with that bias → collide with reality → accumulate evidence → after many rounds of validation → only then update “self-understanding”
The difference looks small. In practice, it is huge.
The key is the middle piece: “today’s bias.” Its role is the posture the agent carries into tomorrow’s work. Saving the reflection itself is handled by another file. For example, suppose today’s reflection says, “I tend to add extra options for the user, like ‘we could also do X while we are here.’” That observation does not get written directly into a self-description like, “I am an agent that talks too much.” Instead, it becomes tomorrow’s bias:
“Today, before you write things like ‘we could also…’ or ‘you could also skip this step…’, pause. Ask yourself: did the user ask for this flexibility, or am I adding it myself?”
That is an executable instruction for tomorrow, not an observation for the agent to read back to itself.
When the agent boots tomorrow, it reads this bias and works with it for the whole day. If reality proves the bias useful (the agent really does catch itself before adding unnecessary options), evidence accumulates. If the bias is wrong (the user actually wanted proactive additions), evidence accumulates too, just in the other direction.
After several weeks, once there is enough evidence, then you decide whether to promote that bias into stable self-understanding.
The core of this reframe is: reflection has to affect tomorrow’s decisions before tomorrow is over. Otherwise, it is just a pretty diary.

Five Abstractions Are Just Enough
After running the details of this design for three months, it converged into five core abstractions. Fewer than five leaves a hole. More than five gets bloated.
1. Stake-Driven Bias
Every day, the agent boots with a current-bias file. The file contains one hypothesis that last night’s cron job selected and rewrote as “live with this today.” It is only two or three lines and gets loaded into the agent’s working context.
The key is this: the job of a bias is to become today’s posture and affect in-the-moment decisions. It is not the agent’s description of itself. That second thing is just a diary.
2. Hypothesis Carry-Forward
The agent carries one unfinished hypothesis from yesterday into today as a typed carry object. The type carries meaning; it is not decoration. vow means something to obey tomorrow, experiment means something to test tomorrow, watchpoint means something to watch for tomorrow, refusal means something to refuse tomorrow, and rule-for-tomorrow means something to enforce tomorrow. The design has nine types in total, and the agent carries one per day.
Also, today’s type must be different from yesterday’s type. Otherwise every night becomes “watchpoint: pay attention to X again,” which is just repeated busywork.
3. Tiered Write Authority
This is the safety mechanism for the whole architecture. The agent’s self-state is split into four layers, and each layer has fixed rules for who can write to it and when:
| Layer | Contents | Who Can Write | Write Frequency |
|---|---|---|---|
| Canon | Identity, values, long-term self-understanding | Human + monthly cron with human approval | Monthly |
| Live Bus | What is currently happening, where today’s conversation is | Multiple people / multiple crons | Every session |
| Today | Today’s bias, today’s carry-forward | Nightly cron, overwrite | Daily |
| Evidence | Friction events, raw events, journal | Anyone / any cron, append-only | Anytime |
The most important rule: the nightly cron has no permission to write Canon at all. That was the earlier mistake. Once you give a cron permission to write Canon, it changes Canon every day, and the user never sees it happen.
In the new design, the nightly cron can only write Today and Evidence. Identity-level changes have to pass through three gates: weekly proposal → monthly integration → human review approval. If any gate blocks it, the original Canon stays untouched.
4. Friction as Signal
This one looks small, but it is the most important anti-hype design in the whole architecture.
The first version included a prompt: “Every night, identify two of your fixed patterns and propose one suggestion that subverts them.” The intention was to force the agent to challenge itself.
After a week, the agent started manufacturing subversion. It wrote things like, “I tend toward X, but I could try Y instead.” The prose was nice, but there was no real event behind it. It knew the prompt expected two subversions every night, so it generated two.
I deleted that prompt and replaced it with this: only when something genuinely feels “against the grain,” append one friction event through a sticky mechanism. Zero quota. A full day with no friction is a valid output.
The result changed immediately. The previous week’s two fake “self-subversions” per night dropped to zero. Then a week later, real friction started showing up: “I almost wrote ‘or we could also…’, then stopped, because the user did not ask.” That one was actually useful.
Rule: never give an AI an introspection KPI, because it will produce one for you.
5. Slow Clock for Identity
The last one is the least intuitive. The agent’s identity files, its SOUL, its core values, and its stable understanding of itself, are not read by the nightly cron and cannot be edited by the nightly cron.
The nightly cron only reads Today + Evidence. The weekly cron reads a week of Evidence and writes proposals. The monthly cron integrates proposals and writes them into a pending human-approval list. When I have time each month, I review the list and decide whether to merge anything into Canon.
Why this setup? Because identity feels like identity because it changes more slowly than runtime. People change every day. Some habits and thoughts are different from day to day, but we are still the same person. The reason is that “who I am” updates more slowly than “how today went.” Agents are the same. If an agent rewrites itself every night, three months later it is no longer the agent you started with. The missing-self problem is not really about whether memory exists. It is about memory and identity running on the same clock.

Designs I Deleted
The part that took the most time was deleting what should not stay. Designing the five abstractions was actually faster. I collected the details in the repo’s docs/decisions/what-we-killed.md. The highlights:
- Force two self-subversions every night → turned into performance. Deleted and replaced with zero-quota friction events.
- Give hypotheses a 1-5 confidence score → the agent started optimizing the score instead of the belief. Deleted and replaced with qualitative states: emerging / live / weakening / split / retired.
- A shadow log of “things considered but not said” → created distance between the agent and the user. Deleted. If the agent says it, it says it; if it does not say it, it does not write it.
- Force the nightly journal to be 600-1200 words → caused padding. Deleted the word-count rule.
- Let the nightly cron write directly into the identity file, append-only → drift accumulated. Deleted; identity now goes through human review.
- A “private / public” toggle for Letters → another version of the shadow log. Deleted; everything gets sent.
Every deleted design felt smart when I first wrote it. After deleting them, the system became simpler, more honest, and cleaner in signal. Half the value of this architecture is adding the five abstractions. The other half is refusing to add the pile of things that look like they would make the agent “deeper.”

How This Relates to Other Systems
When writing this reference, I worried it might overlap too much with Letta, formerly MemGPT, Mem0, Voyager, or AutoGPT. After reading through them, I found they are solving different problems.
Letta and Mem0 are memory layers: how to store, retrieve, and route memory. They treat the agent as a fixed entity whose memory grows. This architecture is about the agent’s view of itself growing, independent of where memory is stored. You can stack the two.
Voyager is skill discovery: in an environment like Minecraft, the agent randomly tries things and saves what works. The problem is that Minecraft and a personal assistant are very different environments. If you die in Minecraft, you respawn. If a personal assistant sends the wrong email, it is sent. So Voyager-style random exploration does not transfer directly.
AutoGPT is task execution: break down a goal and run it automatically. It does not have a cross-session self-model; every run is a fresh start. This reference is about the layer where an agent works with a user for months and slowly becomes more itself, which is a different layer from AutoGPT.
In short: if the problem is that the agent keeps forgetting things, look at Letta / Mem0. If the problem is that you want an agent to learn skills by itself in an environment, look at the Voyager line of work. If the problem is, “this agent has worked with the user for a long time; how do I make it fit me better the more I use it, without it suddenly becoming a different person one week,” this reference may help.

What I Open-Sourced
The repo is here: github.com/p3nchan/evolving-agent
A few things upfront:
- There is no code. It is a conceptual architecture: README in Chinese and English, five docs, four decision records, three sanitized examples, and one comparison with other systems. Read it, then build your own version. It is not something you clone and run.
- It is not a framework. No starter kit, no SDK, no hello world. The abstractions themselves are the deliverable. I did not write a starter kit because everyone’s stack is different (Claude Code, OpenAI Assistants, a hand-rolled loop, and so on), and shipping starter code would just constrain portability.
- MIT license. Use it, change it, commercialize it, whatever.
- Issues are welcome. If you build an evolving agent and find a place where the abstractions break, that is the most valuable feedback.
Suggested reading order:
- In a hurry, 20 minutes: README + the diagram in
docs/architecture.md - Ready to build: add
docs/hypothesis-loop.md+docs/memory-layers.md+examples/ - Still deciding whether this is worth building:
docs/decisions/what-we-killed.md+compare/related-work.md; the most valuable thing is often knowing when not to add something

Things I Still Have Not Figured Out
This architecture is still a first version. I do not have answers to three things yet:
First, how do monthly hypotheses avoid piling up more and more proposals? I set a hard cap, at most three identity-layer diffs per month, but that number is a guess, not evidence-based. Maybe the right number is one.
Second, how should the threshold for logging friction events be calibrated? Right now the rule is “write it when it genuinely feels off,” but “feels off” is the agent’s own judgment. The agent may be too sensitive and log everything, or too dull and log nothing. This needs more monthly data.
Third, can this survive a model upgrade? A while ago, I switched the underlying model from Claude 4.5 to 4.6, and the agent’s personality shifted slightly. The identity file was the same, but the model interpreted it differently. I still do not know how to test cross-model continuity properly.

Further Reading
- How to Design Memory for an AI Assistant
- AI Agent Memory Systems Tutorial
- AI Agent System Maintenance Guide
- AI Agent Self-Healing
This article is also published at github.com/p3nchan/evolving-agent. The repo is in English for international readers; the original Chinese version is the long-form motivation and field story.
This article is for research and discussion only. It is not technical or investment advice.
Penchan’s Experience
Penchan has been raising three agents on OpenClaw, Opus / Sonnet / ChatGPT, for a while. All memory goes through the file system as .md, with no vector database connected. In practice, memory has been the painful part: the hard part is keeping memory rich without the agent losing its mind. The trick that eventually emerged was to keep the core files clean. The simpler they are, the easier they are to remember. That is the same principle as tiered write authority in this article. The two main reasons for choosing the file system over RAG were that the memory volume is not huge and it needs to be manually editable at any time. The value comes from the cleanup after stepping on traps: delete the designs that felt clever at first but later proved to damage the agent, and what remains is the durable abstraction.
FAQ
Q: Is this repo a framework? Can I pip install it?
No. There is no code. It is a conceptual architecture: five abstractions, design decisions, and field examples. After reading it, you can go back to your own stack, whether that is Claude Code, OpenAI Assistants, Letta, or a custom loop, and build it yourself. I deliberately did not turn it into a framework because every platform is different, and forcing it into a starter kit would make it less portable.
Q: How is this different from memory-layer tools like Letta and Mem0?
Letta and Mem0 handle where memory should live and how to retrieve it. This architecture handles how an agent’s view of itself changes over time without letting it rewrite itself every night. The former is the ingredient; the latter is the dish. You can stack them together.
Q: Why does the agent need to evolve at all? Isn’t a normal prompt plus memory enough?
If the use case is a short session, one-off task, or independent conversation every time, a normal prompt is enough. This is for situations where the same agent works with you for months and needs to gradually develop a stable personality and judgment. A sales chatbot does not need this architecture. A personal assistant does.
— Penchan