After running AI agents for a long time, opening the working directory often looks like this: temp files, scheduled-task logs, screenshots, and outdated memory notes all over the place. You clean manually for 20 minutes, and the next day they grow back.
This is a structural problem. AI agent workspaces naturally expand. Every session produces temp files. Every scheduled task leaves logs. Every media task creates screenshots. Without automatic cleanup, you are fighting entropy in a battle you will lose.
Core Idea: Scripts Do Labor, AI Makes Judgments

Before designing this system, I tried “let AI clean everything.” The result:
| Method | Cost | Problem |
|---|---|---|
| All manual cleanup | No money, but costs time | 10-20 minutes every day, easy to forget |
| All AI cleanup | Pay tokens every time | Most cleanup does not need judgment |
| Scripts + AI hybrid | Very low cost per run | Scripts handle deterministic work; AI steps in only when judgment is needed |
Core principle:
“Delete temp files older than 3 days” does not need AI judgment. A shell script is enough. “These two memory notes overlap 80%; should they be merged?” needs AI.
Four-Tier Architecture

graph TD
T0["⏱ Tier 0: Hourly health check<br/>Pure Shell"] --> T1["🧹 Tier 1: Daily cleanup<br/>Pure Shell"]
T1 --> T2["🔍 Tier 2: Weekly scan<br/>Shell + AI"]
T2 --> T3["📋 Tier 3: Monthly audit<br/>Shell + AI"]
| Tier | Frequency | Executor | What it does |
|---|---|---|---|
| 0 | Hourly | Shell | Health check, sentinel monitoring, error dedupe |
| 1 | Daily | Shell | Delete temp files, clean media, clean logs |
| 2 | Weekly | Shell + AI | Refine topic files, compress notes |
| 3 | Monthly | Shell + AI | Compare environment manifests, audit complexity |
The higher the tier, the lower the frequency, the more judgment required, and the higher the cost. If even “delete temp files older than N days” uses AI, accumulated token cost will become more expensive than disk space.
Tier 0: Hourly Health Check
This is the heartbeat of the whole system. It does not clean anything. It only does three things:
1. Sentinel File Check

After each tier finishes, it updates a “sentinel file”:
# Written when Tier 1 completes
echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) OK" > .last-daily-ok
# Tier 0 checks the sentinel
last_daily=$(cat .last-daily-ok 2>/dev/null)
# If it has not updated for more than 36 hours → warning
You do not need to monitor whether the scheduler itself is healthy. Just check whether the sentinel file is fresh enough, and you know whether the system is running.
2. Error Dedupe
Report the same warning only after it appears several times in a row. This prevents notification fatigue. If you receive “too many temp files” every hour, you will quickly start ignoring all warnings.
3. Early Exit
When the workspace is clean, with no files above thresholds and no missing sentinels, the entire script exits in a few milliseconds. No unnecessary scans.
Tier 1: Daily Cleanup
This is where cleanup actually starts. Every rule is deterministic: no judgment, only age and type.
Cleanup Rules
| Target | Rule | Default threshold |
|---|---|---|
Temp files (tmp/) | Delete after N days | 3 days |
Media files (media/) | Delete after N days | 7 days |
Scheduled-task logs (cron/runs/) | Delete after N days | 14 days |
| Empty directories | Remove automatically |

Two-Layer Early Exit
Tier 1 has one important design: two-layer early exit.
# Layer 1: overall check
total_candidates=$(find tmp/ media/ cron/runs/ -type f | wc -l)
if [ "$total_candidates" -eq 0 ]; then
echo "Nothing to clean. Exiting."
exit 0
fi
# Layer 2: check each category separately
old_temps=$(find tmp/ -mtime +3 -type f | wc -l)
if [ "$old_temps" -eq 0 ]; then
echo "tmp/ is clean. Skipping."
# Continue to the next category; do not exit the whole script
fi
This makes a clean workspace cost almost no I/O.
Safe Delete
All deletion uses trash on macOS instead of rm, leaving a regret window. If the trash command is unavailable, the script falls back to moving files into ~/.Trash/.
Tier 2: Weekly AI-Assisted Scan
At this level, judgment starts to matter. The shell script collects data, and AI makes decisions.
What the Script Does
- Lists all topic files and last modified dates
- Calculates each project directory size
- Finds notes highly duplicated with other files
- Lists long-inactive projects
What AI Does
After receiving the scan report, AI decides:
- These two notes overlap 80%; merge them? → Merge
- This project has been untouched for 30 days; paused or ended? → Mark as paused
- This topic file is over 200 lines; split it? → Split
Why Split It This Way
Shell scripts collect data almost for free, in milliseconds. One AI judgment pass is cheap. If AI scans by itself, it must read dozens of files, each becoming tokens, and cost multiplies.
Scripts filter first; AI reads only the essence. The weekly AI cost of the whole system can stay very low.
Tier 3: Monthly Deep Audit
Once a month, do a full checkup.
Environment Manifest
Generate an environment snapshot every month:
## Environment Manifest: 2026-03
- Projects: N (last month M, difference)
- Memory notes: N (last month M, difference)
- Topic files: N (last month M, difference)
- Scheduled tasks: N (last month M, difference)
- Total disk usage: X GB (last month Y GB, difference)
Compare with last month, and you can see where things are expanding or shrinking.
Complexity Trap
The maintenance system itself can also bloat. The monthly audit checks:
- Are cleanup rules multiplying? (If above a threshold, simplify.)
- Are scripts getting too long? (If above a line count threshold, split them.)
- Are you maintaining “the maintenance system of the maintenance system”? (Time to step back.)
Feedback Loop: The System Evolves by Itself

The elegant part of the system is its self-improvement mechanism:
- Tier 1 finds an anomaly: “For 5 days straight, more than 10 temp files needed cleanup. Maybe the threshold is too long?” → write to
optimization-suggestions.md - Tier 2 evaluates the suggestion: AI reads it and decides whether it makes sense
- Tier 3 adopts the rule: if AI and human both accept it, update thresholds in
config.sh
The system learns from its own operation instead of relying on humans to remember “what we tuned last time.”
Lessons from Pitfalls
Several walls were hit before they became script logic:
null-byte bug
Some AI tools occasionally write null bytes (\x00) into files. The file looks normal, but grep treats it as binary and skips it. Fix: add a null-byte scan step during cleanup.
-newermt trap
macOS find does not support -newermt. Fix: use -mtime +N, or use stat -f%m to get epoch time and calculate manually. Platform differences are wrapped in helper functions inside config.sh.
Value of Early Exit
At first, hourly health checks took 2-3 seconds because there was no early exit. After adding early exit, a clean workspace takes only a few dozen milliseconds. It sounds small, but across 24 runs a day the difference is noticeable.
Quick Start
Only four steps:
1. Clone + Set Path
git clone https://github.com/p3nchan/auto-optimization.git
cd auto-optimization
export WORKSPACE_ROOT="$HOME/.my-agent-workspace"
2. Adjust Thresholds
Edit config/config.sh:
THRESHOLD_TEMP_DAYS=3 # Temp file retention days
THRESHOLD_MEDIA_DAYS=7 # Media file retention days
THRESHOLD_CRON_LOG_DAYS=14 # Scheduled-task log retention days
3. Schedule
# crontab -e
0 * * * * /path/to/auto-optimization/scripts/hourly-healthcheck.sh
0 3 * * * /path/to/auto-optimization/scripts/daily-cleanup.sh
0 4 * * 0 /path/to/auto-optimization/scripts/weekly-scan.sh
0 5 1 * * /path/to/auto-optimization/scripts/monthly-scan.sh
4. Observe
After a few days, check whether optimization-suggestions.md has accumulated suggestions. Tune thresholds based on those suggestions.
Summary
| Situation | Recommendation |
|---|---|
| Just started using AI agents | Add Tier 1 (daily cleanup) first |
| Multiple projects are running | Add Tier 0 (health check) + Tier 1 |
| Scheduled tasks + lots of logs | Use all four tiers |
| Want minimum effort | Add daily-cleanup.sh to cron and ignore the rest for now |
The full scripts, configs, and docs are in the open-source repo:
MIT licensed. Clone it and modify as needed.
Further Reading
FAQ
Q: Why does an AI agent workspace need automatic cleanup?
Every AI agent session creates temp files, logs, screenshots, and other artifacts. In a workspace with multiple projects and scheduled tasks running every day, file count grows quickly. Without regular cleanup, disk space is wasted, file search slows down, and the AI is more likely to get confused.
Q: Why not let AI clean everything?
Every AI judgment has a cost in API tokens, while most cleanup is deterministic: delete temp files older than N days, delete logs older than N days. Shell scripts handle that for free, quickly, and reliably. Only judgment-heavy work, such as whether a note is still useful or two files should be merged, should go to AI.
Q: What does this system require?
Only bash and cron, or any scheduler. macOS and Linux both support it. AI-assisted Tier 2 and Tier 3 are optional. Even Tier 0 and Tier 1 alone, using pure shell scripts, can greatly improve workspace hygiene.
Penchan’s Take
OpenClaw has run the full three-tier auto-optimization loop: daily cron runs cleanup scripts, weekly cron scans logs for patterns and writes to optimization-suggestions.md, and only monthly do I manually review whether suggestions should become formal rules. The two most common pitfalls in practice: first, thresholds that are too tight keep deleting useful temp files; second, running deletion without a dry-run mode will lose things in week one. Start with an observation-only phase, run it for two weeks, check whether the distribution makes sense, then enable actions.
— Penchan