After running AI agents for a long time, opening the working directory often looks like this: temp files, scheduled-task logs, screenshots, and outdated memory notes all over the place. You clean manually for 20 minutes, and the next day they grow back.

This is a structural problem. AI agent workspaces naturally expand. Every session produces temp files. Every scheduled task leaves logs. Every media task creates screenshots. Without automatic cleanup, you are fighting entropy in a battle you will lose.


Core Idea: Scripts Do Labor, AI Makes Judgments

Mechanical helpers do repetitive work while a smart owl is called in only when judgment is needed

Before designing this system, I tried “let AI clean everything.” The result:

MethodCostProblem
All manual cleanupNo money, but costs time10-20 minutes every day, easy to forget
All AI cleanupPay tokens every timeMost cleanup does not need judgment
Scripts + AI hybridVery low cost per runScripts handle deterministic work; AI steps in only when judgment is needed

Core principle:

“Delete temp files older than 3 days” does not need AI judgment. A shell script is enough. “These two memory notes overlap 80%; should they be merged?” needs AI.


Four-Tier Architecture

Four-tier tower, from heartbeat checks at the bottom to deep audit at the top

graph TD
    T0["⏱ Tier 0: Hourly health check<br/>Pure Shell"] --> T1["🧹 Tier 1: Daily cleanup<br/>Pure Shell"]
    T1 --> T2["🔍 Tier 2: Weekly scan<br/>Shell + AI"]
    T2 --> T3["📋 Tier 3: Monthly audit<br/>Shell + AI"]
TierFrequencyExecutorWhat it does
0HourlyShellHealth check, sentinel monitoring, error dedupe
1DailyShellDelete temp files, clean media, clean logs
2WeeklyShell + AIRefine topic files, compress notes
3MonthlyShell + AICompare environment manifests, audit complexity

The higher the tier, the lower the frequency, the more judgment required, and the higher the cost. If even “delete temp files older than N days” uses AI, accumulated token cost will become more expensive than disk space.


Tier 0: Hourly Health Check

This is the heartbeat of the whole system. It does not clean anything. It only does three things:

1. Sentinel File Check

A penguin stands on a lighthouse, watching bottle signals on the sea; green means healthy, amber means warning

After each tier finishes, it updates a “sentinel file”:

# Written when Tier 1 completes
echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) OK" > .last-daily-ok

# Tier 0 checks the sentinel
last_daily=$(cat .last-daily-ok 2>/dev/null)
# If it has not updated for more than 36 hours → warning

You do not need to monitor whether the scheduler itself is healthy. Just check whether the sentinel file is fresh enough, and you know whether the system is running.

2. Error Dedupe

Report the same warning only after it appears several times in a row. This prevents notification fatigue. If you receive “too many temp files” every hour, you will quickly start ignoring all warnings.

3. Early Exit

When the workspace is clean, with no files above thresholds and no missing sentinels, the entire script exits in a few milliseconds. No unnecessary scans.


Tier 1: Daily Cleanup

This is where cleanup actually starts. Every rule is deterministic: no judgment, only age and type.

Cleanup Rules

TargetRuleDefault threshold
Temp files (tmp/)Delete after N days3 days
Media files (media/)Delete after N days7 days
Scheduled-task logs (cron/runs/)Delete after N days14 days
Empty directoriesRemove automatically

Temp files are being sorted by an invisible force; some glow and stay, some dissolve and are deleted

Two-Layer Early Exit

Tier 1 has one important design: two-layer early exit.

# Layer 1: overall check
total_candidates=$(find tmp/ media/ cron/runs/ -type f | wc -l)
if [ "$total_candidates" -eq 0 ]; then
    echo "Nothing to clean. Exiting."
    exit 0
fi

# Layer 2: check each category separately
old_temps=$(find tmp/ -mtime +3 -type f | wc -l)
if [ "$old_temps" -eq 0 ]; then
    echo "tmp/ is clean. Skipping."
    # Continue to the next category; do not exit the whole script
fi

This makes a clean workspace cost almost no I/O.

Safe Delete

All deletion uses trash on macOS instead of rm, leaving a regret window. If the trash command is unavailable, the script falls back to moving files into ~/.Trash/.


Tier 2: Weekly AI-Assisted Scan

At this level, judgment starts to matter. The shell script collects data, and AI makes decisions.

What the Script Does

  • Lists all topic files and last modified dates
  • Calculates each project directory size
  • Finds notes highly duplicated with other files
  • Lists long-inactive projects

What AI Does

After receiving the scan report, AI decides:

  • These two notes overlap 80%; merge them? → Merge
  • This project has been untouched for 30 days; paused or ended? → Mark as paused
  • This topic file is over 200 lines; split it? → Split

Why Split It This Way

Shell scripts collect data almost for free, in milliseconds. One AI judgment pass is cheap. If AI scans by itself, it must read dozens of files, each becoming tokens, and cost multiplies.

Scripts filter first; AI reads only the essence. The weekly AI cost of the whole system can stay very low.


Tier 3: Monthly Deep Audit

Once a month, do a full checkup.

Environment Manifest

Generate an environment snapshot every month:

## Environment Manifest: 2026-03

- Projects: N (last month M, difference)
- Memory notes: N (last month M, difference)
- Topic files: N (last month M, difference)
- Scheduled tasks: N (last month M, difference)
- Total disk usage: X GB (last month Y GB, difference)

Compare with last month, and you can see where things are expanding or shrinking.

Complexity Trap

The maintenance system itself can also bloat. The monthly audit checks:

  • Are cleanup rules multiplying? (If above a threshold, simplify.)
  • Are scripts getting too long? (If above a line count threshold, split them.)
  • Are you maintaining “the maintenance system of the maintenance system”? (Time to step back.)

Feedback Loop: The System Evolves by Itself

A penguin sits comfortably and reviews a long scroll that gradually changes from chaotic to tidy

The elegant part of the system is its self-improvement mechanism:

  1. Tier 1 finds an anomaly: “For 5 days straight, more than 10 temp files needed cleanup. Maybe the threshold is too long?” → write to optimization-suggestions.md
  2. Tier 2 evaluates the suggestion: AI reads it and decides whether it makes sense
  3. Tier 3 adopts the rule: if AI and human both accept it, update thresholds in config.sh

The system learns from its own operation instead of relying on humans to remember “what we tuned last time.”


Lessons from Pitfalls

Several walls were hit before they became script logic:

null-byte bug

Some AI tools occasionally write null bytes (\x00) into files. The file looks normal, but grep treats it as binary and skips it. Fix: add a null-byte scan step during cleanup.

-newermt trap

macOS find does not support -newermt. Fix: use -mtime +N, or use stat -f%m to get epoch time and calculate manually. Platform differences are wrapped in helper functions inside config.sh.

Value of Early Exit

At first, hourly health checks took 2-3 seconds because there was no early exit. After adding early exit, a clean workspace takes only a few dozen milliseconds. It sounds small, but across 24 runs a day the difference is noticeable.


Quick Start

Only four steps:

1. Clone + Set Path

git clone https://github.com/p3nchan/auto-optimization.git
cd auto-optimization
export WORKSPACE_ROOT="$HOME/.my-agent-workspace"

2. Adjust Thresholds

Edit config/config.sh:

THRESHOLD_TEMP_DAYS=3        # Temp file retention days
THRESHOLD_MEDIA_DAYS=7       # Media file retention days
THRESHOLD_CRON_LOG_DAYS=14   # Scheduled-task log retention days

3. Schedule

# crontab -e
0 * * * *  /path/to/auto-optimization/scripts/hourly-healthcheck.sh
0 3 * * *  /path/to/auto-optimization/scripts/daily-cleanup.sh
0 4 * * 0  /path/to/auto-optimization/scripts/weekly-scan.sh
0 5 1 * *  /path/to/auto-optimization/scripts/monthly-scan.sh

4. Observe

After a few days, check whether optimization-suggestions.md has accumulated suggestions. Tune thresholds based on those suggestions.


Summary

SituationRecommendation
Just started using AI agentsAdd Tier 1 (daily cleanup) first
Multiple projects are runningAdd Tier 0 (health check) + Tier 1
Scheduled tasks + lots of logsUse all four tiers
Want minimum effortAdd daily-cleanup.sh to cron and ignore the rest for now

The full scripts, configs, and docs are in the open-source repo:

👉 Auto Optimization on GitHub

MIT licensed. Clone it and modify as needed.

Further Reading

FAQ

Q: Why does an AI agent workspace need automatic cleanup?

Every AI agent session creates temp files, logs, screenshots, and other artifacts. In a workspace with multiple projects and scheduled tasks running every day, file count grows quickly. Without regular cleanup, disk space is wasted, file search slows down, and the AI is more likely to get confused.

Q: Why not let AI clean everything?

Every AI judgment has a cost in API tokens, while most cleanup is deterministic: delete temp files older than N days, delete logs older than N days. Shell scripts handle that for free, quickly, and reliably. Only judgment-heavy work, such as whether a note is still useful or two files should be merged, should go to AI.

Q: What does this system require?

Only bash and cron, or any scheduler. macOS and Linux both support it. AI-assisted Tier 2 and Tier 3 are optional. Even Tier 0 and Tier 1 alone, using pure shell scripts, can greatly improve workspace hygiene.


Penchan’s Take

OpenClaw has run the full three-tier auto-optimization loop: daily cron runs cleanup scripts, weekly cron scans logs for patterns and writes to optimization-suggestions.md, and only monthly do I manually review whether suggestions should become formal rules. The two most common pitfalls in practice: first, thresholds that are too tight keep deleting useful temp files; second, running deletion without a dry-run mode will lose things in week one. Start with an observation-only phase, run it for two weeks, check whether the distribution makes sense, then enable actions.

— Penchan