Skip to main content
Portfolio Context

Artifact type: Engineering blog post
Audience: Developers using AI-assisted coding tools (e.g., Cursor)
Role: Documentation author

Less structure, less context rot

Context rot in long agent sessions

You're an hour into a debugging session. The agent is working, or it looks like it is. Then you scroll up and realize it's been attempting the same fix since your 30th reply. Each attempt was phrased differently, so you didn't catch it. You check your task list. It shows completed work. None of it matches what actually happened in the thread.

This is context rot: the agent keeps operating, but the working context degrades.

The natural response is to add more structure and control

We built exactly that kind of system while developing a cross-platform desktop application in Cursor. Long agent sessions across multiple repositories regularly ran into hundreds of exchanges, and at the time there was no reliable built-in way to keep the agent oriented without restarting.

The first response was to wrap the agent in a more structured task management layer that would:

  • Track task status explicitly
  • Prioritize by dependency
  • Synchronize task state with GitHub so nothing gets lost
  • Add richer schemas so the agent always knows where it is

This approach works in theory. In practice, it creates new failure modes.

How added complexity breaks agent context

Added structure increases the number of failure surfaces in an agent workflow.

Synchronizing task state with GitHub introduced divergence between the agent and the project’s working state. When the sync script failed, the agent generated its own internal state rather than failing cleanly. Human updates made in GitHub were not reliably reflected in the active session, leaving the agent to operate on stale state and requiring manual reconciliation.

Separating rules from guides was an attempt to make the instructions easier for the agent to follow. The original rules file had become long enough that compliance degraded, so the instructions were split into smaller files. That reduced one problem but created another: the agent treated the files as competing sources of authority, sometimes following one, sometimes the other, or silently deprioritizing one entirely. Merging them back into a single file resolved the conflict, but only after significant trimming.

Cross-platform scripts introduced execution uncertainty. The agent struggled to determine the correct runtime environment and often avoided running the system altogether. On Windows, small incompatibilities compounded into consistent failure to engage.

Python-based workflows introduced operational friction. Directory activation, permission concerns, and execution overhead caused the agent to skip task creation and validation steps rather than execute them reliably.

Auto-prioritization introduced unstable task ordering. Priority metadata conflicted with prompt intent, leading the agent to reweight tasks mid-session, especially during debugging.

Every added feature created a new decision point. More decision points meant more failure surfaces under incomplete or conflicting state. The agent wasn't failing because it lacked structure. It was failing because of it.

Minimal agent task scaffolding that works

The system was rebuilt from scratch in Node.js, keeping only what the agent would reliably engage with.

The Python system was convention-dependent. Completing a task meant following a sequence the CLI didn't enforce: generate an ID, archive the task, validate, and check stats. The agent could skip any step and nothing would stop it. Over time, skipped steps compounded into state the agent couldn't trust.

The difference is visible in how tasks are executed.

# Python — convention-dependent, agent must follow an unenforced sequence
python3 task.py generate_next_id
python3 task.py archive P1-A1
python3 task.py validate-ids
python3 task.py stats

The Node.js version is signal-dependent. Every command returns a deterministic token. The agent doesn't maintain hygiene — it reads outcomes.

# Node.js — signal-dependent, deterministic outputs
npm run task:switch -- "my-project" # → @@OK:SWITCH:my-project@@
npm run task:add -- "Fix audio bug" # → @@OK:ADDED:P1-A1@@
npm run task:complete -- "P1-A1" # → @@OK:COMPLETE:P1-A1@@

Tasks are stored as four-line markdown blocks in a single active.md file. No task state synchronization with GitHub, no dependency graphs, no cross-platform script detection. The Cursor rule is short, always on, and lives in a single file. Zero external dependencies — pure Node.js built-ins.

PythonNode.js
Convention-dependent sequencingSignal-dependent outputs (@@OK:/@@ERR:)
Task state synchronization with GitHubactive.md as single source of truth
Auto-prioritization by dependencyManual task order
Directory switching requiredRuns from anywhere via npm run
OS-dependent script executionCross-platform, zero external dependencies
Ambiguous status markersExplicit TODO / COMPLETED
Multiple rule sourcesSingle trimmed Cursor .mdc rules file

Impact on long-running agent workflows

Reducing scaffolding complexity improves agent reliability over long sessions.

Fewer execution paths reduce the likelihood of repeated or conflicting actions. Deterministic outputs make task state explicit and observable, while a single source of truth prevents divergence between the agent and the system. Removing convention-dependent steps eliminates failure modes tied to implicit memory.

These changes reduce the amount of context the agent must maintain across a session.

In practice, this results in fewer repeated fix attempts, more consistent task completion, longer sessions before requiring a reset, and cleaner handoffs into new sessions without adding prompt clutter.

Agent tooling is catching up

Cursor has since shipped debug mode and build mode. The problem we faced was real enough that the tooling itself caught up.

Upstream improvements take time. Context windows are expanding, and native agent modes are evolving. Over time, much of this scaffolding will become unnecessary.

In the meantime, agent behavior is a user-facing problem with user-facing solutions. You don’t need to wait for the next model release or IDE update to improve long-running sessions.

The default response to agent failure is to add more structure: more rules, more state, more enforcement. In practice, this makes behavior less reliable. Agents perform better when the scaffolding around them is minimal, explicit, and signal-driven.

This is not about simplicity as a preference. It reflects how agents operate. They are stateless between sessions, and convention-dependent systems assume memory they do not have.

Agent workflows that span long sessions across multiple files benefit from a single source of truth, deterministic signals, and a small set of always-on rules. This approach is lightweight enough to adopt in any repository with minimal overhead.

Less structure, less context rot.