Interpretable Context Methodology: Folder Structure as Agentic Architecture
Authors: Van Clief, McDermott · Year: 2026 · Venue: arXiv preprint (arXiv:2603.16021, v2; no journal DOI) Raw: md · tex
Out-of-domain (methodology, not aerospace)
This is a context-engineering / AI-agent-orchestration paper, filed in the
context/staging bucket (“methodology papers on structuring context for AI agents”). It is peripheral to the free-flying-manipulator science and relevant instead to how this wiki and its agent harness are run. No closed-vocab tag fits it (see Open Questions).
Summary
The paper proposes Interpretable Context Methodology (ICM): orchestrate a sequential, human-reviewed AI-agent workflow using a folder hierarchy, markdown files, and local scripts instead of a coordination framework (CrewAI/LangChain/AutoGen). One orchestrating agent reads stage-scoped context files at each step; the folder structure encodes stage order (numbered folders), context scoping (the hierarchy), state (files on disk), and inter-stage coordination (one folder’s output is the next folder’s input). The claimed payoff is inspectability, editability, and portability: every intermediate artifact is a plain file a human can open, edit, and hand off, so the pipeline is a “glass box” by construction rather than through an added explanation layer.
Key Claims
- Filesystem replaces framework for sequential/reviewable/repeatable workflows. Stage sequencing = folder numbering, context scoping = folder hierarchy, state = files, coordination = output-becomes-input. For this workflow class the authors argue frameworks add complexity (opacity, fragility, developer dependency) the problem does not require.
- Five design principles, each borrowed from prior practice: one stage/one job (McIlroy, Parnas); plain text as the interface (Kernighan & Pike); layered context loading (prevention, not compression — motivated by Liu et al.’s “lost in the middle”); every output is an edit surface (Horvitz mixed-initiative, Shneiderman direct manipulation); “configure the factory, not the product” (continuous-delivery repeatability).
- Five-layer context hierarchy. Layer 0 identity, Layer 1 task routing, Layer 2 stage contract (the control point), Layer 3 reference material (stable “factory” — voice/design/conventions), Layer 4 working artifacts (per-run “product”). The Layer 3/4 split is claimed to matter because reference material should be internalized as constraints while working artifacts should be processed as input; each stage typically loads only 2,000–8,000 tokens vs a 30k–50k monolithic prompt.
- Stage contract = Inputs / Process / Outputs in a
CONTEXT.md; the Inputs table makes context selection explicit, editable, and auditable. The same folder hierarchy is claimed to serve double duty — human control surface and the orchestrator’s own spec for delegating sub-tasks to sub-agents. - Practitioner evidence (informal). Across 33 community members using multi-stage workspaces, 30 report a U-shaped intervention pattern: heavy editing at stage 1 (direction-setting, creative), light in the middle (constrained execution), heavy again at the final stage (alignment/“debugging”). Non-technical users edited
CONTEXT.mdfiles to change stage behavior; three with no coding experience built and ran workspaces via the workspace-builder. - Explicitly not for real-time multi-agent collaboration, high-concurrency multi-user serving, or automated mid-pipeline branching — those still need framework infrastructure.
Method
Not an experiment: an architectural pattern plus working implementations and practitioner reports (no controlled study). Implementations were built and run in Claude Code with an Opus-class orchestrator delegating to a Sonnet-class sub-agent via agent-teams; the paper states ICM is model-agnostic (folder/format/naming conventions, no model-specific capability). Reported workspaces: a three-stage script-to-animation pipeline (research → script → production/Remotion), a five-stage course-deck pipeline, and a five-stage workspace-builder whose output is a new workspace. Evidence is drawn from an invite-only 52-member practitioner community through conversation, not instrumented logging.
Threats to validity (the authors’ own): data collection is informal and self-reported; the community is self-selected (selection + enthusiasm bias); most active use is content production (academic/policy deployments early-stage); all testing used a single model family; and no controlled comparison of scoped vs monolithic context has been run — the quality claim rests on the “lost in the middle” literature and practitioner judgment, not measured effect sizes.
Relevance to thesis
Peripheral to the manipulator science, but directly descriptive of the agentic research infrastructure this project already runs: numbered/typed folders, per-scope CLAUDE.md/CONTEXT.md-style contracts, path-scoped rule loading, stage-scoped context, and human review gates are exactly ICM’s Layers 0–4 and its edit-surface principle. Useful as (a) a citable provenance for our own “folder structure as architecture” harness choices, (b) a source for the layered-context-loading rationale (why scope each subagent’s brief), and (c) a caution: its central quality claim is explicitly unmeasured, so treat “scoped context improves output” as a hypothesis, not a result. It does not bear on guidance/planning/control math.
Connections
Topics: context_engineering — created 2026-07-02 after the user approved the tag-vocab extension (context_engineering added; agentic_methodology deliberately not). Sources: none in-corpus (its references are software-engineering and HCI, outside this wiki’s scope).
Key Equations / Quotes
No equations (a methodology paper). Cited by section (the converted markdown carries no page numbers).
“if the prompts and context for each stage of a workflow already exist as files in a well-organized folder hierarchy, you do not need a coordination framework … You need one orchestrating agent that reads the right files at the right moment.” (Introduction)
“Stage sequencing is the folder numbering. Context scoping is the folder hierarchy. State management is the files on disk. Coordination between stages is one folder’s output being another folder’s input.” (Sec. 3, Architecture)
“It was never opaque in the first place, because every artifact is a plain-text file that a human can read.” (Sec. 5, Observability as a Side Effect)
Open Questions
- The core quality claim (scoped context beats a monolithic prompt) is unmeasured — the paper concedes no controlled comparison was run. What would a fair ICM-vs-monolithic benchmark on the same task look like?
- Does the five-layer hierarchy generalize across model families, or is it tuned to the one family tested? (Authors flag cross-model evaluation as future work.)
- As context windows grow, does selective loading still pay off, or do only the human-interaction benefits (observability, edit surfaces, review gates) remain?
- How sensitive is stage-output quality to the ordering/formatting of files within a layer — which the protocol does not prescribe?
- Wiki-local: RESOLVED 2026-07-02 — user approved extending the vocabulary with
context_engineering+ the topics page;agentic_methodologynot added (one tag per real need).