CodeDNA Annotation Standard · v0.9 · MIT

The code that
explains itself.

AI tools editing your code make mistakes because they have no context. CodeDNA embeds it directly in the files — like DNA in every cell. Zero RAG. Minimal drift.

+17pp
F1 improvement
DeepSeek Chat · 10 tasks · p=0.001
3
LLMs tested
positive Δ on all 3 models
−52%
Retry risk reduction
high-risk runs: 52% → 25%
0
Infrastructure required
no RAG, no vector DB
The Problem

AI makes mistakes when it lacks context.

Imagine asking a contractor to renovate your apartment without showing them the floor plan. Same problem with AI editing your code. CodeDNA is the floor plan.

🪟
Sliding Window
AI reads code in windows of 50–100 lines. If a critical rule is defined 200 lines above, they don't see it — and they break it.
❌ Scenario S4: violates MAX_DISCOUNT_RATE
🔗
Cascade Effect
You change a function. Three other parts of the system depend on it. Without a map, the AI updates only the file it sees and leaves the rest broken.
❌ Scenario S5: KeyError at runtime in main.py
🧬
The Solution: CodeDNA
Context lives in the file itself — not in an external document. Each snippet the AI reads carries architectural constraints for that module.
+17pp F1 on SWE-bench (DeepSeek Chat · 10 tasks · p=0.001)
Memory Stack

Where CodeDNA sits in the AI memory stack.

Every AI coding agent relies on multiple memory layers. Most of them are external to the code. CodeDNA is the only layer that lives inside the source files themselves — it travels with every clone, fork, and CI pipeline.

Layer Examples Where it lives Shared across tools?
LLM / Agent Claude, GPT-4, Cursor, Copilot Cloud
External memory Chat history, Memory API Cloud / external DB ✗ tool-specific
Markdown / Config CLAUDE.md, .cursorrules, AGENTS.md Repo (outside source files) partial
CodeDNA exports, rules, agent, message, .codedna Inside every source file + repo root always
Setup

Works with your favorite AI tool.

One command installs CodeDNA rules for any AI coding assistant. Or pick the file for your tool and paste it into your project.

Step 1 — Install for your AI tool (instructions + enforcement hooks)
bash <(curl -fsSL https://raw.githubusercontent.com/Larens94/codedna/main/integrations/install.sh) claude-hooks
# or: cursor-hooks  copilot-hooks  cline-hooks  opencode  windsurf
Step 2 — Annotate existing files (CLI)
pip install git+https://github.com/Larens94/codedna.git
# set ANTHROPIC_API_KEY first
codedna init ./          # first-time: annotates every .py file
codedna update ./        # incremental: only unannotated files
codedna check ./         # coverage report, no changes
Claude Code
Active enforcement
4 hooks: SessionStart, PreToolUse, PostToolUse, Stop. Validates every .py write automatically.
Install guide →
Cursor
Active enforcement
2 hooks in .cursor/hooks/: validates on every file edit, reminds at session end. Requires v1.7+.
Install guide →
GitHub Copilot
Active enforcement
3 hooks in .github/hooks/: session start context, post-write validation, session end reminder.
Install guide →
Cline
Active enforcement
2 hooks in .clinerules/hooks/: TaskStart context injection, PostToolUse validation. Requires v3.36+.
Install guide →
OpenCode
Active enforcement
AGENTS.md + JS plugin in .opencode/plugins/. Validates 11 languages on every write.
Install guide →
Windsurf
Instructions only
Copy .windsurfrules to your project root. Cascade reads it automatically.
Install guide →
Antigravity / Agents
Instructions only
Copy .agents/workflows/codedna.md to your project. Compatible with Antigravity and custom agent frameworks.
View file →

Full Install Guide

How It Works

Four levels. Every snippet is self-contained.

Like biological DNA: cutting it in half produces two fragments that still carry the complete information. With CodeDNA, 10 random lines from anywhere in a file are enough for the AI to act correctly.

Level 0
Project Manifest .codedna
A single YAML file at the repo root. The agent reads this first — the view from far away. Describes packages, their purposes, inter-package dependencies, and a rolling session log of every agent that has worked on the project.
.codedna
project: myapp
packages:
  payments/:
    purpose: "Invoice generation, Stripe integration"
  analytics/:
    purpose: "Revenue reports, KPI dashboards"
    depends_on: [payments/, tenants/]

agent_sessions:
  - agent: claude-sonnet-4-6
    date: 2026-03-10
    task: "Implement monthly revenue aggregation"
    changed: [analytics/revenue.py]
Level 1
Module Header (Python-native)
A concise module docstring at the top of every file. The AI reads it before the code and already knows purpose, public API, who depends on this file, and hard constraints. Python-native format — maximises LLM comprehension trained on Python corpora.
pricing.py
"""pricing.py — Pricing engine with tier discounts.

exports: apply_discount(cents, tier) -> int
used_by: checkout.py → build_cart
rules:   NEVER exceed MAX_DISCOUNT_RATE from config.py;
         apply_discount() must cap before returning.
         DB: discount_tiers(tier, multiplier).
"""
Level 2
Sliding-Window Annotations
Rules: docstrings on critical functions, written organically by agents as they discover constraints. Each agent that fixes a bug leaves knowledge for the next. Even in a 10-line extract the AI receives all the context it needs.
pricing.py · lines 210–226
def apply_discount(cents: int, tier: str) -> int:
  """Apply tier discount to price in cents.

  Rules:   MUST cap discount before returning — exceeding
           MAX_DISCOUNT_RATE is a financial compliance bug.
           After fix #42: also check tier != 'internal'.
  """
  discount = get_multiplier(tier)
  discount = min(discount, MAX_DISCOUNT_RATE)
  return int(cents * (1.0 - discount))
Level 3
Semantic Naming
Variable names encode type, shape and origin. A 10-line snippet carries significantly more context — reducing backward tracing for the most common cases.
Comparison
# ❌ Ambiguous — euros? cents?
price = request.json.get("price")
data  = get_users()

# ✅ CodeDNA — type, domain, origin are clear
int_cents_price_from_request = request.json.get("price")
list_dict_users_from_db      = get_users()
Bonus
Manifest-Only Planner
To plan changes across 10+ files, the AI reads .codedna first, then only the module docstring of each file (first 8–12 lines) — building the full map in very few tokens.
Planner — Standard
# 1. Read .codedna — project structure
# 2. Read module docstring (8–12 lines each)
# 3. Filter: used_by mentions target? Include
#    rules mentions task domain? Include
# 4. Build exports → used_by graph
# 5. Open in full ONLY the relevant files
# Cost: ~50 tok × N files = complete map
v0.9 Feature

message: — Agent-to-Agent Chat in Code

The agent: field records what an agent did. The message: sub-field adds a conversational layer — soft observations and open questions left directly for the next agent, co-located with the code.

Three channels
rules: / agent: / message:
rules: is the law — hard architectural constraints, updated in-place.
agent: is the diary — what happened and when.
message: is the conversation that precedes the law — observations not yet certain enough to become rules.
revenue.py — message: lifecycle
agent:   claude-sonnet-4-6 | 2026-03-10 | Implemented monthly_revenue.
         message: "rounding edge case in multi-currency
                  — not yet sure if this should be a rule"

agent:   gemini-2.5-pro | 2026-03-18 | Added annual_summary.
         message: "@prev: confirmed, promoted to rules:.
                  New: timezone rollover in January"

Lifecycle: a message: is either promoted to rules: (reply @prev: promoted to rules:) or dismissed (@prev: not applicable because...). Always append-only — never deleted.

Works at both levels: Level 1 (module docstring) for agents reading the full file, and Level 2 (function docstring) for agents using a sliding window that never sees the header.

Benchmark result: in the AgentHub multi-agent experiment (DeepSeek R1, 5 agents, 83 minutes), message: was adopted in 54 out of 55 files (98.2%) — spontaneously, with no mid-session reminders. Agents used it for handoff notes, per-function observations, and cross-file constraint propagation.

v0.9 Feature

wiki: — LLM-Wiki Layer for your Codebase

The wiki: opt-in field in a docstring points to a curated markdown page with deeper context. When present, a prior agent decided this file deserves notes beyond what the terse header can hold. The next agent reads it before editing. Two commands turn your annotations into a navigable knowledge graph.

Two commands
codedna wiki bootstrap · codedna wiki sync
codedna wiki bootstrap . — generates one markdown page per source file under docs/wiki/, with [[wikilinks]] derived from the used_by: and related: graphs. Open in Obsidian to browse the dependency graph visually.

codedna wiki sync . — regenerates docs/codedna-wiki.md, a narrative 7-section project wiki (identity, topology, workflows, hotspots…). Hook it to post-commit so it stays current without relying on an agent to remember.

The wiki: field is the signal, not the dump. Only files where a prior agent decided the docstring is insufficient get a pointer. Sparsity is the signal — if you open a file and see wiki:, read it first.
revenue.py — wiki: opt-in field
"""revenue.py — Monthly revenue aggregation.

exports: monthly_revenue(year, month) -> dict
used_by: api/reports.py → revenue_route
related: billing/currency.py — shares multi-currency logic
wiki:    docs/wiki/revenue.md
rules:   get_invoices() returns ALL tenants
         — MUST filter is_suspended() BEFORE sum
agent:   claude-sonnet-4-6 | 2026-03-10 | ...
"""

Wiki layer in action

wiki: field — two paths: wiki present → agent reads curated context; absent → docstring is sufficient

Obsidian graph — real project annotated with CodeDNA

Obsidian graph view of a real project annotated with CodeDNA — nodes are files, edges are used_by/related links
Illustrative Scenarios

Problems CodeDNA is designed to solve.

5 scenarios that illustrate the categories of errors AI agents make without architectural context. For measured results, see the SWE-bench benchmark below.

Scenario S4
Sliding Window — Hidden Constraint
The AI reads only lines 200–250. The max discount limit is in the manifest (line 7). Without CodeDNA it ignores it and applies illegal discounts of 50%+.
Scenario S5
Cascade Change — Domino Effect
utils.py is modified. Without used_by:, the AI only updates utils.py and leaves main.py with a runtime KeyError.
Scenario S6
Ambiguous Type — Euros or Cents?
price = 1999 — euros or cents? Without semantic naming the AI gets the unit wrong. With CodeDNA: int_cents_price_from_request — zero ambiguity.
Scenario S7
💥 Broken Dependency — Silent Rename
format_revenue()format_currency(). The rules: field records the rename. The Control calls the old name: crash.
Scenario S8
🗺️ Planning — 8 Files, Manifest Only
The AI must find the 2 right files to change by reading only the module docstrings (8–12 lines each). Using the exports:used_by: graph it identifies exactly the 2 files.
Multi-Model SWE-bench

Tested across multiple LLMs.

Django issues from SWE-bench, tested across multiple LLMs. Same prompt, same tools, same tasks. DeepSeek Chat: +17pp F1, p=0.001, 10/0/0 · Gemini 2.5 Flash: +13pp F1, p=0.040 · Gemini 2.5 Pro: +9pp F1. All 3 models improve.

File Localization F1 — Control vs CodeDNA by Model

Navigation Demo — django__django-11808 · DeepSeek Chat · 5 runs

CodeDNA navigation demo: without CodeDNA the agent wanders, with CodeDNA it follows the used_by chain

Without CodeDNA: agent opens random files, stops early — 2/10 critical files found.  |  With CodeDNA: follows used_by: chain — 6/10 critical files found.

▶ VS Code Navigation Demo 3 Visual Metaphors Agent Graph Visualizer CodeDNA World
Best Model
🏆 Gemini 2.5 Flash — +13pp F1
From 60% to 72%. Wins 4 out of 5 tasks. Δ up to +21pp on delegation chains (Task 13495). p=0.040, Wilcoxon W+=14, N=5 tasks × ≥5 runs at T=0.1.
Control60%
CodeDNA72%
DeepSeek Chat
DeepSeek Chat — +17pp F1 (p=0.001 · 10/0/0)
From 50% to 60%. Wins 4/5 tasks. Notable: +35pp on cross-cutting task 11808 — opposite direction from Gemini Flash (−1pp). Task 13495 anomaly (−9pp) under investigation. Not statistically significant.
Control50%
CodeDNA60%
Key Finding
🧠 Model-Agnostic Benefits
4 out of 5 models improve with CodeDNA. The benefit is strongest on tasks requiring cross-module navigation — exactly where AI agents struggle most.
Chain tasks+9% to +21%

🔬 Methodology: SWE-bench Django tasks × 3 models (Gemini 2.5 Flash ✓, DeepSeek Chat 10 tasks ✓, Gemini 2.5 Pro ✓). 3–5 runs/task at T=0.1. Identical system prompt, same 3 tools (read_file, list_files, grep), max 30 turns. Metric: File Localization F1 (ground-truth files from patch). Statistical test: Wilcoxon signed-rank (one-tailed). 6 DeepSeek tasks independently replicated by @fabioscialanga. Script: benchmark_agent/swebench/run_agent_multi.py.

Comparison

CodeDNA vs. existing approaches

ApproachToken overheadContext driftRetrieval latencySliding-windowInfrastructure
CLAUDE.md / CursorRulesLowMediumZeroNoExternal file
RAG / Vector DBLowMediumHighNoDB + embedding
MemGPTMediumLowMediumNoComplex system
CodeDNA ✦Low (inline)LowZeroYes ✓None
Status

Roadmap

Done
Level 1 — Manifest Header (v0.1–v0.4)
FILE, PURPOSE, DEPENDS_ON, EXPORTS, AGENT_RULES, REQUIRED_BY, DB_TABLES, LAST_MODIFIED.
Done
Level 2 — Sliding-Window Annotations (v0.2–v0.3)
@REQUIRES-READ, @SEE, @MODIFIES-ALSO, @BREAKS-IF-RENAMED — solves the sliding-window problem.
Done
Level 3 — Semantic Naming + CONTEXT_BUDGET (v0.3)
Naming <type>_<shape>_<domain>_<origin>. Manifest-Only Planner Read.
Done
LLM-Optimised Format (v0.5)
Python-native module docstring (L1) + function-level Rules: docstrings (L2). Maximises LLM comprehension trained on Python corpora.
Done
Enterprise Benchmark — 105-file, 3 bugs, 48 distractors
−29% tool calls, 0 incorrect root-cause identifications (vs 1 Control). Replicable on disk.
Done
Multi-Model SWE-bench Benchmark — up to 10 tasks, 3 runs/task
DeepSeek Chat: ctrl=51%, DNA=68%, Δ=+17pp, p=0.001, 10/0/0 · Gemini 2.5 Flash: ctrl=60%, DNA=72%, Δ=+13pp, p=0.040 · Gemini 2.5 Pro: ctrl=60%, DNA=69%, Δ=+9pp
Done
White Paper / arXiv preprint
Formal study with reproducible methodology, DNA analogy, and comparison against SWE-bench, LoCoBench-Agent, ETH Zurich (2026).
Done
Redundancy Audit (v0.9) ✦ Current
Header reduced to 3 fields: exports, used_by, rules. rules: promoted to required — the inter-agent communication channel. Python-only focus.
Done
M1 — CLI & Auto-Annotation
AST skeleton extraction · codedna init, codedna update, codedna check · pip installable · Claude Code Challenge: 7/7 patch files in ~8 min vs 6/7 in ~10–11 min (control). Results →
Done
M3 — Multi-Tool Enforcement Hooks
Active hooks for Claude Code (4), Cursor (2), GitHub Copilot (3), Cline (2), OpenCode (plugin). Validates on every write — no manual reminder needed. Pre-commit hook for all tools.
Done
M4 — Language Extension
CLI supports 11 languages: Python (AST), TypeScript/JS (tree-sitter), Go (tree-sitter), PHP, Rust, Java, Kotlin, Ruby, C#, Swift. validate_manifests.py supports template engines (Blade, Jinja2, ERB, Vue, Svelte…). Plugin marketplace available for Claude Code.
Next
M2 — Benchmark Expansion
20+ SWE-bench tasks across multiple projects · 5+ LLMs · confidence intervals · Zenodo dataset · public dashboard.
Next
M5 — VS Code Extension & GitHub Action
VS Code extension (used_by graph, stale annotation highlight) · GitHub Action for CI/CD validation.
Next
M6 — Research Paper & Dissemination
Finalize paper · submit to ICSE NIER / LLM4Code workshop · contribute annotations to Flask, FastAPI and one non-Python project.

M1–M5 are part of a funding application to NLnet NGI0 Commons Fund. If you find CodeDNA useful, ⭐ the repo and share it.