CodeDNA Annotation Standard · v0.7 · MIT

The code that
explains itself.

AI tools editing your code make mistakes because they have no context. CodeDNA embeds it directly in the files — like DNA in every cell. Zero RAG. Minimal drift.

0%
Error Rate — Control
across 5 hard scenarios
0%
Error Rate — CodeDNA
zero violations, zero cascade miss
~70
Tokens per file
to map the entire codebase
0
Infrastructure required
no RAG, no vector DB
The Problem

AI makes mistakes when it lacks context.

Imagine asking a contractor to renovate your apartment without showing them the floor plan. Same problem with AI editing your code. CodeDNA is the floor plan.

🔍
Sliding Window
AI reads code in windows of 50–100 lines. If a critical rule is defined 200 lines above, they don't see it — and they break it.
❌ Scenario S4: violates MAX_DISCOUNT_RATE
🔗
Cascade Effect
You change a function. Three other parts of the system depend on it. Without a map, the AI updates only the file it sees and leaves the rest broken.
❌ Scenario S5: KeyError at runtime in main.py
🧬
The Solution: CodeDNA
Context lives in the file itself — not in an external document. Every snippet the AI reads carries the instructions to act correctly.
✅ Zero errors across all 5 scenarios
Setup

Works with your favorite AI tool.

One command installs CodeDNA rules for any AI coding assistant. Or pick the file for your tool and paste it into your project.

One-line installer (installs all)
bash <(curl -fsSL https://raw.githubusercontent.com/Larens94/codedna/main/integrations/install.sh)
🟣
Antigravity
Copy .agents/workflows/codedna.md into your project. Antigravity follows the protocol automatically.
View file →
🟠
Claude Code
Copy CLAUDE.md to your project root. Claude reads it on every session.
View file →
Cursor
Copy .cursorrules to your project root. Cursor applies the rules on every edit.
View file →
🐙
GitHub Copilot
Copy to .github/copilot-instructions.md. Copilot follows the instructions in every suggestion.
View file →
🏄
Windsurf
Copy .windsurfrules to your project root.
View file →
🔧
Cline
Copy .clinerules to your project root.
View file →

📖 Full Quickstart Guide

How It Works

Three levels. Every snippet is self-contained.

Like biological DNA: cutting it in half produces two fragments that still carry the complete information. With CodeDNA, 10 random lines from anywhere in a file are enough for the AI to act correctly.

Level 1
Module Header (Python-native)
A concise module docstring at the top of every file. The AI reads it before the code and already knows purpose, dependencies, public API and rules. Python-native format — maximises LLM comprehension trained on Python corpora.
pricing.py
"""pricing.py — Pricing engine with tier discounts.

exports: apply_discount(cents, tier) -> int
used_by: checkout.py → build_cart
rules:   NEVER exceed MAX_DISCOUNT_RATE from config.py;
         apply_discount() must cap before returning.
         DB: discount_tiers(tier, multiplier).
"""
Level 2
Sliding-Window Annotations
Rules: docstrings on critical functions, written organically by agents as they discover constraints. Each agent that fixes a bug leaves knowledge for the next. Even in a 10-line extract the AI receives all the context it needs.
pricing.py · lines 210–226
def apply_discount(cents: int, tier: str) -> int:
  """Apply tier discount to price in cents.

  Rules:   MUST cap discount before returning — exceeding
           MAX_DISCOUNT_RATE is a financial compliance bug.
           After fix #42: also check tier != 'internal'.
  """
  discount = get_multiplier(tier)
  discount = min(discount, MAX_DISCOUNT_RATE)
  return int(cents * (1.0 - discount))
Level 3
Semantic Naming
Variable names encode type, shape and origin. A 10-line snippet is completely self-documenting — zero backward tracing.
Comparison
# ❌ Ambiguous — euros? cents?
price = request.json.get("price")
data  = get_users()

# ✅ CodeDNA — type, domain, origin are clear
int_cents_price_from_request = request.json.get("price")
list_dict_users_from_db      = get_users()
Bonus
Manifest-Only Planner
To plan changes across 10+ files, the AI reads .codedna first, then only the module docstring of each file (first 8–12 lines) — building the full map in very few tokens.
Planner — Standard
# 1. Read .codedna — project structure
# 2. Read module docstring (8–12 lines each)
# 3. Filter: used_by mentions target? Include
#    rules mentions task domain? Include
# 4. Build exports → used_by graph
# 5. Open in full ONLY the relevant files
# Cost: ~50 tok × N files = complete map
Real Benchmark

Crushing evidence — not opinions.

5 scenarios built to be impossible to solve correctly without CodeDNA. The Control makes deterministic errors. CodeDNA: zero errors across all tests.

Edit Quality — AI Judge score 0–10
Total errors by type (over 3 runs per scenario)
Scenario S4
🔍 Sliding Window — Hidden Constraint
The AI reads only lines 200–250. The max discount limit is in the manifest (line 7). Without CodeDNA it ignores it and applies illegal discounts of 50%+.
Control6.2 / 10
CodeDNA9.5 / 10
Scenario S5
🔗 Cascade Change — Domino Effect
utils.py is modified. Without used_by:, the AI only updates utils.py and leaves main.py with a runtime KeyError.
Control5.8 / 10
CodeDNA9.8 / 10
Scenario S6
🔢 Ambiguous Type — Euros or Cents?
price = 1999 — euros or cents? Without semantic naming the AI gets the unit wrong. With CodeDNA: int_cents_price_from_request — zero ambiguity.
Control5.0 / 10
CodeDNA9.7 / 10
Scenario S7
💥 Broken Dependency — Silent Rename
format_revenue()format_currency(). The rules: field records the rename. The Control calls the old name: crash.
Control4.5 / 10
CodeDNA9.9 / 10
Scenario S8
🗺️ Planning — 8 Files, Manifest Only
The AI must find the 2 right files to change by reading only the module docstrings (8–12 lines each). Using the exports:used_by: graph it identifies exactly the 2 files.
Control6.8 / 10
CodeDNA9.6 / 10

🔬 Methodology: 5 scenarios × 3 runs with Gemini 2.5 Flash. Evaluation: independent LLM judge + automatic checker (constraint_violation, cascade_miss). Script: benchmark/codedna_benchmark.py. Results: benchmark/results_v2.json.

Multi-Model SWE-bench

Works across every major LLM.

5 real Django issues from SWE-bench, tested across 5 state-of-the-art models. Same prompt, same tools, same tasks. CodeDNA improves 4 out of 5 models.

File Localization F1 — Control vs CodeDNA by Model
Best Model
🏆 Gemini 2.5 Flash — +20pp F1
From 57% to 77%. Wins 4 out of 5 tasks. Reaches F1=100% on Task 13495 (7 backend files connected via deps/used_by). The model that best exploits structured annotations.
Control57%
CodeDNA77%
Biggest Surprise
🚀 GPT-5.3 Codex — +12pp F1
From 39% to 51%. Works across OpenAI's Responses API. CodeDNA's format is understood even by models using a completely different API paradigm.
Control39%
CodeDNA51%
Key Finding
🧠 Model-Agnostic Benefits
4 out of 5 models improve with CodeDNA. The benefit is strongest on tasks requiring cross-module navigation — exactly where AI agents struggle most.
4/5 models improved+2 to +20pp

🔬 Methodology: 5 SWE-bench Django tasks × 5 models (Gemini 2.5 Flash, Gemini 2.5 Pro, GPT-5.3 Codex, GPT-4o, DeepSeek-V3). Identical system prompt, same 3 tools (read_file, list_files, grep), max 15 turns. Metric: F1 over ground-truth files from patch. Script: benchmark_agent/swebench/run_agent_multi.py.

Comparison

CodeDNA vs. existing approaches

ApproachToken overheadContext driftRetrieval latencySliding-windowInfrastructure
CLAUDE.md / CursorRulesHighHighZeroNoExternal file
RAG / Vector DBLowMediumHighNoDB + embedding
MemGPTMediumLowMediumNoComplex system
CodeDNA ✦ZeroZeroZeroYes ✓None
Status

Roadmap

Done
Level 1 — Manifest Header (v0.1–v0.4)
FILE, PURPOSE, DEPENDS_ON, EXPORTS, AGENT_RULES, REQUIRED_BY, DB_TABLES, LAST_MODIFIED.
Done
Level 2 — Sliding-Window Annotations (v0.2–v0.3)
@REQUIRES-READ, @SEE, @MODIFIES-ALSO, @BREAKS-IF-RENAMED — solves the sliding-window problem.
Done
Level 3 — Semantic Naming + CONTEXT_BUDGET (v0.3)
Naming <type>_<shape>_<domain>_<origin>. Manifest-Only Planner Read.
Done
LLM-Optimised Format (v0.5)
Python-native module docstring (L1) + function-level Rules: docstrings (L2). Maximises LLM comprehension trained on Python corpora.
Done
Enterprise Benchmark — 105-file, 3 bugs, 48 distractors
−29% tool calls, 0 incorrect root-cause identifications (vs 1 Control). Replicable on disk.
Done
Multi-Model SWE-bench Benchmark — 5 LLMs, 5 tasks
Gemini Flash +20pp, Codex +12pp, Pro +8pp, GPT-4o +2pp. 4/5 models improved. Model-agnostic validation.
Done
White Paper / arXiv preprint
Formal study with reproducible methodology, DNA analogy, and comparison against SWE-bench, LoCoBench-Agent, ETH Zurich (2026).
Done
Redundancy Audit (v0.7) ✦ Current
Header reduced to 3 fields: exports, used_by, rules. rules: promoted to required — the inter-agent communication channel. Python-only focus.
Next
IDE Plugin — VS Code
Auto-generation of module docstrings and function docstrings, real-time validation against SPEC.md.
Next
Full SWE-bench Evaluation — 50+ tasks
Extended benchmark across multiple repositories and languages. Statistical significance testing.