CodeDNA — An In-Source Communication Protocol for AI Coding Agents

The Problem

AI makes mistakes when it lacks context.

Imagine asking a contractor to renovate your apartment without showing them the floor plan. Same problem with AI editing your code. CodeDNA is the floor plan.

🪟

Sliding Window

AI reads code in windows of 50–100 lines. If a critical rule is defined 200 lines above, they don't see it — and they break it.

❌ Scenario S4: violates MAX_DISCOUNT_RATE

🔗

Cascade Effect

You change a function. Three other parts of the system depend on it. Without a map, the AI updates only the file it sees and leaves the rest broken.

❌ Scenario S5: KeyError at runtime in main.py

🧬

The Solution: CodeDNA

Context lives in the file itself — not in an external document. Each snippet the AI reads carries architectural constraints for that module.

+17pp F1 on SWE-bench (DeepSeek Chat · 10 tasks · p=0.001)

Memory Stack

Where CodeDNA sits in the AI memory stack.

Every AI coding agent relies on multiple memory layers. Most of them are external to the code. CodeDNA is the only layer that lives inside the source files themselves — it travels with every clone, fork, and CI pipeline.

Layer	Examples	Where it lives	Shared across tools?
LLM / Agent	Claude, GPT-4, Cursor, Copilot	Cloud	—
External memory	Chat history, Memory API	Cloud / external DB	✗ tool-specific
Markdown / Config	CLAUDE.md, .cursorrules, AGENTS.md	Repo (outside source files)	partial
CodeDNA	exports, rules, agent, message, .codedna	Inside every source file + repo root	always

Setup

Works with your favorite AI tool.

One command installs CodeDNA rules for any AI coding assistant. Or pick the file for your tool and paste it into your project.

Step 1 — Install for your AI tool (instructions + enforcement hooks)

bash <(curl -fsSL https://raw.githubusercontent.com/Larens94/codedna/main/integrations/install.sh) claude-hooks
# or: cursor-hooks  copilot-hooks  cline-hooks  opencode  windsurf

Step 2 — Annotate existing files (CLI)

pip install git+https://github.com/Larens94/codedna.git
# set ANTHROPIC_API_KEY first
codedna init ./          # first-time: annotates every .py file
codedna update ./        # incremental: only unannotated files
codedna check ./         # coverage report, no changes

Claude Code

Active enforcement

4 hooks: SessionStart, PreToolUse, PostToolUse, Stop. Validates every .py write automatically.

Install guide →

Cursor

Active enforcement

2 hooks in .cursor/hooks/: validates on every file edit, reminds at session end. Requires v1.7+.

Install guide →

GitHub Copilot

Active enforcement

3 hooks in .github/hooks/: session start context, post-write validation, session end reminder.

Install guide →

Cline

Active enforcement

2 hooks in .clinerules/hooks/: TaskStart context injection, PostToolUse validation. Requires v3.36+.

Install guide →

OpenCode

Active enforcement

AGENTS.md + JS plugin in .opencode/plugins/. Validates 11 languages on every write.

Install guide →

Windsurf

Instructions only

Copy .windsurfrules to your project root. Cascade reads it automatically.

Install guide →

Antigravity / Agents

Instructions only

Copy .agents/workflows/codedna.md to your project. Compatible with Antigravity and custom agent frameworks.

View file →

Full Install Guide

How It Works

Four levels. Every snippet is self-contained.

Like biological DNA: cutting it in half produces two fragments that still carry the complete information. With CodeDNA, 10 random lines from anywhere in a file are enough for the AI to act correctly.

Level 0

Project Manifest .codedna

A single YAML file at the repo root. The agent reads this first — the view from far away. Describes packages, their purposes, inter-package dependencies, and a rolling session log of every agent that has worked on the project.

.codedna

project: myapp
packages:
  payments/:
    purpose: "Invoice generation, Stripe integration"
  analytics/:
    purpose: "Revenue reports, KPI dashboards"
    depends_on: [payments/, tenants/]

agent_sessions:
  - agent: claude-sonnet-4-6
    date: 2026-03-10
    task: "Implement monthly revenue aggregation"
    changed: [analytics/revenue.py]

Level 1

Module Header (Python-native)

A concise module docstring at the top of every file. The AI reads it before the code and already knows purpose, public API, who depends on this file, and hard constraints. Python-native format — maximises LLM comprehension trained on Python corpora.

pricing.py

"""pricing.py — Pricing engine with tier discounts.

exports: apply_discount(cents, tier) -> int
used_by: checkout.py → build_cart
rules:   NEVER exceed MAX_DISCOUNT_RATE from config.py;
         apply_discount() must cap before returning.
         DB: discount_tiers(tier, multiplier).
"""

Level 2

Sliding-Window Annotations

Rules: docstrings on critical functions, written organically by agents as they discover constraints. Each agent that fixes a bug leaves knowledge for the next. Even in a 10-line extract the AI receives all the context it needs.

pricing.py · lines 210–226

def apply_discount(cents: int, tier: str) -> int:
  """Apply tier discount to price in cents.

  Rules:   MUST cap discount before returning — exceeding
           MAX_DISCOUNT_RATE is a financial compliance bug.
           After fix #42: also check tier != 'internal'.
  """
  discount = get_multiplier(tier)
  discount = min(discount, MAX_DISCOUNT_RATE)
  return int(cents * (1.0 - discount))

Level 3

Semantic Naming

Variable names encode type, shape and origin. A 10-line snippet carries significantly more context — reducing backward tracing for the most common cases.

Comparison

# ❌ Ambiguous — euros? cents?
price = request.json.get("price")
data  = get_users()

# ✅ CodeDNA — type, domain, origin are clear
int_cents_price_from_request = request.json.get("price")
list_dict_users_from_db      = get_users()

Bonus

Manifest-Only Planner

To plan changes across 10+ files, the AI reads .codedna first, then only the module docstring of each file (first 8–12 lines) — building the full map in very few tokens.

Planner — Standard

# 1. Read .codedna — project structure
# 2. Read module docstring (8–12 lines each)
# 3. Filter: used_by mentions target? Include
#    rules mentions task domain? Include
# 4. Build exports → used_by graph
# 5. Open in full ONLY the relevant files
# Cost: ~50 tok × N files = complete map

v0.9 Feature

`message:` — Agent-to-Agent Chat in Code

The agent: field records what an agent did. The message: sub-field adds a conversational layer — soft observations and open questions left directly for the next agent, co-located with the code.

Three channels

rules: / agent: / message:

rules: is the law — hard architectural constraints, updated in-place.
agent: is the diary — what happened and when.
message: is the conversation that precedes the law — observations not yet certain enough to become rules.

revenue.py — message: lifecycle

agent:   claude-sonnet-4-6 | 2026-03-10 | Implemented monthly_revenue.
         message: "rounding edge case in multi-currency
                  — not yet sure if this should be a rule"

agent:   gemini-2.5-pro | 2026-03-18 | Added annual_summary.
         message: "@prev: confirmed, promoted to rules:.
                  New: timezone rollover in January"

Lifecycle: a message: is either promoted to rules: (reply @prev: promoted to rules:) or dismissed (@prev: not applicable because...). Always append-only — never deleted.

Works at both levels: Level 1 (module docstring) for agents reading the full file, and Level 2 (function docstring) for agents using a sliding window that never sees the header.

Benchmark result: in the AgentHub multi-agent experiment (DeepSeek R1, 5 agents, 83 minutes), message: was adopted in 54 out of 55 files (98.2%) — spontaneously, with no mid-session reminders. Agents used it for handoff notes, per-function observations, and cross-file constraint propagation.

v0.9 Feature

`wiki:` — LLM-Wiki Layer for your Codebase

The wiki: opt-in field in a docstring points to a curated markdown page with deeper context. When present, a prior agent decided this file deserves notes beyond what the terse header can hold. The next agent reads it before editing. Two commands turn your annotations into a navigable knowledge graph.

Two commands

codedna wiki bootstrap · codedna wiki sync

codedna wiki bootstrap . — generates one markdown page per source file under docs/wiki/, with [[wikilinks]] derived from the used_by: and related: graphs. Open in Obsidian to browse the dependency graph visually.

codedna wiki sync . — regenerates docs/codedna-wiki.md, a narrative 7-section project wiki (identity, topology, workflows, hotspots…). Hook it to post-commit so it stays current without relying on an agent to remember.

The wiki: field is the signal, not the dump. Only files where a prior agent decided the docstring is insufficient get a pointer. Sparsity is the signal — if you open a file and see wiki:, read it first.

revenue.py — wiki: opt-in field

"""revenue.py — Monthly revenue aggregation.

exports: monthly_revenue(year, month) -> dict
used_by: api/reports.py → revenue_route
related: billing/currency.py — shares multi-currency logic
wiki:    docs/wiki/revenue.md
rules:   get_invoices() returns ALL tenants
         — MUST filter is_suspended() BEFORE sum
agent:   claude-sonnet-4-6 | 2026-03-10 | ...
"""

Wiki layer in action

wiki: field — two paths: wiki present → agent reads curated context; absent → docstring is sufficient

Obsidian graph — real project annotated with CodeDNA

Obsidian graph view of a real project annotated with CodeDNA — nodes are files, edges are used_by/related links

Illustrative Scenarios

Problems CodeDNA is designed to solve.

5 scenarios that illustrate the categories of errors AI agents make without architectural context. For measured results, see the SWE-bench benchmark below.

Scenario S4

Sliding Window — Hidden Constraint

The AI reads only lines 200–250. The max discount limit is in the manifest (line 7). Without CodeDNA it ignores it and applies illegal discounts of 50%+.

Scenario S5

Cascade Change — Domino Effect

utils.py is modified. Without used_by:, the AI only updates utils.py and leaves main.py with a runtime KeyError.

Scenario S6

Ambiguous Type — Euros or Cents?

price = 1999 — euros or cents? Without semantic naming the AI gets the unit wrong. With CodeDNA: int_cents_price_from_request — zero ambiguity.

Scenario S7

💥 Broken Dependency — Silent Rename

format_revenue() → format_currency(). The rules: field records the rename. The Control calls the old name: crash.

Scenario S8

🗺️ Planning — 8 Files, Manifest Only

The AI must find the 2 right files to change by reading only the module docstrings (8–12 lines each). Using the exports: → used_by: graph it identifies exactly the 2 files.

Multi-Model SWE-bench

Tested across multiple LLMs.

Django issues from SWE-bench, tested across multiple LLMs. Same prompt, same tools, same tasks. DeepSeek Chat: +17pp F1, p=0.001, 10/0/0 · Gemini 2.5 Flash: +13pp F1, p=0.040 · Gemini 2.5 Pro: +9pp F1. All 3 models improve.

File Localization F1 — Control vs CodeDNA by Model

Navigation Demo — django__django-11808 · DeepSeek Chat · 5 runs

CodeDNA navigation demo: without CodeDNA the agent wanders, with CodeDNA it follows the used_by chain

Without CodeDNA: agent opens random files, stops early — 2/10 critical files found. | With CodeDNA: follows used_by: chain — 6/10 critical files found.

▶ VS Code Navigation Demo 3 Visual Metaphors Agent Graph Visualizer CodeDNA World

Best Model

🏆 Gemini 2.5 Flash — +13pp F1

From 60% to 72%. Wins 4 out of 5 tasks. Δ up to +21pp on delegation chains (Task 13495). p=0.040, Wilcoxon W+=14, N=5 tasks × ≥5 runs at T=0.1.

Control60%

CodeDNA72%

DeepSeek Chat

DeepSeek Chat — +17pp F1 (p=0.001 · 10/0/0)

From 50% to 60%. Wins 4/5 tasks. Notable: +35pp on cross-cutting task 11808 — opposite direction from Gemini Flash (−1pp). Task 13495 anomaly (−9pp) under investigation. Not statistically significant.

Control50%

CodeDNA60%

Key Finding

🧠 Model-Agnostic Benefits

4 out of 5 models improve with CodeDNA. The benefit is strongest on tasks requiring cross-module navigation — exactly where AI agents struggle most.

Chain tasks+9% to +21%

🔬 Methodology: SWE-bench Django tasks × 3 models (Gemini 2.5 Flash ✓, DeepSeek Chat 10 tasks ✓, Gemini 2.5 Pro ✓). 3–5 runs/task at T=0.1. Identical system prompt, same 3 tools (read_file, list_files, grep), max 30 turns. Metric: File Localization F1 (ground-truth files from patch). Statistical test: Wilcoxon signed-rank (one-tailed). 6 DeepSeek tasks independently replicated by @fabioscialanga. Script: benchmark_agent/swebench/run_agent_multi.py.

Comparison

CodeDNA vs. existing approaches

Approach	Token overhead	Context drift	Retrieval latency	Sliding-window	Infrastructure
CLAUDE.md / CursorRules	Low	Medium	Zero	No	External file
RAG / Vector DB	Low	Medium	High	No	DB + embedding
MemGPT	Medium	Low	Medium	No	Complex system
CodeDNA ✦	Low (inline)	Low	Zero	Yes ✓	None

Status

Roadmap

Done

Level 1 — Manifest Header (v0.1–v0.4)

FILE, PURPOSE, DEPENDS_ON, EXPORTS, AGENT_RULES, REQUIRED_BY, DB_TABLES, LAST_MODIFIED.

Done

Level 2 — Sliding-Window Annotations (v0.2–v0.3)

@REQUIRES-READ, @SEE, @MODIFIES-ALSO, @BREAKS-IF-RENAMED — solves the sliding-window problem.

Done

Level 3 — Semantic Naming + CONTEXT_BUDGET (v0.3)

Naming <type>_<shape>_<domain>_<origin>. Manifest-Only Planner Read.

Done

LLM-Optimised Format (v0.5)

Python-native module docstring (L1) + function-level Rules: docstrings (L2). Maximises LLM comprehension trained on Python corpora.

Done

Enterprise Benchmark — 105-file, 3 bugs, 48 distractors

−29% tool calls, 0 incorrect root-cause identifications (vs 1 Control). Replicable on disk.

Done

Multi-Model SWE-bench Benchmark — up to 10 tasks, 3 runs/task

DeepSeek Chat: ctrl=51%, DNA=68%, Δ=+17pp, p=0.001, 10/0/0 · Gemini 2.5 Flash: ctrl=60%, DNA=72%, Δ=+13pp, p=0.040 · Gemini 2.5 Pro: ctrl=60%, DNA=69%, Δ=+9pp

Done

White Paper / arXiv preprint

Formal study with reproducible methodology, DNA analogy, and comparison against SWE-bench, LoCoBench-Agent, ETH Zurich (2026).

Done

Redundancy Audit (v0.9) ✦ Current

Header reduced to 3 fields: exports, used_by, rules. rules: promoted to required — the inter-agent communication channel. Python-only focus.

Done

M1 — CLI & Auto-Annotation

AST skeleton extraction · codedna init, codedna update, codedna check · pip installable · Claude Code Challenge: 7/7 patch files in ~8 min vs 6/7 in ~10–11 min (control). Results →

Done

M3 — Multi-Tool Enforcement Hooks

Active hooks for Claude Code (4), Cursor (2), GitHub Copilot (3), Cline (2), OpenCode (plugin). Validates on every write — no manual reminder needed. Pre-commit hook for all tools.

Done

M4 — Language Extension

CLI supports 11 languages: Python (AST), TypeScript/JS (tree-sitter), Go (tree-sitter), PHP, Rust, Java, Kotlin, Ruby, C#, Swift. validate_manifests.py supports template engines (Blade, Jinja2, ERB, Vue, Svelte…). Plugin marketplace available for Claude Code.

20+ SWE-bench tasks across multiple projects · 5+ LLMs · confidence intervals · Zenodo dataset · public dashboard.

VS Code extension (used_by graph, stale annotation highlight) · GitHub Action for CI/CD validation.

Finalize paper · submit to ICSE NIER / LLM4Code workshop · contribute annotations to Flask, FastAPI and one non-Python project.

M1–M5 are part of a funding application to NLnet NGI0 Commons Fund. If you find CodeDNA useful, ⭐ the repo and share it.

The code thatexplains itself.

AI makes mistakes when it lacks context.

Where CodeDNA sits in the AI memory stack.

Works with your favorite AI tool.

Four levels. Every snippet is self-contained.

message: — Agent-to-Agent Chat in Code

wiki: — LLM-Wiki Layer for your Codebase

Problems CodeDNA is designed to solve.

Tested across multiple LLMs.

CodeDNA vs. existing approaches

Roadmap

The code that
explains itself.

`message:` — Agent-to-Agent Chat in Code

`wiki:` — LLM-Wiki Layer for your Codebase