knowledge-genome-orchestrator/README.md

# Knowledge Genome System

> A distributed, encrypted, multi-domain personal knowledge base.
> No vector database. No embedding pipeline. No external retrieval server.

Built on the [LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f)
by Andrej Karpathy — extended with a multi-domain submodule architecture,
AES-256-CTR encryption via git-crypt, Vaultwarden runtime key injection,
and a human-in-the-loop Git Flow for quality control.

---

## Table of Contents

1. [Core Philosophy](#core-philosophy)
2. [Architecture](#architecture)
3. [System Requirements](#system-requirements)
4. [Prerequisites](#prerequisites)
5. [Configuration](#configuration)
6. [Quick Start](#quick-start)
7. [Makefile Reference](#makefile-reference)
8. [Genome Lifecycle](#genome-lifecycle)
9. [Security Model](#security-model)
10. [Key Management](#key-management)
11. [Agent Sessions](#agent-sessions)
12. [Workflows](#workflows)
13. [Knowledge Quality](#knowledge-quality)
14. [Knowledge Schema](#knowledge-schema)
15. [Collaboration Model](#collaboration-model)
16. [Optional Extensions](#optional-extensions)
17. [Troubleshooting](#troubleshooting)

---

## Core Philosophy

Most RAG systems make the LLM rediscover knowledge from scratch on every query.
A document is indexed; at query time, relevant chunks are retrieved; an answer is generated.
Nothing accumulates. Ask a question requiring synthesis across five documents and the LLM
pieces it together from fragments every single time.

This system is different. Instead of retrieval at query time, the LLM
**incrementally builds and maintains a persistent wiki** that sits between you and the raw
sources. When a new source arrives, the LLM reads it, extracts key information, updates
entity and concept pages, flags contradictions with existing claims, and strengthens the
evolving synthesis. Knowledge is compiled once and kept current.

**The wiki is a compounding artifact.** Cross-references are already there.
Contradictions have been flagged. The synthesis already reflects everything ingested.

This means:
- No vector database.
- No embedding pipeline.
- No external retrieval infrastructure.

The `wiki/index.md` of each genome is the retrieval layer. At moderate scale
(~100 sources, hundreds of pages) this performs better than RAG because cross-references,
contradictions, and syntheses are already resolved — not re-derived per query.

The human's job: curate sources, direct analysis, ask good questions, review PRs.
The LLM's job: everything else — summarising, cross-referencing, filing, maintaining consistency.

---

## Architecture

### Repository structure

```text
master-knowledge-genome/              ← Root orchestrator (submodule registry)
├── core-karpathy/                    ← LLM Wiki reference pattern (read-only submodule)
├── genome-dev/                       ← Submodule: web development, Angular, TUI
├── genome-finance/                   ← Submodule: personal finance, investments
├── genome-homelab/                   ← Submodule: Keru infrastructure, network configs
└── AGENTS.md                         ← Global coordination schema (cross-genome rules)
```

Each genome is an independent git repository:

```text
genome-{name}/
├── .gitattributes                    ← Encryption rules — **/private/** wildcard
├── .gitignore
├── .git/hooks/pre-commit             ← Security hook (dynamic git check-attr)
├── AGENTS.md                         ← Per-genome agent contract and workflow rules
│
├── raw/                              ← Immutable sources — LLM reads, never writes
│   ├── articles/                     ← Web clips, saved articles
│   ├── transcripts/                  ← Audio/video transcripts
│   ├── code-packs/                   ← Code snippets and repositories
│   ├── assets/                       ← Images, PDFs, binary files
│   └── private/                      ← AES-256-CTR encrypted — owner only
│
└── wiki/                             ← LLM-owned — agent creates and maintains
    ├── index.md                      ← Primary catalog (read first every session)
    ├── log.md                        ← Append-only operations ledger
    ├── sources/                      ← One page per processed raw source
    ├── entities/                     ← People, tools, organisations, projects
    ├── concepts/                     ← Patterns, theories, architectural decisions
    ├── queries/                      ← Preserved answers and conflict notes
    └── private/                      ← AES-256-CTR encrypted — owner only
```

### Three layers

| Layer | Path | Owner | Rule |
|-------|------|-------|------|
| Raw sources | `raw/` | Human | Immutable. LLM reads only. Never modified. |
| Wiki | `wiki/` | LLM | Agent creates, updates, cross-links, maintains. |
| Schema | `AGENTS.md` | Human + LLM | Co-evolved contract defining structure and workflows. |

### Framework structure

```text
knowledge-genome-setup/               ← This repository (setup tooling)
├── globals.env                       ← Static KEY=VALUE config (Make-includable)
├── registry.sh                       ← Bash-only: GENOMES array + dynamic paths
├── Makefile                          ← Entry point for all operations
├── lib/
│   ├── output.sh                     ← Terminal helpers (colors, log levels)
│   ├── deps.sh                       ← Dependency validation
│   ├── scaffold.sh                   ← Template rendering engine
│   ├── lint.sh                       ← Per-file validation functions
│   └── git-crypt.sh                  ← git-crypt lifecycle (init, export, verify, rotate)
├── providers/
│   ├── forgejo.sh                    ← Forgejo REST API provider
│   └── github.sh                     ← GitHub REST API provider
├── scripts/
│   ├── setup.sh                      ← Main entry point
│   ├── setup-master.sh               ← Master repo initialisation
│   ├── setup-genomes.sh              ← Genome provisioning loop
│   ├── add-genome.sh                 ← Add a single new genome
│   └── lint-genomes.sh               ← Quality control across all genomes
└── templates/
    ├── agents-genome.md              ← Per-genome agent contract template
    ├── agents-master.md              ← Master coordination schema template
    ├── wiki-index.md                 ← Index template (rendered per genome)
    ├── wiki-log.md                   ← Log template (rendered per genome)
    ├── pr-description.md             ← PR review checklist template
    ├── pre-commit.sh                 ← Security hook template
    ├── gitattributes                 ← Git encryption rules template
    └── gitignore                     ← Git ignore template
```

---

## System Requirements

### Linux — full support (primary target)

All scripts are written for GNU/bash on Linux. Tested on Ubuntu 22.04+.
All tools (git-crypt, bw, qmd) have native Linux binaries.

### macOS — full support

All scripts are compatible with macOS. Requirements:
- bash 3.2+ (macOS default) — fully supported. All `bash 4+` constructs removed.
- GNU coreutils not required — BSD variants of `date`, `grep`, `sed` all handled.
- `git-crypt`: install via Homebrew — `brew install git-crypt`
- `jq`, `curl`: pre-installed or via Homebrew

If you use Homebrew bash (`brew install bash`), the scripts work identically to Linux.

### Windows — WSL2 only

**Git Bash and native Windows are not supported.**

Reasons:
- `git-crypt` has no native Windows binary.
- Process substitution `<(...)` used for runtime key injection is not available
  in Git Bash or PowerShell.
- Several bash builtins used throughout (`compgen`, `BASH_SOURCE`, arrays) are not
  available outside a POSIX-compliant shell.

**WSL2 (Windows Subsystem for Linux)** with Ubuntu gives full compatibility.
All setup and runtime operations work identically to native Linux inside WSL2.

### Hardware recommendations

The system is designed for a homelab architecture:

| Component | Recommended | Role |
|-----------|-------------|------|
| Storage node | Any Linux server with NFS | Hosts Forgejo, stores genome repos |
| AI compute node | GPU server (16GB+ VRAM) | Runs local LLM agent sessions |
| VRAM | 16GB minimum | 14B model at Q5_K_M ≈ 10GB weights; ~6GB for KV cache |
| Local LLM | 14B–32B quantised | Active wiki maintenance sessions |
| Large LLM | 70B (async) | Deep reflection, complex synthesis (scheduled, not interactive) |

> **On VRAM constraints:** with a 16GB card and a 14B model, the KV cache budget
> is ~6GB — approximately 32k tokens of effective context. Every token in `AGENTS.md`,
> the index, and the log tail is a cost. This is why all agent files are token-optimised
> and sessions are kept to one source at a time.

---

## Prerequisites

### Required

| Tool | Purpose |
|------|---------|
| `git` | Version control |
| `git-crypt` | Transparent file encryption |
| `curl` | REST API calls to Forgejo/GitHub |
| `jq` | JSON parsing |

### Optional

| Tool | Purpose |
|------|---------|
| `bw` | Bitwarden CLI — runtime key injection from Vaultwarden (no key on disk) |
| `qmd` | Local BM25 + vector search for Markdown files with MCP server interface |

> **`bw` vs `bws`:** Use `bw` (standard Bitwarden CLI). `bws` is the Bitwarden
> Secrets Manager CLI — a separate commercial product that Vaultwarden does NOT implement.

### Install on Ubuntu/Debian

```bash
sudo apt update && sudo apt install -y git git-crypt curl jq
```

### Install on macOS

```bash
brew install git git-crypt curl jq
```

### Install Bitwarden CLI

```bash
# Linux
npm install -g @bitwarden/cli

# macOS
brew install bitwarden-cli
```

### Verify all tools

```bash
make doctor
```

---

## Configuration

Configuration is split into two files with distinct purposes:

### `globals.env` — static KEY=VALUE

Safe for `make include`, `docker-compose`, shell `source`, and any standard env parser.
Contains only simple scalar values — no bash syntax, no arrays.

```bash
# Provider selection
PROVIDER=forgejo            # forgejo | github

# Forgejo (active when PROVIDER=forgejo)
FORGEJO_URL=https://git.yourserver.com
FORGEJO_USER=yourusername
FORGEJO_SSH_PORT=222        # Default for many homelab Forgejo setups; 22 for standard

# GitHub (active when PROVIDER=github — uncomment to use)
# GITHUB_USER=your-username
# GITHUB_ORG=your-org       # Optional: for org repos, overrides GITHUB_USER

# Vaultwarden
VAULTWARDEN_URL=https://vault.yourserver.com

# Master repository
MASTER_REPO=master-knowledge-genome
GIST_URL=https://gist.github.com/442a6bf555914893e9891c11519de94f.git
```

### `registry.sh` — bash runtime config

Sourced by shell scripts only. Contains the genome registry array and dynamic path
resolution. Never included by Make.

```bash
# Dynamic paths (resolved at source time)
WORK_DIR="${HOME}/knowledge-genome-setup"
KEYS_DIR="${WORK_DIR}/keys"

# Genome registry — format: "name|description"
GENOMES=(
  "genome-dev|Web development, TUI, Angular, software architecture"
  "genome-finance|Personal finance, investments, market analysis"
  "genome-homelab|Infrastructure, network configs, architecture logs"
)
```

To add a genome to the registry before running setup, append a line to `GENOMES`.
After initial setup, use `make add-genome` instead.

### Tokens

Tokens are never stored in config files. Export them in your shell before running setup:

```bash
export FORGEJO_TOKEN="your_forgejo_token"
# or
export GITHUB_TOKEN="your_github_token"
```

---

## Quick Start

```bash
# 1. Clone the setup framework
git clone <setup-repo-url> knowledge-genome-setup
cd knowledge-genome-setup

# 2. Configure your environment
cp globals.env.example globals.env   # edit with your values
# Edit registry.sh to define your genomes

# 3. Export your provider token
export FORGEJO_TOKEN="your_token_here"

# 4. Verify dependencies
make doctor

# 5. Run full setup
make setup
```

`make setup` executes in order:

1. **Dependency check** — verifies all required tools are installed
2. **Git identity check** — warns if `user.name` / `user.email` are not configured
3. **Master repo** — creates `master-knowledge-genome` on Forgejo, scaffolds with
   `AGENTS.md` and `README.md`, initialises git, adds `core-karpathy` as submodule, pushes
4. **Genome provisioning** — for each genome in `registry.sh`:
   - Creates remote repository on Forgejo
   - Adds it as a submodule in the master repo
   - Initialises git-crypt (**before any files are created**)
   - Scaffolds directory structure and renders all templates
   - Installs pre-commit security hook
   - Commits, pushes genome to remote
   - Exports symmetric key to `keys/<genome>.key`
   - Prints Vaultwarden upload instructions
   - Commits submodule pointer in master repo

After setup completes:
- Upload all files in `keys/` to Vaultwarden (see Key Management)
- Delete key files from disk: `rm keys/*.key`

---

## Makefile Reference

| Target | Description |
|--------|-------------|
| `make setup` | Full system initialisation — master repo + all genomes in `registry.sh` |
| `make add-genome NAME=x DESC="y"` | Scaffold and register a single new genome |
| `make lint` | Run quality checks across all genomes (schema, privacy, decay, page size) |
| `make status` | Show submodule status and first 10 git-crypt encryption states |
| `make lock` | Lock all encrypted repos (master + all genome submodules) |
| `make doctor` | Verify required tools: git, git-crypt, curl, jq; warn if bw missing |
| `make sync` | `git submodule update --init --recursive` + report unpushed commits per genome |
| `make help` | Print all available targets |

### Examples

```bash
# Check system health
make doctor

# Add a new genome after initial setup
make add-genome NAME=genome-research DESC="Academic papers and deep research"

# Run full lint pass (bash deterministic checks)
make lint

# Sync all nodes after pulling on another machine
make sync

# Emergency lock — secures all repos before leaving a session
make lock
```

---

## Genome Lifecycle

### Initial setup

All genomes defined in `registry.sh` are provisioned by `make setup`.

### Adding a genome after initial setup

```bash
make add-genome NAME=genome-newname DESC="Domain description"
```

This: creates the remote repo, adds it as a submodule, initialises git-crypt,
scaffolds the directory structure, installs the pre-commit hook, commits and pushes,
exports the key, and commits the submodule pointer in master.

After adding: upload the new key to Vaultwarden and delete the key file.

### Removing a genome

Manual process:
```bash
# In master repo
git submodule deinit genome-name
git rm genome-name
git commit -m "chore: remove genome-name submodule"
git push
# Archive or delete the remote repository on Forgejo
```

### Template rendering

When a genome is scaffolded, `render_template` replaces these placeholders in all
template files:

| Placeholder | Source | Example |
|-------------|--------|---------|
| `{{GENOME_NAME}}` | registry.sh | `genome-dev` |
| `{{GENOME_NAME_UPPER}}` | derived | `GENOME-DEV` |
| `{{GENOME_DESC}}` | registry.sh | `Web development...` |
| `{{FORGEJO_URL}}` | globals.env | `https://git.yourserver.com` |
| `{{FORGEJO_USER}}` | globals.env | `yourusername` |
| `{{VAULTWARDEN_URL}}` | globals.env | `https://vault.yourserver.com` |
| `{{MASTER_REPO}}` | globals.env | `master-knowledge-genome` |
| `{{DATE}}` | runtime | `2026-05-11` |

---

## Security Model

### Encryption architecture

Each genome uses a unique symmetric AES-256-CTR key managed by git-crypt.
Two directories in every genome are always encrypted:

| Directory | Contents | On remote |
|-----------|----------|-----------|
| `raw/private/` | Sensitive source material | Opaque binary blob |
| `wiki/private/` | Private synthesis and notes | Opaque binary blob |

All other directories (`raw/articles/`, `wiki/sources/`, etc.) are plaintext.
Collaborators without the key can contribute to public directories normally —
git handles encrypted files transparently.

### `.gitattributes` — dynamic encryption rules

Encryption rules use a glob wildcard that catches any `private/` directory at
any depth in the repository — including directories created at runtime by the LLM:

```gitattributes
# Text rules first
*.md     text eol=lf
*.sh     text eol=lf

# Encryption rules LAST (later rules override per-attribute)
# **/private/** ensures -text overrides *.md text=lf, preventing EOL corruption
**/private/**   filter=git-crypt diff=git-crypt -text
```

> Rule ordering matters: in `.gitattributes`, the last matching rule wins per attribute.
> Encryption rules must come after text rules so `-text` overrides `text eol=lf`
> for encrypted markdown files.

### Pre-commit hook — dynamic validation

The security hook installed at `.git/hooks/pre-commit` validates every staged file
dynamically — it reads encryption requirements from `.gitattributes` at runtime
rather than checking hardcoded paths:

```bash
# For each staged file, check if git-crypt encryption is required
filter=$(git check-attr filter -- "$file" | sed 's/.*: //')
if [[ "$filter" == "git-crypt" ]]; then
    # Verify the file is actually encrypted
    if git-crypt status "$file" | grep -q "not encrypted"; then
        # BLOCK THE COMMIT
    fi
fi
```

This means: any file matching `**/private/**` in `.gitattributes` is protected,
including future `private/` directories created anywhere in the repo.
The hook never needs updating when the encryption rules change.

### PRIVATE_CONTEXT toggle

The `PRIVATE_CONTEXT` toggle in `AGENTS.md` controls whether the LLM agent
accesses encrypted directories. It must be declared explicitly by the operator
at the start of every session:

```text
PRIVATE_CONTEXT: disabled   ← Default. private/ directories are treated as non-existent.
PRIVATE_CONTEXT: enabled    ← Agent may read/write private/. Requires git-crypt unlock.
```

Rules:
- Never inferred. Never carried over from a previous session.
- `enabled` requires the operator to confirm that `git-crypt unlock` has run on the host.
- Per-genome, per-session: enabling for `genome-finance` does NOT enable for `genome-dev`.
- Cloud LLM models: `PRIVATE_CONTEXT` must always be `disabled`. Private data never leaves the local network.
- All outputs derived from private data are prefixed `[PRIVATE DATA INCLUDED]`.
- Private synthesis goes exclusively to `wiki/private/` — never to public wiki paths.

### Runtime key injection — zero disk policy

Encryption keys are never stored as persistent files on the AI server.
They are injected at session start via the Bitwarden CLI (`bw`) against
your self-hosted Vaultwarden instance, using process substitution:

```bash
# Step 1: authenticate
bw config server https://vault.yourserver.com
export BW_SESSION=$(bw unlock --passwordenv BW_MASTER_PASSWORD --raw)

# Step 2: unlock genome (key lives only in a kernel file descriptor — never touches disk)
git-crypt unlock <(
  bw get notes "genome-dev key" --session "$BW_SESSION" | base64 -d
)
```

The key flows: Vaultwarden → `bw get notes` → `base64 -d` → kernel pipe → `git-crypt`.
At no point is the key written to any file on disk.

Lock a genome when the session ends:
```bash
git-crypt lock
```

---

## Key Management

> This section is for the operator. These commands are never issued by the LLM agent.

### Vaultwarden Secure Notes

Each genome key is stored as a base64-encoded Secure Note in Vaultwarden:

| Genome | Vaultwarden Note Name |
|--------|----------------------|
| `genome-dev` | `genome-dev key` |
| `genome-finance` | `genome-finance key` |
| `genome-homelab` | `genome-homelab key` |

After `make setup` or `make add-genome`, key files are exported to `keys/`.
Upload procedure:

```bash
# Encode the key
base64 < keys/genome-dev.key

# Paste the output into a Vaultwarden Secure Note named "genome-dev key"
# Then delete the key file
rm keys/genome-dev.key
```

### Cloning on a new machine

```bash
# Full clone with all submodules
git clone --recurse-submodules \
  https://git.yourserver.com/yourusername/master-knowledge-genome.git

# Unlock a specific genome (with key file — development only)
cd master-knowledge-genome/genome-dev
git-crypt unlock /path/to/genome-dev.key

# Unlock via Vaultwarden (recommended — no key on disk)
export BW_SESSION=$(bw unlock --passwordenv BW_MASTER_PASSWORD --raw)
git-crypt unlock <(bw get notes "genome-dev key" --session "$BW_SESSION" | base64 -d)

# Sparse clone — collaborator who only needs one genome
git clone https://git.yourserver.com/yourusername/genome-dev.git
```

### Key rotation (emergency)

If a key is lost or compromised:

```bash
# From the knowledge-genome-setup/ directory
source lib/git-crypt.sh
cd ~/knowledge-genome-setup/genome-dev
gcrypt_rotate_key "genome-dev"
```

`gcrypt_rotate_key` performs:
1. Unlocks repo with existing key
2. Removes old key material
3. Generates new symmetric key via `git-crypt init`
4. Re-stages and commits private files (encrypted with new key)
5. Exports new key to `keys/`
6. Prints Vaultwarden update instructions

> **Limitation:** git history still contains blobs encrypted with the old key.
> Anyone with the old key and git history access can decrypt them. To purge old
> encrypted blobs from history:
> ```bash
> git filter-repo --invert-paths --path raw/private --path wiki/private
> git push --force origin main
> ```
> This rewrites all commit hashes — coordinate with any collaborators first.

After rotation:
- Upload new key to Vaultwarden (replace existing note)
- Delete both `keys/genome-dev.key` and `keys/genome-dev-rotated-*.key` from disk
- Revoke access from previous key holders

---

## Agent Sessions

### Prerequisites for every session

Before starting an LLM agent session on a genome:
1. The host (AI server) runs `git-crypt unlock` for the required genomes
2. The orchestrator prepares context: `tail -n 20 wiki/log.md`
3. Declare `PRIVATE_CONTEXT` state explicitly in the opening prompt

### Session start protocol

The agent executes in this order at the start of every session:

1. Read `wiki/index.md` — primary catalog of all pages and maturity
2. Read last 20 log entries (injected by orchestrator — does NOT open `wiki/log.md` directly)
3. For tasks involving related pages: `qmd search "<query>"` before opening any files
4. Operate on individual files — never scan entire directories

### One source per session

With a 14B model and ~6GB KV cache budget, long sessions degrade.
As the session extends, the context fills with pages already created,
attention dilutes, and later entities receive worse cross-references than earlier ones.

**Hard rule: one source per session.**
If multiple sources are queued in `raw/`, process only the first.
Commit, close the session. The orchestrator (n8n or script) starts a new session
for the next source with a clean KV cache.

For automated pipelines: if 5 files arrive in `raw/`, trigger 5 agent sessions
sequentially — not one session with 5 files.

### n8n automation

For Forgejo webhook → automated ingest:
1. Forgejo sends webhook on push to `raw/`
2. n8n receives webhook, identifies new files
3. n8n starts one agent session per new file (sequential, not parallel)
4. Each session: inject `tail -n 20 wiki/log.md` + `PRIVATE_CONTEXT` state + source path
5. Agent ingest workflow runs, opens PR
6. Human reviews and merges PR

---

## Workflows

### Ingest

Triggered by a new file in `raw/` (manual or via webhook).

1. Read source once
2. Create `wiki/sources/<slug>.md` — summary and key points
3. Per entity (person, tool, organisation): create or update `wiki/entities/<name>.md`
4. Per concept (pattern, theory, decision): create or update `wiki/concepts/<name>.md`
5. Check each touched page for contradictions → apply Conflict Resolution if found
6. Append entry to `wiki/index.md` (bottom of relevant section — do not reorder)
7. Append log entry: `INGEST | <slug>`
8. Run scoped lint on pages created or modified in this session; report in PR
9. Commit on `feat/ai-ingest-<slug>`; open PR using `templates/pr-description.md`

For private sources (`PRIVATE_CONTEXT: enabled` required):
- All output goes to `wiki/private/<slug>.md` only
- PR title: `[PRIVATE] ingest: <slug>`

### Query

Triggered by an operator question.

1. `qmd search "<query>"` → identify candidate pages
2. Read candidate pages directly (qmd already returns file paths — no intermediate index lookup)
3. Synthesise answer with `[[wikilink]]` citations
4. If answer is non-trivial: save as `wiki/queries/<slug>.md` and append to index
5. Append log entry: `QUERY | <subject>`

For general orientation without a specific query: read `wiki/index.md` directly.

### Lint

The lint workflow is split between deterministic bash checks and semantic LLM judgment.

**Step 1 — operator runs bash linter:**
```bash
make lint
```

The bash linter checks automatically:
- YAML frontmatter validity (all mandatory fields present)
- Domain consistency (domain field matches genome name)
- Type validity (value from allowed list)
- Privacy consistency (`private/` directories have `private: true`)
- Page size (warn at 400 lines, error at 800 lines)
- Knowledge decay (stable > 180 days, draft > 90 days)
- Broken internal wikilinks (warnings only — cross-type links produce expected false positives)

**Step 2 — operator provides bash output to LLM agent:**

The agent applies semantic judgment to findings the bash linter cannot make:
- **Orphan pages** (from bash list): for each orphan, identify 1-3 existing pages
  that should link to it; propose specific additions
- **Implicit concepts** (from bash term frequency list): determine if a candidate
  term warrants a dedicated page; draft stub if yes
- **Duplicate concepts**: `qmd search "<concept>"` for suspected duplicates;
  propose merge if confirmed
- **Maturity promotion**: pages with 2+ sources still marked `draft` → propose `stable`

The agent reports all findings as a structured list. It does not modify files
without operator approval. Appends `LINT | <summary>` log entry.

---

## Knowledge Quality

### PR review workflow

Every agent session that modifies wiki pages opens a PR.
The PR description uses `templates/pr-description.md`:

```markdown
## Summary
One sentence: goal of this session and source processed.

## Pages Created
| Path | Type | Maturity |

## Pages Modified
| Path | Change |

## Contradictions Found
[ ] None  /  [ ] n conflict file(s) created

## Private Data Accessed
[ ] No (PRIVATE_CONTEXT: disabled)  /  [ ] Yes

## Scoped Lint (post-ingest)
[ ] Frontmatter valid  [ ] No broken links  [ ] No issues found
```

This makes human review fast and structured: read the table, scan the diff,
approve or request changes. No exploration required to understand what the agent did.

### Conflict resolution

When new evidence contradicts an existing wiki claim:

1. Keep the existing page unchanged
2. Create `wiki/queries/conflict-<concept>-<YYYY-MM-DD>.md` with:
   - The existing claim and its source
   - The contradicting evidence and its source
   - Agent confidence assessment for each
   - Recommendation: `accept_b` | `keep_a` | `requires_human_review`
3. Add entry to `wiki/index.md` → Conflicts Pending Review section
4. Log entry: `CONFLICT | <concept>`
5. Open PR: `[CONFLICT] <concept> — human review required`

The operator resolves the conflict, updates relevant pages, closes the PR.

### Knowledge decay

Pages have a `last_updated` field in frontmatter. During lint passes:

| Maturity | Threshold | Action |
|----------|-----------|--------|
| `stable` | 180 days | Flag as stale — add `⚠️ STALE` callout |
| `draft` | 90 days | Flag as stale — add `⚠️ STALE` callout |

The agent proposes re-validation but does not change `maturity` without new source evidence.

### Cross-genome lint

A manual, monthly operation. Not automated in CI/CD — the token cost and coordination
complexity are not justified at this scale.

1. Operator initiates a master-repo agent session
2. Agent uses `qmd search "<concept>"` across the multi-genome index to find:
   - Concepts defined in 2+ genomes with potentially conflicting definitions
   - Entities referenced cross-genome without canonical cross-genome wikilinks
   - Concepts in genome-X that should link to genome-Y
3. Agent reports findings — does not modify files
4. For each finding: create conflict note in the genome where resolution belongs

---

## Knowledge Schema

### Frontmatter

Every wiki page must start with valid YAML frontmatter:

```yaml
---
title: "Strict String Title"
type: source | entity | concept | query | conflict | private
domain: genome-name
tags: [lowercase, hyphen-separated]
maturity: draft | stable | deprecated
last_updated: YYYY-MM-DD
private: true | false
---
```

| Field | Rules |
|-------|-------|
| `type` | Must be one of: `source entity concept query conflict private index log` |
| `maturity: draft` | Single source or unvalidated |
| `maturity: stable` | Confirmed by 2+ independent sources |
| `maturity: deprecated` | Superseded — add `> **DEPRECATED:** <reason>` callout at top |
| `private: true` | Required on all pages in `wiki/private/` and `raw/private/` |

Do not use semantic versioning for content. Git history tracks every change.
`maturity` captures epistemic state; `last_updated` tracks recency.

### Page types and directories

| Type | Directory | Description |
|------|-----------|-------------|
| `source` | `wiki/sources/` | One page per processed raw source |
| `entity` | `wiki/entities/` | People, tools, organisations, projects |
| `concept` | `wiki/concepts/` | Patterns, theories, architectural decisions |
| `query` | `wiki/queries/` | Preserved answers and analyses |
| `conflict` | `wiki/queries/conflict-*.md` | Unresolved contradictions |
| `private` | `wiki/private/` | Private synthesis (PRIVATE_CONTEXT: enabled) |
| `index` | `wiki/index.md` | Primary navigation catalog (singleton) |
| `log` | `wiki/log.md` | Operations ledger (singleton) |

### Page size limits

| Limit | Lines | Action |
|-------|-------|--------|
| Soft cap | 400 | Bash linter warns |
| Hard cap | 800 | Bash linter errors — split the page |

These limits ensure pages fit within the LLM context window without attention degradation
and keep the wiki atomically navigable.

### Linking conventions

| Type | Format |
|------|--------|
| Internal (same genome) | `[[folder/slug]]` — Obsidian wikilinks only |
| Cross-genome | `[[../genome-target/wiki/folder/slug]]` |
| External | `[text](https://url)` — standard Markdown |

Never use `[text](relative/path)` for internal references. Obsidian wikilinks are
bidirectional and appear in the graph view.

### Log format

Every operation appends one entry to `wiki/log.md`:

```markdown
## [YYYY-MM-DD] TYPE | Subject

- run_id: `<uuid>`
- model: `<model-name>`
- context_read: `[[path/A]]`, `[[path/B]]`
- output_written: `[[path/C]]`
- reasoning: One sentence — what changed and why.
```

Valid TYPEs: `INGEST` `LINT` `QUERY` `CONFLICT` `CONFIG` `SECURITY`

Parse examples:
```bash
grep "^## \[" wiki/log.md | tail -5          # Last 5 entries
grep "^## \[" wiki/log.md | grep "CONFLICT"  # All conflicts
grep "^## \[2026-05" wiki/log.md             # Entries from a specific month
```

The orchestrator always injects only `tail -n 20 wiki/log.md` into agent context.
The LLM never loads the full log.

---

## Collaboration Model

| Role | Key access | Permitted operations |
|------|-----------|----------------------|
| Owner | Full — key holder | Read/write everywhere |
| Collaborator | None | Push to `raw/articles/`, `raw/transcripts/`, `raw/code-packs/`, `raw/assets/` |
| Local AI agent | Conditional | `private/` only when `PRIVATE_CONTEXT: enabled` |
| Cloud AI model | Never | `PRIVATE_CONTEXT` must be `disabled`; private data stays on local network |

Grant collaborator access: add as Forgejo contributor with Write role.
Never share the git-crypt key — collaborators operate exclusively in public directories.

---

## Optional Extensions

### qmd — local Markdown search

[qmd](https://github.com/tobi/qmd) is a local, on-device BM25 + vector search
engine for Markdown files. It has both a CLI (for shell scripts and agent tool calls)
and an MCP server (for native LLM tool use).

Recommended at scale: once a genome exceeds ~150 pages, `qmd search` is significantly
faster and more accurate than navigating `wiki/index.md` manually.

```bash
# Index a genome
qmd index genome-dev/wiki/

# Search
qmd search "graph-based state management"

# Start MCP server (for Claude Code / Codex integration)
qmd serve --port 3333
```

### Obsidian integration

Obsidian is the recommended wiki browser. Open any genome directory as an Obsidian vault.

Recommended setup:
- **Graph view** — visualise page connections; spot orphans and hubs instantly
- **Obsidian Web Clipper** — browser extension to clip articles directly to `raw/articles/`
  as Markdown
- **Download attachments** — Settings → Hotkeys → "Download attachments for current file".
  Binds to a hotkey (e.g. Ctrl+Shift+D). After clipping, downloads all images to `raw/assets/`
- **Dataview plugin** — query YAML frontmatter across the wiki;
  `TABLE maturity, last_updated WHERE domain = "genome-dev"` generates dynamic tables
- **Marp plugin** — render Markdown as slide decks directly from wiki content

Note: `.obsidian/` is in `.gitignore`. Workspace and plugin settings are local — not synced.

### n8n automation

n8n (running on the storage node) can automate the ingest pipeline:

1. Forgejo webhook fires on push to a genome's `raw/` directory
2. n8n flow identifies new files
3. For each new file: starts one agent session (sequential — never parallel)
4. Each session receives: `tail -n 20 wiki/log.md` + `PRIVATE_CONTEXT` state + source path
5. Agent runs ingest workflow and opens PR
6. Human reviews the PR

Key constraint: one source per session, sessions sequential.
Never batch multiple sources into one agent session.

### Intel NPU offloading

If the AI compute node has an Intel NPU (e.g. Core Ultra series):

- Background tasks (embedding updates, index refresh) → Intel NPU via OpenVINO
- Active reasoning sessions (ingest, query, synthesis) → GPU

This keeps the GPU's KV cache free for interactive work and reduces power consumption
for background operations.

---

## Troubleshooting

### `git-crypt: command not found`

```bash
# Ubuntu/Debian
sudo apt install git-crypt

# macOS
brew install git-crypt
```

### `make setup` fails with "MISSING: jq"

```bash
make doctor   # identifies all missing tools
sudo apt install git git-crypt curl jq
```

### Pre-commit hook blocks a commit with "PLAINTEXT LEAK DETECTED"

The staged file is in a path matching `**/private/**` but is not encrypted.

Fix options:
1. Verify `.gitattributes` contains `**/private/** filter=git-crypt diff=git-crypt -text`
2. Run `git-crypt init` if git-crypt is not initialised in this repo
3. Run `git-crypt status` to check the encryption state of all files

Never use `git commit --no-verify` to bypass this check.

### `git-crypt status` shows files as "not encrypted" after init

The `.gitattributes` rule must be committed before files in `private/` are staged.
If files were staged before `.gitattributes` was committed:

```bash
git rm -r --cached raw/private/ wiki/private/
git add raw/private/ wiki/private/
git commit -m "fix: re-stage private files for encryption"
```

### Agent returns stale or missing cross-references

Likely causes:
1. Session was too long — KV cache degraded. Use one source per session.
2. `wiki/index.md` was not read at session start — agent lacked the page catalog.
3. qmd index is stale — re-index: `qmd index <genome>/wiki/`

### Submodules show as "modified" after `make sync`

This is normal if genome repos have new commits. Update master's pointers:

```bash
cd master-knowledge-genome
git add .
git commit -m "chore: update submodule pointers"
git push
```

### bw unlock fails

Verify you are using `bw` (standard Bitwarden CLI), not `bws` (Secrets Manager CLI).
`bws` does not work with self-hosted Vaultwarden.

```bash
bw --version     # should print e.g. "2024.x.x"
bw config server https://vault.yourserver.com
bw login
```