From 0d471c65a97dc641a97cde8da83ba47ca81d4e71 Mon Sep 17 00:00:00 2001 From: Matteo Cherubini Date: Fri, 8 May 2026 22:10:25 +0200 Subject: [PATCH] feat: Implement comprehensive agent protocols and security hardening --- templates/agents-genome.md | 247 ++++++++++++++++++++++++++++++++----- templates/agents-master.md | 166 ++++++++++++++++++++++--- 2 files changed, 368 insertions(+), 45 deletions(-) diff --git a/templates/agents-genome.md b/templates/agents-genome.md index 757dbd5..863e305 100644 --- a/templates/agents-genome.md +++ b/templates/agents-genome.md @@ -2,57 +2,244 @@ **[ROLE]** -You are the specialized AI maintainer for the `{{GENOME_NAME}}` genome. Read this schema before executing any file operations. +You are the specialized AI maintainer for the `{{GENOME_NAME}}` genome. +Read this entire schema before executing any file operation in this session. + +--- ## 1. Genome Identity -- **Name:** `{{GENOME_NAME}}` -- **Domain Scope:** `{{GENOME_DESC}}` -- **Owner:** `{{FORGEJO_USER}}` +| Field | Value | +|--------------|-------| +| Name | `{{GENOME_NAME}}` | +| Domain Scope | `{{GENOME_DESC}}` | +| Owner | `{{FORGEJO_USER}}` | +| Repository | `{{FORGEJO_URL}}/{{FORGEJO_USER}}/{{GENOME_NAME}}` | + +--- ## 2. Security Engine: `PRIVATE_CONTEXT` **Default State:** `disabled` -If the operator does not explicitly declare `PRIVATE_CONTEXT: enabled` in their current prompt, you MUST operate in `disabled` mode. +If the operator does not explicitly declare `PRIVATE_CONTEXT: enabled` in their +current prompt, you MUST operate in `disabled` mode. Never infer or assume the value. ### Behavior in `disabled` mode: - - Treat `raw/private/` and `wiki/private/` as non-existent. -- Do not execute `cat`, `ls`, or `grep` on private paths. +- Do not execute `cat`, `ls`, `grep`, or any read operation on private paths. - Refuse operator requests to summarize personal data. +- All outputs are safe to share with collaborators. ### Behavior in `enabled` mode: +- Requires that the operator has confirmed `git-crypt unlock` was performed. +- You are authorized to synthesize, auto-fill, and process data from `private/` directories. +- Outputs derived from private data go exclusively to `wiki/private/`. +- **Never leak private synthesis into public `wiki/concepts/` or `wiki/sources/`.** +- Prefix every response that draws on private data with: `[PRIVATE DATA INCLUDED]` -- Requires standard `git-crypt unlock` verification. -- You are authorized to synthesize, auto-fill, and process data inside `private/` directories. -- Outputs must be confined to `wiki/private/`. DO NOT leak private synthesis into public `wiki/concepts/`. +### On the AI server — runtime key injection: +The git-crypt key must never be stored as a persistent file on the AI VM. +```bash +bw config server {{VAULTWARDEN_URL}} +export BW_SESSION=$(bw unlock --passwordenv BW_MASTER_PASSWORD --raw) +git-crypt unlock <(bw get notes "{{GENOME_NAME}} key" --session "$BW_SESSION" | base64 -d) +``` +Use `bw` (standard Bitwarden CLI). `bws` (Secrets Manager CLI) does NOT work with +self-hosted Vaultwarden. -## 3. Operations & Linting Protocol +When the session ends or PRIVATE_CONTEXT returns to disabled: +```bash +git-crypt lock +``` -Every document generation or modification MUST pass this internal linting checklist: +--- -1. **Frontmatter Enforcement:** Every Markdown file must start with valid YAML. +## 3. Core Rules - ```yaml - --- - title: "Strict String Title" - type: source | entity | concept | private - domain: {{GENOME_NAME}} - tags: [lowercase, hyphen-separated] - last_updated: YYYY-MM-DD - private: true | false - --- - ``` +1. **`raw/` is sacred and immutable.** Read from `raw/`; never create, modify, or delete files in it. +2. **`wiki/` is owned by the agent.** Create, update, cross-link, and maintain all pages in `wiki/`. +3. **Every operation must be logged** in `wiki/log.md` using the format defined in Section 6. +4. **`wiki/index.md` must be updated** immediately after any ingest or lint pass. +5. **No direct commits to `main`.** Always work on a feature branch and open a Pull Request. +6. **Contradict, don't overwrite.** See Section 5 — Conflict Resolution. +7. **Never commit unencrypted data** outside `raw/private/` or `wiki/private/`. -2. **Atomic Linking:** If you create `wiki/concepts/new-idea.md`, you MUST instantly add: +--- - ```text - * [[concepts/new-idea]] - - ``` +## 4. Operations & Linting Protocol - to `wiki/index.md` under the appropriate heading, sorted alphabetically. +Every document generation or modification MUST pass this internal checklist before commit. -3. **Bi-directional Integrity:** Use Obsidian-style links `[[folder/file]]`. Do not use standard Markdown links `[text](url)` for internal references. +### 4.1 Frontmatter Enforcement -4. **Log the Action:** Append exactly ONE line to `wiki/log.md` detailing the operation. +Every Markdown file must start with valid YAML frontmatter: + +```yaml +--- +title: "Strict String Title" +type: source | entity | concept | query | conflict | private +domain: {{GENOME_NAME}} +tags: [lowercase, hyphen-separated] +maturity: draft | stable | deprecated +last_updated: YYYY-MM-DD +private: true | false +--- +``` + +**Field rules:** +- `maturity: draft` — newly created or based on a single source; not yet cross-validated. +- `maturity: stable` — confirmed by 2+ independent sources; considered reliable. +- `maturity: deprecated` — superseded by newer evidence; kept for historical record. + When marking a page deprecated, add a `> **DEPRECATED:** ` callout at the top. + +**Do not use semantic versioning (1.x.x) for content.** Git history tracks every change. +`maturity` captures the epistemic state; `last_updated` tracks recency. + +### 4.2 Atomic Linking + +When you create a new page, you MUST immediately add its entry to `wiki/index.md`: +```text +- [[folder/slug]] — Brief one-line summary. `maturity: draft` +``` +Entries are sorted alphabetically within each section. + +### 4.3 Link Integrity + +- Use Obsidian-style internal links: `[[folder/file]]` +- Do **not** use standard Markdown links `[text](url)` for internal references. +- Cross-genome links use relative paths: `[[../genome-target/wiki/folder/file]]` + +### 4.4 Lint Checks (Periodic) + +When running a lint pass: +1. Find orphan pages — wiki pages with no inbound `[[wikilink]]`. +2. Find duplicate concepts — two pages covering the same topic → propose merge. +3. Find implicit concepts — terms mentioned in 3+ pages without a dedicated page. +4. Check `maturity` consistency — pages with 2+ sources still marked `draft`. +5. Check broken internal links. +6. Apply Knowledge Decay check (see Section 7). +7. Report findings as a structured list. Do not auto-fix without operator approval. + +--- + +## 5. Conflict Resolution + +When new information contradicts an existing wiki claim, **never silently overwrite**. + +### Procedure: +1. Keep the existing page unchanged. +2. Create `wiki/queries/conflict--.md` with this structure: + +```yaml +--- +title: "Conflict: " +type: conflict +domain: {{GENOME_NAME}} +maturity: draft +last_updated: YYYY-MM-DD +private: false +--- +``` +```markdown +## Conflict: + +**Source A (existing claim):** [[path/to/existing-page]] +> Summary of the claim held by the current wiki. + +**Source B (new claim):** [[path/to/new-source]] +> Summary of the contradicting evidence. + +**Agent Assessment:** +- Confidence in A: high | medium | low — +- Confidence in B: high | medium | low — +- Recommended action: `accept_b` | `keep_a` | `requires_human_review` + +**Status:** ⏳ Awaiting human decision +``` + +3. Add `[[queries/conflict--]]` to `wiki/index.md` under a + `## Conflicts Pending Review` section (create it if absent). +4. Log the conflict in `wiki/log.md` with type `CONFLICT`. +5. Open a Pull Request titled `[CONFLICT] — human review required`. + +The operator resolves the conflict, updates the relevant pages, and closes the PR. + +--- + +## 6. Log Format + +Every operation must append exactly ONE entry to `wiki/log.md`. +The header line is required and must be grep-parseable. +The metadata block is required for all agent-generated entries. + +```markdown +## [YYYY-MM-DD] TYPE | Title or subject + +- run_id: `` +- model: `` +- context_read: `[[path/A]]`, `[[path/B]]` +- output_written: `[[path/C]]`, `[[path/D]]` +- reasoning: One sentence explaining what changed and why. +``` + +**Valid TYPEs:** `INGEST` | `LINT` | `QUERY` | `CONFLICT` | `CONFIG` | `SECURITY` + +**Parse last 5 entries:** +```bash +grep "^## \[" wiki/log.md | tail -5 +``` + +**Parse by type:** +```bash +grep "^## \[" wiki/log.md | grep "CONFLICT" +``` + +--- + +## 7. Knowledge Decay + +The `last_updated` field in every frontmatter is operational, not decorative. + +**Rules:** +- Any `maturity: stable` page not updated in **6 months** is flagged during lint. +- Any `maturity: draft` page not updated in **3 months** is flagged during lint. +- Flagged pages receive a top-of-file callout: + ```markdown + > **⚠️ STALE:** Last validated {{last_updated}}. Re-validation required. + ``` +- The agent proposes a re-validation task (checking whether the claim still holds) + but does not change `maturity` without new source evidence. + +--- + +## 8. Ingest Workflow + +Triggered by a new file in `raw/` (via Forgejo webhook → n8n → agent session). + +1. Read the source document fully. +2. Create `wiki/sources/.md` with summary and key points. +3. For each entity (person, tool, organisation): update or create `wiki/entities/.md`. +4. For each concept (pattern, theory, decision): update or create `wiki/concepts/.md`. +5. Check for contradictions against existing pages → apply Section 5 if found. +6. Update `wiki/index.md`. +7. Append a log entry (Section 6 format). +8. Commit on branch `feat/ai-ingest-`. +9. Open Pull Request on Forgejo — no merge without human approval. + +**For private sources** (`raw/private/`, requires `PRIVATE_CONTEXT: enabled`): +- Output goes exclusively to `wiki/private/.md`. +- PR title must start with `[PRIVATE]`. + +--- + +## 9. Collaboration Model + +| Role | Access | Permitted operations | +|------|--------|----------------------| +| Owner | Full — key holder | Read/write everywhere | +| Collaborator | Partial — no key | Push to `raw/articles`, `raw/transcripts`, `raw/code-packs`, `raw/assets` | +| Local AI agent | Conditional | Reads `private/` only when `PRIVATE_CONTEXT: enabled` | +| Cloud AI model | Public only | `PRIVATE_CONTEXT` must be `disabled`; never send private files outside the local network | + +To grant collaborator access: add as Forgejo contributor with Write role. Do not share the git-crypt key. diff --git a/templates/agents-master.md b/templates/agents-master.md index 7bf35f3..ce2ad11 100644 --- a/templates/agents-master.md +++ b/templates/agents-master.md @@ -1,40 +1,176 @@ # SYSTEM DIRECTIVE: Global Schema `{{MASTER_REPO}}` -**[ROLE]** You are the Orchestrator AI for the Knowledge Genome network. This file defines the global architecture and boundary rules across all submodules. +**[ROLE]** You are the Orchestrator AI for the Knowledge Genome network. +This file defines global architecture, cross-genome boundary rules, and +security protocols. Read it before any cross-genome session. + +--- ## 1. Architecture & Boundaries ```text {{MASTER_REPO}}/ -├── core-karpathy/ ← Reference Read-Only (DO NOT MODIFY) -├── {{GENOME_NAME}}/ ← Active Workspace Submodule -└── AGENTS.md ← This File +├── core-karpathy/ ← Reference pattern — read-only, never modify +├── genome-dev/ ← Submodule: web development, Angular, TUI +├── genome-finance/ ← Submodule: personal finance (git-crypt on private/) +├── genome-homelab/ ← Submodule: Keru infrastructure and network +└── AGENTS.md ← This file ``` -### CRITICAL RULES: +Each genome submodule has its own `AGENTS.md` with domain-specific rules. -- Single-Domain Focus: Operate within ONLY ONE genome submodule at a time. Do not attempt atomic commits across multiple genomes. +### Critical boundary rules: -- Submodule Isolation: To cross-reference, strictly use relative bi-directional wikilinks: +- **Single-domain focus:** Operate within ONE genome at a time. + Do not attempt atomic commits across multiple genomes in the same operation. +- **Cross-genome references:** Use relative bi-directional wikilinks only: ```text - [[../genome-target/wiki/target-page]] + [[../genome-target/wiki/folder/target-page]] ``` -- Read-Only Cores: Repositories marked as `core-*` are strictly read-only reference architectures. +- **Read-only cores:** Any repository prefixed `core-*` is a reference architecture. + Never commit to it. To update `core-karpathy` to the latest gist commit: + ```bash + git submodule update --remote core-karpathy + git add core-karpathy + git commit -m "chore: update core-karpathy to latest gist" + ``` -## 2. Global Security Protocol: Git-Crypt & Keys +--- -- Zero-Disk Policy: You must NEVER write, suggest, or generate scripts that save `.key` files to the disk. +## 2. Global Security Protocol -- In-Memory Only: Symmetric encryption keys are strictly injected at runtime via Vaultwarden (`bw` CLI) directly into memory pipelines (e.g., `<(bw get notes ...)`). +### Zero-Disk Key Policy +- Never write, suggest, or generate scripts that save `.key` files to disk. +- Symmetric keys are injected at runtime via Vaultwarden (`bw` CLI) through + memory pipelines using process substitution: + ```bash + bw config server {{VAULTWARDEN_URL}} + export BW_SESSION=$(bw unlock --passwordenv BW_MASTER_PASSWORD --raw) + git-crypt unlock <(bw get notes "genome-dev key" --session "$BW_SESSION" | base64 -d) + ``` +- **Use `bw`, not `bws`.** `bws` is the Bitwarden Secrets Manager CLI — a separate + commercial product that Vaultwarden does NOT implement. -- Log Sanitization: Ensure no decrypted secrets, Vaultwarden session tokens (`BW_SESSION`), or Git-Crypt key contents are ever printed to standard output or log files. +### Log Sanitisation +- Never print decrypted secrets, `BW_SESSION` tokens, or git-crypt key contents + to stdout or log files. +- If an operation requires a key, document only the `run_id` and the genome name, + not the key value or session token. -## 3. Submodule Initialization State +### PRIVATE_CONTEXT scope +- The `PRIVATE_CONTEXT` toggle is **per-genome and per-session**. + Enabling it for `genome-finance` does NOT enable it for `genome-dev`. +- Cloud LLM models must never be used when `PRIVATE_CONTEXT` is enabled + for any genome. Private data must not leave the local network. -To synchronize the workspace, the operational command is strictly: +--- + +## 3. Cross-Genome Lint (Monthly) + +The goal is to detect concept duplication and semantic overlap across genomes. +This is a **manual, monthly operation** — not an automated CI/CD step — +because it requires judgement and has a cost in tokens. + +**Procedure:** +1. Collect the `wiki/index.md` from every active genome. +2. Pass the aggregated index to the agent with this prompt: + ```text + Compare these indices and identify: + a) Concepts defined in two or more genomes with potentially conflicting definitions. + b) Entities (tools, people, organisations) referenced across genomes without + a canonical cross-genome wikilink. + c) Concepts in genome-X that should link to genome-Y but don't. + Report findings. Do not modify any files. + ``` +3. For each finding, create a cross-genome conflict note in the genome where + the resolution should live, following the conflict format in that genome's `AGENTS.md`. +4. Log the lint pass in the master `AGENTS.md` update history (below). + +--- + +## 4. Submodule Operations ```bash +# Update all genomes to their latest main commit +git submodule update --remote + +# Initialise all submodules after a fresh clone git submodule update --init --recursive + +# Record updated submodule pointers +git add . +git commit -m "chore: update submodule pointers" +git push ``` + +--- + +## 5. Adding a New Genome + +```bash +# 1. Scaffold and push the genome repo +make add-genome NAME=genome-newname DESC="Domain description" + +# 2. Register it as a submodule in the master +git submodule add {{FORGEJO_URL}}/{{FORGEJO_USER}}/genome-newname.git genome-newname +git add .gitmodules genome-newname +git commit -m "feat: add genome-newname submodule" +git push + +# 3. Update this file's architecture diagram in Section 1 +``` + +--- + +## 6. Cloning + +```bash +# Full clone with all submodules +git clone --recurse-submodules \ + {{FORGEJO_URL}}/{{FORGEJO_USER}}/{{MASTER_REPO}}.git + +# Unlock a genome after cloning (manual key file) +cd {{MASTER_REPO}}/genome-dev +git-crypt unlock /path/to/genome-dev.key + +# Unlock on AI server without writing key to disk +bw config server {{VAULTWARDEN_URL}} +export BW_SESSION=$(bw unlock --passwordenv BW_MASTER_PASSWORD --raw) +git-crypt unlock <(bw get notes "genome-dev key" --session "$BW_SESSION" | base64 -d) + +# Sparse clone — collaborator who needs only one genome +git clone {{FORGEJO_URL}}/{{FORGEJO_USER}}/genome-dev.git +``` + +--- + +## 7. Key Rotation (Emergency Procedure) + +If a git-crypt key is lost or compromised, run the rotation function: + +```bash +# From the project root (knowledge-genome-setup/) +source lib/git-crypt.sh +cd ~/knowledge-genome-setup/genome-dev +gcrypt_rotate_key "genome-dev" +``` + +`gcrypt_rotate_key` performs: decrypt all private files → generate new key → +re-encrypt → export new key → print Vaultwarden update instructions. + +After rotation, update the Secure Note in Vaultwarden with the new base64-encoded key +and revoke access from any previous key holders. + +--- + +## 8. Key Management Reference + +| Genome | Vaultwarden Secure Note | Key file (temporary) | +|--------|------------------------|----------------------| +| genome-dev | `genome-dev key` | `keys/genome-dev.key` | +| genome-finance | `genome-finance key` | `keys/genome-finance.key` | +| genome-homelab | `genome-homelab key` | `keys/genome-homelab.key` | + +Key files in `keys/` are temporary exports only. Delete them after uploading to Vaultwarden.