Compare commits

..

5 commits

20 changed files with 875 additions and 153 deletions

View file

@ -49,6 +49,7 @@ evolving synthesis. Knowledge is compiled once and kept current.
Contradictions have been flagged. The synthesis already reflects everything ingested.
This means:
- No vector database.
- No embedding pipeline.
- No external retrieval infrastructure.
@ -104,7 +105,7 @@ genome-{name}/
### Three layers
| Layer | Path | Owner | Rule |
|-------|------|-------|------|
| ----------- | ----------- | ----------- | ----------------------------------------------------- |
| Raw sources | `raw/` | Human | Immutable. LLM reads only. Never modified. |
| Wiki | `wiki/` | LLM | Agent creates, updates, cross-links, maintains. |
| Schema | `AGENTS.md` | Human + LLM | Co-evolved contract defining structure and workflows. |
@ -154,6 +155,7 @@ All tools (git-crypt, bw, qmd) have native Linux binaries.
### macOS — full support
All scripts are compatible with macOS. Requirements:
- bash 3.2+ (macOS default) — fully supported. All `bash 4+` constructs removed.
- GNU coreutils not required — BSD variants of `date`, `grep`, `sed` all handled.
- `git-crypt`: install via Homebrew — `brew install git-crypt`
@ -166,6 +168,7 @@ If you use Homebrew bash (`brew install bash`), the scripts work identically to
**Git Bash and native Windows are not supported.**
Reasons:
- `git-crypt` has no native Windows binary.
- Process substitution `<(...)` used for runtime key injection is not available
in Git Bash or PowerShell.
@ -180,7 +183,7 @@ All setup and runtime operations work identically to native Linux inside WSL2.
The system is designed for a homelab architecture:
| Component | Recommended | Role |
|-----------|-------------|------|
| --------------- | ------------------------- | --------------------------------------------------------------- |
| Storage node | Any Linux server with NFS | Hosts Forgejo, stores genome repos |
| AI compute node | GPU server (16GB+ VRAM) | Runs local LLM agent sessions |
| VRAM | 16GB minimum | 14B model at Q5_K_M ≈ 10GB weights; ~6GB for KV cache |
@ -199,7 +202,7 @@ The system is designed for a homelab architecture:
### Required
| Tool | Purpose |
|------|---------|
| ----------- | -------------------------------- |
| `git` | Version control |
| `git-crypt` | Transparent file encryption |
| `curl` | REST API calls to Forgejo/GitHub |
@ -208,7 +211,7 @@ The system is designed for a homelab architecture:
### Optional
| Tool | Purpose |
|------|---------|
| ----- | ----------------------------------------------------------------------- |
| `bw` | Bitwarden CLI — runtime key injection from Vaultwarden (no key on disk) |
| `qmd` | Local BM25 + vector search for Markdown files with MCP server interface |
@ -347,6 +350,7 @@ make setup
- Commits submodule pointer in master repo
After setup completes:
- Upload all files in `keys/` to Vaultwarden (see Key Management)
- Delete key files from disk: `rm keys/*.key`
@ -355,7 +359,7 @@ After setup completes:
## Makefile Reference
| Target | Description |
|--------|-------------|
| --------------------------------- | ------------------------------------------------------------------------------ |
| `make setup` | Full system initialisation — master repo + all genomes in `registry.sh` |
| `make add-genome NAME=x DESC="y"` | Scaffold and register a single new genome |
| `make lint` | Run quality checks across all genomes (schema, privacy, decay, page size) |
@ -407,6 +411,7 @@ After adding: upload the new key to Vaultwarden and delete the key file.
### Removing a genome
Manual process:
```bash
# In master repo
git submodule deinit genome-name
@ -422,7 +427,7 @@ When a genome is scaffolded, `render_template` replaces these placeholders in al
template files:
| Placeholder | Source | Example |
|-------------|--------|---------|
| ----------------------- | ----------- | ------------------------------ |
| `{{GENOME_NAME}}` | registry.sh | `genome-dev` |
| `{{GENOME_NAME_UPPER}}` | derived | `GENOME-DEV` |
| `{{GENOME_DESC}}` | registry.sh | `Web development...` |
@ -442,7 +447,7 @@ Each genome uses a unique symmetric AES-256-CTR key managed by git-crypt.
Two directories in every genome are always encrypted:
| Directory | Contents | On remote |
|-----------|----------|-----------|
| --------------- | --------------------------- | ------------------ |
| `raw/private/` | Sensitive source material | Opaque binary blob |
| `wiki/private/` | Private synthesis and notes | Opaque binary blob |
@ -502,6 +507,7 @@ PRIVATE_CONTEXT: enabled ← Agent may read/write private/. Requires git-cryp
```
Rules:
- Never inferred. Never carried over from a previous session.
- `enabled` requires the operator to confirm that `git-crypt unlock` has run on the host.
- Per-genome, per-session: enabling for `genome-finance` does NOT enable for `genome-dev`.
@ -530,6 +536,7 @@ The key flows: Vaultwarden → `bw get notes` → `base64 -d` → kernel pipe
At no point is the key written to any file on disk.
Lock a genome when the session ends:
```bash
git-crypt lock
```
@ -545,7 +552,7 @@ git-crypt lock
Each genome key is stored as a base64-encoded Secure Note in Vaultwarden:
| Genome | Vaultwarden Note Name |
|--------|----------------------|
| ---------------- | --------------------- |
| `genome-dev` | `genome-dev key` |
| `genome-finance` | `genome-finance key` |
| `genome-homelab` | `genome-homelab key` |
@ -593,6 +600,7 @@ gcrypt_rotate_key "genome-dev"
```
`gcrypt_rotate_key` performs:
1. Unlocks repo with existing key
2. Removes old key material
3. Generates new symmetric key via `git-crypt init`
@ -603,13 +611,16 @@ gcrypt_rotate_key "genome-dev"
> **Limitation:** git history still contains blobs encrypted with the old key.
> Anyone with the old key and git history access can decrypt them. To purge old
> encrypted blobs from history:
>
> ```bash
> git filter-repo --invert-paths --path raw/private --path wiki/private
> git push --force origin main
> ```
>
> This rewrites all commit hashes — coordinate with any collaborators first.
After rotation:
- Upload new key to Vaultwarden (replace existing note)
- Delete both `keys/genome-dev.key` and `keys/genome-dev-rotated-*.key` from disk
- Revoke access from previous key holders
@ -621,6 +632,7 @@ After rotation:
### Prerequisites for every session
Before starting an LLM agent session on a genome:
1. The host (AI server) runs `git-crypt unlock` for the required genomes
2. The orchestrator prepares context: `tail -n 20 wiki/log.md`
3. Declare `PRIVATE_CONTEXT` state explicitly in the opening prompt
@ -651,6 +663,7 @@ sequentially — not one session with 5 files.
### n8n automation
For Forgejo webhook → automated ingest:
1. Forgejo sends webhook on push to `raw/`
2. n8n receives webhook, identifies new files
3. n8n starts one agent session per new file (sequential, not parallel)
@ -677,6 +690,7 @@ Triggered by a new file in `raw/` (manual or via webhook).
9. Commit on `feat/ai-ingest-<slug>`; open PR using `templates/pr-description.md`
For private sources (`PRIVATE_CONTEXT: enabled` required):
- All output goes to `wiki/private/<slug>.md` only
- PR title: `[PRIVATE] ingest: <slug>`
@ -697,11 +711,13 @@ For general orientation without a specific query: read `wiki/index.md` directly.
The lint workflow is split between deterministic bash checks and semantic LLM judgment.
**Step 1 — operator runs bash linter:**
```bash
make lint
```
The bash linter checks automatically:
- YAML frontmatter validity (all mandatory fields present)
- Domain consistency (domain field matches genome name)
- Type validity (value from allowed list)
@ -713,6 +729,7 @@ The bash linter checks automatically:
**Step 2 — operator provides bash output to LLM agent:**
The agent applies semantic judgment to findings the bash linter cannot make:
- **Orphan pages** (from bash list): for each orphan, identify 1-3 existing pages
that should link to it; propose specific additions
- **Implicit concepts** (from bash term frequency list): determine if a candidate
@ -735,21 +752,27 @@ The PR description uses `templates/pr-description.md`:
```markdown
## Summary
One sentence: goal of this session and source processed.
## Pages Created
| Path | Type | Maturity |
## Pages Modified
| Path | Change |
## Contradictions Found
[ ] None / [ ] n conflict file(s) created
## Private Data Accessed
[ ] No (PRIVATE_CONTEXT: disabled) / [ ] Yes
## Scoped Lint (post-ingest)
[ ] Frontmatter valid [ ] No broken links [ ] No issues found
```
@ -777,7 +800,7 @@ The operator resolves the conflict, updates relevant pages, closes the PR.
Pages have a `last_updated` field in frontmatter. During lint passes:
| Maturity | Threshold | Action |
|----------|-----------|--------|
| -------- | --------- | -------------------------------------- |
| `stable` | 180 days | Flag as stale — add `⚠️ STALE` callout |
| `draft` | 90 days | Flag as stale — add `⚠️ STALE` callout |
@ -817,7 +840,7 @@ private: true | false
```
| Field | Rules |
|-------|-------|
| ---------------------- | ------------------------------------------------------------------------ |
| `type` | Must be one of: `source entity concept query conflict private index log` |
| `maturity: draft` | Single source or unvalidated |
| `maturity: stable` | Confirmed by 2+ independent sources |
@ -830,7 +853,7 @@ Do not use semantic versioning for content. Git history tracks every change.
### Page types and directories
| Type | Directory | Description |
|------|-----------|-------------|
| ---------- | ---------------------------- | -------------------------------------------- |
| `source` | `wiki/sources/` | One page per processed raw source |
| `entity` | `wiki/entities/` | People, tools, organisations, projects |
| `concept` | `wiki/concepts/` | Patterns, theories, architectural decisions |
@ -843,7 +866,7 @@ Do not use semantic versioning for content. Git history tracks every change.
### Page size limits
| Limit | Lines | Action |
|-------|-------|--------|
| -------- | ----- | ----------------------------------- |
| Soft cap | 400 | Bash linter warns |
| Hard cap | 800 | Bash linter errors — split the page |
@ -853,7 +876,7 @@ and keep the wiki atomically navigable.
### Linking conventions
| Type | Format |
|------|--------|
| ---------------------- | ------------------------------------------- |
| Internal (same genome) | `[[folder/slug]]` — Obsidian wikilinks only |
| Cross-genome | `[[../genome-target/wiki/folder/slug]]` |
| External | `[text](https://url)` — standard Markdown |
@ -878,6 +901,7 @@ Every operation appends one entry to `wiki/log.md`:
Valid TYPEs: `INGEST` `LINT` `QUERY` `CONFLICT` `CONFIG` `SECURITY`
Parse examples:
```bash
grep "^## \[" wiki/log.md | tail -5 # Last 5 entries
grep "^## \[" wiki/log.md | grep "CONFLICT" # All conflicts
@ -892,7 +916,7 @@ The LLM never loads the full log.
## Collaboration Model
| Role | Key access | Permitted operations |
|------|-----------|----------------------|
| -------------- | ----------------- | ----------------------------------------------------------------------------- |
| Owner | Full — key holder | Read/write everywhere |
| Collaborator | None | Push to `raw/articles/`, `raw/transcripts/`, `raw/code-packs/`, `raw/assets/` |
| Local AI agent | Conditional | `private/` only when `PRIVATE_CONTEXT: enabled` |
@ -930,6 +954,7 @@ qmd serve --port 3333
Obsidian is the recommended wiki browser. Open any genome directory as an Obsidian vault.
Recommended setup:
- **Graph view** — visualise page connections; spot orphans and hubs instantly
- **Obsidian Web Clipper** — browser extension to clip articles directly to `raw/articles/`
as Markdown
@ -991,6 +1016,7 @@ sudo apt install git git-crypt curl jq
The staged file is in a path matching `**/private/**` but is not encrypted.
Fix options:
1. Verify `.gitattributes` contains `**/private/** filter=git-crypt diff=git-crypt -text`
2. Run `git-crypt init` if git-crypt is not initialised in this repo
3. Run `git-crypt status` to check the encryption state of all files
@ -1011,6 +1037,7 @@ git commit -m "fix: re-stage private files for encryption"
### Agent returns stale or missing cross-references
Likely causes:
1. Session was too long — KV cache degraded. Use one source per session.
2. `wiki/index.md` was not read at session start — agent lacked the page catalog.
3. qmd index is stale — re-index: `qmd index <genome>/wiki/`

View file

@ -4,6 +4,9 @@
# Directory structure creation and template rendering engine.
# =============================================================================
# Canonical directory layout lives in one place (lib/structure.sh).
source "$(dirname "${BASH_SOURCE[0]}")/structure.sh"
render_template() {
local template_file="$1"
local output_file="$2"
@ -13,17 +16,21 @@ render_template() {
local content
content=$(<"$template_file")
# Defaults (:-) so master-repo templates render even when GENOME_* are unset
# (scaffold_master runs before any genome; set -u would otherwise abort here).
local genome_name_upper
genome_name_upper=$(tr '[:lower:]' '[:upper:]' <<< "${GENOME_NAME}")
genome_name_upper=$(tr '[:lower:]' '[:upper:]' <<< "${GENOME_NAME:-}")
# Placeholder replacement
content="${content//\{\{GENOME_NAME\}\}/${GENOME_NAME}}"
content="${content//\{\{GENOME_NAME\}\}/${GENOME_NAME:-}}"
content="${content//\{\{GENOME_NAME_UPPER\}\}/${genome_name_upper}}"
content="${content//\{\{GENOME_DESC\}\}/${GENOME_DESC}}"
content="${content//\{\{FORGEJO_URL\}\}/${FORGEJO_URL}}"
content="${content//\{\{FORGEJO_USER\}\}/${FORGEJO_USER}}"
content="${content//\{\{VAULTWARDEN_URL\}\}/${VAULTWARDEN_URL}}"
content="${content//\{\{MASTER_REPO\}\}/${MASTER_REPO}}"
content="${content//\{\{GENOME_DESC\}\}/${GENOME_DESC:-}}"
content="${content//\{\{FORGEJO_URL\}\}/${FORGEJO_URL:-}}"
content="${content//\{\{FORGEJO_USER\}\}/${FORGEJO_USER:-}}"
content="${content//\{\{VAULTWARDEN_URL\}\}/${VAULTWARDEN_URL:-}}"
content="${content//\{\{MASTER_REPO\}\}/${MASTER_REPO:-}}"
# linked project reference (optional) — empty registry field renders as 'none'
content="${content//\{\{LINKED_PROJECT\}\}/${GENOME_LINKED:-none}}"
content="${content//\{\{DATE\}\}/$(date +%Y-%m-%d)}"
mkdir -p "$(dirname "$output_file")"
@ -32,13 +39,9 @@ render_template() {
scaffold_genome() {
local base="$1"
local dirs=(
"raw/articles" "raw/transcripts" "raw/code-packs" "raw/assets" "raw/private"
"wiki/sources" "wiki/entities" "wiki/concepts" "wiki/queries" "wiki/private"
)
info "Building directory structure in ${base}..."
for dir in "${dirs[@]}"; do
for dir in "${GENOME_DIRS[@]}"; do
mkdir -p "${base}/${dir}"
touch "${base}/${dir}/.gitkeep"
done

70
lib/structure.sh Normal file
View file

@ -0,0 +1,70 @@
#!/usr/bin/env bash
# =============================================================================
# lib/structure.sh
# Single source of truth for the canonical genome directory layout, plus the
# verify/sync helpers used by scripts/verify-genomes.sh.
#
# IMPORTANT: this is the ONE place the structure is defined. scaffold.sh sources
# this file and builds new genomes from GENOME_DIRS, so scaffolding and the
# structure check can never drift apart.
# =============================================================================
# Canonical directories every genome must have.
# raw/* are input buckets (collaborator-writable); wiki/* is the agent-owned,
# contract-bound layout the lint, the index sections and the ingest skill depend on.
GENOME_DIRS=(
"raw/articles" "raw/transcripts" "raw/code-packs" "raw/assets" "raw/private"
"wiki/sources" "wiki/entities" "wiki/concepts" "wiki/queries" "wiki/private"
)
# ---------------------------------------------------------------------------
# structure_report <base>
# Reports drift of <base> against GENOME_DIRS.
# - missing canonical dir → counted as drift (returns non-zero)
# - extra dir under raw/ or wiki/ → warning only (does not fail)
# Returns the number of MISSING canonical directories.
# ---------------------------------------------------------------------------
structure_report() {
local base="$1"
local missing=0
for d in "${GENOME_DIRS[@]}"; do
if [[ ! -d "${base}/${d}" ]]; then
warn "missing: ${d}"
missing=$((missing + 1))
fi
done
# Extra directories (drift the other way) — informational only.
local canon=" ${GENOME_DIRS[*]} "
while IFS= read -r d; do
d="${d#"${base}/"}"
[[ "$canon" == *" ${d} "* ]] && continue
info "extra (not in canon): ${d}"
done < <(find "${base}/raw" "${base}/wiki" -mindepth 1 -type d 2>/dev/null)
return $missing
}
# ---------------------------------------------------------------------------
# structure_sync <base>
# Creates any MISSING canonical directories (idempotent). Never deletes —
# retiring a bucket is a deliberate, contract-aware change to GENOME_DIRS +
# the templates, not an automatic prune.
# ---------------------------------------------------------------------------
structure_sync() {
local base="$1"
local added=0
for d in "${GENOME_DIRS[@]}"; do
if [[ ! -d "${base}/${d}" ]]; then
mkdir -p "${base}/${d}"
touch "${base}/${d}/.gitkeep"
success "created: ${d}"
added=$((added + 1))
fi
done
[[ $added -eq 0 ]] && info "already in sync: ${base}"
return 0
}

View file

@ -19,9 +19,13 @@ LIB_DIR="${PROJECT_ROOT}/lib"
PROVIDERS_DIR="${PROJECT_ROOT}/providers"
# --- GENOME REGISTRY ---
# Format: "name|description"
# Format: "name|description|linked_repo"
# - linked_repo is OPTIONAL. Leave empty (trailing pipe) for knowledge-only genomes.
# - It is an opaque reference rendered verbatim into the genome's AGENTS.md
# (phase-2 project work is parked, so the framework does not act on it yet).
# - Example with a project: "genome-homelab|Keru infrastructure...|keru/homelab-infra"
GENOMES=(
"genome-dev|Web development, TUI, Angular, software architecture"
"genome-finance|Personal finance, investments, market analysis"
"genome-homelab|Keru infrastructure, network configs, architecture logs"
"genome-dev|Web development, TUI, Angular, software architecture|"
"genome-finance|Personal finance, investments, market analysis|"
"genome-homelab|Keru infrastructure, network configs, architecture logs|"
)

View file

@ -11,16 +11,18 @@ source "registry.sh"
GENOME_NAME="${1:-}"
GENOME_DESC="${2:-}"
GENOME_LINKED="${3:-}" # optional: linked project repo reference
if [[ -z "$GENOME_NAME" || -z "$GENOME_DESC" ]]; then
error "Missing arguments."
echo "Usage: $0 <genome-name> <description>"
echo "Usage: $0 <genome-name> <description> [linked-repo]"
exit 1
fi
step "Adding New Genome: ${GENOME_NAME}"
GENOMES=("${GENOME_NAME}|${GENOME_DESC}")
# Build a 3-field registry entry (linked_repo may be empty)
GENOMES=("${GENOME_NAME}|${GENOME_DESC}|${GENOME_LINKED}")
source "scripts/setup-genomes.sh"

View file

@ -19,8 +19,9 @@ source "providers/${PROVIDER}.sh"
step "Processing Genome Registry"
for entry in "${GENOMES[@]}"; do
IFS='|' read -r GENOME_NAME GENOME_DESC <<< "$entry"
export GENOME_NAME GENOME_DESC
# 3-field format: name|description|linked_repo (linked_repo optional → may be empty)
IFS='|' read -r GENOME_NAME GENOME_DESC GENOME_LINKED <<< "$entry"
export GENOME_NAME GENOME_DESC GENOME_LINKED
info "Processing: ${GENOME_NAME}..."

50
scripts/verify-genomes.sh Normal file
View file

@ -0,0 +1,50 @@
#!/usr/bin/env bash
# =============================================================================
# scripts/verify-genomes.sh
# Check (default) or --sync the directory structure of every registered genome
# against the canonical layout in lib/structure.sh.
#
# bash scripts/verify-genomes.sh # report drift, non-zero exit on drift
# bash scripts/verify-genomes.sh --sync # create missing dirs everywhere (safe)
#
# No hardware/LLM involved — pure structure check. Run anywhere.
# =============================================================================
set -euo pipefail
source "lib/output.sh"
source "globals.env"
source "registry.sh"
source "lib/structure.sh"
MODE="verify"
[[ "${1:-}" == "--sync" ]] && MODE="sync"
step "Genome structure: ${MODE}"
TOTAL_MISSING=0
for entry in "${GENOMES[@]}"; do
IFS='|' read -r GENOME_NAME _ _ <<< "$entry" # 3-field registry; ignore desc + linked
genome_dir="${WORK_DIR}/${MASTER_REPO}/${GENOME_NAME}"
if [[ ! -d "$genome_dir" ]]; then
warn "not found locally, skipping: ${GENOME_NAME}"
continue
fi
info "Genome: ${GENOME_NAME}"
if [[ "$MODE" == "sync" ]]; then
structure_sync "$genome_dir"
else
structure_report "$genome_dir" && m=0 || m=$?
TOTAL_MISSING=$((TOTAL_MISSING + m))
fi
done
echo ""
if [[ "$MODE" == "sync" ]]; then
success "Structure sync complete."
elif [[ $TOTAL_MISSING -eq 0 ]]; then
success "Structure verified: all genomes match the canonical layout."
else
error "Structure drift: ${TOTAL_MISSING} missing directory(ies). Fix with: make sync-structure"
exit 1
fi

83
skills/ingest/SKILL.md Normal file
View file

@ -0,0 +1,83 @@
---
name: ingest
description: Semantic pass of a single raw source into the current genome's wiki — read the source, write sources/entities/concepts, handle contradictions, then emit a manifest and STOP. Use when a new file lands in raw/. Does NOT do git, log, index, lint, or PRs (a post-processor handles those), and does NOT handle private sources or project repos.
license: see repository
compatibility: Runs inside one genome checkout (cwd = genome root). Tools needed — read, edit only. NO bash, NO git. The deterministic steps (index, log, scoped lint, PR) run AFTER you exit, via run-ingest.sh. PRIVATE_CONTEXT must be disabled.
allowed-tools: read edit
metadata:
framework: knowledge-genome
phase: "1-ingest-semantic"
---
# Ingest — semantic pass
You run inside ONE genome checkout. `AGENTS.md` (already in your context) is the
authoritative contract. Your job is the **semantic pass only**: read the source, write
the wiki pages, handle contradictions. You do **not** touch git, the log, the index, the
linter, or PRs — a post-processor (`run-ingest.sh`) does all of that _after you stop_,
from the manifest you leave behind. This keeps your context clean and your turns few,
which matters on a small local model.
**Argument:** the relative path of the single raw source to ingest
(e.g. `raw/articles/foo.md`). Process only this one.
## Pre-flight — stop the session if any check fails
1. Refuse if the argument path is under any `private/` directory.
2. Refuse if `PRIVATE_CONTEXT` is not `disabled`.
3. Confirm the file exists under `raw/`.
## Semantic work (your only job)
1. Read the source once.
2. Write `wiki/sources/<kebab-slug>.md` — faithful summary + key points, with the required
frontmatter (`type: source`, `domain: <genome>`, `maturity: draft`,
`last_updated: <today>`, `private: false`, sensible `tags`).
3. For each entity (person, tool, org) → create or update `wiki/entities/<kebab-name>.md`.
4. For each concept (pattern, theory, decision) → create or update
`wiki/concepts/<kebab-name>.md`.
5. On a real contradiction with an existing claim, follow `AGENTS.md` §Conflict: create
`wiki/queries/conflict-<concept>-<YYYY-MM-DD>.md`. Never overwrite the existing page.
Name files in kebab-case and pick stable names. Read `wiki/index.md` (and the specific
pages it points to) to decide create-vs-update and to spot contradictions. Do not scan
whole directories.
## Finish: write the manifest, then STOP
As your **final action**, write `.ingest-manifest.json` at the genome root
(NOT under `wiki/`) describing exactly what you did. Then stop — do not commit, lint,
append to the log/index, or open anything.
```json
{
"raw_source": "raw/articles/foo.md",
"model": "<the model you are running as>",
"reasoning": "One sentence for the log: what changed and why.",
"pr_summary": "One or two sentences describing this ingest for the PR.",
"contradictions": "None (or: 1 conflict file created — <concept>)",
"pages": [
{
"path": "wiki/sources/foo.md",
"summary": "One-line index summary.",
"maturity": "draft",
"status": "created"
},
{
"path": "wiki/entities/acme.md",
"summary": "Acme — vendor.",
"maturity": "draft",
"status": "modified"
}
]
}
```
Manifest rules:
- List every page you created or modified, with `status` `created` or `modified`.
- `summary` is the one-line index description (≈12 words max). For conflict pages the
summary is ignored — the index lists conflicts by slug only.
- Do not invent a `run_id`, branch, commit, or PR — those belong to the post-processor.
One source per session. After writing the manifest, stop.

View file

View file

@ -0,0 +1,96 @@
#!/usr/bin/env python3
# =============================================================================
# skills/ingest/scripts/index-append.py
# Insert an entry line into the correct section of wiki/index.md and keep that
# section's entries alphabetically ordered. Bumps frontmatter last_updated.
#
# NOTE: agents-genome.md and wiki-index.md claim the pre-commit hook sorts the
# index. The actual pre-commit.sh only runs the plaintext-leak check — it does
# NOT sort. This script owns the ordering instead. (If you later move sorting
# into the hook, reduce this to a plain append.)
#
# index-append.py --section Sources \
# --entry '- [[sources/foo]] — One-line summary. `maturity: draft`'
# =============================================================================
import argparse
import datetime
import re
import sys
ENTRY_RE = re.compile(r"^- \[\[")
HEADER_RE = re.compile(r"^## ")
def main() -> int:
ap = argparse.ArgumentParser()
ap.add_argument("--section", required=True,
help="Section name, e.g. Sources / Entities / Concepts / Queries / Conflicts")
ap.add_argument("--entry", required=True, help="Full index line to insert")
ap.add_argument("--file", default="wiki/index.md")
args = ap.parse_args()
try:
with open(args.file, encoding="utf-8") as fh:
lines = fh.read().splitlines()
except FileNotFoundError:
print(f"index-append: not found: {args.file}", file=sys.stderr)
return 1
today = datetime.date.today().isoformat()
# 1. Bump last_updated inside the first frontmatter block
fm_open = False
for i, ln in enumerate(lines):
if ln.strip() == "---":
if not fm_open:
fm_open = True
continue
break # end of frontmatter
if fm_open and ln.startswith("last_updated:"):
lines[i] = f"last_updated: {today}"
# 2. Locate the target section [start, end)
start = None
for i, ln in enumerate(lines):
if HEADER_RE.match(ln) and ln[3:].startswith(args.section):
start = i
break
if start is None:
print(f"index-append: section '{args.section}' not found in {args.file}",
file=sys.stderr)
return 1
end = len(lines)
for i in range(start + 1, len(lines)):
if HEADER_RE.match(lines[i]):
end = i
break
# 3. Split the section body into intro (non-entry) and entries
body = lines[start + 1:end]
intro = [ln for ln in body if not ENTRY_RE.match(ln)]
entries = [ln for ln in body if ENTRY_RE.match(ln)]
if args.entry in entries:
print(f"index-append: entry already present, skipping")
return 0
entries.append(args.entry)
entries.sort(key=str.casefold)
# Normalise intro: drop trailing blanks, keep header + comment(s)
while intro and intro[-1].strip() == "":
intro.pop()
new_section = intro + [""] + entries + [""]
lines = lines[:start + 1] + new_section + lines[end:]
with open(args.file, "w", encoding="utf-8") as fh:
fh.write("\n".join(lines) + "\n")
print(f"index-append: added to {args.section}")
return 0
if __name__ == "__main__":
sys.exit(main())

View file

@ -0,0 +1,50 @@
#!/usr/bin/env bash
# =============================================================================
# skills/ingest/scripts/log-append.sh
# Append one entry to the append-only ledger wiki/log.md, in the exact format
# defined by AGENTS.md / wiki-log.md. Generates run_id. Never edits prior entries.
#
# log-append.sh --type INGEST --subject "<slug>" --model "<model>" \
# --context "[[raw/x]]" --output "[[sources/x]]" \
# --reasoning "One sentence."
# =============================================================================
set -euo pipefail
LOG_FILE="${LOG_FILE:-wiki/log.md}"
type="" subject="" model="" context="" output="" reasoning=""
while [[ $# -gt 0 ]]; do
case "$1" in
--type) type="$2"; shift 2 ;;
--subject) subject="$2"; shift 2 ;;
--model) model="$2"; shift 2 ;;
--context) context="$2"; shift 2 ;;
--output) output="$2"; shift 2 ;;
--reasoning) reasoning="$2"; shift 2 ;;
*) echo "log-append: unknown arg: $1" >&2; exit 1 ;;
esac
done
: "${type:?--type required}"
: "${subject:?--subject required}"
case "$type" in
INGEST|LINT|QUERY|CONFLICT|CONFIG|SECURITY) ;;
*) echo "log-append: invalid TYPE '${type}'" >&2; exit 1 ;;
esac
[[ -f "$LOG_FILE" ]] || { echo "log-append: not found: $LOG_FILE" >&2; exit 1; }
run_id="$(uuidgen 2>/dev/null || cat /proc/sys/kernel/random/uuid)"
today="$(date +%Y-%m-%d)"
{
printf '\n## [%s] %s | %s\n\n' "$today" "$type" "$subject"
printf -- '- run_id: `%s`\n' "$run_id"
printf -- '- model: `%s`\n' "${model:-unknown}"
printf -- '- context_read: %s\n' "${context:-*(none)*}"
printf -- '- output_written: %s\n' "${output:-*(none)*}"
printf -- '- reasoning: %s\n' "${reasoning:-No reasoning provided.}"
} >> "$LOG_FILE"
echo "run_id=${run_id}"

View file

@ -0,0 +1,98 @@
#!/usr/bin/env bash
# =============================================================================
# skills/ingest/scripts/open-pr.sh
# Branch, commit (conventional), push, and open a Forgejo PR for the wiki/ changes.
# Mirrors the API conventions of providers/forgejo.sh (token auth + http_code).
# Runs inside the genome checkout (cwd = genome root). Never touches main.
#
# open-pr.sh --slug <slug> --title "feat: ingest <slug>" --body-file <path> \
# [--base main] [--label CONFLICT]
#
# Requires env: FORGEJO_URL, FORGEJO_USER, FORGEJO_TOKEN.
# =============================================================================
set -euo pipefail
: "${FORGEJO_URL:?missing FORGEJO_URL}"
: "${FORGEJO_USER:?missing FORGEJO_USER}"
: "${FORGEJO_TOKEN:?missing FORGEJO_TOKEN}"
slug="" title="" body_file="" base="main" label=""
while [[ $# -gt 0 ]]; do
case "$1" in
--slug) slug="$2"; shift 2 ;;
--title) title="$2"; shift 2 ;;
--body-file) body_file="$2"; shift 2 ;;
--base) base="$2"; shift 2 ;;
--label) label="$2"; shift 2 ;;
*) echo "open-pr: unknown arg: $1" >&2; exit 1 ;;
esac
done
: "${slug:?--slug required}"
: "${title:?--title required}"
: "${body_file:?--body-file required}"
[[ -f "$body_file" ]] || { echo "open-pr: body file not found: $body_file" >&2; exit 1; }
branch="feat/ai-ingest-${slug}"
repo="$(basename -s .git "$(git config --get remote.origin.url)")"
# 1. Branch + commit + push (AGENTS.md rule 5: never commit to main)
git switch -c "$branch" 2>/dev/null || git switch "$branch"
git add wiki/
if git diff --cached --quiet; then
echo "open-pr: nothing staged under wiki/ — aborting" >&2
exit 1
fi
git commit -m "$title"
git push -u origin "$branch"
# 2. Open the PR via Forgejo API (jq builds the JSON safely)
body="$(cat "$body_file")"
payload="$(jq -n --arg head "$branch" --arg base "$base" \
--arg title "$title" --arg body "$body" \
'{head:$head, base:$base, title:$title, body:$body}')"
resp="$(curl -s -w '\n%{http_code}' \
-H "Authorization: token ${FORGEJO_TOKEN}" \
-H "Content-Type: application/json" \
-X POST "${FORGEJO_URL}/api/v1/repos/${FORGEJO_USER}/${repo}/pulls" \
-d "$payload")"
code="$(printf '%s' "$resp" | tail -n1)"
json="$(printf '%s' "$resp" | sed '$d')"
case "$code" in
201)
url="$(printf '%s' "$json" | jq -r '.html_url')"
number="$(printf '%s' "$json" | jq -r '.number')"
echo "PR opened: ${url}"
;;
409)
echo "open-pr: a PR for '${branch}' already exists — push updated the branch." >&2
exit 0
;;
401)
echo "open-pr: unauthorized — check FORGEJO_TOKEN (n8n-bot)." >&2
exit 1
;;
*)
echo "open-pr: Forgejo API HTTP ${code}: ${json}" >&2
exit 1
;;
esac
# 3. Optional label (e.g. CONFLICT). Best-effort; non-fatal.
if [[ -n "$label" && -n "${number:-}" ]]; then
label_id="$(curl -s -H "Authorization: token ${FORGEJO_TOKEN}" \
"${FORGEJO_URL}/api/v1/repos/${FORGEJO_USER}/${repo}/labels" \
| jq -r --arg n "$label" '.[] | select(.name==$n) | .id' | head -n1)"
if [[ -n "$label_id" && "$label_id" != "null" ]]; then
curl -s -o /dev/null \
-H "Authorization: token ${FORGEJO_TOKEN}" -H "Content-Type: application/json" \
-X POST "${FORGEJO_URL}/api/v1/repos/${FORGEJO_USER}/${repo}/issues/${number}/labels" \
-d "{\"labels\":[${label_id}]}" \
&& echo "label '${label}' applied" >&2
else
echo "open-pr: label '${label}' not found in repo — skipped." >&2
fi
fi

View file

@ -0,0 +1,120 @@
#!/usr/bin/env bash
# =============================================================================
# skills/ingest/scripts/run-ingest.sh
# Post-pi orchestrator. Runs OUTSIDE pi's loop, on vm101, in the genome checkout.
# Consumes .ingest-manifest.json (written by the ingest skill) and performs every
# deterministic step — index, log, scoped lint, PR — so pi's context stays clean.
#
# run-ingest.sh <genome_name> [manifest_path]
#
# Emits a single JSON result line on stdout for n8n to parse.
# =============================================================================
set -euo pipefail
genome="${1:?usage: run-ingest.sh <genome> [manifest]}"
manifest="${2:-.ingest-manifest.json}"
SCRIPTS="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
fail() {
jq -n --arg stage "$1" --arg reason "$2" \
'{status:"error", stage:$stage, reason:$reason}'
exit 1
}
command -v jq >/dev/null 2>&1 || { echo '{"status":"error","reason":"jq missing"}'; exit 1; }
command -v python3 >/dev/null 2>&1 || fail "deps" "python3 missing (needed by index-append.py)"
[[ -f "$manifest" ]] || fail "manifest" "manifest not found: ${manifest}"
# --- read manifest scalars ---
raw_source="$(jq -r '.raw_source' "$manifest")"
model="$(jq -r '.model // "unknown"' "$manifest")"
reasoning="$(jq -r '.reasoning // "Ingest."' "$manifest")"
pr_summary="$(jq -r '.pr_summary // "Ingest."' "$manifest")"
contradictions="$(jq -r '.contradictions // "None"' "$manifest")"
[[ -n "$raw_source" && "$raw_source" != "null" ]] || fail "manifest" "raw_source missing"
slug="$(bash "${SCRIPTS}/slug.sh" "$raw_source")"
# --- collect touched paths ---
mapfile -t created_paths < <(jq -r '.pages[] | select(.status=="created") | .path' "$manifest")
mapfile -t modified_paths < <(jq -r '.pages[] | select(.status=="modified") | .path' "$manifest")
all_paths=( "${created_paths[@]}" "${modified_paths[@]}" )
[[ ${#all_paths[@]} -gt 0 ]] || fail "manifest" "no pages reported"
conflict_label=""
# --- 1. index entries (created pages only), inserted in order ---
while IFS=$'\t' read -r path summary maturity; do
[[ -z "$path" ]] && continue
link="${path#wiki/}"; link="${link%.md}" # e.g. sources/foo
folder="${link%%/*}"
case "$folder" in
sources) section="Sources" ;;
entities) section="Entities" ;;
concepts) section="Concepts" ;;
queries)
if [[ "$link" == queries/conflict-* ]]; then section="Conflicts"; conflict_label="CONFLICT"
else section="Queries"; fi ;;
*) section="Sources" ;;
esac
if [[ "$section" == "Conflicts" ]]; then
entry="- [[${link}]]" # conflicts: slug only
else
entry="- [[${link}]] — ${summary} \`maturity: ${maturity}\`"
fi
python3 "${SCRIPTS}/index-append.py" --section "$section" --entry "$entry" \
|| fail "index" "index-append failed for ${path}"
done < <(jq -r '.pages[] | select(.status=="created")
| [.path, (.summary // ""), (.maturity // "draft")] | @tsv' "$manifest")
# --- 2. log entry ---
out="$(jq -r '[.pages[].path | "[[" + (sub("^wiki/";"") | sub("\\.md$";"")) + "]]"] | join(", ")' "$manifest")"
"${SCRIPTS}/log-append.sh" --type INGEST --subject "$slug" --model "$model" \
--context "[[${raw_source}]]" --output "${out:-*(none)*}" --reasoning "$reasoning" \
|| fail "log" "log-append failed"
# --- 3. scoped lint (capture findings for the PR; never aborts the run) ---
lint_out="$( "${SCRIPTS}/scoped-lint.sh" "$genome" "${all_paths[@]}" 2>&1 )" && lint_rc=0 || lint_rc=$?
# --- 4. assemble the PR body (manifest tables + lint results) ---
body="$(mktemp)"
{
echo "## Summary"
echo "$pr_summary"
echo ""
echo "## Pages"
echo "| Path | Status | Maturity |"
echo "|------|--------|----------|"
jq -r '.pages[] | "| `\(.path)` | \(.status) | \(.maturity // "draft") |"' "$manifest"
echo ""
echo "## Contradictions"
echo "$contradictions"
echo ""
echo "## Scoped Lint (post-ingest)"
echo '```'
echo "$lint_out"
echo '```'
} > "$body"
# --- 5. open the PR ---
pr_args=( --slug "$slug" --title "feat: ingest ${slug}" --body-file "$body" )
[[ -n "$conflict_label" ]] && pr_args+=( --label "$conflict_label" )
pr_out="$( "${SCRIPTS}/open-pr.sh" "${pr_args[@]}" 2>&1 )" && pr_rc=0 || pr_rc=$?
pr_url="$(printf '%s\n' "$pr_out" | sed -n 's/^PR opened: //p' | head -n1)"
rm -f "$body"
# --- final result line for n8n ---
jq -n \
--arg status "$([[ $pr_rc -eq 0 ]] && echo ok || echo pr_failed)" \
--arg slug "$slug" \
--arg pr_url "$pr_url" \
--argjson lint_clean "$([[ $lint_rc -eq 0 ]] && echo true || echo false)" \
--argjson conflict "$([[ -n "$conflict_label" ]] && echo true || echo false)" \
--arg detail "$pr_out" \
'{status:$status, slug:$slug, pr_url:$pr_url, lint_clean:$lint_clean, conflict:$conflict, detail:$detail}'
[[ $pr_rc -eq 0 ]]

View file

@ -0,0 +1,50 @@
#!/usr/bin/env bash
# =============================================================================
# skills/ingest/scripts/scoped-lint.sh
# Run the framework's validation on ONLY the files touched this session.
# Reuses lib/lint.sh + lib/output.sh — same checks as `make lint`, scoped.
#
# KG_LIB_DIR=/opt/knowledge-genome-setup/lib \
# scoped-lint.sh <genome_name> wiki/sources/x.md wiki/entities/y.md
#
# Exits non-zero if any hard error is found, so the agent notices.
# Findings are printed (stderr from the lint functions + a summary on stdout).
# =============================================================================
set -euo pipefail
: "${KG_LIB_DIR:?set KG_LIB_DIR to the framework lib/ dir (e.g. /opt/knowledge-genome-setup/lib)}"
# shellcheck source=/dev/null
source "${KG_LIB_DIR}/output.sh"
# shellcheck source=/dev/null
source "${KG_LIB_DIR}/lint.sh"
genome="${1:?usage: scoped-lint.sh <genome> <file...>}"
shift
[[ $# -gt 0 ]] || { echo "scoped-lint: no files given" >&2; exit 1; }
errors=0
stale=0
count=$#
for f in "$@"; do
if [[ ! -f "$f" ]]; then
warn "scoped-lint: missing file (skipped): $f"
continue
fi
lint_markdown_file "$f" "$genome" && fe=0 || fe=$?
check_privacy_consistency "$f" && pce=0 || pce=$?
check_page_size "$f" && pse=0 || pse=$?
errors=$(( errors + fe + pce + pse ))
check_knowledge_decay "$f" && st=0 || st=$?
stale=$(( stale + st ))
check_broken_links "$f" || true # warnings only
done
echo ""
echo "scoped-lint: ${errors} error(s), ${stale} stale across ${count} file(s)"
[[ $errors -eq 0 ]]

View file

@ -0,0 +1,18 @@
#!/usr/bin/env bash
# =============================================================================
# skills/ingest/scripts/slug.sh
# Derive a wiki slug from a path, filename, or title string.
# slug.sh "raw/articles/My Source.md" -> my-source
# slug.sh "Some Concept Name" -> some-concept-name
# =============================================================================
set -euo pipefail
input="${1:?usage: slug.sh <path-or-title>}"
# Strip directory and extension when given a path
base="${input##*/}"
base="${base%.*}"
printf '%s\n' "$base" \
| tr '[:upper:]' '[:lower:]' \
| sed -E 's/[^a-z0-9]+/-/g; s/-{2,}/-/g; s/^-+//; s/-+$//'

View file

@ -3,7 +3,7 @@
## Identity
| Field | Value |
|--------|-------|
| ------ | -------------------------------------------------- |
| Genome | `{{GENOME_NAME}}` |
| Domain | `{{GENOME_DESC}}` |
| Owner | `{{FORGEJO_USER}}` |
@ -14,12 +14,26 @@
---
## Linked Project
| Field | Value |
| --------------- | --------------------- |
| Project repo | `{{LINKED_PROJECT}}` |
| Branch | `main` |
| Allowed tasks | `readme, tests, code` |
| Preferred model | `auto` |
If `Project repo` is `none`, this genome is knowledge-only — phase-2 project work
does not apply. When set, after a wiki PR is **merged**, the orchestrator may trigger
work on this repo within _Allowed tasks_. The agent never touches the project repo
during ingest.
## PRIVATE_CONTEXT
**Default: `disabled`** — never infer; require explicit operator declaration per session.
| State | Behavior |
|-------|----------|
| ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `disabled` | `raw/private/` and `wiki/private/` do not exist. No read, list, grep, or summary on private paths. All outputs safe for collaborators. |
| `enabled` | Operator has confirmed `git-crypt unlock` ran on host. Read/write `private/` authorized. All outputs from private data go exclusively to `wiki/private/`. Prefix every response drawing on private data: `[PRIVATE DATA INCLUDED]`. Never leak private synthesis into public wiki paths. |
@ -41,6 +55,7 @@ Session end or return to `disabled`: remind operator to run `git-crypt lock` on
8. Every PR must use `templates/pr-description.md`. Do not omit the tabular summary.
### NEVER
- Load `wiki/log.md` in full — read only the tail injected by the orchestrator.
- Rewrite `wiki/index.md` to reorder entries — append only; sorting is automated.
- Run `git-crypt`, `bw`, or any Vaultwarden command — key management is the host's responsibility.
@ -48,6 +63,7 @@ Session end or return to `disabled`: remind operator to run `git-crypt lock` on
- Merge PRs — human approval required.
### ASK FIRST
- Deleting any wiki page.
- Changing `maturity` from `stable` to `deprecated`.
- Writing to `wiki/private/` when PRIVATE_CONTEXT state is ambiguous.
@ -70,7 +86,8 @@ Execute in this order before any file operation:
## Workflows
### Ingest
*Triggered by new file in `raw/`.*
_Triggered by new file in `raw/`._
1. Read source once.
2. Create `wiki/sources/<slug>.md` — summary + key points.
@ -82,12 +99,14 @@ Execute in this order before any file operation:
8. Run scoped lint on pages created or modified in this session. Report issues in PR description. Do not auto-fix.
9. Commit on `feat/ai-ingest-<slug>`. Open PR using `templates/pr-description.md`.
*Private source* (`PRIVATE_CONTEXT: enabled` required):
_Private source_ (`PRIVATE_CONTEXT: enabled` required):
- All output → `wiki/private/<slug>.md` only.
- PR title: `[PRIVATE] ingest: <slug>`.
### Query
*Triggered by operator question.*
_Triggered by operator question._
1. `qmd search "<query>"` → identify candidate pages.
2. Read candidate pages directly.
@ -96,10 +115,11 @@ Execute in this order before any file operation:
5. Append entry to `wiki/index.md` under Queries.
6. Append log entry: `QUERY | <subject>`.
*For general orientation without a specific query: read `wiki/index.md` directly.*
_For general orientation without a specific query: read `wiki/index.md` directly._
### Lint
*Triggered by operator with bash pre-scan output.*
_Triggered by operator with bash pre-scan output._
Pre-requisite: operator runs `bash scripts/lint-genomes.sh` and provides output to this session.
The script handles deterministically: broken links, knowledge decay, page size, frontmatter validation.
@ -119,6 +139,7 @@ Append log entry: `LINT | <summary of findings>`.
## File Conventions
### Frontmatter
Required on every wiki page:
```yaml
@ -138,19 +159,25 @@ private: true | false
- `deprecated` — superseded. Add `> **DEPRECATED:** <reason>` callout at top of body.
### Links
- Internal: `[[folder/file]]` — Obsidian wikilinks only. Never `[text](url)` for internal refs.
- Cross-genome: `[[../genome-target/wiki/folder/file]]`.
- External: `[text](https://...)`.
### Index entries
Append at bottom of relevant section in `wiki/index.md`:
```
- [[folder/slug]] — One-line summary. `maturity: draft`
```
Never reorder. Alphabetical sort is handled by the pre-commit hook.
### Log entries
Append one entry per operation to `wiki/log.md`:
```markdown
## [YYYY-MM-DD] TYPE | Subject
@ -160,6 +187,7 @@ Append one entry per operation to `wiki/log.md`:
- output_written: `[[path/C]]`
- reasoning: One sentence — what changed and why.
```
Valid TYPEs: `INGEST` `LINT` `QUERY` `CONFLICT` `CONFIG` `SECURITY`
Parse: `grep "^## \[" wiki/log.md | tail -5`
@ -183,16 +211,20 @@ last_updated: YYYY-MM-DD
private: false
---
```
```markdown
## Conflict: <concept>
**Claim A (existing):** [[path/to/existing-page]]
> Summary of current wiki position.
**Claim B (new):** [[path/to/new-source]]
> Summary of contradicting evidence.
**Assessment:**
- Confidence A: high | medium | low — <reason>
- Confidence B: high | medium | low — <reason>
- Recommendation: `accept_b` | `keep_a` | `requires_human_review`
@ -212,9 +244,11 @@ private: false
- `maturity: draft` not updated in **90 days** → flag during lint.
Flagged pages: prepend to body:
```markdown
> **⚠️ STALE:** Last validated {{last_updated}}. Re-validation required.
```
Propose re-validation task. Do not change `maturity` without new source evidence.
---
@ -222,7 +256,7 @@ Propose re-validation task. Do not change `maturity` without new source evidence
## Collaboration
| Role | Access | Permitted |
|------|--------|-----------|
| -------------- | ----------------- | ------------------------------------------------------------------------------------ |
| Owner | Full — key holder | Read/write everywhere |
| Collaborator | No key | Push to `raw/articles`, `raw/transcripts`, `raw/code-packs`, `raw/assets` |
| Local AI agent | Conditional | `private/` only when `PRIVATE_CONTEXT: enabled` |

View file

@ -3,7 +3,7 @@
## Identity
| Field | Value |
|--------|-------|
| ------ | -------------------------------------------------- |
| Repo | `{{MASTER_REPO}}` |
| Owner | `{{FORGEJO_USER}}` |
| Remote | `{{FORGEJO_URL}}/{{FORGEJO_USER}}/{{MASTER_REPO}}` |
@ -32,14 +32,17 @@ Genome-level operations are governed by the genome's `AGENTS.md`, not this file.
## Global Security Rules
### PRIVATE_CONTEXT scope
- Toggle is **per-genome and per-session**. Enabling for `genome-finance` does NOT enable for `genome-dev`.
- Cloud LLM models: `PRIVATE_CONTEXT` must be `disabled` for all genomes. Private data never leaves the local network.
### Log sanitization
- Never print decrypted secrets, session tokens, or key contents to stdout or log files.
- Document only `run_id` and genome name — never the key value.
### Key management
- Key injection is the host's responsibility — executed before this session starts.
- Never write, suggest, or generate scripts that save `.key` files to disk.
@ -54,12 +57,14 @@ Genome-level operations are governed by the genome's `AGENTS.md`, not this file.
5. Per-genome `AGENTS.md` governs all wiki operations within that genome. This file governs boundaries only.
### NEVER
- Load multiple `wiki/index.md` files simultaneously for cross-genome comparison — use qmd.
- Run `git-crypt`, `bw`, or Vaultwarden commands — host responsibility.
- Modify files in more than one genome in the same operation.
- Modify `core-karpathy` in any way.
### ASK FIRST
- Any operation that touches two or more genomes.
- Updating submodule pointers in master.
- Any key rotation procedure.
@ -77,7 +82,8 @@ Genome-level operations are governed by the genome's `AGENTS.md`, not this file.
---
## Cross-Genome Lint
*Manual, monthly — requires operator initiation. Not automated.*
_Manual, monthly — requires operator initiation. Not automated._
1. Use `qmd search "<concept>"` to find pages covering the same concept across genomes.
2. Identify:

View file

@ -1,25 +1,31 @@
## Summary
<!-- One sentence: goal of this session and source processed. -->
## Pages Created
| Path | Type | Maturity |
|------|------|----------|
| ----------------- | --------------------------------- | -------- |
| `[[folder/slug]]` | entity / concept / source / query | draft |
## Pages Modified
| Path | Change |
|------|--------|
| ----------------- | ----------------------------------------- |
| `[[folder/slug]]` | Added cross-reference to `[[other/page]]` |
## Contradictions Found
- [ ] None
- [ ] `n` conflict file(s) created — listed below
## Private Data Accessed
- [ ] No — `PRIVATE_CONTEXT: disabled`
- [ ] Yes — `PRIVATE_CONTEXT: enabled` · outputs in `wiki/private/` only
## Scoped Lint (post-ingest)
- [ ] Frontmatter valid on all touched pages
- [ ] No broken wikilinks on touched pages
- [ ] No issues found

View file

@ -19,27 +19,28 @@ Entry format: `- [[folder/slug]] — One-line summary. \`maturity: <value>\``
---
## Sources (`wiki/sources/`)
*Ingested raw materials. One entry per processed source.*
_Ingested raw materials. One entry per processed source._
## Entities (`wiki/entities/`)
*People, organisations, tools, projects.*
_People, organisations, tools, projects._
## Concepts (`wiki/concepts/`)
*Theories, methodologies, patterns, architectural decisions.*
_Theories, methodologies, patterns, architectural decisions._
## Queries (`wiki/queries/`)
*Synthesised answers worth preserving. Archived explorations and analyses.*
_Synthesised answers worth preserving. Archived explorations and analyses._
## Conflicts Pending Review (`wiki/queries/conflict-*.md`)
*Created automatically when the agent detects contradictions between sources.*
*Do not summarise entries here — list slugs only to avoid surfacing unresolved claims.*
*Remove entry once the operator has resolved and closed the corresponding PR.*
_Created automatically when the agent detects contradictions between sources._
_Do not summarise entries here — list slugs only to avoid surfacing unresolved claims._
_Remove entry once the operator has resolved and closed the corresponding PR._
## Private Synthesis (`wiki/private/`)
*Restricted access. Requires `PRIVATE_CONTEXT: enabled` and unlocked repo.*
*List slug names ONLY. Do not append summaries — prevents metadata leakage.*
_Restricted access. Requires `PRIVATE_CONTEXT: enabled` and unlocked repo._
_List slug names ONLY. Do not append summaries — prevents metadata leakage._

View file

@ -22,11 +22,13 @@ Append new entries at the bottom using the format defined below.
## Entry Format
### Required header (enables shell parsing):
```text
## [YYYY-MM-DD] TYPE | Subject or title
```
### Required metadata block for all agent-generated entries:
```markdown
- run_id: `<short-uuid or session-identifier>`
- model: `<model-name-and-version>`
@ -38,6 +40,7 @@ Append new entries at the bottom using the format defined below.
**Valid TYPEs:** `INGEST` | `LINT` | `QUERY` | `CONFLICT` | `CONFIG` | `SECURITY`
**Parse examples:**
```bash
# Last 5 entries
grep "^## \[" wiki/log.md | tail -5
@ -55,6 +58,6 @@ grep "^## \[2026-05" wiki/log.md
- run_id: `system-init`
- model: `setup-knowledge-genome.sh`
- context_read: *(none — initial scaffold)*
- context_read: _(none — initial scaffold)_
- output_written: `[[wiki/index.md]]`, `[[wiki/log.md]]`, `[[AGENTS.md]]`
- reasoning: Initial directory structure and encryption layer initialized by setup script.