knowledge-genome-orchestrator/templates/agents-master.md

134 lines
7.8 KiB
Markdown

# SYSTEM DIRECTIVE — `{{MASTER_REPO}}`
## Identity
| Field | Value |
| ------ | -------------------------------------------------- |
| Repo | `{{MASTER_REPO}}` |
| Owner | `{{FORGEJO_USER}}` |
| Remote | `{{FORGEJO_URL}}/{{FORGEJO_USER}}/{{MASTER_REPO}}` |
**Role:** Cross-genome coordinator for the Knowledge Genome network.
**Metrics:** no cross-genome boundary violations · submodule pointers current · cross-genome discoveries routed to target raw/ · zero stale submodule-relative wikilinks.
---
## Architecture
```text
{{MASTER_REPO}}/
├── core-karpathy/ ← Reference pattern — read-only, never modify
├── genome-example/ ← Submodule placeholder (replace with your domain)
└── AGENTS.md
```
Each genome has its own `AGENTS.md` with domain-specific rules.
Genome-level operations are governed by the genome's `AGENTS.md`, not this file.
---
## Global Security Rules
### PRIVATE_CONTEXT scope
- Toggle is **per-genome and per-session**. Enabling for `genome-finance` does NOT enable for `genome-dev`.
- Cloud LLM models: `PRIVATE_CONTEXT` must be `disabled` for all genomes. Private data never leaves the local network.
### Log sanitization
- Never print decrypted secrets, session tokens, or key contents to stdout or log files.
- Document only `run_id` and genome name — never the key value.
### Key management
- Key injection is the host's responsibility — executed before this session starts.
- Never write, suggest, or generate scripts that save `.key` files to disk.
---
## Immutable Rules
1. Operate within ONE genome at a time. No atomic commits across multiple genomes.
2. `core-karpathy` is read-only. Never commit to it.
3. Cross-genome references are NEVER expressed as wikilinks. When a concept belongs to another genome, use the navigation skill to emit a raw stub into that genome's `raw/articles/` and let its own ingest pipeline handle it asynchronously.
4. Never commit to `main` in any genome. PRs required; no self-merge.
5. Per-genome `AGENTS.md` governs all wiki operations within that genome. This file governs boundaries only.
### NEVER
- Load multiple `wiki/index.md` files simultaneously for cross-genome comparison — use qmd.
- Run `git-crypt`, `bw`, or Vaultwarden commands — host responsibility.
- Modify files in more than one genome in the same operation.
- Create cross-genome wikilinks (e.g., `[[../genome-*/wiki/...]]`). All cross-domain connections must be routed via the navigation skill as raw stubs.
- Modify `core-karpathy` in any way.
### ASK FIRST
- Any operation that touches two or more genomes.
- Updating submodule pointers in master.
- Any key rotation procedure.
- Enabling `PRIVATE_CONTEXT` — operator must confirm `git-crypt unlock` ran on host.
---
## Session Start
1. Identify which genome(s) this session involves.
2. Read the relevant genome's `wiki/index.md` — not all genomes' indexes.
3. For cross-genome discovery: `qmd search "<concept>"` across the multi-genome index.
4. Operate on one genome at a time. Switch genome only when the previous operation is committed.
---
## Cross-Genome Pull (Navigation Skill)
Cross-genome knowledge moves by **pull, never push**: the genome you are working in draws material *in*; nothing is ever written into another genome. The cross-genome reading is performed by a deterministic collector **outside any agent's context**, so the agent still operates within ONE genome (Immutable Rule 1 holds). The `cross_source` registry flag decides which genomes may be read as sources.
There is **no separate synthesis step**: retrieving and then distilling twice would only add LLM cost and lose information. The collector *retrieves* (like a search) and deposits the result as a raw; the working genome's own ingest *distills* it once, for this genome's needs.
### How it works
Two actors:
1. **Collector** (`collect-crossgen.sh`, deterministic, agent-free). Clones each genome flagged `cross_source: yes` **read-only at its remote HEAD** — a disposable checkout, for freshness; never the pinned submodule state. The clone is **keyless**, so `private/` stays an encrypted blob and is unreadable. It indexes the public wikis with `qmd`, runs `qmd search "<topic>"`, and assembles a **dossier**: the text of the matching pages plus per-excerpt provenance (source genome, page, HEAD short-sha, date), with every `[[wikilink]]` neutralized to plain text. It deposits the dossier as **one** raw in the working genome at `raw/articles/crossgen-<topic>-<YYYY-MM-DD>.md`, commits, and pushes. Nothing is written to any source genome.
2. **Target ingest.** The working genome's standard ingest reads that raw as an ordinary source and distills it into wiki pages for the local domain — one semantic pass → PR → human gate. Same gate as any other source.
### When to pull
Pull is initiated deliberately (operator- or context-driven, never on a timer). Produce a crossgen raw ONLY when all three hold:
1. **Ownership elsewhere.** The concept, entity, or pattern is defined and maintained in another genome, and you need it framed for the working domain.
2. **Structural relevance.** It influences decisions, patterns, or entities here — not a casual mention.
3. **No fresh local coverage.** `qmd search "<concept>"` in the working genome returns nothing, or only a stub that needs enrichment.
If in doubt, do NOT pull. A missed cross-reference is cheaper than crossgen spam.
### Boundaries (enforced by the master)
- **Sources are restricted to `cross_source: yes` genomes.** A genome flagged `no` (e.g., a client / confidential file) is NEVER read as a source — the collector skips it physically. The wall is structural, not a matter of the agent's discipline.
- **Keyless collection.** The collector holds no git-crypt key, so `private/` stays ciphertext and cannot be read — privacy does not depend on the agent behaving.
- **Sources are read-only, at HEAD.** No write, commit, branch, or PR in any genome other than the one being worked on.
- **NEVER `git submodule update --remote`.** Read other genomes via disposable read-only clones — never by moving this master's submodule pointers (that is ASK FIRST).
- The deposited raw must contain **no wikilinks and no private data**; it is processed by the working genome's normal ingest + human gate.
### Output raw (the only artifact written)
**Path (in the working genome):** `raw/articles/crossgen-<topic>-<YYYY-MM-DD>.md`
Plain text. No YAML frontmatter (raw is immutable input). **No wikilinks of any kind**`[[...]]` from source pages are flattened to plain text so they never become broken cross-references here.
```markdown
> Cross-genome pull | Into: genome-<working> | Query: "<topic>" | Date: YYYY-MM-DD
## From genome-<a> — wiki/concepts/<x>.md (HEAD <short-sha>)
[retrieved page text — wikilinks flattened to plain text, no private data]
## From genome-<b> — wiki/entities/<y>.md (HEAD <short-sha>)
[retrieved page text]
```
**Rules:**
- **Deterministic deposit.** The raw is written by the collector (the skill's mechanical side), never edited by an agent — agents never create, modify, or delete files in any `raw/`. Each pull is a **new, dated** file (raw is immutable).
- **Distillation happens at ingest, once.** The working genome's normal ingest turns the dossier into wiki pages and **deduplicates against existing pages** via its §Conflict procedure. There is no pre-summarization.
- **Bound large retrievals deterministically** (top-N pages / relevant sections) rather than adding an LLM pass — keeps the dossier-raw and the ingest job reasonable at any scale.
- *Optional (large + expensive-cloud deployments only):* a cheap **local** pre-distillation may be inserted before an expensive cloud ingest to shrink its input. This is an opt-in optimization; the default is no synthesis.