feat(cross-genome): Implement pull-based navigation skill and policy

This commit is contained in:
Matteo Cherubini 2026-06-09 17:02:58 +02:00
parent abbf7362d9
commit 2c38d04d7f
2 changed files with 65 additions and 19 deletions

View file

@ -1020,14 +1020,9 @@ and keep the wiki atomically navigable.
### Linking conventions
| Type | Format |
| ---------------------- | ------------------------------------------- |
| Internal (same genome) | `[[folder/slug]]` — Obsidian wikilinks only |
| Cross-genome | `[[../genome-target/wiki/folder/slug]]` |
| External | `[text](https://url)` — standard Markdown |
Never use `[text](relative/path)` for internal references. Obsidian wikilinks are
bidirectional and appear in the graph view.
- **Intra-genome:** `[[folder/file]]` — Obsidian wikilinks only.
- **Cross-genome:** NOT supported via wikilink. Submodule pointers make relative paths brittle. When a concept belongs to another genome, use the navigation skill to emit a raw stub into that genome's `raw/articles/` directory so its local ingest pipeline can process it.
- **External:** `[text](https://...)` — standard Markdown.
### Log format

View file

@ -9,7 +9,7 @@
| Remote | `{{FORGEJO_URL}}/{{FORGEJO_USER}}/{{MASTER_REPO}}` |
**Role:** Cross-genome coordinator for the Knowledge Genome network.
**Metrics:** no cross-genome boundary violations · submodule pointers current · cross-genome wikilinks valid · no private data outside local network.
**Metrics:** no cross-genome boundary violations · submodule pointers current · cross-genome discoveries routed to target raw/ · zero stale submodule-relative wikilinks.
---
@ -50,7 +50,7 @@ Genome-level operations are governed by the genome's `AGENTS.md`, not this file.
1. Operate within ONE genome at a time. No atomic commits across multiple genomes.
2. `core-karpathy` is read-only. Never commit to it.
3. Cross-genome references use relative wikilinks only: `[[../genome-target/wiki/folder/page]]`.
3. Cross-genome references are NEVER expressed as wikilinks. When a concept belongs to another genome, use the navigation skill to emit a raw stub into that genome's `raw/articles/` and let its own ingest pipeline handle it asynchronously.
4. Never commit to `main` in any genome. PRs required; no self-merge.
5. Per-genome `AGENTS.md` governs all wiki operations within that genome. This file governs boundaries only.
@ -59,6 +59,7 @@ Genome-level operations are governed by the genome's `AGENTS.md`, not this file.
- Load multiple `wiki/index.md` files simultaneously for cross-genome comparison — use qmd.
- Run `git-crypt`, `bw`, or Vaultwarden commands — host responsibility.
- Modify files in more than one genome in the same operation.
- Create cross-genome wikilinks (e.g., `[[../genome-*/wiki/...]]`). All cross-domain connections must be routed via the navigation skill as raw stubs.
- Modify `core-karpathy` in any way.
### ASK FIRST
@ -79,14 +80,64 @@ Genome-level operations are governed by the genome's `AGENTS.md`, not this file.
---
## Cross-Genome Lint
## Cross-Genome Pull (Navigation Skill)
_Manual, monthly — requires operator initiation. Not automated._
Cross-genome knowledge moves by **pull, never push**: the genome you are working in draws material *in*; nothing is ever written into another genome. The cross-genome reading is performed by a deterministic collector **outside any agent's context**, so the agent still operates within ONE genome (Immutable Rule 1 holds). The `cross_source` registry flag decides which genomes may be read as sources.
1. Use `qmd search "<concept>"` to find pages covering the same concept across genomes.
2. Identify:
- Concepts defined in 2+ genomes with potentially conflicting definitions.
- Entities referenced across genomes without a canonical cross-genome wikilink.
- Concepts in genome-X that should link to genome-Y but don't.
3. Report findings. Do not modify any files.
4. For each finding: create a conflict note in the genome where resolution belongs, following that genome's §Conflict procedure.
### How it works
Three actors, mirroring the ingest two-phase split:
1. **Collector** (`collect-crossgen.sh`, deterministic, agent-free). Clones each genome flagged `cross_source: yes` **read-only at its remote HEAD** — a disposable checkout, for freshness; never the pinned submodule state. Reads each `wiki/index.md` plus the relevant pages and assembles a **dossier of excerpts with provenance** (source genome, page, date/commit). Writes nothing to any source genome.
2. **Synthesis** (agent, navigation skill, `read`/`edit` only). Reads **only the dossier** — a single artifact inside the working genome's context — then the skill deposits **one** abstract, non-private raw into the working genome at `raw/articles/crossgen-&lt;topic&gt;-&lt;&lt;YYYY-MM-DD&gt;.md`, and STOPS.
3. **Target ingest.** The working genome's own standard pipeline processes that raw → PR → human gate. Same gate as any other source.
### When to pull
Pull is initiated deliberately (operator- or context-driven, never on a timer). Produce a crossgen raw ONLY when all three hold:
1. **Ownership elsewhere.** The concept, entity, or pattern is defined and maintained in another genome, and you need it framed for the working domain.
2. **Structural relevance.** It influences decisions, patterns, or entities here — not a casual mention.
3. **No fresh local coverage.** `qmd search "&lt;concept&gt;"` in the working genome returns nothing, or only a stub that needs enrichment.
If in doubt, do NOT pull. A missed cross-reference is cheaper than crossgen spam.
### Boundaries (enforced by the master)
- **Sources are restricted to `cross_source: yes` genomes.** A genome flagged `no` (e.g., a client / confidential file) is NEVER read as a source — the collector skips it physically. The wall decides what may flow; it does not rely on the agent's discipline.
- **Sources are read-only, at HEAD.** No write, commit, branch, or PR in any genome other than the one being worked on.
- **NEVER `git submodule update --remote`.** Read other genomes via disposable read-only clones — never by moving this master's submodule pointers (that is ASK FIRST).
- **NEVER read `*/private/*`.** The skill runs `PRIVATE_CONTEXT: disabled` and `private/` is an encrypted blob; even on an unlocked host, private paths are off-limits.
- Confidential / client genomes are normally isolated from cross-genome pulls entirely (operator policy). Whatever genome a pull runs into, the output raw must be abstract and non-private.
### Output raw (the only artifact written)
**Path (in the working genome):** `raw/articles/crossgen-&lt;topic&gt;-&lt;&lt;YYYY-MM-DD&gt;.md`
Plain text. No YAML frontmatter (raw is immutable input). **No wikilinks of any kind** — never a `[[../genome-*/...]]` path.
```markdown
&gt; Cross-genome pull | Into: genome-&lt;working&gt; | Sources: genome-&lt;a&gt; (wiki/concepts/x.md), genome-&lt;b&gt; (wiki/entities/y.md) | HEAD: &lt;short-sha…&gt; | Date: YYYY-MM-DD
```
# &lt;Topic&gt; (synthesized from other genomes)
## What the source genomes say
[Abstract, faithful synthesis of the relevant material. Plain text, no private data, no wikilinks.]
## Relevance to this genome
[Why it matters in the working domain; textual references to existing local entities, if any.]
## Suggested local action
[Semantic hint for this genome's ingest: e.g., create/update wiki/concepts/&lt;concept&gt;.md, map local relationships.]
---
- Each pull writes a **new, dated** crossgen file — never overwrite or edit an existing raw (raw is immutable). Deduplication happens later, at the **wiki** level: the working genome's normal ingest reconciles against existing pages via its §Conflict procedure.
- The raw is processed by the working genome's standard ingest as an ordinary `raw/articles/` source — no special path.
- The collector and the raw deposit are the **deterministic** side of the skill; the agent only synthesizes content. Agents never create, modify, or delete files in any `raw/` directly.
---
That closes the remaining audit items for `agents-master.md`. The file is now fully pull-oriented and consistent with the dossier.