docs(crossgen): Streamline knowledge pull, remove agent synthesis step
This commit is contained in:
parent
351e4a13af
commit
22239f4bb5
1 changed files with 18 additions and 21 deletions
|
|
@ -84,13 +84,14 @@ Genome-level operations are governed by the genome's `AGENTS.md`, not this file.
|
||||||
|
|
||||||
Cross-genome knowledge moves by **pull, never push**: the genome you are working in draws material *in*; nothing is ever written into another genome. The cross-genome reading is performed by a deterministic collector **outside any agent's context**, so the agent still operates within ONE genome (Immutable Rule 1 holds). The `cross_source` registry flag decides which genomes may be read as sources.
|
Cross-genome knowledge moves by **pull, never push**: the genome you are working in draws material *in*; nothing is ever written into another genome. The cross-genome reading is performed by a deterministic collector **outside any agent's context**, so the agent still operates within ONE genome (Immutable Rule 1 holds). The `cross_source` registry flag decides which genomes may be read as sources.
|
||||||
|
|
||||||
|
There is **no separate synthesis step**: retrieving and then distilling twice would only add LLM cost and lose information. The collector *retrieves* (like a search) and deposits the result as a raw; the working genome's own ingest *distills* it once, for this genome's needs.
|
||||||
|
|
||||||
### How it works
|
### How it works
|
||||||
|
|
||||||
Three actors, mirroring the ingest two-phase split:
|
Two actors:
|
||||||
|
|
||||||
1. **Collector** (`collect-crossgen.sh`, deterministic, agent-free). Clones each genome flagged `cross_source: yes` **read-only at its remote HEAD** — a disposable checkout, for freshness; never the pinned submodule state. Reads each `wiki/index.md` plus the relevant pages and assembles a **dossier of excerpts with provenance** (source genome, page, date/commit). Writes nothing to any source genome.
|
1. **Collector** (`collect-crossgen.sh`, deterministic, agent-free). Clones each genome flagged `cross_source: yes` **read-only at its remote HEAD** — a disposable checkout, for freshness; never the pinned submodule state. The clone is **keyless**, so `private/` stays an encrypted blob and is unreadable. It indexes the public wikis with `qmd`, runs `qmd search "<topic>"`, and assembles a **dossier**: the text of the matching pages plus per-excerpt provenance (source genome, page, HEAD short-sha, date), with every `[[wikilink]]` neutralized to plain text. It deposits the dossier as **one** raw in the working genome at `raw/articles/crossgen-<topic>-<YYYY-MM-DD>.md`, commits, and pushes. Nothing is written to any source genome.
|
||||||
2. **Synthesis** (agent, navigation skill, `read`/`edit` only). Reads **only the dossier** — a single artifact inside the working genome's context — then the skill deposits **one** abstract, non-private raw into the working genome at `raw/articles/crossgen-<topic>-<YYYY-MM-DD>.md`, and STOPS.
|
2. **Target ingest.** The working genome's standard ingest reads that raw as an ordinary source and distills it into wiki pages for the local domain — one semantic pass → PR → human gate. Same gate as any other source.
|
||||||
3. **Target ingest.** The working genome's own standard pipeline processes that raw → PR → human gate. Same gate as any other source.
|
|
||||||
|
|
||||||
### When to pull
|
### When to pull
|
||||||
|
|
||||||
|
|
@ -104,34 +105,30 @@ If in doubt, do NOT pull. A missed cross-reference is cheaper than crossgen spam
|
||||||
|
|
||||||
### Boundaries (enforced by the master)
|
### Boundaries (enforced by the master)
|
||||||
|
|
||||||
- **Sources are restricted to `cross_source: yes` genomes.** A genome flagged `no` (e.g., a client / confidential file) is NEVER read as a source — the collector skips it physically. The wall decides what may flow; it does not rely on the agent's discipline.
|
- **Sources are restricted to `cross_source: yes` genomes.** A genome flagged `no` (e.g., a client / confidential file) is NEVER read as a source — the collector skips it physically. The wall is structural, not a matter of the agent's discipline.
|
||||||
|
- **Keyless collection.** The collector holds no git-crypt key, so `private/` stays ciphertext and cannot be read — privacy does not depend on the agent behaving.
|
||||||
- **Sources are read-only, at HEAD.** No write, commit, branch, or PR in any genome other than the one being worked on.
|
- **Sources are read-only, at HEAD.** No write, commit, branch, or PR in any genome other than the one being worked on.
|
||||||
- **NEVER `git submodule update --remote`.** Read other genomes via disposable read-only clones — never by moving this master's submodule pointers (that is ASK FIRST).
|
- **NEVER `git submodule update --remote`.** Read other genomes via disposable read-only clones — never by moving this master's submodule pointers (that is ASK FIRST).
|
||||||
- **NEVER read `*/private/*`.** The skill runs `PRIVATE_CONTEXT: disabled` and `private/` is an encrypted blob; even on an unlocked host, private paths are off-limits.
|
- The deposited raw must contain **no wikilinks and no private data**; it is processed by the working genome's normal ingest + human gate.
|
||||||
- Confidential / client genomes are normally isolated from cross-genome pulls entirely (operator policy). Whatever genome a pull runs into, the output raw must be abstract and non-private.
|
|
||||||
|
|
||||||
### Output raw (the only artifact written)
|
### Output raw (the only artifact written)
|
||||||
|
|
||||||
**Path (in the working genome):** `raw/articles/crossgen-<topic>-<YYYY-MM-DD>.md`
|
**Path (in the working genome):** `raw/articles/crossgen-<topic>-<YYYY-MM-DD>.md`
|
||||||
Plain text. No YAML frontmatter (raw is immutable input). **No wikilinks of any kind** — never a `[[../genome-*/...]]` path.
|
Plain text. No YAML frontmatter (raw is immutable input). **No wikilinks of any kind** — `[[...]]` from source pages are flattened to plain text so they never become broken cross-references here.
|
||||||
|
|
||||||
```markdown
|
```markdown
|
||||||
> Cross-genome pull | Into: genome-<working> | Sources: genome-<a> (wiki/concepts/x.md), genome-<b> (wiki/entities/y.md) | HEAD: <short-sha…> | Date: YYYY-MM-DD
|
> Cross-genome pull | Into: genome-<working> | Query: "<topic>" | Date: YYYY-MM-DD
|
||||||
|
|
||||||
# <Topic> (synthesized from other genomes)
|
## From genome-<a> — wiki/concepts/<x>.md (HEAD <short-sha>)
|
||||||
|
[retrieved page text — wikilinks flattened to plain text, no private data]
|
||||||
|
|
||||||
## What the source genomes say
|
## From genome-<b> — wiki/entities/<y>.md (HEAD <short-sha>)
|
||||||
[Abstract, faithful synthesis of the relevant material. Plain text, no private data, no wikilinks.]
|
[retrieved page text]
|
||||||
|
|
||||||
## Relevance to this genome
|
|
||||||
[Why it matters in the working domain; textual references to existing local entities, if any.]
|
|
||||||
|
|
||||||
## Suggested local action
|
|
||||||
[Semantic hint for this genome's ingest: e.g., create/update wiki/concepts/<concept>.md, map local relationships.]
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Rules:**
|
**Rules:**
|
||||||
|
|
||||||
- Each pull writes a **new, dated** crossgen file — never overwrite or edit an existing raw (raw is immutable). Deduplication happens later, at the **wiki** level: the working genome's normal ingest reconciles against existing pages via its §Conflict procedure.
|
- **Deterministic deposit.** The raw is written by the collector (the skill's mechanical side), never edited by an agent — agents never create, modify, or delete files in any `raw/`. Each pull is a **new, dated** file (raw is immutable).
|
||||||
- The raw is processed by the working genome's standard ingest as an ordinary `raw/articles/` source — no special path.
|
- **Distillation happens at ingest, once.** The working genome's normal ingest turns the dossier into wiki pages and **deduplicates against existing pages** via its §Conflict procedure. There is no pre-summarization.
|
||||||
- The collector and the raw deposit are the **deterministic** side of the skill; the agent only synthesizes content. Agents never create, modify, or delete files in any `raw/` directly.
|
- **Bound large retrievals deterministically** (top-N pages / relevant sections) rather than adding an LLM pass — keeps the dossier-raw and the ingest job reasonable at any scale.
|
||||||
|
- *Optional (large + expensive-cloud deployments only):* a cheap **local** pre-distillation may be inserted before an expensive cloud ingest to shrink its input. This is an opt-in optimization; the default is no synthesis.
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue