feat: Document ingest security model and human-gated workflow
This commit is contained in:
parent
2426b09b50
commit
ab1141e132
1 changed files with 45 additions and 9 deletions
54
README.md
54
README.md
|
|
@ -580,6 +580,17 @@ This means: any file matching `**/private/**` in `.gitattributes` is protected,
|
|||
including future `private/` directories created anywhere in the repo.
|
||||
The hook never needs updating when the encryption rules change.
|
||||
|
||||
### Untrusted agent output — manifest validation
|
||||
|
||||
The ingest agent's output is stochastic: a hallucinated manifest could carry a missing field,
|
||||
a wrong type, or a malicious path such as `wiki/../../etc/passwd`. `run-ingest.sh` therefore
|
||||
**validates the manifest before trusting any field** — it must be well-formed JSON with a
|
||||
string `raw_source` and an array `pages`, and **every `path` must be a string under `wiki/`
|
||||
with no `..`**. Anything else fails fast with a structured `{"status":"error"}` and no
|
||||
filesystem access outside the wiki, so a bad path can't drive a read or a lint outside the
|
||||
knowledge tree. This is the trust boundary between the (stochastic) model and the
|
||||
(deterministic, tested) post-processor.
|
||||
|
||||
### PRIVATE_CONTEXT toggle
|
||||
|
||||
The `PRIVATE_CONTEXT` toggle in `AGENTS.md` controls whether the LLM agent
|
||||
|
|
@ -753,9 +764,9 @@ For Forgejo webhook → automated ingest:
|
|||
1. Forgejo sends webhook on push to `raw/`
|
||||
2. n8n receives webhook, identifies new files
|
||||
3. n8n starts one agent session per new file (sequential, not parallel)
|
||||
4. Each session: inject `tail -n 20 wiki/log.md` + `PRIVATE_CONTEXT` state + source path
|
||||
5. Phase 1 agent (`/skill:ingest`) writes the manifest; Phase 2 `run-ingest.sh` opens the PR
|
||||
6. Human reviews and merges PR
|
||||
4. Each session: realign the checkout to the base (`git switch <base> && git reset --hard origin/<base>`), then inject `tail -n 20 wiki/log.md` + `PRIVATE_CONTEXT` state + source path
|
||||
5. Phase 1 agent (`/skill:ingest`) writes the manifest; Phase 2 `run-ingest.sh` opens the PR, then **stops**
|
||||
6. Human reviews — **merge to accept**, or close the PR + delete the `feat` branch to reject
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -778,15 +789,21 @@ only (no shell). It:
|
|||
6. Writes `.ingest-manifest.json` (the list of pages it created/modified, the model name,
|
||||
a one-line reasoning, the PR summary, and any contradictions) — then **stops**
|
||||
|
||||
**Phase 2 — `run-ingest.sh` (deterministic, outside the agent).** The post-processor
|
||||
consumes the manifest and does the mechanical work the model must not waste context on:
|
||||
**Phase 2 — `run-ingest.sh` (deterministic, outside the agent).** The post-processor first
|
||||
**validates the manifest** — well-formed JSON, expected shape, and every page path confined to
|
||||
`wiki/` with no `..` (see [Security Model](#security-model)) — then does the mechanical work the
|
||||
model must not waste context on:
|
||||
|
||||
7. Inserts each page into the correct `wiki/index.md` section **in alphabetical order**
|
||||
(`index-append.py`) and bumps the index `last_updated`
|
||||
8. Appends the `INGEST | <slug>` entry to `wiki/log.md`
|
||||
7. Inserts each page into the correct `wiki/index.md` section **in alphabetical order**,
|
||||
deduplicated by wikilink (a re-ingest updates the entry, never duplicates it), and bumps the
|
||||
index `last_updated` (`index-append.py`)
|
||||
8. Appends the `INGEST | <slug>` entry to `wiki/log.md` (the model name comes from the
|
||||
orchestrator via `INGEST_MODEL` — the agent cannot reliably know its own tag)
|
||||
9. Runs scoped lint on exactly the pages touched this run (`scoped-lint.sh`, reusing
|
||||
`lib/lint.sh`)
|
||||
10. Commits on `feat/ai-ingest-<slug>` and opens the PR using `templates/pr-description.md`
|
||||
10. Commits **only `wiki/`** on `feat/ai-ingest-<slug>` and opens a PR against the integration
|
||||
base (`INGEST_BASE`, default `main`); the body matches the `templates/pr-description.md`
|
||||
structure (Summary / Pages / Contradictions / Scoped Lint)
|
||||
11. Emits a single compact JSON line (status, slug, PR url, lint_clean, conflict) for n8n
|
||||
|
||||
The agent never runs git, never edits the index/log mechanically, and never lints — those
|
||||
|
|
@ -802,6 +819,25 @@ For private sources (`PRIVATE_CONTEXT: enabled` required):
|
|||
- All output goes to `wiki/private/<slug>.md` only
|
||||
- PR title: `[PRIVATE] ingest: <slug>`
|
||||
|
||||
**Branch lifecycle & the manual gate.** `run-ingest.sh` / `open-pr.sh` are deliberately
|
||||
"dumb": they create the `feat/ai-ingest-<slug>` branch, commit only `wiki/`, open the PR, and
|
||||
stop. They never reset, revert, or touch the integration branch — that lifecycle belongs to
|
||||
the orchestrator, around the human gate:
|
||||
|
||||
- **Before each session** the orchestrator realigns the checkout to the base
|
||||
(`git fetch && git switch <base> && git reset --hard origin/<base>`) — a reset of the _local_
|
||||
checkout to match the remote, never a force-push to the shared branch.
|
||||
- **After the PR opens, everything stops** until a human approves: one source per session,
|
||||
sequential, no new ingest until the pending PR is closed.
|
||||
- **Approve = merge. Reject = close the PR and delete the remote `feat` branch.** To undo an
|
||||
already-merged ingest, open a _revert PR_ against the base — never rewrite history on a
|
||||
shared branch.
|
||||
|
||||
The PR base is configurable via `INGEST_BASE` (default `main`). Per-page `maturity` already
|
||||
encodes stability and tags/releases mark versioned snapshots, so `main` is the integration
|
||||
branch today. If a linked project later _consumes_ a genome, set `INGEST_BASE=develop` to
|
||||
buffer ingests on `develop` and cut manual `develop → main` releases — no code change.
|
||||
|
||||
### Query
|
||||
|
||||
Triggered by an operator question.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue