knowledge-genome-orchestrator/skills/ingest/SKILL.md

85 lines
3.8 KiB
Markdown

---
name: ingest
description: Semantic pass of a single raw source into the current genome's wiki — read the source, write sources/entities/concepts, handle contradictions, then emit a manifest and STOP. Use when a new file lands in raw/. Does NOT do git, log, index, lint, or PRs (a post-processor handles those), and does NOT handle private sources or project repos.
license: see repository
compatibility: Runs inside one genome checkout (cwd = genome root). Tools needed — read, edit only. NO bash, NO git. The deterministic steps (index, log, scoped lint, PR) run AFTER you exit, via run-ingest.sh. PRIVATE_CONTEXT must be disabled.
allowed-tools: read edit
metadata:
framework: knowledge-genome
phase: "1-ingest-semantic"
---
# Ingest — semantic pass
You run inside ONE genome checkout. `AGENTS.md` (already in your context) is the
authoritative contract. Your job is the **semantic pass only**: read the source, write
the wiki pages, handle contradictions. You do **not** touch git, the log, the index, the
linter, or PRs — a post-processor (`run-ingest.sh`) does all of that _after you stop_,
from the manifest you leave behind. This keeps your context clean and your turns few,
which matters on a small local model.
**Argument:** the relative path of the single raw source to ingest
(e.g. `raw/articles/foo.md`). Process only this one.
## Pre-flight — stop the session if any check fails
1. Refuse if the argument path is under any `private/` directory.
2. Refuse if `PRIVATE_CONTEXT` is not `disabled`.
3. Confirm the file exists under `raw/`.
## Semantic work (your only job)
1. Read the source once.
2. Write `wiki/sources/<kebab-slug>.md` — faithful summary + key points, with the required
frontmatter (`type: source`, `domain: <genome>`, `maturity: draft`,
`last_updated: <today>`, `private: false`, sensible `tags`).
3. For each entity (person, tool, org) → create or update `wiki/entities/<kebab-name>.md`.
4. For each concept (pattern, theory, decision) → create or update
`wiki/concepts/<kebab-name>.md`.
5. On a real contradiction with an existing claim, follow `AGENTS.md` §Conflict: create
`wiki/queries/conflict-<concept>-<YYYY-MM-DD>.md`. Never overwrite the existing page.
Name files in kebab-case and pick stable names. Read `wiki/index.md` (and the specific
pages it points to) to decide create-vs-update and to spot contradictions. Do not scan
whole directories.
## Finish: write the manifest, then STOP
As your **final action**, write `.ingest-manifest.json` at the genome root
(NOT under `wiki/`) describing exactly what you did. Then stop — do not commit, lint,
append to the log/index, or open anything.
```json
{
"raw_source": "raw/articles/foo.md",
"reasoning": "One sentence for the log: what changed and why.",
"pr_summary": "One or two sentences describing this ingest for the PR.",
"contradictions": "None (or: 1 conflict file created — <concept>)",
"pages": [
{
"path": "wiki/sources/foo.md",
"summary": "One-line index summary.",
"maturity": "draft",
"status": "created"
},
{
"path": "wiki/entities/acme.md",
"summary": "Acme — vendor.",
"status": "modified"
}
]
}
```
Manifest rules:
- List every page you created or modified, with `status` `created` or `modified`.
- `summary` is the one-line index description (≈12 words max). For conflict pages the
summary is ignored — the index lists conflicts by slug only.
- `maturity` is required only on `created` pages (it seeds the new index entry). It is
ignored for `modified` pages, so omit it there.
- Do NOT add a `model` field — the orchestrator records which model produced this run; you
cannot know your own model name reliably, so do not guess one.
- Do not invent a `run_id`, branch, commit, or PR — those belong to the post-processor.
One source per session. After writing the manifest, stop.