--- name: ingest description: Semantic pass of a single raw source into the current genome's wiki. The model ONLY extracts structured semantic content (summary, entities, concepts, contradictions) and returns one JSON object — it does not write files, produce frontmatter, slugs, git, index, log or PRs. A deterministic conform script (ingest-semantic.py) turns that JSON into properly-structured wiki pages + a manifest; run-ingest.sh then does index/log/lint/PR. license: see repository compatibility: Driven by scripts/ingest-semantic.py (one schema-constrained call to a local model via Ollama /api/chat). NO agent tools are used — no read, no edit, no bash. The model never touches the filesystem. PRIVATE_CONTEXT must be disabled. metadata: framework: knowledge-genome phase: "1-ingest-semantic" mode: structured-json # lightweight agent + deterministic conform --- # Ingest — semantic pass (structured-JSON) This is the **light** semantic pass. The model's only job is to read one source and return a single JSON object describing what the source contains. It does **not** write files, choose paths, produce frontmatter, pick slugs, or touch git / index / log / PRs. All structure is owned by `scripts/ingest-semantic.py`, which conforms the model's JSON into wiki pages with enforced kebab-case paths and frontmatter, and writes `.ingest-manifest.json` in the exact schema `run-ingest.sh` consumes. This keeps the agent minimal and makes the output impossible to mis-shape, regardless of how small or quirky the local model is. Pipeline: cd scripts/ingest-semantic.py raw/articles/.md # phase 1 (this) scripts/run-ingest.sh # phase 2 (deterministic) ## Pre-flight (enforced by ingest-semantic.py, not by the model) 1. Refuse if the source path is under any `private/` directory. 2. Refuse if `PRIVATE_CONTEXT` is not `disabled`. 3. Confirm the file exists under `raw/` and is non-empty. ## What the model returns (the only contract) A single JSON object, decoding-constrained to this shape via Ollama's `format`: ```json { "source_title": "Human title of the source", "source_summary": "Faithful, self-contained prose summary of the source.", "key_points": ["Concrete fact or claim worth indexing", "..."], "entities": [ { "name": "Acme", "kind": "org", "description": "Vendor referenced by the source." } ], "concepts": [ { "name": "JWT RS256", "description": "Asymmetric token signing scheme the source uses." } ], "contradictions": [ { "concept": "auth", "description": "Source claims X, contradicting the existing claim Y." } ], "reasoning": "One sentence for the log: what this source adds.", "pr_summary": "One or two sentences describing this ingest for the PR." } ``` Field rules (guidance for the model; the script enforces _structure_): - `source_summary` is faithful and in the source's own language. No markdown headings inside any description field. No padding. - `entities` = every person, tool, org or product the source names. `kind` ∈ `person|tool|org|product`. `description` = one or two factual sentences. - `concepts` = every pattern, theory, decision or named idea the source explains. - `contradictions` = only a claim that directly contradicts a widely-known fact or contradicts the source itself; otherwise an empty list. - Names are the natural name of the thing. The script normalises them to kebab-case and guarantees a single stable page per entity/concept. ## What the conform script guarantees (so the model cannot break it) - **Paths:** `wiki/sources/.md`, `wiki/entities/.md`, `wiki/concepts/.md`, `wiki/queries/conflict--.md`. - **Slugs:** minimal kebab-case (lowercase, digits, hyphens; no spaces / underscores / capitals). - **Frontmatter:** `type`, `domain: `, `maturity: draft`, `last_updated: `, `private: false`, `tags`. - **Create-vs-update:** existing entity/concept pages are **appended to** (a section attributed to the new source), never overwritten. The source page is the canonical summary of that exact source and is (re)written. - **Manifest:** `.ingest-manifest.json` with `raw_source`, `reasoning`, `pr_summary`, `contradictions` (string), and `pages[]` (`path`, `summary`, `status`, plus `maturity` on created pages) — exactly what `run-ingest.sh` validates. The model name is recorded by the orchestrator (`INGEST_MODEL`); the model does not self-report it. No `run_id`, branch, commit or PR is invented here — those belong to phase 2. > Interactive use of `pi` (TUI) is unaffected and still available for manual > exploration. The **automated** ingest path no longer relies on `pi` or on > native tool-calling: it is the single schema-constrained call above.