Merge branch 'release/1.3.0' into main
This commit is contained in:
commit
4b99b0acd2
4 changed files with 361 additions and 74 deletions
2
Makefile
2
Makefile
|
|
@ -1,5 +1,5 @@
|
|||
# =============================================================================
|
||||
# Knowledge Genome - Makefile v. 1.2.5
|
||||
# Knowledge Genome - Makefile v. 1.3.0
|
||||
# Orchestrates the setup and management of the knowledge base.
|
||||
# =============================================================================
|
||||
|
||||
|
|
|
|||
|
|
@ -1,93 +1,92 @@
|
|||
---
|
||||
name: ingest
|
||||
description: Semantic pass of a single raw source into the current genome's wiki — read the source, write sources/entities/concepts, handle contradictions, then emit a manifest and STOP. Use when a new file lands in raw/. Does NOT do git, log, index, lint, or PRs (a post-processor handles those), and does NOT handle private sources or project repos.
|
||||
description: Semantic pass of a single raw source into the current genome's wiki. The model ONLY extracts structured semantic content (summary, entities, concepts, contradictions) and returns one JSON object — it does not write files, produce frontmatter, slugs, git, index, log or PRs. A deterministic conform script (ingest-semantic.py) turns that JSON into properly-structured wiki pages + a manifest; run-ingest.sh then does index/log/lint/PR.
|
||||
license: see repository
|
||||
compatibility: Runs inside one genome checkout (cwd = genome root). Tools needed — read, edit only. NO bash, NO git. The deterministic steps (index, log, scoped lint, PR) run AFTER you exit, via run-ingest.sh. PRIVATE_CONTEXT must be disabled.
|
||||
allowed-tools: read edit
|
||||
compatibility: Driven by scripts/ingest-semantic.py (one schema-constrained call to a local model via Ollama /api/chat). NO agent tools are used — no read, no edit, no bash. The model never touches the filesystem. PRIVATE_CONTEXT must be disabled.
|
||||
metadata:
|
||||
framework: knowledge-genome
|
||||
phase: "1-ingest-semantic"
|
||||
mode: structured-json # lightweight agent + deterministic conform
|
||||
---
|
||||
|
||||
# Ingest — semantic pass
|
||||
# Ingest — semantic pass (structured-JSON)
|
||||
|
||||
You run inside ONE genome checkout. `AGENTS.md` (already in your context) is the
|
||||
authoritative contract. Your job is the **semantic pass only**: read the source, write
|
||||
the wiki pages, handle contradictions. You do **not** touch git, the log, the index, the
|
||||
linter, or PRs — a post-processor (`run-ingest.sh`) does all of that _after you stop_,
|
||||
from the manifest you leave behind. This keeps your context clean and your turns few,
|
||||
which matters on a small local model.
|
||||
This is the **light** semantic pass. The model's only job is to read one source
|
||||
and return a single JSON object describing what the source contains. It does
|
||||
**not** write files, choose paths, produce frontmatter, pick slugs, or touch
|
||||
git / index / log / PRs. All structure is owned by `scripts/ingest-semantic.py`,
|
||||
which conforms the model's JSON into wiki pages with enforced kebab-case paths
|
||||
and frontmatter, and writes `.ingest-manifest.json` in the exact schema
|
||||
`run-ingest.sh` consumes. This keeps the agent minimal and makes the output
|
||||
impossible to mis-shape, regardless of how small or quirky the local model is.
|
||||
|
||||
**Argument:** the relative path of the single raw source to ingest
|
||||
(e.g. `raw/articles/foo.md`). Process only this one.
|
||||
Pipeline:
|
||||
|
||||
## Pre-flight — stop the session if any check fails
|
||||
cd <genome checkout>
|
||||
scripts/ingest-semantic.py <genome> raw/articles/<file>.md # phase 1 (this)
|
||||
scripts/run-ingest.sh <genome> # phase 2 (deterministic)
|
||||
|
||||
1. Refuse if the argument path is under any `private/` directory.
|
||||
## Pre-flight (enforced by ingest-semantic.py, not by the model)
|
||||
|
||||
1. Refuse if the source path is under any `private/` directory.
|
||||
2. Refuse if `PRIVATE_CONTEXT` is not `disabled`.
|
||||
3. Confirm the file exists under `raw/`.
|
||||
3. Confirm the file exists under `raw/` and is non-empty.
|
||||
|
||||
## Semantic work (your only job)
|
||||
## What the model returns (the only contract)
|
||||
|
||||
1. Read the source once.
|
||||
2. Write `wiki/sources/<kebab-slug>.md` — faithful summary + key points, with the required
|
||||
frontmatter (`type: source`, `domain: <genome>`, `maturity: draft`,
|
||||
`last_updated: <today>`, `private: false`, sensible `tags`).
|
||||
3. For each entity (person, tool, org) → create or update `wiki/entities/<kebab-name>.md`.
|
||||
4. For each concept (pattern, theory, decision) → create or update
|
||||
`wiki/concepts/<kebab-name>.md`.
|
||||
5. On a real contradiction with an existing claim, follow `AGENTS.md` §Conflict: create
|
||||
`wiki/queries/conflict-<concept>-<YYYY-MM-DD>.md`. Never overwrite the existing page.
|
||||
|
||||
**Naming — you are the sole author of these names; nothing renames your files.** Use
|
||||
minimal kebab-case: lowercase letters, digits and hyphens only — no spaces, no underscores,
|
||||
no capitals. Pick stable names so the same entity is never created twice (always `acme`,
|
||||
never also `acme-corp`). The path you write a file to MUST be byte-for-byte the path you
|
||||
list in the manifest.
|
||||
|
||||
**Deciding create-vs-update and spotting contradictions — mind the context budget.** Use
|
||||
`wiki/index.md` to locate existing pages, then read **only** the handful that _this source
|
||||
actually names_ — the entities and concepts in the source's title and opening paragraphs —
|
||||
not everything the index lists. When in doubt, read fewer: a missed cross-link is far
|
||||
cheaper than a saturated context. Never scan whole directories.
|
||||
|
||||
## Finish: write the manifest, then STOP
|
||||
|
||||
As your **final action**, write `.ingest-manifest.json` at the genome root
|
||||
(NOT under `wiki/`) describing exactly what you did. Then stop — do not commit, lint,
|
||||
append to the log/index, or open anything.
|
||||
A single JSON object, decoding-constrained to this shape via Ollama's `format`:
|
||||
|
||||
```json
|
||||
{
|
||||
"raw_source": "raw/articles/foo.md",
|
||||
"reasoning": "One sentence for the log: what changed and why.",
|
||||
"pr_summary": "One or two sentences describing this ingest for the PR.",
|
||||
"contradictions": "None (or: 1 conflict file created — <concept>)",
|
||||
"pages": [
|
||||
{
|
||||
"path": "wiki/sources/foo.md",
|
||||
"summary": "One-line index summary.",
|
||||
"maturity": "draft",
|
||||
"status": "created"
|
||||
},
|
||||
{
|
||||
"path": "wiki/entities/acme.md",
|
||||
"summary": "Acme — vendor.",
|
||||
"status": "modified"
|
||||
}
|
||||
]
|
||||
"source_title": "Human title of the source",
|
||||
"source_summary": "Faithful, self-contained prose summary of the source.",
|
||||
"key_points": ["Concrete fact or claim worth indexing", "..."],
|
||||
"entities": [
|
||||
{ "name": "Acme", "kind": "org", "description": "Vendor referenced by the source." }
|
||||
],
|
||||
"concepts": [
|
||||
{ "name": "JWT RS256", "description": "Asymmetric token signing scheme the source uses." }
|
||||
],
|
||||
"contradictions": [
|
||||
{ "concept": "auth", "description": "Source claims X, contradicting the existing claim Y." }
|
||||
],
|
||||
"reasoning": "One sentence for the log: what this source adds.",
|
||||
"pr_summary": "One or two sentences describing this ingest for the PR."
|
||||
}
|
||||
```
|
||||
|
||||
Manifest rules:
|
||||
Field rules (guidance for the model; the script enforces _structure_):
|
||||
|
||||
- List every page you created or modified, with `status` `created` or `modified`.
|
||||
- `summary` is the one-line index description (≈12 words max). For conflict pages the
|
||||
summary is ignored — the index lists conflicts by slug only.
|
||||
- `maturity` is required only on `created` pages (it seeds the new index entry). It is
|
||||
ignored for `modified` pages, so omit it there.
|
||||
- Do NOT add a `model` field — the orchestrator records which model produced this run; you
|
||||
cannot know your own model name reliably, so do not guess one.
|
||||
- Do not invent a `run_id`, branch, commit, or PR — those belong to the post-processor.
|
||||
- `source_summary` is faithful and in the source's own language. No markdown
|
||||
headings inside any description field. No padding.
|
||||
- `entities` = every person, tool, org or product the source names. `kind` ∈
|
||||
`person|tool|org|product`. `description` = one or two factual sentences.
|
||||
- `concepts` = every pattern, theory, decision or named idea the source explains.
|
||||
- `contradictions` = only a claim that directly contradicts a widely-known fact
|
||||
or contradicts the source itself; otherwise an empty list.
|
||||
- Names are the natural name of the thing. The script normalises them to
|
||||
kebab-case and guarantees a single stable page per entity/concept.
|
||||
|
||||
One source per session. After writing the manifest, stop.
|
||||
## What the conform script guarantees (so the model cannot break it)
|
||||
|
||||
- **Paths:** `wiki/sources/<slug>.md`, `wiki/entities/<slug>.md`,
|
||||
`wiki/concepts/<slug>.md`, `wiki/queries/conflict-<concept>-<YYYY-MM-DD>.md`.
|
||||
- **Slugs:** minimal kebab-case (lowercase, digits, hyphens; no spaces /
|
||||
underscores / capitals).
|
||||
- **Frontmatter:** `type`, `domain: <genome>`, `maturity: draft`,
|
||||
`last_updated: <today>`, `private: false`, `tags`.
|
||||
- **Create-vs-update:** existing entity/concept pages are **appended to** (a
|
||||
section attributed to the new source), never overwritten. The source page is
|
||||
the canonical summary of that exact source and is (re)written.
|
||||
- **Manifest:** `.ingest-manifest.json` with `raw_source`, `reasoning`,
|
||||
`pr_summary`, `contradictions` (string), and `pages[]` (`path`, `summary`,
|
||||
`status`, plus `maturity` on created pages) — exactly what `run-ingest.sh`
|
||||
validates.
|
||||
|
||||
The model name is recorded by the orchestrator (`INGEST_MODEL`); the model does
|
||||
not self-report it. No `run_id`, branch, commit or PR is invented here — those
|
||||
belong to phase 2.
|
||||
|
||||
> Interactive use of `pi` (TUI) is unaffected and still available for manual
|
||||
> exploration. The **automated** ingest path no longer relies on `pi` or on
|
||||
> native tool-calling: it is the single schema-constrained call above.
|
||||
|
|
|
|||
277
skills/ingest/scripts/ingest-semantic.py
Normal file
277
skills/ingest/scripts/ingest-semantic.py
Normal file
|
|
@ -0,0 +1,277 @@
|
|||
#!/usr/bin/env python3
|
||||
# =============================================================================
|
||||
# skills/ingest/scripts/ingest-semantic.py
|
||||
# Phase 1 (semantic) of the Knowledge Genome ingest — the LIGHT version.
|
||||
#
|
||||
# The model does ONLY semantic extraction and returns ONE schema-constrained JSON
|
||||
# object (no tools, no file writing, no git, no frontmatter, no slugs). This script
|
||||
# then CONFORMS that output deterministically into wiki pages with enforced
|
||||
# frontmatter + kebab-case paths, and writes a .ingest-manifest.json in EXACTLY the
|
||||
# schema run-ingest.sh expects. run-ingest.sh (phase 2) then does index / log /
|
||||
# scoped-lint / PR, unchanged.
|
||||
#
|
||||
# cd <genome checkout>
|
||||
# ingest-semantic.py <genome> raw/articles/<file>.md # phase 1 (this)
|
||||
# run-ingest.sh <genome> # phase 2 (deterministic)
|
||||
#
|
||||
# Why this shape: local tool-calling via pi/ollama proved fragile, and a small
|
||||
# model does not reliably honour folders / naming / frontmatter / manifest schema
|
||||
# when it writes files itself. Here the model cannot break the contract because it
|
||||
# never touches the filesystem — the script owns all structure. Stdlib only.
|
||||
#
|
||||
# Emits a single JSON status line on stdout (for n8n / logs).
|
||||
# =============================================================================
|
||||
import json, os, re, sys, datetime, urllib.request, urllib.error
|
||||
|
||||
# --- config (override via env; these live in ~/.config/knowledge-genome.env) ---
|
||||
OLLAMA_URL = os.environ.get("OLLAMA_URL", "http://localhost:11434/api/chat")
|
||||
MODEL = os.environ.get("INGEST_MODEL", "qwen2.5:14b")
|
||||
NUM_CTX = int(os.environ.get("INGEST_NUM_CTX", "16384"))
|
||||
TIMEOUT = int(os.environ.get("INGEST_TIMEOUT", "600"))
|
||||
TODAY = datetime.date.today().isoformat()
|
||||
|
||||
|
||||
def die(stage, reason):
|
||||
print(json.dumps({"status": "error", "stage": stage, "reason": reason}))
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
# --- args + pre-flight (mirror the old skill's guards, enforced in code) ---
|
||||
if len(sys.argv) < 3:
|
||||
die("args", "usage: ingest-semantic.py <genome> <raw/rel/path.md>")
|
||||
genome = sys.argv[1]
|
||||
raw_rel = sys.argv[2].lstrip("./")
|
||||
|
||||
if "private/" in raw_rel or raw_rel.startswith("private"):
|
||||
die("preflight", "refusing private source: " + raw_rel)
|
||||
if os.environ.get("PRIVATE_CONTEXT", "disabled") != "disabled":
|
||||
die("preflight", "PRIVATE_CONTEXT must be disabled")
|
||||
if not raw_rel.startswith("raw/"):
|
||||
die("preflight", "source must live under raw/: " + raw_rel)
|
||||
if not os.path.isfile(raw_rel):
|
||||
die("preflight", "source not found in cwd: " + raw_rel)
|
||||
|
||||
with open(raw_rel, "r", encoding="utf-8") as fh:
|
||||
source_text = fh.read()
|
||||
if not source_text.strip():
|
||||
die("preflight", "source is empty: " + raw_rel)
|
||||
|
||||
|
||||
# --- the semantic contract (authoritative copy; SKILL.md documents it) ---
|
||||
SYSTEM_PROMPT = """You perform the SEMANTIC PASS of a single source into a knowledge wiki.
|
||||
Read the source and return ONLY structured data describing what it contains.
|
||||
You do not write files, you do not produce frontmatter, and you do not invent
|
||||
paths, slugs, branches, commits or PRs — a deterministic script does all of that.
|
||||
|
||||
Rules:
|
||||
- source_summary: a faithful, self-contained summary of the source, in the
|
||||
source's own language. Plain prose, no markdown headings.
|
||||
- key_points: the handful of concrete facts/claims worth indexing.
|
||||
- entities: every person, tool, organisation or product the source names.
|
||||
kind is one of person|tool|org|product. description is one or two factual
|
||||
sentences. No markdown headings inside the description.
|
||||
- concepts: every pattern, theory, decision or named idea the source explains.
|
||||
description is one or two factual sentences.
|
||||
- contradictions: ONLY when the source makes a claim that directly contradicts a
|
||||
widely-known fact or contradicts itself. Otherwise return an empty list.
|
||||
- Names must be the natural name of the thing; the script will normalise them.
|
||||
Do not pad. Be faithful to the source."""
|
||||
|
||||
# --- JSON schema -> constrained decoding (Ollama structured outputs) ---
|
||||
SCHEMA = {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"source_title": {"type": "string"},
|
||||
"source_summary": {"type": "string"},
|
||||
"key_points": {"type": "array", "items": {"type": "string"}},
|
||||
"entities": {"type": "array", "items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {"type": "string"},
|
||||
"kind": {"type": "string",
|
||||
"enum": ["person", "tool", "org", "product"]},
|
||||
"description": {"type": "string"},
|
||||
},
|
||||
"required": ["name", "description"],
|
||||
}},
|
||||
"concepts": {"type": "array", "items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {"type": "string"},
|
||||
"description": {"type": "string"},
|
||||
},
|
||||
"required": ["name", "description"],
|
||||
}},
|
||||
"contradictions": {"type": "array", "items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"concept": {"type": "string"},
|
||||
"description": {"type": "string"},
|
||||
},
|
||||
"required": ["concept", "description"],
|
||||
}},
|
||||
"reasoning": {"type": "string"},
|
||||
"pr_summary": {"type": "string"},
|
||||
},
|
||||
"required": ["source_title", "source_summary", "entities", "concepts"],
|
||||
}
|
||||
|
||||
|
||||
def call_model():
|
||||
payload = {
|
||||
"model": MODEL,
|
||||
"messages": [
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content":
|
||||
"Source path: " + raw_rel + "\n\n--- SOURCE START ---\n"
|
||||
+ source_text + "\n--- SOURCE END ---\n\nReturn the JSON now."},
|
||||
],
|
||||
"format": SCHEMA, # schema-constrained generation
|
||||
"stream": False,
|
||||
# deterministic extraction; repetition penalties OFF for structured output
|
||||
"options": {"temperature": 0.2, "repeat_penalty": 1.0, "num_ctx": NUM_CTX},
|
||||
}
|
||||
data = json.dumps(payload).encode("utf-8")
|
||||
req = urllib.request.Request(
|
||||
OLLAMA_URL, data=data, headers={"Content-Type": "application/json"})
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=TIMEOUT) as r:
|
||||
resp = json.loads(r.read().decode("utf-8"))
|
||||
except urllib.error.URLError as e:
|
||||
die("model", "ollama request failed: " + str(e))
|
||||
content = ((resp.get("message") or {}).get("content") or "").strip()
|
||||
# schema-constrained, but stay defensive if a model wraps it in a fence
|
||||
if content.startswith("```"):
|
||||
content = content.strip("`")
|
||||
brace = content.find("{")
|
||||
if brace >= 0:
|
||||
content = content[brace:]
|
||||
try:
|
||||
return json.loads(content)
|
||||
except json.JSONDecodeError as e:
|
||||
die("model", "model did not return valid JSON: " + str(e))
|
||||
|
||||
|
||||
# --- conform helpers (the script OWNS all structure) ---
|
||||
def slugify(s):
|
||||
s = re.sub(r"[^a-z0-9]+", "-", (s or "").strip().lower())
|
||||
return re.sub(r"-+", "-", s).strip("-") or "untitled"
|
||||
|
||||
|
||||
def twords(s, n=12):
|
||||
s = " ".join((s or "").split())
|
||||
w = s.split(" ")
|
||||
return s if len(w) <= n else " ".join(w[:n]) + "…"
|
||||
|
||||
|
||||
def frontmatter(ptype, tags):
|
||||
taglist = "[" + ", ".join(sorted(set(t for t in tags if t))) + "]"
|
||||
return ("---\n"
|
||||
f"type: {ptype}\n"
|
||||
f"domain: {genome}\n"
|
||||
"maturity: draft\n"
|
||||
f"last_updated: {TODAY}\n"
|
||||
"private: false\n"
|
||||
f"tags: {taglist}\n"
|
||||
"---\n")
|
||||
|
||||
|
||||
def write_new(path, ptype, title, body, tags):
|
||||
os.makedirs(os.path.dirname(path), exist_ok=True)
|
||||
with open(path, "w", encoding="utf-8") as f:
|
||||
f.write(frontmatter(ptype, tags))
|
||||
f.write(f"\n# {title}\n\n{body}\n")
|
||||
|
||||
|
||||
def append_section(path, source_slug, body):
|
||||
# never overwrite an existing page: accumulate, attributed to the new source
|
||||
with open(path, "a", encoding="utf-8") as f:
|
||||
f.write(f"\n\n## From [[sources/{source_slug}]]\n\n{body}\n")
|
||||
try: # best-effort bump of last_updated in the existing frontmatter
|
||||
with open(path, "r", encoding="utf-8") as f:
|
||||
txt = f.read()
|
||||
txt = re.sub(r"(?m)^last_updated:.*$", "last_updated: " + TODAY, txt, count=1)
|
||||
with open(path, "w", encoding="utf-8") as f:
|
||||
f.write(txt)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
# --- run the semantic pass ---
|
||||
sem = call_model()
|
||||
source_slug = slugify(os.path.splitext(os.path.basename(raw_rel))[0])
|
||||
pages = []
|
||||
|
||||
# 1. source page — canonical summary of THIS source (re)written
|
||||
src_path = f"wiki/sources/{source_slug}.md"
|
||||
src_status = "modified" if os.path.exists(src_path) else "created"
|
||||
kp_lines = "\n".join("- " + p for p in (sem.get("key_points") or []) if p.strip())
|
||||
src_body = (sem.get("source_summary") or "").strip()
|
||||
if kp_lines:
|
||||
src_body += "\n\n## Key points\n\n" + kp_lines
|
||||
src_body += f"\n\n## Source\n\n- [[{raw_rel}]]\n"
|
||||
src_tags = ([slugify(e.get("name", "")) for e in sem.get("entities", [])]
|
||||
+ [slugify(c.get("name", "")) for c in sem.get("concepts", [])])[:8]
|
||||
os.makedirs("wiki/sources", exist_ok=True)
|
||||
with open(src_path, "w", encoding="utf-8") as f:
|
||||
f.write(frontmatter("source", src_tags))
|
||||
f.write(f"\n# {sem.get('source_title') or source_slug}\n\n{src_body}\n")
|
||||
pages.append({"path": src_path,
|
||||
"summary": twords(sem.get("source_title") or source_slug),
|
||||
"maturity": "draft", "status": src_status})
|
||||
|
||||
|
||||
def handle(kind_dir, ptype, items):
|
||||
for it in items or []:
|
||||
name = (it.get("name") or "").strip()
|
||||
if not name:
|
||||
continue
|
||||
slug = slugify(name)
|
||||
path = f"wiki/{kind_dir}/{slug}.md"
|
||||
desc = (it.get("description") or "").strip()
|
||||
if os.path.exists(path):
|
||||
append_section(path, source_slug, desc)
|
||||
pages.append({"path": path, "summary": twords(desc), "status": "modified"})
|
||||
else:
|
||||
body = desc + f"\n\n## Sources\n\n- [[sources/{source_slug}]]\n"
|
||||
write_new(path, ptype, name, body, [genome, ptype])
|
||||
pages.append({"path": path, "summary": twords(desc),
|
||||
"maturity": "draft", "status": "created"})
|
||||
|
||||
|
||||
# 2. entities, 3. concepts
|
||||
handle("entities", "entity", sem.get("entities", []))
|
||||
handle("concepts", "concept", sem.get("concepts", []))
|
||||
|
||||
# 4. contradictions -> conflict pages (run-ingest routes wiki/queries/conflict-*)
|
||||
conflicts = sem.get("contradictions") or []
|
||||
conf_slugs = []
|
||||
for c in conflicts:
|
||||
cslug = slugify(c.get("concept", "unknown"))
|
||||
conf_slugs.append(cslug)
|
||||
path = f"wiki/queries/conflict-{cslug}-{TODAY}.md"
|
||||
write_new(path, "query", f"Conflict: {c.get('concept', '')}",
|
||||
(c.get("description") or "").strip()
|
||||
+ f"\n\n## Source\n\n- [[sources/{source_slug}]]\n",
|
||||
[genome, "conflict"])
|
||||
pages.append({"path": path, "summary": "", "maturity": "draft",
|
||||
"status": "created"})
|
||||
|
||||
contradictions_str = ("None" if not conflicts
|
||||
else f"{len(conflicts)} conflict file(s) created — "
|
||||
+ ", ".join(conf_slugs))
|
||||
|
||||
# --- write the manifest in EXACTLY run-ingest.sh's schema ---
|
||||
manifest = {
|
||||
"raw_source": raw_rel,
|
||||
"reasoning": sem.get("reasoning") or ("Ingest of " + raw_rel),
|
||||
"pr_summary": sem.get("pr_summary") or ("Semantic ingest of " + raw_rel),
|
||||
"contradictions": contradictions_str,
|
||||
"pages": pages,
|
||||
}
|
||||
with open(".ingest-manifest.json", "w", encoding="utf-8") as f:
|
||||
json.dump(manifest, f, indent=2, ensure_ascii=False)
|
||||
|
||||
print(json.dumps({"status": "ok", "stage": "semantic",
|
||||
"pages": len(pages), "model": MODEL,
|
||||
"manifest": ".ingest-manifest.json"}))
|
||||
|
|
@ -1,13 +1,17 @@
|
|||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# skills/ingest/scripts/run-ingest.sh
|
||||
# Post-pi orchestrator. Runs OUTSIDE pi's loop, on vm101, in the genome checkout.
|
||||
# Consumes .ingest-manifest.json (written by the ingest skill) and performs every
|
||||
# deterministic step — index, log, scoped lint, PR — so pi's context stays clean.
|
||||
# Post-semantic orchestrator. Runs OUTSIDE the model, on vm101, in the genome
|
||||
# checkout. Consumes .ingest-manifest.json (written by ingest-semantic.py) and
|
||||
# performs every deterministic step — index, log, scoped lint, PR.
|
||||
#
|
||||
# run-ingest.sh <genome_name> [manifest_path]
|
||||
#
|
||||
# Emits a single JSON result line on stdout for n8n to parse.
|
||||
#
|
||||
# every page listed in the manifest must exist on disk before we trust the run.
|
||||
# Everything else is unchanged: the manifest the semantic phase now produces is
|
||||
# already in this script's expected schema.
|
||||
# =============================================================================
|
||||
set -euo pipefail
|
||||
|
||||
|
|
@ -57,6 +61,13 @@ mapfile -t modified_paths < <(jq -r '.pages[] | select(.status=="modified") | .p
|
|||
all_paths=( "${created_paths[@]}" "${modified_paths[@]}" )
|
||||
[[ ${#all_paths[@]} -gt 0 ]] || fail "manifest" "no pages reported"
|
||||
|
||||
# --- the semantic phase (ingest-semantic.py) writes the files; verify
|
||||
# every manifest page actually exists on disk before trusting the run. Catches any
|
||||
# drift between what the manifest claims and what was really written. ---
|
||||
for _p in "${all_paths[@]}"; do
|
||||
[[ -f "$_p" ]] || fail "pages" "manifest lists a file not present on disk: ${_p}"
|
||||
done
|
||||
|
||||
conflict_label=""
|
||||
|
||||
# NOTE: No rollback. The steps below modify the working tree in order (index → log → commit).
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue