diff --git a/deploy/vm101/README.md b/deploy/vm101/README.md new file mode 100644 index 0000000..b7491e6 --- /dev/null +++ b/deploy/vm101/README.md @@ -0,0 +1,60 @@ +# deploy/vm101 + +System artifacts deployed to **vm101** (the GPU ingest node). The repo is the +source of truth; the live copies live in `/usr/local/bin/`. Edit here, then +`sudo ./install.sh` on vm101 to push changes. + +## Contents + +- `n8n-pi-wrap` — forced-command wrapper that fronts every n8n→vm101 SSH call. +- `install.sh` — installs the wrapper(s) into `/usr/local/bin` (idempotent). + +## n8n-pi-wrap + +The only entry point for the `n8n-runner` identity onto vm101. n8n never gets a +shell here: whatever it sends arrives as `SSH_ORIGINAL_COMMAND`, and a `case` +whitelist decides what runs. Anything outside the whitelist is denied and logged. + +Allowed commands: + +| Command | What it does | +|---|---| +| `pi run` | one-shot prompt via stdin (proof-of-life / health) | +| `pi ingest ` | the real two-phase ingest (below) | +| `ollama list` / `ollama ps` | model introspection | + +### The two-phase ingest + +`pi ingest` runs the clean-start + two phases, then stops: + +1. **Clean start** — `git fetch && switch && reset --hard origin/`. + Destroys only vm101's *scratch* checkout (never a shared branch, never a + force-push) — this determinism is by design. +2. **Semantic** — `skills/ingest/scripts/ingest-semantic.py ` + drives `pi` to WRITE `wiki/*` pages + `.ingest-manifest.json`. + NOTE: this is the script, NOT `pi -p "/skill:ingest ..."` (that form makes the + model reply in chat and write nothing — the classic "manifest not found" trap). +3. **Mechanical** — `skills/ingest/scripts/run-ingest.sh ` validates the + manifest, then index/log/scoped-lint/commit on `feat/ai-ingest-` and opens + a PR onto ``. Emits one JSON line `{status,slug,pr_url,...}`. + +The PR then waits for the human gate. One raw per session, sequential. + +### Input hardening + +Both inputs come from `SSH_ORIGINAL_COMMAND`, so both are validated: + +- `genome` — kebab lowercase `^[a-z0-9-]+$`. +- `raw_path` — must be under `raw/`, no `..` traversal, restricted charset + `[A-Za-z0-9._/-]`, and the file must exist. Rejected paths return a JSON error. + +Config (`INGEST_BASE`, `GENOMES_ROOT`, `INGEST_MODEL`, Forgejo token) is sourced +from `~/.config/knowledge-genome.env` (0600, owner-only). + +## Install / update + +```bash +# on vm101 +cd ~/knowledge-genome-orchestrator/deploy/vm101 +sudo ./install.sh +``` diff --git a/deploy/vm101/install.sh b/deploy/vm101/install.sh new file mode 100755 index 0000000..7970a21 --- /dev/null +++ b/deploy/vm101/install.sh @@ -0,0 +1,8 @@ +#!/bin/bash +# deploy/vm101/install.sh — install vm101 wrappers from repo -> /usr/local/bin (idempotent). +# Run ON vm101 with sudo: sudo ./install.sh +set -euo pipefail +here="$(cd "$(dirname "$0")" && pwd)" +install -m 0755 "${here}/n8n-pi-wrap" /usr/local/bin/n8n-pi-wrap +echo "installed: /usr/local/bin/n8n-pi-wrap" +bash -n /usr/local/bin/n8n-pi-wrap && echo "syntax: ok" diff --git a/deploy/vm101/n8n-pi-wrap b/deploy/vm101/n8n-pi-wrap new file mode 100755 index 0000000..b0db80f --- /dev/null +++ b/deploy/vm101/n8n-pi-wrap @@ -0,0 +1,71 @@ +#!/bin/bash +set -eu +cmd="${SSH_ORIGINAL_COMMAND:-}" +case "$cmd" in + "pi run") + logger -t n8n-pi-wrap "ok: pi run (prompt via stdin)" + prompt=$(cat) + exec /usr/local/bin/pi --no-tools --mode json -p "$prompt" ` (two tokens). + rest="${cmd#pi ingest }" + genome="${rest%% *}" + raw_path="${rest#* }" + # reject: missing second token, or any extra token (a space left in raw_path) + if [ "$genome" = "$rest" ] || [ -z "$raw_path" ] || [ "$raw_path" != "${raw_path#* }" ]; then + echo '{"status":"error","reason":"usage: pi ingest "}'; exit 1 + fi + # genome slug: kebab lowercase only + case "$genome" in ""|*[!a-z0-9-]*) echo '{"status":"error","reason":"invalid genome name"}'; exit 1;; esac + # raw_path whitelist: MUST live under raw/, no traversal, restricted charset. + # - must start with "raw/" - no ".." segment - no absolute path / leading slash + # - allowed chars: [A-Za-z0-9._/-] (kebab slugs + subdirs like raw/articles/foo.md) + case "$raw_path" in + raw/*) : ;; + *) echo '{"status":"error","reason":"raw_path must be under raw/"}'; exit 1;; + esac + case "$raw_path" in + *..*|*//*) echo '{"status":"error","reason":"raw_path traversal"}'; exit 1;; + esac + case "$raw_path" in + *[!A-Za-z0-9._/-]*) echo '{"status":"error","reason":"raw_path illegal chars"}'; exit 1;; + esac + + logger -t n8n-pi-wrap "ok: pi ingest ${genome} ${raw_path}" + set -a; . "${HOME}/.config/knowledge-genome.env"; set +a + cd "${GENOMES_ROOT}/${genome}" || { echo '{"status":"error","reason":"unknown genome"}'; exit 1; } + + # The raw file must actually exist under the genome's raw/ dir. + [ -f "$raw_path" ] || { echo '{"status":"error","reason":"raw file not found"}'; exit 1; } + + # Clean start on the configured base (develop), pinned to the remote. Destroys only + # vm101's scratch checkout (never a shared branch, never a force-push) — this is by design. + git fetch -q origin \ + && git switch -q "${INGEST_BASE:-main}" 2>/dev/null \ + && git reset -q --hard "origin/${INGEST_BASE:-main}" + + # SEMANTIC step: dedicated script drives pi to WRITE wiki pages + manifest. + # (NOT `pi -p "/skill:ingest ..."`, which makes the model reply in chat and write nothing.) + log="$(mktemp -t pi-ingest.XXXXXX.log)" + "${HOME}/.pi/agent/skills/ingest/scripts/ingest-semantic.py" "${genome}" "${raw_path}" \ + >"$log" 2>&1 \ + || { echo "{\"status\":\"error\",\"stage\":\"semantic\",\"reason\":\"ingest-semantic failed\",\"log\":\"${log}\"}"; exit 1; } + + # MECHANICAL step: validate manifest -> index/log/scoped-lint/commit/PR -> 1 JSON line + exec "${HOME}/.pi/agent/skills/ingest/scripts/run-ingest.sh" "${genome}" + ;; + "ollama list") + logger -t n8n-pi-wrap "ok: ollama list" + exec /usr/local/bin/ollama list + ;; + "ollama ps") + logger -t n8n-pi-wrap "ok: ollama ps" + exec /usr/local/bin/ollama ps + ;; + *) + logger -t n8n-pi-wrap "denied: ${cmd:-}" + echo "unauthorized command" >&2 + exit 1 + ;; +esac