docs: Clarify ingest pipeline roles and automation

This commit is contained in:
Matteo Cherubini 2026-06-19 05:47:21 +02:00
parent 593819451e
commit 5491e807e0

View file

@ -175,6 +175,11 @@ knowledge-genome-orchestrator/ ← This repository (setup tooling)
> The `skills/ingest/` directory is version-controlled here but **deployed** to the AI
> node (vm101) under `~/.pi/agent/skills/ingest`. The agent (`pi`) does only semantic work
> and writes a manifest; `run-ingest.sh` does the mechanical steps. See [Workflows → Ingest](#ingest).
>
> ingest-semantic.py: one schema-constrained call to local model, returns JSON. run-ingest.sh: index/log/lint/PR.
> Semantic JSON extraction → deterministic wiki conform + manifest.
>
> cp skills/ingest/\* ~/.pi/agent/skills/ingest/ after make setup. Updated via git pull on laptop, pushed to vm101 via SSH in n8n flow.
---
@ -1062,7 +1067,7 @@ grep "^## \[" wiki/log.md | grep "CONFLICT" # All conflicts
grep "^## \[2026-05" wiki/log.md # Entries from a specific month
```
The orchestrator always injects only `tail -n 20 wiki/log.md` into agent context.
ingest-semantic.py receives source text + existing entity/concept names (from index) as prompt context.
The LLM never loads the full log.
---
@ -1122,6 +1127,8 @@ Note: `.obsidian/` is in `.gitignore`. Workspace and plugin settings are local
### n8n automation
n8n → SSH → ingest-semantic.py <genome> <raw> → run-ingest.sh <genome>.
n8n (running on the storage node) can automate the ingest pipeline:
1. Forgejo webhook fires on push to a genome's `raw/` directory