docs: Update README
This commit is contained in:
parent
42c1302035
commit
13d34b4906
1 changed files with 170 additions and 58 deletions
228
README.md
228
README.md
|
|
@ -19,16 +19,17 @@ and a human-in-the-loop Git Flow for quality control.
|
|||
5. [Configuration](#configuration)
|
||||
6. [Quick Start](#quick-start)
|
||||
7. [Makefile Reference](#makefile-reference)
|
||||
8. [Genome Lifecycle](#genome-lifecycle)
|
||||
9. [Security Model](#security-model)
|
||||
10. [Key Management](#key-management)
|
||||
11. [Agent Sessions](#agent-sessions)
|
||||
12. [Workflows](#workflows)
|
||||
13. [Knowledge Quality](#knowledge-quality)
|
||||
14. [Knowledge Schema](#knowledge-schema)
|
||||
15. [Collaboration Model](#collaboration-model)
|
||||
16. [Optional Extensions](#optional-extensions)
|
||||
17. [Troubleshooting](#troubleshooting)
|
||||
8. [Testing](#testing)
|
||||
9. [Genome Lifecycle](#genome-lifecycle)
|
||||
10. [Security Model](#security-model)
|
||||
11. [Key Management](#key-management)
|
||||
12. [Agent Sessions](#agent-sessions)
|
||||
13. [Workflows](#workflows)
|
||||
14. [Knowledge Quality](#knowledge-quality)
|
||||
15. [Knowledge Schema](#knowledge-schema)
|
||||
16. [Collaboration Model](#collaboration-model)
|
||||
17. [Optional Extensions](#optional-extensions)
|
||||
18. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -110,10 +111,18 @@ genome-{name}/
|
|||
| Wiki | `wiki/` | LLM | Agent creates, updates, cross-links, maintains. |
|
||||
| Schema | `AGENTS.md` | Human + LLM | Co-evolved contract defining structure and workflows. |
|
||||
|
||||
### Linked projects (optional)
|
||||
|
||||
A genome can optionally declare a **linked project repository** — a separate repo where
|
||||
the knowledge in that genome is meant to be applied (e.g. `genome-dev` linked to an app
|
||||
repo). The link is recorded as a third field in the registry and rendered into the
|
||||
genome's `AGENTS.md` (`## Linked Project`). A genome with no link is _knowledge-only_ and
|
||||
behaves exactly as before. See [Configuration](#configuration).
|
||||
|
||||
### Framework structure
|
||||
|
||||
```text
|
||||
knowledge-genome-setup/ ← This repository (setup tooling)
|
||||
knowledge-genome-orchestrator/ ← This repository (setup tooling)
|
||||
├── globals.env ← Static KEY=VALUE config (Make-includable)
|
||||
├── registry.sh ← Bash-only: GENOMES array + dynamic paths
|
||||
├── Makefile ← Entry point for all operations
|
||||
|
|
@ -121,6 +130,7 @@ knowledge-genome-setup/ ← This repository (setup tooling)
|
|||
│ ├── output.sh ← Terminal helpers (colors, log levels)
|
||||
│ ├── deps.sh ← Dependency validation
|
||||
│ ├── scaffold.sh ← Template rendering engine
|
||||
│ ├── structure.sh ← Canonical genome layout (single source of truth)
|
||||
│ ├── lint.sh ← Per-file validation functions
|
||||
│ └── git-crypt.sh ← git-crypt lifecycle (init, export, verify, rotate)
|
||||
├── providers/
|
||||
|
|
@ -131,18 +141,41 @@ knowledge-genome-setup/ ← This repository (setup tooling)
|
|||
│ ├── setup-master.sh ← Master repo initialisation
|
||||
│ ├── setup-genomes.sh ← Genome provisioning loop
|
||||
│ ├── add-genome.sh ← Add a single new genome
|
||||
│ └── lint-genomes.sh ← Quality control across all genomes
|
||||
└── templates/
|
||||
├── agents-genome.md ← Per-genome agent contract template
|
||||
├── agents-master.md ← Master coordination schema template
|
||||
├── wiki-index.md ← Index template (rendered per genome)
|
||||
├── wiki-log.md ← Log template (rendered per genome)
|
||||
├── pr-description.md ← PR review checklist template
|
||||
├── pre-commit.sh ← Security hook template
|
||||
├── gitattributes ← Git encryption rules template
|
||||
└── gitignore ← Git ignore template
|
||||
│ ├── lint-genomes.sh ← Quality control across all genomes
|
||||
│ └── verify-genomes.sh ← Structure verify / --sync across all genomes
|
||||
├── templates/
|
||||
│ ├── agents-genome.md ← Per-genome agent contract template
|
||||
│ ├── agents-master.md ← Master coordination schema template
|
||||
│ ├── readme-master.md ← Master repo README template
|
||||
│ ├── wiki-index.md ← Index template (rendered per genome)
|
||||
│ ├── wiki-log.md ← Log template (rendered per genome)
|
||||
│ ├── pr-description.md ← PR review checklist template
|
||||
│ ├── pre-commit.sh ← Security hook template
|
||||
│ ├── gitattributes ← Git encryption rules template
|
||||
│ └── gitignore ← Git ignore template
|
||||
├── skills/
|
||||
│ └── ingest/ ← pi skill: deployed to the AI node (vm101)
|
||||
│ ├── SKILL.md ← Semantic-only contract (read/edit, emits manifest)
|
||||
│ ├── references/ ← On-demand reference docs for the agent
|
||||
│ └── scripts/ ← Deterministic post-processor (runs outside the agent)
|
||||
│ ├── run-ingest.sh ← Orchestrator: consumes the manifest, emits one JSON line
|
||||
│ ├── slug.sh ← Slug normalisation
|
||||
│ ├── index-append.py ← Sorted insert into wiki/index.md + last_updated bump
|
||||
│ ├── log-append.sh ← Append a wiki/log.md entry
|
||||
│ ├── scoped-lint.sh ← Lint only the pages touched this run (reuses lib/lint.sh)
|
||||
│ └── open-pr.sh ← Branch / commit / push / open PR (DRY_RUN seam for tests)
|
||||
└── tests/ ← bats suite — deterministic, no LLM/GPU (see Testing)
|
||||
├── helpers.bash
|
||||
├── scripts.bats
|
||||
├── lint.bats
|
||||
├── structure.bats
|
||||
└── run-ingest.bats
|
||||
```
|
||||
|
||||
> The `skills/ingest/` directory is version-controlled here but **deployed** to the AI
|
||||
> node (vm101) under `~/.pi/agent/skills/ingest`. The agent (`pi`) does only semantic work
|
||||
> and writes a manifest; `run-ingest.sh` does the mechanical steps. See [Workflows → Ingest](#ingest).
|
||||
|
||||
---
|
||||
|
||||
## System Requirements
|
||||
|
|
@ -156,7 +189,9 @@ All tools (git-crypt, bw, qmd) have native Linux binaries.
|
|||
|
||||
All scripts are compatible with macOS. Requirements:
|
||||
|
||||
- bash 3.2+ (macOS default) — fully supported. All `bash 4+` constructs removed.
|
||||
- bash 3.2+ (macOS default) — supported for the **setup scripts** (`make` targets, scaffolding).
|
||||
The `ingest` skill uses bash 4+ constructs (`mapfile`), but it is deployed and run on the
|
||||
Linux AI node, not on the macOS setup machine — so this is not a constraint in practice.
|
||||
- GNU coreutils not required — BSD variants of `date`, `grep`, `sed` all handled.
|
||||
- `git-crypt`: install via Homebrew — `brew install git-crypt`
|
||||
- `jq`, `curl`: pre-installed or via Homebrew
|
||||
|
|
@ -195,6 +230,11 @@ The system is designed for a homelab architecture:
|
|||
> the index, and the log tail is a cost. This is why all agent files are token-optimised
|
||||
> and sessions are kept to one source at a time.
|
||||
|
||||
> **Reference deployment:** the table above is a target profile, not a hard requirement.
|
||||
> The current setup runs a single 16GB GPU (RTX 5060 Ti) with a ~9B model for interactive
|
||||
> ingest, and offloads heavy/async synthesis to a cloud model. Smaller models work — they
|
||||
> just make the "one source per session" discipline and the token budget matter more.
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
|
@ -285,14 +325,17 @@ resolution. Never included by Make.
|
|||
|
||||
```bash
|
||||
# Dynamic paths (resolved at source time)
|
||||
WORK_DIR="${HOME}/knowledge-genome-setup"
|
||||
WORK_DIR="${HOME}/knowledge-genome-orchestrator"
|
||||
KEYS_DIR="${WORK_DIR}/keys"
|
||||
|
||||
# Genome registry — format: "name|description"
|
||||
# Genome registry — format: "name|description|linked_repo"
|
||||
# The third field is OPTIONAL:
|
||||
# - leave it empty → knowledge-only genome (no linked project)
|
||||
# - owner/repo → genome is linked to that project repository (rendered into AGENTS.md)
|
||||
GENOMES=(
|
||||
"genome-dev|Web development, TUI, Angular, software architecture"
|
||||
"genome-finance|Personal finance, investments, market analysis"
|
||||
"genome-homelab|Infrastructure, network configs, architecture logs"
|
||||
"genome-dev|Web development, TUI, Angular, software architecture|myorg/my-app"
|
||||
"genome-finance|Personal finance, investments, market analysis|"
|
||||
"genome-homelab|Infrastructure, network configs, architecture logs|"
|
||||
)
|
||||
```
|
||||
|
||||
|
|
@ -315,8 +358,8 @@ export GITHUB_TOKEN="your_github_token"
|
|||
|
||||
```bash
|
||||
# 1. Clone the setup framework
|
||||
git clone <setup-repo-url> knowledge-genome-setup
|
||||
cd knowledge-genome-setup
|
||||
git clone <setup-repo-url> knowledge-genome-orchestrator
|
||||
cd knowledge-genome-orchestrator
|
||||
|
||||
# 2. Configure your environment
|
||||
cp globals.env.example globals.env # edit with your values
|
||||
|
|
@ -358,16 +401,19 @@ After setup completes:
|
|||
|
||||
## Makefile Reference
|
||||
|
||||
| Target | Description |
|
||||
| --------------------------------- | ------------------------------------------------------------------------------ |
|
||||
| `make setup` | Full system initialisation — master repo + all genomes in `registry.sh` |
|
||||
| `make add-genome NAME=x DESC="y"` | Scaffold and register a single new genome |
|
||||
| `make lint` | Run quality checks across all genomes (schema, privacy, decay, page size) |
|
||||
| `make status` | Show submodule status and first 10 git-crypt encryption states |
|
||||
| `make lock` | Lock all encrypted repos (master + all genome submodules) |
|
||||
| `make doctor` | Verify required tools: git, git-crypt, curl, jq; warn if bw missing |
|
||||
| `make sync` | `git submodule update --init --recursive` + report unpushed commits per genome |
|
||||
| `make help` | Print all available targets |
|
||||
| Target | Description |
|
||||
| ----------------------------------------------------- | ------------------------------------------------------------------------------------- |
|
||||
| `make setup` | Full system initialisation — master repo + all genomes in `registry.sh` |
|
||||
| `make add-genome NAME=x DESC="y" [LINKED=owner/repo]` | Scaffold and register a single new genome (optional linked project) |
|
||||
| `make lint` | Run quality checks across all genomes (schema, privacy, decay, page size) |
|
||||
| `make verify-structure` | Report directory drift of each genome vs the canonical layout (`lib/structure.sh`) |
|
||||
| `make sync-structure` | Create any missing canonical directories across all genomes (safe, idempotent) |
|
||||
| `make test` | Run the bats test suite (deterministic; no LLM/GPU/network) — see [Testing](#testing) |
|
||||
| `make status` | Show submodule status and per-genome git-crypt encryption state |
|
||||
| `make lock` | Lock all encrypted repos (master + all genome submodules) |
|
||||
| `make doctor` | Verify required tools: git, git-crypt, curl, jq; warn if bw missing |
|
||||
| `make sync` | `git submodule update --init --recursive` + report unpushed commits per genome |
|
||||
| `make help` | Print all available targets |
|
||||
|
||||
### Examples
|
||||
|
||||
|
|
@ -378,6 +424,12 @@ make doctor
|
|||
# Add a new genome after initial setup
|
||||
make add-genome NAME=genome-research DESC="Academic papers and deep research"
|
||||
|
||||
# Add a genome linked to a project repository
|
||||
make add-genome NAME=genome-dev DESC="Web development" LINKED=myorg/my-app
|
||||
|
||||
# Check every genome against the canonical directory layout
|
||||
make verify-structure
|
||||
|
||||
# Run full lint pass (bash deterministic checks)
|
||||
make lint
|
||||
|
||||
|
|
@ -390,6 +442,38 @@ make lock
|
|||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
The mechanical layer (slug, index, log, lint, structure, the ingest orchestrator) is
|
||||
covered by a [bats](https://github.com/bats-core/bats-core) suite. The tests are
|
||||
**deterministic and have zero dependency on the LLM, the GPU, or the network** — they
|
||||
simulate the agent's output with fixtures and exercise the scripts directly, so they run
|
||||
anywhere git + bash live (laptop, CI, a git hook). They are **not** meant to run on the AI
|
||||
node or via n8n.
|
||||
|
||||
```bash
|
||||
sudo apt install bats # once
|
||||
make test # or: bats tests/
|
||||
```
|
||||
|
||||
| File | Covers |
|
||||
| ----------------- | ------------------------------------------------------------------------------ |
|
||||
| `scripts.bats` | `slug.sh`, `log-append.sh`, `index-append.py` (insert, sort, bump, idempotent) |
|
||||
| `lint.bats` | `lib/lint.sh` validators + `scoped-lint.sh` |
|
||||
| `structure.bats` | `lib/structure.sh` report / sync |
|
||||
| `run-ingest.bats` | `run-ingest.sh` end-to-end (DRY_RUN, local bare remote) — needs `jq` |
|
||||
|
||||
Each test builds its own throwaway genome with a local bare remote, configured to ignore
|
||||
the operator's global git settings (signing, global hooks) so the suite is hermetic. The
|
||||
`run-ingest` tests auto-`skip` if `jq` is absent. If you change the canonical layout in
|
||||
`lib/structure.sh`, update `FIXTURE_DIRS` in `tests/helpers.bash` to match.
|
||||
|
||||
> Why this matters: the only non-deterministic part of the system is the model. Pinning
|
||||
> the mechanical layer with tests means that when an ingest misbehaves, you know it's the
|
||||
> model or the prompt — not the plumbing.
|
||||
|
||||
---
|
||||
|
||||
## Genome Lifecycle
|
||||
|
||||
### Initial setup
|
||||
|
|
@ -431,6 +515,7 @@ template files:
|
|||
| `{{GENOME_NAME}}` | registry.sh | `genome-dev` |
|
||||
| `{{GENOME_NAME_UPPER}}` | derived | `GENOME-DEV` |
|
||||
| `{{GENOME_DESC}}` | registry.sh | `Web development...` |
|
||||
| `{{LINKED_PROJECT}}` | registry.sh | `myorg/my-app` (or `none`) |
|
||||
| `{{FORGEJO_URL}}` | globals.env | `https://git.yourserver.com` |
|
||||
| `{{FORGEJO_USER}}` | globals.env | `yourusername` |
|
||||
| `{{VAULTWARDEN_URL}}` | globals.env | `https://vault.yourserver.com` |
|
||||
|
|
@ -593,9 +678,9 @@ git clone https://git.yourserver.com/yourusername/genome-dev.git
|
|||
If a key is lost or compromised:
|
||||
|
||||
```bash
|
||||
# From the knowledge-genome-setup/ directory
|
||||
# From the knowledge-genome-orchestrator/ directory
|
||||
source lib/git-crypt.sh
|
||||
cd ~/knowledge-genome-setup/genome-dev
|
||||
cd ~/knowledge-genome-orchestrator/genome-dev
|
||||
gcrypt_rotate_key "genome-dev"
|
||||
```
|
||||
|
||||
|
|
@ -643,7 +728,8 @@ The agent executes in this order at the start of every session:
|
|||
|
||||
1. Read `wiki/index.md` — primary catalog of all pages and maturity
|
||||
2. Read last 20 log entries (injected by orchestrator — does NOT open `wiki/log.md` directly)
|
||||
3. For tasks involving related pages: `qmd search "<query>"` before opening any files
|
||||
3. For tasks involving related pages: if the optional `qmd` extension is installed,
|
||||
`qmd search "<query>"` before opening files; otherwise navigate from `wiki/index.md`
|
||||
4. Operate on individual files — never scan entire directories
|
||||
|
||||
### One source per session
|
||||
|
|
@ -668,7 +754,7 @@ For Forgejo webhook → automated ingest:
|
|||
2. n8n receives webhook, identifies new files
|
||||
3. n8n starts one agent session per new file (sequential, not parallel)
|
||||
4. Each session: inject `tail -n 20 wiki/log.md` + `PRIVATE_CONTEXT` state + source path
|
||||
5. Agent ingest workflow runs, opens PR
|
||||
5. Phase 1 agent (`/skill:ingest`) writes the manifest; Phase 2 `run-ingest.sh` opens the PR
|
||||
6. Human reviews and merges PR
|
||||
|
||||
---
|
||||
|
|
@ -677,17 +763,39 @@ For Forgejo webhook → automated ingest:
|
|||
|
||||
### Ingest
|
||||
|
||||
Triggered by a new file in `raw/` (manual or via webhook).
|
||||
Triggered by a new file in `raw/` (manual or via webhook). Ingest is split into two
|
||||
phases so that the small local model spends its limited context only on judgement, and
|
||||
all the deterministic bookkeeping happens outside the model's loop.
|
||||
|
||||
1. Read source once
|
||||
2. Create `wiki/sources/<slug>.md` — summary and key points
|
||||
3. Per entity (person, tool, organisation): create or update `wiki/entities/<name>.md`
|
||||
4. Per concept (pattern, theory, decision): create or update `wiki/concepts/<name>.md`
|
||||
5. Check each touched page for contradictions → apply Conflict Resolution if found
|
||||
6. Append entry to `wiki/index.md` (bottom of relevant section — do not reorder)
|
||||
7. Append log entry: `INGEST | <slug>`
|
||||
8. Run scoped lint on pages created or modified in this session; report in PR
|
||||
9. Commit on `feat/ai-ingest-<slug>`; open PR using `templates/pr-description.md`
|
||||
**Phase 1 — agent (semantic only).** The `ingest` skill gives the agent read/edit tools
|
||||
only (no shell). It:
|
||||
|
||||
1. Reads the source once
|
||||
2. Creates `wiki/sources/<slug>.md` — summary and key points
|
||||
3. Per entity (person, tool, organisation): creates or updates `wiki/entities/<name>.md`
|
||||
4. Per concept (pattern, theory, decision): creates or updates `wiki/concepts/<name>.md`
|
||||
5. Checks each touched page for contradictions → applies Conflict Resolution if found
|
||||
6. Writes `.ingest-manifest.json` (the list of pages it created/modified, the model name,
|
||||
a one-line reasoning, the PR summary, and any contradictions) — then **stops**
|
||||
|
||||
**Phase 2 — `run-ingest.sh` (deterministic, outside the agent).** The post-processor
|
||||
consumes the manifest and does the mechanical work the model must not waste context on:
|
||||
|
||||
7. Inserts each page into the correct `wiki/index.md` section **in alphabetical order**
|
||||
(`index-append.py`) and bumps the index `last_updated`
|
||||
8. Appends the `INGEST | <slug>` entry to `wiki/log.md`
|
||||
9. Runs scoped lint on exactly the pages touched this run (`scoped-lint.sh`, reusing
|
||||
`lib/lint.sh`)
|
||||
10. Commits on `feat/ai-ingest-<slug>` and opens the PR using `templates/pr-description.md`
|
||||
11. Emits a single compact JSON line (status, slug, PR url, lint_clean, conflict) for n8n
|
||||
|
||||
The agent never runs git, never edits the index/log mechanically, and never lints — those
|
||||
are deterministic and tested (see [Testing](#testing)). Invocation on the AI node:
|
||||
|
||||
```bash
|
||||
pi --mode json -p "/skill:ingest raw/articles/<file>.md" # phase 1 → writes manifest
|
||||
run-ingest.sh <genome> # phase 2 → index/log/lint/PR
|
||||
```
|
||||
|
||||
For private sources (`PRIVATE_CONTEXT: enabled` required):
|
||||
|
||||
|
|
@ -698,7 +806,8 @@ For private sources (`PRIVATE_CONTEXT: enabled` required):
|
|||
|
||||
Triggered by an operator question.
|
||||
|
||||
1. `qmd search "<query>"` → identify candidate pages
|
||||
1. `qmd search "<query>"` (if the optional qmd extension is installed) → identify
|
||||
candidate pages; otherwise start from `wiki/index.md`
|
||||
2. Read candidate pages directly (qmd already returns file paths — no intermediate index lookup)
|
||||
3. Synthesise answer with `[[wikilink]]` citations
|
||||
4. If answer is non-trivial: save as `wiki/queries/<slug>.md` and append to index
|
||||
|
|
@ -974,7 +1083,8 @@ n8n (running on the storage node) can automate the ingest pipeline:
|
|||
2. n8n flow identifies new files
|
||||
3. For each new file: starts one agent session (sequential — never parallel)
|
||||
4. Each session receives: `tail -n 20 wiki/log.md` + `PRIVATE_CONTEXT` state + source path
|
||||
5. Agent runs ingest workflow and opens PR
|
||||
5. Phase 1 — agent runs `/skill:ingest` (semantic → writes manifest); Phase 2 —
|
||||
`run-ingest.sh` does index/log/lint and opens the PR, returning one JSON line to n8n
|
||||
6. Human reviews the PR
|
||||
|
||||
Key constraint: one source per session, sessions sequential.
|
||||
|
|
@ -984,11 +1094,13 @@ Never batch multiple sources into one agent session.
|
|||
|
||||
If the AI compute node has an Intel NPU (e.g. Core Ultra series):
|
||||
|
||||
- Background tasks (embedding updates, index refresh) → Intel NPU via OpenVINO
|
||||
- Background/auxiliary tasks (OCR of `raw/assets/`, async summarisation, or qmd
|
||||
re-indexing **if** the optional qmd extension is in use) → Intel NPU via OpenVINO
|
||||
- Active reasoning sessions (ingest, query, synthesis) → GPU
|
||||
|
||||
This keeps the GPU's KV cache free for interactive work and reduces power consumption
|
||||
for background operations.
|
||||
Note: the core system has no embedding pipeline (see [Core Philosophy](#core-philosophy)),
|
||||
so there is nothing to embed here — the NPU is only for auxiliary work. This keeps the
|
||||
GPU's KV cache free for interactive sessions and lowers power draw for background jobs.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue