Compare commits

..

6 commits

7 changed files with 95 additions and 34 deletions

View file

@ -190,8 +190,9 @@ All tools (git-crypt, bw, qmd) have native Linux binaries.
All scripts are compatible with macOS. Requirements: All scripts are compatible with macOS. Requirements:
- bash 3.2+ (macOS default) — supported for the **setup scripts** (`make` targets, scaffolding). - bash 3.2+ (macOS default) — supported for the **setup scripts** (`make` targets, scaffolding).
The `ingest` skill uses bash 4+ constructs (`mapfile`), but it is deployed and run on the Two things need bash 4+: the `ingest` skill (`mapfile`), which runs on the Linux AI node (not a
Linux AI node, not on the macOS setup machine — so this is not a constraint in practice. constraint on the macOS setup machine); and `gcrypt_rotate_key` (`compgen -G`), which **does**
run on the laptop. For key rotation on macOS, use Homebrew bash (`brew install bash`).
- GNU coreutils not required — BSD variants of `date`, `grep`, `sed` all handled. - GNU coreutils not required — BSD variants of `date`, `grep`, `sed` all handled.
- `git-crypt`: install via Homebrew — `brew install git-crypt` - `git-crypt`: install via Homebrew — `brew install git-crypt`
- `jq`, `curl`: pre-installed or via Homebrew - `jq`, `curl`: pre-installed or via Homebrew
@ -695,6 +696,9 @@ cd ~/knowledge-genome-orchestrator/genome-dev
gcrypt_rotate_key "genome-dev" gcrypt_rotate_key "genome-dev"
``` ```
> **macOS:** `gcrypt_rotate_key` uses `compgen -G` (bash 4+). The stock macOS bash 3.2 is not
> enough — run rotation under Homebrew bash (`brew install bash`).
`gcrypt_rotate_key` performs: `gcrypt_rotate_key` performs:
1. Unlocks repo with existing key 1. Unlocks repo with existing key
@ -951,18 +955,25 @@ Pages have a `last_updated` field in frontmatter. During lint passes:
The agent proposes re-validation but does not change `maturity` without new source evidence. The agent proposes re-validation but does not change `maturity` without new source evidence.
### Cross-genome lint ### Cross-genome references
A manual, monthly operation. Not automated in CI/CD — the token cost and coordination Cross-domain knowledge moves by **pull, never push**: the genome you are working in draws
complexity are not justified at this scale. material _in_; nothing is ever written into another genome. There are **no cross-genome
wikilinks** — submodule pointers make relative paths brittle.
1. Operator initiates a master-repo agent session When the working genome needs a concept that lives elsewhere, the **navigation skill** handles
2. Agent uses `qmd search "<concept>"` across the multi-genome index to find: it in the same two-phase shape as ingest:
- Concepts defined in 2+ genomes with potentially conflicting definitions
- Entities referenced cross-genome without canonical cross-genome wikilinks 1. A deterministic collector clones the relevant genomes **read-only at HEAD** (fresh — never the
- Concepts in genome-X that should link to genome-Y pinned submodule state) and assembles a dossier of excerpts with provenance.
3. Agent reports findings — does not modify files 2. A semantic pass reads only that dossier; the skill then deposits **one** abstract, non-private
4. For each finding: create conflict note in the genome where resolution belongs raw into the working genome at `raw/articles/crossgen-<topic>-<date>.md`.
3. That raw goes through the working genome's normal ingest → PR → human gate, like any source.
Which genomes may be read as **sources** is gated by a per-genome `cross_source: yes|no` flag: a
confidential genome (e.g. a client file) is marked `no` and is never read as a source — the wall
is structural, not a matter of the agent's discipline. The master `AGENTS.md` holds the full
boundary contract.
--- ---
@ -1021,7 +1032,7 @@ and keep the wiki atomically navigable.
### Linking conventions ### Linking conventions
- **Intra-genome:** `[[folder/file]]` — Obsidian wikilinks only. - **Intra-genome:** `[[folder/file]]` — Obsidian wikilinks only.
- **Cross-genome:** NOT supported via wikilink. Submodule pointers make relative paths brittle. When a concept belongs to another genome, use the navigation skill to emit a raw stub into that genome's `raw/articles/` directory so its local ingest pipeline can process it. - **Cross-genome:** NOT supported via wikilink — submodule pointers make relative paths brittle. When the working genome needs a concept that lives elsewhere, the navigation skill **pulls it in** as one abstract raw under _this_ genome's `raw/articles/`, which then goes through normal ingest. See [Cross-genome references](#cross-genome-references).
- **External:** `[text](https://...)` — standard Markdown. - **External:** `[text](https://...)` — standard Markdown.
### Log format ### Log format

View file

@ -21,18 +21,29 @@ gcrypt_export_key() {
gcrypt_verify() { gcrypt_verify() {
local genome_name="$1" local genome_name="$1"
local key_path="${KEYS_DIR}/${genome_name}.key"
info "Verifying git-crypt status for ${genome_name}..." info "Verifying git-crypt configuration for ${genome_name}..."
git-crypt lock
if file "raw/private/.gitkeep" 2>/dev/null | grep -q "data"; then # `git-crypt status` reports the CONFIGURED status (from `.gitattributes`), not the
success "Encryption verified: private/ directory is protected." # lock/unlock status of the working tree. Encrypted lines have their labels right-aligned
else # (with leading whitespace), so you CANNOT anchor on `^encrypted`.
warn "Encryption check inconclusive. Run 'git-crypt status' manually." # We filter by private/ and distinguish “encrypted” from “not encrypted” without
# relying on exact spacing.
local status_out encrypted_count not_encrypted_count
status_out=$(git-crypt status 2>/dev/null || true)
encrypted_count=$(printf '%s\n' "$status_out" | grep 'private/' | grep -cE '^[[:space:]]*encrypted:' || true)
not_encrypted_count=$(printf '%s\n' "$status_out" | grep 'private/' | grep -cE '^not encrypted:' || true)
if [[ "$encrypted_count" -gt 0 ]]; then
success "Encryption configured: ${encrypted_count} private file(s) under git-crypt."
if [[ "$not_encrypted_count" -gt 0 ]]; then
warn "${not_encrypted_count} file(s) under private/ are NOT covered by the git-crypt filter — check .gitattributes (leak risk)."
fi
elif [[ "$not_encrypted_count" -gt 0 ]]; then
warn "private/ files exist but none are covered by the git-crypt filter — check the .gitattributes filter (leak risk)."
else
info "No private/ files present yet — nothing to verify."
fi fi
[[ -f "$key_path" ]] && git-crypt unlock "$key_path"
} }
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -107,6 +118,8 @@ gcrypt_rotate_key() {
# 5. Re-stage private files so they are committed encrypted with the new key # 5. Re-stage private files so they are committed encrypted with the new key
local staged=0 local staged=0
# compgen -G requires bash 4+ for reliable glob expansion. macOS stock
# bash is 3.2; use Homebrew bash (already recommended in README) for rotation.
if compgen -G "raw/private/*" > /dev/null 2>&1; then if compgen -G "raw/private/*" > /dev/null 2>&1; then
git add raw/private/ git add raw/private/
staged=1 staged=1

View file

@ -23,7 +23,7 @@ lint_markdown_file() {
# 1. Check frontmatter delimiters # 1. Check frontmatter delimiters
if [[ $(head -n 1 "$file") != "---" ]]; then if [[ $(head -n 1 "$file") != "---" ]]; then
warn "Missing frontmatter start (---) in: $file" error "Missing frontmatter start (---) in: $file"
errors=$((errors + 1)) errors=$((errors + 1))
fi fi
@ -31,14 +31,14 @@ lint_markdown_file() {
local mandatory_fields=("title:" "type:" "domain:" "maturity:" "last_updated:") local mandatory_fields=("title:" "type:" "domain:" "maturity:" "last_updated:")
for field in "${mandatory_fields[@]}"; do for field in "${mandatory_fields[@]}"; do
if ! grep -q "^${field}" "$file"; then if ! grep -q "^${field}" "$file"; then
warn "Missing mandatory field '${field}' in: $file" error "Missing mandatory field '${field}' in: $file"
errors=$((errors + 1)) errors=$((errors + 1))
fi fi
done done
# 3. Check domain matches genome name # 3. Check domain matches genome name
if grep -q "^domain:" "$file" && ! grep -q "^domain: ${genome_name}" "$file"; then if grep -q "^domain:" "$file" && ! grep -q "^domain: ${genome_name}" "$file"; then
warn "Domain mismatch in $file (expected '${genome_name}')" error "Domain mismatch in $file (expected '${genome_name}')"
errors=$((errors + 1)) errors=$((errors + 1))
fi fi
@ -70,8 +70,8 @@ check_valid_type() {
done done
if [[ $valid -eq 0 ]]; then if [[ $valid -eq 0 ]]; then
warn "Invalid type value '${type_value}' in: $file" error "Invalid type value '${type_value}' in: $file"
warn " Valid types: ${VALID_TYPES[*]}" error " Valid types: ${VALID_TYPES[*]}"
return 1 return 1
fi fi
@ -144,8 +144,8 @@ check_knowledge_decay() {
esac esac
if [[ $days_old -gt $threshold ]]; then if [[ $days_old -gt $threshold ]]; then
warn "STALE: $file" error "STALE: $file"
warn " maturity: ${maturity} | last_updated: ${last_updated} | ${days_old} days ago (threshold: ${threshold})" error " maturity: ${maturity} | last_updated: ${last_updated} | ${days_old} days ago (threshold: ${threshold})"
return 1 return 1
fi fi
@ -190,12 +190,21 @@ check_broken_links() {
local links local links
links=$(grep -oE '\[\[[^\]]+' "$file" 2>/dev/null | sed 's/^\[\[//' | cut -d'|' -f1) links=$(grep -oE '\[\[[^\]]+' "$file" 2>/dev/null | sed 's/^\[\[//' | cut -d'|' -f1)
for link in $links; do # Cross-genome links (../other-genome/…) are not resolvable from a single
# genome checkout and are skipped — they would always fall
# through the two-level lookup and produce non-actionable warnings.
while IFS= read -r link; do
[[ -z "$link" ]] && continue
if [[ "$link" == ../* ]]; then
continue
fi
local target="$link" local target="$link"
[[ "$target" != *.md ]] && target="${target}.md" [[ "$target" != *.md ]] && target="${target}.md"
if [[ ! -f "${base_dir}/${target}" && ! -f "${base_dir}/../${target}" ]]; then if [[ ! -f "${base_dir}/${target}" && ! -f "${base_dir}/../${target}" ]]; then
warn "Potential broken link: [[$link]] in $file" warn "Potential broken link: [[$link]] in $file"
fi fi
done done <<< "$links"
} }

View file

@ -9,6 +9,14 @@
# structure check can never drift apart. # structure check can never drift apart.
# ============================================================================= # =============================================================================
# NOTE — Return-code smell
# Several functions in this file (and in lint.sh) use the return code as a
# numeric counter (e.g. return $missing). This is a known smell: exit codes
# wrap at 256 and conflate "count of problems" with "exit status". At the
# current scale (<10 problems per run) the wrap-around risk is zero, so we
# accept it pragmatically. If counts ever grow, switch to stdout counters
# or dedicated global variables.
# Canonical directories every genome must have. # Canonical directories every genome must have.
# raw/* are input buckets (collaborator-writable); wiki/* is the agent-owned, # raw/* are input buckets (collaborator-writable); wiki/* is the agent-owned,
# contract-bound layout the lint, the index sections and the ingest skill depend on. # contract-bound layout the lint, the index sections and the ingest skill depend on.
@ -43,6 +51,7 @@ structure_report() {
info "extra (not in canon): ${d}" info "extra (not in canon): ${d}"
done < <(find "${base}/raw" "${base}/wiki" -mindepth 1 -type d 2>/dev/null) done < <(find "${base}/raw" "${base}/wiki" -mindepth 1 -type d 2>/dev/null)
# NOTE: return $missing is a smell — see header. Kept for compatibility.
return $missing return $missing
} }

View file

@ -24,6 +24,15 @@ step "Adding New Genome: ${GENOME_NAME}"
# Build a 3-field registry entry (linked_repo may be empty) # Build a 3-field registry entry (linked_repo may be empty)
GENOMES=("${GENOME_NAME}|${GENOME_DESC}|${GENOME_LINKED}") GENOMES=("${GENOME_NAME}|${GENOME_DESC}|${GENOME_LINKED}")
# NOTE — Maintenance smell
# We source setup-genomes.sh as a library/orchestrator hybrid. This works because:
# - registry.sh is guarded against double-source (idempotent guard)
# - setup-genomes.sh checks WORK_DIR before re-sourcing registry.sh
# - GENOMES is built locally just before the source, so it is not clobbered
# However, sourcing an orchestration script as a library makes the control flow
# harder to trace. If this grows, refactor into a shared function (e.g. setup_one_genome)
# called by both add-genome.sh and setup-genomes.sh.
source "scripts/setup-genomes.sh" source "scripts/setup-genomes.sh"
success "Genome '${GENOME_NAME}' added and linked successfully!" success "Genome '${GENOME_NAME}' added and linked successfully!"

View file

@ -59,10 +59,14 @@ all_paths=( "${created_paths[@]}" "${modified_paths[@]}" )
conflict_label="" conflict_label=""
# NOTE: no rollback. Steps below mutate the working tree in order (index → log → commit). # NOTE: No rollback. The steps below modify the working tree in order (index → log → commit).
# All are idempotent on re-run EXCEPT log-append (append-only). If a step fails midway, # All steps are idempotent on re-run EXCEPT log-append (append-only). If a step fails midway,
# nothing is committed (open-pr is the only committer) — the operator re-runs, or inspects # nothing is committed (open-pr is the only committer) — the operator re-runs, or checks
# wiki/ if log-append already wrote a line. The manifest is removed only on full success. # wiki/ if log-append has already written a line. The manifest is removed only upon full success.
# log-append is not idempotent: a re-run after a post-log failure produces
# duplicate lines. This is accepted by design (append-only ledger, no rollback). If this
# becomes a nuisance tomorrow, add a dedup check on run_id in log-append.sh
# (grep for run_id before appending). Manual recovery: grep for run_id in wiki/log.md.
# --- 1. index entries (created pages only), inserted in order --- # --- 1. index entries (created pages only), inserted in order ---
while IFS=$'\t' read -r path summary maturity; do while IFS=$'\t' read -r path summary maturity; do
@ -76,6 +80,7 @@ while IFS=$'\t' read -r path summary maturity; do
queries) queries)
if [[ "$link" == queries/conflict-* ]]; then section="Conflicts"; conflict_label="CONFLICT" if [[ "$link" == queries/conflict-* ]]; then section="Conflicts"; conflict_label="CONFLICT"
else section="Queries"; fi ;; else section="Queries"; fi ;;
# private/ is not routed here — ingest is public-only. Add when private ingest is built.
*) section="Sources" ;; *) section="Sources" ;;
esac esac

View file

@ -54,6 +54,11 @@ private: false
## Conflicts Pending Review (`wiki/queries/conflict-*.md`) ## Conflicts Pending Review (`wiki/queries/conflict-*.md`)
*slugs only.* *slugs only.*
## Private Synthesis (`wiki/private/`)
*Restricted access. Requires PRIVATE_CONTEXT: enabled and unlocked repo.*
*List slug names ONLY. Do not append summaries — prevents metadata leakage.*
EOF EOF
cat > "${g}/wiki/log.md" <<'EOF' cat > "${g}/wiki/log.md" <<'EOF'