feat: implement ingest skill workflow with post-processing
This commit is contained in:
parent
ee4f5beacf
commit
3005366cfd
8 changed files with 515 additions and 0 deletions
83
skills/ingest/SKILL.md
Normal file
83
skills/ingest/SKILL.md
Normal file
|
|
@ -0,0 +1,83 @@
|
|||
---
|
||||
name: ingest
|
||||
description: Semantic pass of a single raw source into the current genome's wiki — read the source, write sources/entities/concepts, handle contradictions, then emit a manifest and STOP. Use when a new file lands in raw/. Does NOT do git, log, index, lint, or PRs (a post-processor handles those), and does NOT handle private sources or project repos.
|
||||
license: see repository
|
||||
compatibility: Runs inside one genome checkout (cwd = genome root). Tools needed — read, edit only. NO bash, NO git. The deterministic steps (index, log, scoped lint, PR) run AFTER you exit, via run-ingest.sh. PRIVATE_CONTEXT must be disabled.
|
||||
allowed-tools: read edit
|
||||
metadata:
|
||||
framework: knowledge-genome
|
||||
phase: "1-ingest-semantic"
|
||||
---
|
||||
|
||||
# Ingest — semantic pass
|
||||
|
||||
You run inside ONE genome checkout. `AGENTS.md` (already in your context) is the
|
||||
authoritative contract. Your job is the **semantic pass only**: read the source, write
|
||||
the wiki pages, handle contradictions. You do **not** touch git, the log, the index, the
|
||||
linter, or PRs — a post-processor (`run-ingest.sh`) does all of that _after you stop_,
|
||||
from the manifest you leave behind. This keeps your context clean and your turns few,
|
||||
which matters on a small local model.
|
||||
|
||||
**Argument:** the relative path of the single raw source to ingest
|
||||
(e.g. `raw/articles/foo.md`). Process only this one.
|
||||
|
||||
## Pre-flight — stop the session if any check fails
|
||||
|
||||
1. Refuse if the argument path is under any `private/` directory.
|
||||
2. Refuse if `PRIVATE_CONTEXT` is not `disabled`.
|
||||
3. Confirm the file exists under `raw/`.
|
||||
|
||||
## Semantic work (your only job)
|
||||
|
||||
1. Read the source once.
|
||||
2. Write `wiki/sources/<kebab-slug>.md` — faithful summary + key points, with the required
|
||||
frontmatter (`type: source`, `domain: <genome>`, `maturity: draft`,
|
||||
`last_updated: <today>`, `private: false`, sensible `tags`).
|
||||
3. For each entity (person, tool, org) → create or update `wiki/entities/<kebab-name>.md`.
|
||||
4. For each concept (pattern, theory, decision) → create or update
|
||||
`wiki/concepts/<kebab-name>.md`.
|
||||
5. On a real contradiction with an existing claim, follow `AGENTS.md` §Conflict: create
|
||||
`wiki/queries/conflict-<concept>-<YYYY-MM-DD>.md`. Never overwrite the existing page.
|
||||
|
||||
Name files in kebab-case and pick stable names. Read `wiki/index.md` (and the specific
|
||||
pages it points to) to decide create-vs-update and to spot contradictions. Do not scan
|
||||
whole directories.
|
||||
|
||||
## Finish: write the manifest, then STOP
|
||||
|
||||
As your **final action**, write `.ingest-manifest.json` at the genome root
|
||||
(NOT under `wiki/`) describing exactly what you did. Then stop — do not commit, lint,
|
||||
append to the log/index, or open anything.
|
||||
|
||||
```json
|
||||
{
|
||||
"raw_source": "raw/articles/foo.md",
|
||||
"model": "<the model you are running as>",
|
||||
"reasoning": "One sentence for the log: what changed and why.",
|
||||
"pr_summary": "One or two sentences describing this ingest for the PR.",
|
||||
"contradictions": "None (or: 1 conflict file created — <concept>)",
|
||||
"pages": [
|
||||
{
|
||||
"path": "wiki/sources/foo.md",
|
||||
"summary": "One-line index summary.",
|
||||
"maturity": "draft",
|
||||
"status": "created"
|
||||
},
|
||||
{
|
||||
"path": "wiki/entities/acme.md",
|
||||
"summary": "Acme — vendor.",
|
||||
"maturity": "draft",
|
||||
"status": "modified"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Manifest rules:
|
||||
|
||||
- List every page you created or modified, with `status` `created` or `modified`.
|
||||
- `summary` is the one-line index description (≈12 words max). For conflict pages the
|
||||
summary is ignored — the index lists conflicts by slug only.
|
||||
- Do not invent a `run_id`, branch, commit, or PR — those belong to the post-processor.
|
||||
|
||||
One source per session. After writing the manifest, stop.
|
||||
0
skills/ingest/references/frontmatter.md
Normal file
0
skills/ingest/references/frontmatter.md
Normal file
96
skills/ingest/scripts/index-append.py
Normal file
96
skills/ingest/scripts/index-append.py
Normal file
|
|
@ -0,0 +1,96 @@
|
|||
#!/usr/bin/env python3
|
||||
# =============================================================================
|
||||
# skills/ingest/scripts/index-append.py
|
||||
# Insert an entry line into the correct section of wiki/index.md and keep that
|
||||
# section's entries alphabetically ordered. Bumps frontmatter last_updated.
|
||||
#
|
||||
# NOTE: agents-genome.md and wiki-index.md claim the pre-commit hook sorts the
|
||||
# index. The actual pre-commit.sh only runs the plaintext-leak check — it does
|
||||
# NOT sort. This script owns the ordering instead. (If you later move sorting
|
||||
# into the hook, reduce this to a plain append.)
|
||||
#
|
||||
# index-append.py --section Sources \
|
||||
# --entry '- [[sources/foo]] — One-line summary. `maturity: draft`'
|
||||
# =============================================================================
|
||||
import argparse
|
||||
import datetime
|
||||
import re
|
||||
import sys
|
||||
|
||||
ENTRY_RE = re.compile(r"^- \[\[")
|
||||
HEADER_RE = re.compile(r"^## ")
|
||||
|
||||
|
||||
def main() -> int:
|
||||
ap = argparse.ArgumentParser()
|
||||
ap.add_argument("--section", required=True,
|
||||
help="Section name, e.g. Sources / Entities / Concepts / Queries / Conflicts")
|
||||
ap.add_argument("--entry", required=True, help="Full index line to insert")
|
||||
ap.add_argument("--file", default="wiki/index.md")
|
||||
args = ap.parse_args()
|
||||
|
||||
try:
|
||||
with open(args.file, encoding="utf-8") as fh:
|
||||
lines = fh.read().splitlines()
|
||||
except FileNotFoundError:
|
||||
print(f"index-append: not found: {args.file}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
today = datetime.date.today().isoformat()
|
||||
|
||||
# 1. Bump last_updated inside the first frontmatter block
|
||||
fm_open = False
|
||||
for i, ln in enumerate(lines):
|
||||
if ln.strip() == "---":
|
||||
if not fm_open:
|
||||
fm_open = True
|
||||
continue
|
||||
break # end of frontmatter
|
||||
if fm_open and ln.startswith("last_updated:"):
|
||||
lines[i] = f"last_updated: {today}"
|
||||
|
||||
# 2. Locate the target section [start, end)
|
||||
start = None
|
||||
for i, ln in enumerate(lines):
|
||||
if HEADER_RE.match(ln) and ln[3:].startswith(args.section):
|
||||
start = i
|
||||
break
|
||||
if start is None:
|
||||
print(f"index-append: section '{args.section}' not found in {args.file}",
|
||||
file=sys.stderr)
|
||||
return 1
|
||||
|
||||
end = len(lines)
|
||||
for i in range(start + 1, len(lines)):
|
||||
if HEADER_RE.match(lines[i]):
|
||||
end = i
|
||||
break
|
||||
|
||||
# 3. Split the section body into intro (non-entry) and entries
|
||||
body = lines[start + 1:end]
|
||||
intro = [ln for ln in body if not ENTRY_RE.match(ln)]
|
||||
entries = [ln for ln in body if ENTRY_RE.match(ln)]
|
||||
|
||||
if args.entry in entries:
|
||||
print(f"index-append: entry already present, skipping")
|
||||
return 0
|
||||
|
||||
entries.append(args.entry)
|
||||
entries.sort(key=str.casefold)
|
||||
|
||||
# Normalise intro: drop trailing blanks, keep header + comment(s)
|
||||
while intro and intro[-1].strip() == "":
|
||||
intro.pop()
|
||||
|
||||
new_section = intro + [""] + entries + [""]
|
||||
lines = lines[:start + 1] + new_section + lines[end:]
|
||||
|
||||
with open(args.file, "w", encoding="utf-8") as fh:
|
||||
fh.write("\n".join(lines) + "\n")
|
||||
|
||||
print(f"index-append: added to {args.section}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
50
skills/ingest/scripts/log-append.sh
Normal file
50
skills/ingest/scripts/log-append.sh
Normal file
|
|
@ -0,0 +1,50 @@
|
|||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# skills/ingest/scripts/log-append.sh
|
||||
# Append one entry to the append-only ledger wiki/log.md, in the exact format
|
||||
# defined by AGENTS.md / wiki-log.md. Generates run_id. Never edits prior entries.
|
||||
#
|
||||
# log-append.sh --type INGEST --subject "<slug>" --model "<model>" \
|
||||
# --context "[[raw/x]]" --output "[[sources/x]]" \
|
||||
# --reasoning "One sentence."
|
||||
# =============================================================================
|
||||
set -euo pipefail
|
||||
|
||||
LOG_FILE="${LOG_FILE:-wiki/log.md}"
|
||||
|
||||
type="" subject="" model="" context="" output="" reasoning=""
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--type) type="$2"; shift 2 ;;
|
||||
--subject) subject="$2"; shift 2 ;;
|
||||
--model) model="$2"; shift 2 ;;
|
||||
--context) context="$2"; shift 2 ;;
|
||||
--output) output="$2"; shift 2 ;;
|
||||
--reasoning) reasoning="$2"; shift 2 ;;
|
||||
*) echo "log-append: unknown arg: $1" >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
: "${type:?--type required}"
|
||||
: "${subject:?--subject required}"
|
||||
|
||||
case "$type" in
|
||||
INGEST|LINT|QUERY|CONFLICT|CONFIG|SECURITY) ;;
|
||||
*) echo "log-append: invalid TYPE '${type}'" >&2; exit 1 ;;
|
||||
esac
|
||||
|
||||
[[ -f "$LOG_FILE" ]] || { echo "log-append: not found: $LOG_FILE" >&2; exit 1; }
|
||||
|
||||
run_id="$(uuidgen 2>/dev/null || cat /proc/sys/kernel/random/uuid)"
|
||||
today="$(date +%Y-%m-%d)"
|
||||
|
||||
{
|
||||
printf '\n## [%s] %s | %s\n\n' "$today" "$type" "$subject"
|
||||
printf -- '- run_id: `%s`\n' "$run_id"
|
||||
printf -- '- model: `%s`\n' "${model:-unknown}"
|
||||
printf -- '- context_read: %s\n' "${context:-*(none)*}"
|
||||
printf -- '- output_written: %s\n' "${output:-*(none)*}"
|
||||
printf -- '- reasoning: %s\n' "${reasoning:-No reasoning provided.}"
|
||||
} >> "$LOG_FILE"
|
||||
|
||||
echo "run_id=${run_id}"
|
||||
98
skills/ingest/scripts/open-pr.sh
Normal file
98
skills/ingest/scripts/open-pr.sh
Normal file
|
|
@ -0,0 +1,98 @@
|
|||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# skills/ingest/scripts/open-pr.sh
|
||||
# Branch, commit (conventional), push, and open a Forgejo PR for the wiki/ changes.
|
||||
# Mirrors the API conventions of providers/forgejo.sh (token auth + http_code).
|
||||
# Runs inside the genome checkout (cwd = genome root). Never touches main.
|
||||
#
|
||||
# open-pr.sh --slug <slug> --title "feat: ingest <slug>" --body-file <path> \
|
||||
# [--base main] [--label CONFLICT]
|
||||
#
|
||||
# Requires env: FORGEJO_URL, FORGEJO_USER, FORGEJO_TOKEN.
|
||||
# =============================================================================
|
||||
set -euo pipefail
|
||||
|
||||
: "${FORGEJO_URL:?missing FORGEJO_URL}"
|
||||
: "${FORGEJO_USER:?missing FORGEJO_USER}"
|
||||
: "${FORGEJO_TOKEN:?missing FORGEJO_TOKEN}"
|
||||
|
||||
slug="" title="" body_file="" base="main" label=""
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case "$1" in
|
||||
--slug) slug="$2"; shift 2 ;;
|
||||
--title) title="$2"; shift 2 ;;
|
||||
--body-file) body_file="$2"; shift 2 ;;
|
||||
--base) base="$2"; shift 2 ;;
|
||||
--label) label="$2"; shift 2 ;;
|
||||
*) echo "open-pr: unknown arg: $1" >&2; exit 1 ;;
|
||||
esac
|
||||
done
|
||||
|
||||
: "${slug:?--slug required}"
|
||||
: "${title:?--title required}"
|
||||
: "${body_file:?--body-file required}"
|
||||
[[ -f "$body_file" ]] || { echo "open-pr: body file not found: $body_file" >&2; exit 1; }
|
||||
|
||||
branch="feat/ai-ingest-${slug}"
|
||||
repo="$(basename -s .git "$(git config --get remote.origin.url)")"
|
||||
|
||||
# 1. Branch + commit + push (AGENTS.md rule 5: never commit to main)
|
||||
git switch -c "$branch" 2>/dev/null || git switch "$branch"
|
||||
git add wiki/
|
||||
if git diff --cached --quiet; then
|
||||
echo "open-pr: nothing staged under wiki/ — aborting" >&2
|
||||
exit 1
|
||||
fi
|
||||
git commit -m "$title"
|
||||
git push -u origin "$branch"
|
||||
|
||||
# 2. Open the PR via Forgejo API (jq builds the JSON safely)
|
||||
body="$(cat "$body_file")"
|
||||
payload="$(jq -n --arg head "$branch" --arg base "$base" \
|
||||
--arg title "$title" --arg body "$body" \
|
||||
'{head:$head, base:$base, title:$title, body:$body}')"
|
||||
|
||||
resp="$(curl -s -w '\n%{http_code}' \
|
||||
-H "Authorization: token ${FORGEJO_TOKEN}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-X POST "${FORGEJO_URL}/api/v1/repos/${FORGEJO_USER}/${repo}/pulls" \
|
||||
-d "$payload")"
|
||||
|
||||
code="$(printf '%s' "$resp" | tail -n1)"
|
||||
json="$(printf '%s' "$resp" | sed '$d')"
|
||||
|
||||
case "$code" in
|
||||
201)
|
||||
url="$(printf '%s' "$json" | jq -r '.html_url')"
|
||||
number="$(printf '%s' "$json" | jq -r '.number')"
|
||||
echo "PR opened: ${url}"
|
||||
;;
|
||||
409)
|
||||
echo "open-pr: a PR for '${branch}' already exists — push updated the branch." >&2
|
||||
exit 0
|
||||
;;
|
||||
401)
|
||||
echo "open-pr: unauthorized — check FORGEJO_TOKEN (n8n-bot)." >&2
|
||||
exit 1
|
||||
;;
|
||||
*)
|
||||
echo "open-pr: Forgejo API HTTP ${code}: ${json}" >&2
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
|
||||
# 3. Optional label (e.g. CONFLICT). Best-effort; non-fatal.
|
||||
if [[ -n "$label" && -n "${number:-}" ]]; then
|
||||
label_id="$(curl -s -H "Authorization: token ${FORGEJO_TOKEN}" \
|
||||
"${FORGEJO_URL}/api/v1/repos/${FORGEJO_USER}/${repo}/labels" \
|
||||
| jq -r --arg n "$label" '.[] | select(.name==$n) | .id' | head -n1)"
|
||||
if [[ -n "$label_id" && "$label_id" != "null" ]]; then
|
||||
curl -s -o /dev/null \
|
||||
-H "Authorization: token ${FORGEJO_TOKEN}" -H "Content-Type: application/json" \
|
||||
-X POST "${FORGEJO_URL}/api/v1/repos/${FORGEJO_USER}/${repo}/issues/${number}/labels" \
|
||||
-d "{\"labels\":[${label_id}]}" \
|
||||
&& echo "label '${label}' applied" >&2
|
||||
else
|
||||
echo "open-pr: label '${label}' not found in repo — skipped." >&2
|
||||
fi
|
||||
fi
|
||||
120
skills/ingest/scripts/run-ingest.sh
Normal file
120
skills/ingest/scripts/run-ingest.sh
Normal file
|
|
@ -0,0 +1,120 @@
|
|||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# skills/ingest/scripts/run-ingest.sh
|
||||
# Post-pi orchestrator. Runs OUTSIDE pi's loop, on vm101, in the genome checkout.
|
||||
# Consumes .ingest-manifest.json (written by the ingest skill) and performs every
|
||||
# deterministic step — index, log, scoped lint, PR — so pi's context stays clean.
|
||||
#
|
||||
# run-ingest.sh <genome_name> [manifest_path]
|
||||
#
|
||||
# Emits a single JSON result line on stdout for n8n to parse.
|
||||
# =============================================================================
|
||||
set -euo pipefail
|
||||
|
||||
genome="${1:?usage: run-ingest.sh <genome> [manifest]}"
|
||||
manifest="${2:-.ingest-manifest.json}"
|
||||
SCRIPTS="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
|
||||
fail() {
|
||||
jq -n --arg stage "$1" --arg reason "$2" \
|
||||
'{status:"error", stage:$stage, reason:$reason}'
|
||||
exit 1
|
||||
}
|
||||
|
||||
command -v jq >/dev/null 2>&1 || { echo '{"status":"error","reason":"jq missing"}'; exit 1; }
|
||||
command -v python3 >/dev/null 2>&1 || fail "deps" "python3 missing (needed by index-append.py)"
|
||||
[[ -f "$manifest" ]] || fail "manifest" "manifest not found: ${manifest}"
|
||||
|
||||
# --- read manifest scalars ---
|
||||
raw_source="$(jq -r '.raw_source' "$manifest")"
|
||||
model="$(jq -r '.model // "unknown"' "$manifest")"
|
||||
reasoning="$(jq -r '.reasoning // "Ingest."' "$manifest")"
|
||||
pr_summary="$(jq -r '.pr_summary // "Ingest."' "$manifest")"
|
||||
contradictions="$(jq -r '.contradictions // "None"' "$manifest")"
|
||||
|
||||
[[ -n "$raw_source" && "$raw_source" != "null" ]] || fail "manifest" "raw_source missing"
|
||||
|
||||
slug="$(bash "${SCRIPTS}/slug.sh" "$raw_source")"
|
||||
|
||||
# --- collect touched paths ---
|
||||
mapfile -t created_paths < <(jq -r '.pages[] | select(.status=="created") | .path' "$manifest")
|
||||
mapfile -t modified_paths < <(jq -r '.pages[] | select(.status=="modified") | .path' "$manifest")
|
||||
all_paths=( "${created_paths[@]}" "${modified_paths[@]}" )
|
||||
[[ ${#all_paths[@]} -gt 0 ]] || fail "manifest" "no pages reported"
|
||||
|
||||
conflict_label=""
|
||||
|
||||
# --- 1. index entries (created pages only), inserted in order ---
|
||||
while IFS=$'\t' read -r path summary maturity; do
|
||||
[[ -z "$path" ]] && continue
|
||||
link="${path#wiki/}"; link="${link%.md}" # e.g. sources/foo
|
||||
folder="${link%%/*}"
|
||||
case "$folder" in
|
||||
sources) section="Sources" ;;
|
||||
entities) section="Entities" ;;
|
||||
concepts) section="Concepts" ;;
|
||||
queries)
|
||||
if [[ "$link" == queries/conflict-* ]]; then section="Conflicts"; conflict_label="CONFLICT"
|
||||
else section="Queries"; fi ;;
|
||||
*) section="Sources" ;;
|
||||
esac
|
||||
|
||||
if [[ "$section" == "Conflicts" ]]; then
|
||||
entry="- [[${link}]]" # conflicts: slug only
|
||||
else
|
||||
entry="- [[${link}]] — ${summary} \`maturity: ${maturity}\`"
|
||||
fi
|
||||
|
||||
python3 "${SCRIPTS}/index-append.py" --section "$section" --entry "$entry" \
|
||||
|| fail "index" "index-append failed for ${path}"
|
||||
done < <(jq -r '.pages[] | select(.status=="created")
|
||||
| [.path, (.summary // ""), (.maturity // "draft")] | @tsv' "$manifest")
|
||||
|
||||
# --- 2. log entry ---
|
||||
out="$(jq -r '[.pages[].path | "[[" + (sub("^wiki/";"") | sub("\\.md$";"")) + "]]"] | join(", ")' "$manifest")"
|
||||
"${SCRIPTS}/log-append.sh" --type INGEST --subject "$slug" --model "$model" \
|
||||
--context "[[${raw_source}]]" --output "${out:-*(none)*}" --reasoning "$reasoning" \
|
||||
|| fail "log" "log-append failed"
|
||||
|
||||
# --- 3. scoped lint (capture findings for the PR; never aborts the run) ---
|
||||
lint_out="$( "${SCRIPTS}/scoped-lint.sh" "$genome" "${all_paths[@]}" 2>&1 )" && lint_rc=0 || lint_rc=$?
|
||||
|
||||
# --- 4. assemble the PR body (manifest tables + lint results) ---
|
||||
body="$(mktemp)"
|
||||
{
|
||||
echo "## Summary"
|
||||
echo "$pr_summary"
|
||||
echo ""
|
||||
echo "## Pages"
|
||||
echo "| Path | Status | Maturity |"
|
||||
echo "|------|--------|----------|"
|
||||
jq -r '.pages[] | "| `\(.path)` | \(.status) | \(.maturity // "draft") |"' "$manifest"
|
||||
echo ""
|
||||
echo "## Contradictions"
|
||||
echo "$contradictions"
|
||||
echo ""
|
||||
echo "## Scoped Lint (post-ingest)"
|
||||
echo '```'
|
||||
echo "$lint_out"
|
||||
echo '```'
|
||||
} > "$body"
|
||||
|
||||
# --- 5. open the PR ---
|
||||
pr_args=( --slug "$slug" --title "feat: ingest ${slug}" --body-file "$body" )
|
||||
[[ -n "$conflict_label" ]] && pr_args+=( --label "$conflict_label" )
|
||||
pr_out="$( "${SCRIPTS}/open-pr.sh" "${pr_args[@]}" 2>&1 )" && pr_rc=0 || pr_rc=$?
|
||||
pr_url="$(printf '%s\n' "$pr_out" | sed -n 's/^PR opened: //p' | head -n1)"
|
||||
|
||||
rm -f "$body"
|
||||
|
||||
# --- final result line for n8n ---
|
||||
jq -n \
|
||||
--arg status "$([[ $pr_rc -eq 0 ]] && echo ok || echo pr_failed)" \
|
||||
--arg slug "$slug" \
|
||||
--arg pr_url "$pr_url" \
|
||||
--argjson lint_clean "$([[ $lint_rc -eq 0 ]] && echo true || echo false)" \
|
||||
--argjson conflict "$([[ -n "$conflict_label" ]] && echo true || echo false)" \
|
||||
--arg detail "$pr_out" \
|
||||
'{status:$status, slug:$slug, pr_url:$pr_url, lint_clean:$lint_clean, conflict:$conflict, detail:$detail}'
|
||||
|
||||
[[ $pr_rc -eq 0 ]]
|
||||
50
skills/ingest/scripts/scoped-lint.sh
Normal file
50
skills/ingest/scripts/scoped-lint.sh
Normal file
|
|
@ -0,0 +1,50 @@
|
|||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# skills/ingest/scripts/scoped-lint.sh
|
||||
# Run the framework's validation on ONLY the files touched this session.
|
||||
# Reuses lib/lint.sh + lib/output.sh — same checks as `make lint`, scoped.
|
||||
#
|
||||
# KG_LIB_DIR=/opt/knowledge-genome-setup/lib \
|
||||
# scoped-lint.sh <genome_name> wiki/sources/x.md wiki/entities/y.md
|
||||
#
|
||||
# Exits non-zero if any hard error is found, so the agent notices.
|
||||
# Findings are printed (stderr from the lint functions + a summary on stdout).
|
||||
# =============================================================================
|
||||
set -euo pipefail
|
||||
|
||||
: "${KG_LIB_DIR:?set KG_LIB_DIR to the framework lib/ dir (e.g. /opt/knowledge-genome-setup/lib)}"
|
||||
|
||||
# shellcheck source=/dev/null
|
||||
source "${KG_LIB_DIR}/output.sh"
|
||||
# shellcheck source=/dev/null
|
||||
source "${KG_LIB_DIR}/lint.sh"
|
||||
|
||||
genome="${1:?usage: scoped-lint.sh <genome> <file...>}"
|
||||
shift
|
||||
[[ $# -gt 0 ]] || { echo "scoped-lint: no files given" >&2; exit 1; }
|
||||
|
||||
errors=0
|
||||
stale=0
|
||||
count=$#
|
||||
|
||||
for f in "$@"; do
|
||||
if [[ ! -f "$f" ]]; then
|
||||
warn "scoped-lint: missing file (skipped): $f"
|
||||
continue
|
||||
fi
|
||||
|
||||
lint_markdown_file "$f" "$genome" && fe=0 || fe=$?
|
||||
check_privacy_consistency "$f" && pce=0 || pce=$?
|
||||
check_page_size "$f" && pse=0 || pse=$?
|
||||
errors=$(( errors + fe + pce + pse ))
|
||||
|
||||
check_knowledge_decay "$f" && st=0 || st=$?
|
||||
stale=$(( stale + st ))
|
||||
|
||||
check_broken_links "$f" || true # warnings only
|
||||
done
|
||||
|
||||
echo ""
|
||||
echo "scoped-lint: ${errors} error(s), ${stale} stale across ${count} file(s)"
|
||||
|
||||
[[ $errors -eq 0 ]]
|
||||
18
skills/ingest/scripts/slug.sh
Normal file
18
skills/ingest/scripts/slug.sh
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# skills/ingest/scripts/slug.sh
|
||||
# Derive a wiki slug from a path, filename, or title string.
|
||||
# slug.sh "raw/articles/My Source.md" -> my-source
|
||||
# slug.sh "Some Concept Name" -> some-concept-name
|
||||
# =============================================================================
|
||||
set -euo pipefail
|
||||
|
||||
input="${1:?usage: slug.sh <path-or-title>}"
|
||||
|
||||
# Strip directory and extension when given a path
|
||||
base="${input##*/}"
|
||||
base="${base%.*}"
|
||||
|
||||
printf '%s\n' "$base" \
|
||||
| tr '[:upper:]' '[:lower:]' \
|
||||
| sed -E 's/[^a-z0-9]+/-/g; s/-{2,}/-/g; s/^-+//; s/-+$//'
|
||||
Loading…
Add table
Reference in a new issue