refactor: Update script header documentation and remove outdated comments
This commit is contained in:
parent
91e443ad16
commit
b808f0fc8f
1 changed files with 7 additions and 8 deletions
|
|
@ -1,24 +1,23 @@
|
||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
# skills/ingest/scripts/ingest-semantic.py
|
# skills/ingest/scripts/ingest-semantic.py
|
||||||
# Phase 1 (semantic) of the Knowledge Genome ingest — the LIGHT version.
|
# Phase 1 (semantic) of the Knowledge Genome ingest — light agent + deterministic conform.
|
||||||
|
#
|
||||||
|
# - FIXED: Add 'title:' field to frontmatter (lint was complaining about missing title)
|
||||||
|
# - NEW: Inject existing index (entity/concept names) into prompt to prevent duplicates
|
||||||
|
# - NEW: Richer prompt asking for 2-4 sentences per description (not 1-2), with concrete details
|
||||||
|
# - Enhanced schema to handle longer descriptions naturally
|
||||||
#
|
#
|
||||||
# The model does ONLY semantic extraction and returns ONE schema-constrained JSON
|
# The model does ONLY semantic extraction and returns ONE schema-constrained JSON
|
||||||
# object (no tools, no file writing, no git, no frontmatter, no slugs). This script
|
# object (no tools, no file writing, no git, no frontmatter, no slugs). This script
|
||||||
# then CONFORMS that output deterministically into wiki pages with enforced
|
# then CONFORMS that output deterministically into wiki pages with enforced
|
||||||
# frontmatter + kebab-case paths, and writes a .ingest-manifest.json in EXACTLY the
|
# frontmatter + kebab-case paths, and writes a .ingest-manifest.json in EXACTLY the
|
||||||
# schema run-ingest.sh expects. run-ingest.sh (phase 2) then does index / log /
|
# schema run-ingest.sh expects.
|
||||||
# scoped-lint / PR, unchanged.
|
|
||||||
#
|
#
|
||||||
# cd <genome checkout>
|
# cd <genome checkout>
|
||||||
# ingest-semantic.py <genome> raw/articles/<file>.md # phase 1 (this)
|
# ingest-semantic.py <genome> raw/articles/<file>.md # phase 1 (this)
|
||||||
# run-ingest.sh <genome> # phase 2 (deterministic)
|
# run-ingest.sh <genome> # phase 2 (deterministic)
|
||||||
#
|
#
|
||||||
# Why this shape: local tool-calling via pi/ollama proved fragile, and a small
|
|
||||||
# model does not reliably honour folders / naming / frontmatter / manifest schema
|
|
||||||
# when it writes files itself. Here the model cannot break the contract because it
|
|
||||||
# never touches the filesystem — the script owns all structure. Stdlib only.
|
|
||||||
#
|
|
||||||
# Emits a single JSON status line on stdout (for n8n / logs).
|
# Emits a single JSON status line on stdout (for n8n / logs).
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
import json, os, re, sys, datetime, urllib.request, urllib.error
|
import json, os, re, sys, datetime, urllib.request, urllib.error
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue