feat: Enhance AI model prompt and schema for richer content and de-duplication
This commit is contained in:
parent
cdab1e089e
commit
91e443ad16
1 changed files with 28 additions and 7 deletions
|
|
@ -142,17 +142,29 @@ paths, slugs, branches, commits or PRs — a deterministic script does all of th
|
||||||
|
|
||||||
Rules:
|
Rules:
|
||||||
- source_summary: a faithful, self-contained summary of the source, in the
|
- source_summary: a faithful, self-contained summary of the source, in the
|
||||||
source's own language. Plain prose, no markdown headings.
|
source's own language. Plain prose, NO markdown headings. 2-4 sentences,
|
||||||
- key_points: the handful of concrete facts/claims worth indexing.
|
with concrete details. Preserve the essence and nuance of the source.
|
||||||
|
- key_points: 3-5 concrete facts or claims worth indexing; no padding.
|
||||||
- entities: every person, tool, organisation or product the source names.
|
- entities: every person, tool, organisation or product the source names.
|
||||||
kind is one of person|tool|org|product. description is one or two factual
|
kind is one of person|tool|org|product. description is 2-3 factual sentences
|
||||||
sentences. No markdown headings inside the description.
|
with specifics. No markdown headings inside the description.
|
||||||
- concepts: every pattern, theory, decision or named idea the source explains.
|
- concepts: every pattern, theory, decision or named idea the source explains.
|
||||||
description is one or two factual sentences.
|
description is 2-3 factual sentences with concrete examples or context.
|
||||||
- contradictions: ONLY when the source makes a claim that directly contradicts a
|
- contradictions: ONLY when the source makes a claim that directly contradicts a
|
||||||
widely-known fact or contradicts itself. Otherwise return an empty list.
|
widely-known fact or contradicts itself. Otherwise return an empty list.
|
||||||
- Names must be the natural name of the thing; the script will normalise them.
|
- Names must be the natural name of the thing; the script will normalise them.
|
||||||
Do not pad. Be faithful to the source."""
|
|
||||||
|
If the source references an entity or concept already in the wiki (see the list below),
|
||||||
|
use the EXACT name already present; do not invent a variant. This prevents duplicates.
|
||||||
|
|
||||||
|
Existing entities in this genome:
|
||||||
|
{existing_entities}
|
||||||
|
|
||||||
|
Existing concepts in this genome:
|
||||||
|
{existing_concepts}
|
||||||
|
|
||||||
|
Be faithful to the source. Be specific. Do not pad or improvise."""
|
||||||
|
|
||||||
|
|
||||||
# --- JSON schema -> constrained decoding (Ollama structured outputs) ---
|
# --- JSON schema -> constrained decoding (Ollama structured outputs) ---
|
||||||
SCHEMA = {
|
SCHEMA = {
|
||||||
|
|
@ -195,10 +207,19 @@ SCHEMA = {
|
||||||
|
|
||||||
|
|
||||||
def call_model():
|
def call_model():
|
||||||
|
# format existing names as a human-readable list
|
||||||
|
existing_ents = ", ".join(sorted(existing_entities)) or "(none yet)"
|
||||||
|
existing_conc = ", ".join(sorted(existing_concepts)) or "(none yet)"
|
||||||
|
|
||||||
|
prompt = SYSTEM_PROMPT.format(
|
||||||
|
existing_entities=existing_ents,
|
||||||
|
existing_concepts=existing_conc,
|
||||||
|
)
|
||||||
|
|
||||||
payload = {
|
payload = {
|
||||||
"model": MODEL,
|
"model": MODEL,
|
||||||
"messages": [
|
"messages": [
|
||||||
{"role": "system", "content": SYSTEM_PROMPT},
|
{"role": "system", "content": prompt},
|
||||||
{"role": "user", "content":
|
{"role": "user", "content":
|
||||||
"Source path: " + raw_rel + "\n\n--- SOURCE START ---\n"
|
"Source path: " + raw_rel + "\n\n--- SOURCE START ---\n"
|
||||||
+ source_text + "\n--- SOURCE END ---\n\nReturn the JSON now."},
|
+ source_text + "\n--- SOURCE END ---\n\nReturn the JSON now."},
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue