From 11b1245e98887fc29538198cd4a9afb8bd49b94c Mon Sep 17 00:00:00 2001 From: Matteo Cherubini Date: Fri, 8 May 2026 21:09:42 +0200 Subject: [PATCH] docs: Add comprehensive README for Knowledge Genome System --- README.md | 201 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 200 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 24c131f..eedcf3f 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,201 @@ -# knowledge-genome-orchestrator +# Knowledge Genome System +> A distributed, modular, and secure personal knowledge base architecture. + +The **Knowledge Genome System** is a framework designed to manage personal knowledge using a "Master-Genome" architecture. It follows the LLM-Wiki patterns (Karpathy-style) while adding a robust security layer for sensitive data and automated quality control. + +--- + +# Architecture + +This project is structured as a **Master Orchestrator** that manages multiple independent **Genomes** via Git Submodules. + +## Core Components + +### Master Repository + +Contains: + +* Orchestration scripts +* Global configuration (`config.env`) +* Security templates + +### Genomes + +Individual specialized repositories (e.g. `genome-dev`, `genome-finance`) that act as standalone units of knowledge. + +### Security Layers + +#### Physical Security + +`git-crypt` encrypts `private/` directories at rest. + +#### Logical Security + +YAML frontmatter (`private: true`) prevents AI agents from leaking sensitive data during public sessions. + +#### Validation Layer + +A custom linting engine ensures metadata consistency. + +--- + +# Quick Start + +## Prerequisites + +Required dependencies: + +* `git` +* `git-crypt` +* `curl` +* `jq` + +Optional: + +* `bw` (Bitwarden CLI) — used for runtime key injection + +--- + +## Initialization + +```bash +# 1. Clone the master repository +git clone && cd master-knowledge-genome + +# 2. Run the full setup +# (checks dependencies, creates master scaffold, +# initializes genomes) +make setup +``` + +# Management Commands + +The system is controlled through a centralized Makefile. + +| Command | Description | +| ----------------- | -------------------------------------------------------------- | +| `make setup` | Full system initialization (Master + Registry Genomes). | +| `make add-genome` | Scaffolds and registers a new genome (requires NAME and DESC). | +| `make lint` | Runs the validation suite across all genomes. | +| `make status` | Checks Git status and encryption state for all submodules. | + +# Validation & Linting (`make lint`) + +The built-in linter ensures that the knowledge base remains machine-readable and secure. + +It automatically validates: + +## Frontmatter Integrity + +Every `.md` file must contain valid YAML headers. + +## Domain Consistency + +Ensures that a file's domain metadata matches its parent genome. + +## Privacy Leak Detection + +Critical validation step. + +Verifies that any file located in a `/private/` directory contains the flag: + +```yaml +private: true +``` + +This prevents accidental exposure during AI sessions. + +## Broken Wiki-Links + +Detects dead `[[internal-links]]`. + +# Security Model + +## Hybrid Privacy Architecture + +Each genome is divided into two layers. + +### Public Layer + +Directories: + +```text +raw/public/ +wiki/public/ +``` + +Characteristics: + +* Plaintext +* Shareable with collaborators + +### Private Layer + +Directories: + +```text +raw/private/ +wiki/private/ +``` + +Characteristics: + +* Encrypted using AES-256 via `git-crypt` + +## Runtime Key Injection + +To keep the AI environment secure, encryption keys are never stored on the VM disk. + +Instead, the system uses Bitwarden (`bw`) / Vaultwarden for runtime injection. + +### Example + +```bash +# Unlock a genome using a key stored in Vaultwarden +git-crypt unlock <( + bw get notes "genome-dev key" \ + --session "$BW_SESSION" | base64 -d +) +``` + +# Genome Schema + +All wiki documents follow a strict schema to support AI ingestion. + +## YAML Frontmatter Schema + +```yaml +--- +title: "Document Title" +type: entity | concept | source | log +domain: genome-name +private: true/false +last_updated: YYYY-MM-DD +--- +``` + +# Agent Interaction + +When starting a session with an AI agent, always declare the privacy context. + +## Public Context + +```text +PRIVATE_CONTEXT: disabled +``` + +Behavior: + +* The agent ignores all private folders. + +## Private Context + +```text +PRIVATE_CONTEXT: enabled +``` + +Behavior: + +* The agent processes encrypted data. +* Requires the repository to be unlocked.