docs: Add comprehensive README for Knowledge Genome System

This commit is contained in:
Matteo Cherubini 2026-05-08 21:09:42 +02:00
parent 4c6f8259af
commit 11b1245e98

201
README.md
View file

@ -1,2 +1,201 @@
# knowledge-genome-orchestrator
# Knowledge Genome System
> A distributed, modular, and secure personal knowledge base architecture.
The **Knowledge Genome System** is a framework designed to manage personal knowledge using a "Master-Genome" architecture. It follows the LLM-Wiki patterns (Karpathy-style) while adding a robust security layer for sensitive data and automated quality control.
---
# Architecture
This project is structured as a **Master Orchestrator** that manages multiple independent **Genomes** via Git Submodules.
## Core Components
### Master Repository
Contains:
* Orchestration scripts
* Global configuration (`config.env`)
* Security templates
### Genomes
Individual specialized repositories (e.g. `genome-dev`, `genome-finance`) that act as standalone units of knowledge.
### Security Layers
#### Physical Security
`git-crypt` encrypts `private/` directories at rest.
#### Logical Security
YAML frontmatter (`private: true`) prevents AI agents from leaking sensitive data during public sessions.
#### Validation Layer
A custom linting engine ensures metadata consistency.
---
# Quick Start
## Prerequisites
Required dependencies:
* `git`
* `git-crypt`
* `curl`
* `jq`
Optional:
* `bw` (Bitwarden CLI) — used for runtime key injection
---
## Initialization
```bash
# 1. Clone the master repository
git clone <master-repo-url> && cd master-knowledge-genome
# 2. Run the full setup
# (checks dependencies, creates master scaffold,
# initializes genomes)
make setup
```
# Management Commands
The system is controlled through a centralized Makefile.
| Command | Description |
| ----------------- | -------------------------------------------------------------- |
| `make setup` | Full system initialization (Master + Registry Genomes). |
| `make add-genome` | Scaffolds and registers a new genome (requires NAME and DESC). |
| `make lint` | Runs the validation suite across all genomes. |
| `make status` | Checks Git status and encryption state for all submodules. |
# Validation & Linting (`make lint`)
The built-in linter ensures that the knowledge base remains machine-readable and secure.
It automatically validates:
## Frontmatter Integrity
Every `.md` file must contain valid YAML headers.
## Domain Consistency
Ensures that a file's domain metadata matches its parent genome.
## Privacy Leak Detection
Critical validation step.
Verifies that any file located in a `/private/` directory contains the flag:
```yaml
private: true
```
This prevents accidental exposure during AI sessions.
## Broken Wiki-Links
Detects dead `[[internal-links]]`.
# Security Model
## Hybrid Privacy Architecture
Each genome is divided into two layers.
### Public Layer
Directories:
```text
raw/public/
wiki/public/
```
Characteristics:
* Plaintext
* Shareable with collaborators
### Private Layer
Directories:
```text
raw/private/
wiki/private/
```
Characteristics:
* Encrypted using AES-256 via `git-crypt`
## Runtime Key Injection
To keep the AI environment secure, encryption keys are never stored on the VM disk.
Instead, the system uses Bitwarden (`bw`) / Vaultwarden for runtime injection.
### Example
```bash
# Unlock a genome using a key stored in Vaultwarden
git-crypt unlock <(
bw get notes "genome-dev key" \
--session "$BW_SESSION" | base64 -d
)
```
# Genome Schema
All wiki documents follow a strict schema to support AI ingestion.
## YAML Frontmatter Schema
```yaml
---
title: "Document Title"
type: entity | concept | source | log
domain: genome-name
private: true/false
last_updated: YYYY-MM-DD
---
```
# Agent Interaction
When starting a session with an AI agent, always declare the privacy context.
## Public Context
```text
PRIVATE_CONTEXT: disabled
```
Behavior:
* The agent ignores all private folders.
## Private Context
```text
PRIVATE_CONTEXT: enabled
```
Behavior:
* The agent processes encrypted data.
* Requires the repository to be unlocked.