Matteo Cherubini 11b1245e98 docs: Add comprehensive README for Knowledge Genome System

2026-05-08 21:10:12 +02:00

3.9 KiB

Raw Blame History

Knowledge Genome System

A distributed, modular, and secure personal knowledge base architecture.

The Knowledge Genome System is a framework designed to manage personal knowledge using a "Master-Genome" architecture. It follows the LLM-Wiki patterns (Karpathy-style) while adding a robust security layer for sensitive data and automated quality control.

Architecture

This project is structured as a Master Orchestrator that manages multiple independent Genomes via Git Submodules.

Core Components

Master Repository

Contains:

Orchestration scripts
Global configuration (config.env)
Security templates

Genomes

Individual specialized repositories (e.g. genome-dev, genome-finance) that act as standalone units of knowledge.

Security Layers

Physical Security

git-crypt encrypts private/ directories at rest.

Logical Security

YAML frontmatter (private: true) prevents AI agents from leaking sensitive data during public sessions.

Validation Layer

A custom linting engine ensures metadata consistency.

Quick Start

Prerequisites

Required dependencies:

git
git-crypt
curl
jq

Optional:

bw (Bitwarden CLI) — used for runtime key injection

Initialization

# 1. Clone the master repository
git clone <master-repo-url> && cd master-knowledge-genome

# 2. Run the full setup
#    (checks dependencies, creates master scaffold,
#    initializes genomes)
make setup

Management Commands

The system is controlled through a centralized Makefile.

Command	Description
`make setup`	Full system initialization (Master + Registry Genomes).
`make add-genome`	Scaffolds and registers a new genome (requires NAME and DESC).
`make lint`	Runs the validation suite across all genomes.
`make status`	Checks Git status and encryption state for all submodules.

Validation & Linting (`make lint`)

The built-in linter ensures that the knowledge base remains machine-readable and secure.

It automatically validates:

Frontmatter Integrity

Every .md file must contain valid YAML headers.

Domain Consistency

Ensures that a file's domain metadata matches its parent genome.

Privacy Leak Detection

Critical validation step.

Verifies that any file located in a /private/ directory contains the flag:

private: true

This prevents accidental exposure during AI sessions.

Broken Wiki-Links

Detects dead [[internal-links]].

Security Model

Hybrid Privacy Architecture

Each genome is divided into two layers.

Public Layer

Directories:

raw/public/
wiki/public/

Characteristics:

Plaintext
Shareable with collaborators

Private Layer