# Knowledge Genome System > A distributed, modular, and secure personal knowledge base architecture. The **Knowledge Genome System** is a framework designed to manage personal knowledge using a "Master-Genome" architecture. It follows the LLM-Wiki patterns (Karpathy-style) while adding a robust security layer for sensitive data and automated quality control. --- # Architecture This project is structured as a **Master Orchestrator** that manages multiple independent **Genomes** via Git Submodules. ## Core Components ### Master Repository Contains: * Orchestration scripts * Global configuration (`config.env`) * Security templates ### Genomes Individual specialized repositories (e.g. `genome-dev`, `genome-finance`) that act as standalone units of knowledge. ### Security Layers #### Physical Security `git-crypt` encrypts `private/` directories at rest. #### Logical Security YAML frontmatter (`private: true`) prevents AI agents from leaking sensitive data during public sessions. #### Validation Layer A custom linting engine ensures metadata consistency. --- # Quick Start ## Prerequisites Required dependencies: * `git` * `git-crypt` * `curl` * `jq` Optional: * `bw` (Bitwarden CLI) — used for runtime key injection --- ## Initialization ```bash # 1. Clone the master repository git clone && cd master-knowledge-genome # 2. Run the full setup # (checks dependencies, creates master scaffold, # initializes genomes) make setup ``` # Management Commands The system is controlled through a centralized Makefile. | Command | Description | | ----------------- | -------------------------------------------------------------- | | `make setup` | Full system initialization (Master + Registry Genomes). | | `make add-genome` | Scaffolds and registers a new genome (requires NAME and DESC). | | `make lint` | Runs the validation suite across all genomes. | | `make status` | Checks Git status and encryption state for all submodules. | # Validation & Linting (`make lint`) The built-in linter ensures that the knowledge base remains machine-readable and secure. It automatically validates: ## Frontmatter Integrity Every `.md` file must contain valid YAML headers. ## Domain Consistency Ensures that a file's domain metadata matches its parent genome. ## Privacy Leak Detection Critical validation step. Verifies that any file located in a `/private/` directory contains the flag: ```yaml private: true ``` This prevents accidental exposure during AI sessions. ## Broken Wiki-Links Detects dead `[[internal-links]]`. # Security Model ## Hybrid Privacy Architecture Each genome is divided into two layers. ### Public Layer Directories: ```text raw/public/ wiki/public/ ``` Characteristics: * Plaintext * Shareable with collaborators ### Private Layer Directories: ```text raw/private/ wiki/private/ ``` Characteristics: * Encrypted using AES-256 via `git-crypt` ## Runtime Key Injection To keep the AI environment secure, encryption keys are never stored on the VM disk. Instead, the system uses Bitwarden (`bw`) / Vaultwarden for runtime injection. ### Example ```bash # Unlock a genome using a key stored in Vaultwarden git-crypt unlock <( bw get notes "genome-dev key" \ --session "$BW_SESSION" | base64 -d ) ``` # Genome Schema All wiki documents follow a strict schema to support AI ingestion. ## YAML Frontmatter Schema ```yaml --- title: "Document Title" type: entity | concept | source | log domain: genome-name private: true/false last_updated: YYYY-MM-DD --- ``` # Agent Interaction When starting a session with an AI agent, always declare the privacy context. ## Public Context ```text PRIVATE_CONTEXT: disabled ``` Behavior: * The agent ignores all private folders. ## Private Context ```text PRIVATE_CONTEXT: enabled ``` Behavior: * The agent processes encrypted data. * Requires the repository to be unlocked.