3.9 KiB
Knowledge Genome System
A distributed, modular, and secure personal knowledge base architecture.
The Knowledge Genome System is a framework designed to manage personal knowledge using a "Master-Genome" architecture. It follows the LLM-Wiki patterns (Karpathy-style) while adding a robust security layer for sensitive data and automated quality control.
Architecture
This project is structured as a Master Orchestrator that manages multiple independent Genomes via Git Submodules.
Core Components
Master Repository
Contains:
- Orchestration scripts
- Global configuration (
config.env) - Security templates
Genomes
Individual specialized repositories (e.g. genome-dev, genome-finance) that act as standalone units of knowledge.
Security Layers
Physical Security
git-crypt encrypts private/ directories at rest.
Logical Security
YAML frontmatter (private: true) prevents AI agents from leaking sensitive data during public sessions.
Validation Layer
A custom linting engine ensures metadata consistency.
Quick Start
Prerequisites
Required dependencies:
gitgit-cryptcurljq
Optional:
bw(Bitwarden CLI) — used for runtime key injection
Initialization
# 1. Clone the master repository
git clone <master-repo-url> && cd master-knowledge-genome
# 2. Run the full setup
# (checks dependencies, creates master scaffold,
# initializes genomes)
make setup
Management Commands
The system is controlled through a centralized Makefile.
| Command | Description |
|---|---|
make setup |
Full system initialization (Master + Registry Genomes). |
make add-genome |
Scaffolds and registers a new genome (requires NAME and DESC). |
make lint |
Runs the validation suite across all genomes. |
make status |
Checks Git status and encryption state for all submodules. |
Validation & Linting (make lint)
The built-in linter ensures that the knowledge base remains machine-readable and secure.
It automatically validates:
Frontmatter Integrity
Every .md file must contain valid YAML headers.
Domain Consistency
Ensures that a file's domain metadata matches its parent genome.
Privacy Leak Detection
Critical validation step.
Verifies that any file located in a /private/ directory contains the flag:
private: true
This prevents accidental exposure during AI sessions.
Broken Wiki-Links
Detects dead [[internal-links]].
Security Model
Hybrid Privacy Architecture
Each genome is divided into two layers.
Public Layer
Directories:
raw/public/
wiki/public/
Characteristics:
- Plaintext
- Shareable with collaborators
Private Layer
Directories:
raw/private/
wiki/private/
Characteristics:
- Encrypted using AES-256 via
git-crypt
Runtime Key Injection
To keep the AI environment secure, encryption keys are never stored on the VM disk.
Instead, the system uses Bitwarden (bw) / Vaultwarden for runtime injection.
Example
# Unlock a genome using a key stored in Vaultwarden
git-crypt unlock <(
bw get notes "genome-dev key" \
--session "$BW_SESSION" | base64 -d
)
Genome Schema
All wiki documents follow a strict schema to support AI ingestion.
YAML Frontmatter Schema
---
title: "Document Title"
type: entity | concept | source | log
domain: genome-name
private: true/false
last_updated: YYYY-MM-DD
---
Agent Interaction
When starting a session with an AI agent, always declare the privacy context.
Public Context
PRIVATE_CONTEXT: disabled
Behavior:
- The agent ignores all private folders.
Private Context
PRIVATE_CONTEXT: enabled
Behavior:
- The agent processes encrypted data.
- Requires the repository to be unlocked.