ekkOS_docs
System Architecture

10-Layer Memory Architecture

ekkOS uses a hierarchical memory system inspired by human cognitive architecture. Each layer serves a specific purpose in the memory lifecycle.

Architecture Overview

The 10-layer system mirrors how human memory works: information flows from short-term working memory through various processing stages to become permanent knowledge. Each layer has different retention policies and serves different retrieval needs.

1
Capture
2
Process
3
Store
4
Retrieve
5
Apply
┌─────────────────────────────────────────────────────────────────┐
│                         INGESTION                                │
│  Chat → Working(L1) → Episodic(L2) → Semantic(L3) → Pattern(L4) │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                         RETRIEVAL                                │
│  Query → Vector Search → Rank by Relevance → Return Context     │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                         APPLICATION                              │
│  Context → AI Prompt → Response → Outcome Tracking → Learn      │
└─────────────────────────────────────────────────────────────────┘

Memory Layers

1

Working Memory

Recent chat messages with 24-hour sliding window

Stores raw conversation data for immediate context. Automatically expires after 24 hours. This is the "scratchpad" where new information first lands.

Retention:24 hours
Data:Chat messages
2

Episodic Memory

Conversation episodes and significant events

Extracts meaningful episodes from working memory. Each episode captures a complete problem-solution cycle or significant interaction.

Retention:30 days
Data:Episodes
3

Semantic Memory

Compressed knowledge and factual concepts

Distills episodes into semantic knowledge. Stores facts, concepts, and relationships as vector embeddings for semantic search.

Retention:Permanent
Data:Knowledge entries
4

Pattern Memory

Reusable problem-solution strategies

Stores proven patterns with success metrics. Each pattern has a confidence score that evolves based on application outcomes.

Retention:Permanent
Data:Patterns
5

Procedural Memory

Step-by-step workflows and processes

Captures multi-step procedures that work. Useful for complex tasks that require specific sequences of actions.

Retention:Permanent
Data:Workflows
6

Collective Memory

Cross-agent shared knowledge

Aggregates learning from all AI agents connected to your account. Enables knowledge transfer between Claude Code, Cursor, and other tools.

Retention:7 days (rolling)
Data:Reflex events
7

Meta Memory

System self-awareness and introspection

Tracks the memory system's own behavior. Monitors pattern effectiveness, identifies drift, and triggers consolidation.

Retention:Permanent
Data:System records
8

Codebase Memory

Code embeddings for semantic search

Indexes your codebase for semantic search. Enables natural language queries like "find the authentication logic" to locate relevant code.

Retention:Permanent
Data:Code embeddings
9

Directives

MUST/NEVER/PREFER/AVOID rules

Stores explicit behavioral rules with priority levels (300-1000). High-priority directives override conflicting lower-level knowledge.

Retention:Permanent
Data:Rules
10

Conflict Resolution

Decision arbitration and resolution logs

Records how conflicting information was resolved. Maintains audit trail for debugging and understanding AI decisions.

Retention:Permanent
Data:Resolution logs

Data Flow

Ingestion Pipeline

  1. 1
    Capture — Raw conversation messages are captured and stored in Working Memory (Layer 1)
  2. 2
    Episode Extraction — Every 5 minutes, the episodic ingestion worker processes working memory to identify complete episodes
  3. 3
    Semantic Compression — Episodes are compressed into semantic knowledge entries with vector embeddings
  4. 4
    Pattern Discovery — The system identifies reusable patterns and stores them in Pattern Memory

Retrieval Pipeline

  1. 1
    Query Processing — User query is converted to a vector embedding
  2. 2
    Multi-Layer Search — Parallel search across relevant memory layers using HNSW indexes
  3. 3
    Result Ranking — Results ranked by relevance, recency, and confidence scores
  4. 4
    Context Assembly — Top results assembled into coherent context for the AI prompt

Performance Characteristics

18ms
Average retrieval latency
P99: <50ms
1M+
Vectors per index
HNSW with ef=64
1536
Embedding dimensions
OpenAI text-embedding-3-small

Technology Stack

Storage

  • PostgreSQL + pgvector — Vector storage and search
  • Supabase — Managed PostgreSQL with RLS
  • HNSW Indexes — Fast approximate nearest neighbor search

Processing

  • OpenAI Embeddings — text-embedding-3-small
  • Claude/GPT — Pattern extraction and summarization
  • PM2 Workers — Background ingestion pipelines

Learn More