Veritheia Implementation

1. Overview

This document specifies the technical implementation of Veritheia. The system operates as a local-first epistemic infrastructure with four primary components: PostgreSQL with pgvector for knowledge storage, ASP.NET Core for process orchestration, adapter-based LLM integration for assessments, and Blazor Server for user interfaces. All components enforce user data sovereignty and prevent automated insight generation.

2. Technology Stack

2.1 Knowledge Database

PostgreSQL 16 with pgvector extension provides unified storage for documents, metadata, and embeddings. The pgvector extension enables efficient similarity search over 1536-dimensional embedding vectors while maintaining ACID guarantees for relational data. Deployment uses containerization through .NET Aspire for consistent development and production environments.

2.2 Process Engine

ASP.NET Core 8.0 implements the Process Engine as a RESTful API service. The architecture employs Domain-Driven Design with aggregate boundaries around User, Journey, and Document entities. Data access uses the Repository pattern with Entity Framework Core 8.0, while CQRS separates read and write operations for scalability.

2.3 Presentation Tier

Blazor Server provides the web interface, enabling real-time updates through SignalR connections. This architecture choice eliminates JavaScript complexity while maintaining responsive user experiences. Component design follows a strict separation between user-authored content display and system-provided structure.

2.4 Cognitive System Integration

The ICognitiveAdapter interface abstracts LLM implementation details, supporting multiple backends: LlamaCppAdapter for local inference, SemanticKernelAdapter for Microsoft Semantic Kernel, and OpenAIAdapter for cloud-based models. Each adapter implements assessment-only operations, preventing insight generation through prompt engineering constraints.

Data Architecture

Entity Model

The data layer (veritheia.Data) defines these core entities:

Document: Represents raw corpus materials (PDFs, text files)
ProcessedContent: Stores embeddings and extracted text chunks
KnowledgeScope: Defines virtual boundaries for knowledge organization
ProcessDefinition: Metadata describing available processes
ProcessExecution: Tracks process runs and their state
ProcessResult: Stores process outputs with extensible JSON schema
User: Core identity with associated persona and knowledge base
Journey: User’s engagement instance with a process
Journal: Narrative record of intellectual development (Research, Method, Decision, Reflection)
JournalEntry: Individual narrative entries within journals

Primary Key Strategy

All entities use ULID (Universally Unique Lexicographically Sortable Identifier) as primary keys:

Format: 26-character string representation
Benefits:
- Lexicographically sortable (time-ordered)
- Globally unique without coordination
- Better index performance than UUID
- Human-readable when encoded
Implementation: Custom EF Core value converter for ULID ↔ string conversion
Example: 01ARZ3NDEKTSV4RRFFQ69G5FAV

Database Design Patterns

Repository Pattern: Generic IRepository<T> with concrete implementations
Unit of Work: Transaction management across repositories
Specification Pattern: Complex queries via ISpecification<T>
Value Converters: ULID, UTC DateTime, JSONB for PostgreSQL
Soft Deletes: Logical deletion with DeletedAt timestamp
Auditing: CreatedAt, UpdatedAt, CreatedBy, UpdatedBy on all entities

Vector Storage Strategy

Embedding Dimension: 1536 (compatible with common models)
Index Type: IVFFlat with cosine distance for similarity search
Query Optimization: Approximate nearest neighbor for performance

Database Migrations

The system uses Entity Framework Core migrations with this workflow:

Define entity changes in veritheia.Data
Generate migrations from veritheia.ApiService context
Apply migrations during startup or deployment

Service Architecture

Dependency Injection Structure

The application uses ASP.NET Core’s built-in dependency injection with these service lifetimes:

Scoped Services: Database contexts, repositories, process instances
Singleton Services: Configuration, cognitive adapters, caching
Transient Services: Validators, mappers, utilities

Platform Services

The platform provides guaranteed services that all processes can depend on:

Document Encounter Service: Records HOW and WHY a document entered a journey
Conceptual Embedding Service: Creates embeddings that reflect the author’s conceptual framework
Personal Metadata Service: Extracts metadata relevant to the author’s inquiry
Semantic Chunking Service: Splits documents based on the author’s conceptual boundaries
Journey Repository: All data access filtered through journey context

Process Registration

Processes are registered through a convention-based pattern that ensures proper dependency injection and discovery. Each process is registered both as itself and as an IAnalyticalProcess implementation.

Process Architecture

Process Execution Flow

Input Collection: Dynamic forms generated from process definition
Context Creation: User journey and inputs packaged into ProcessContext
Process Execution: Business logic runs with access to platform services
Result Storage: Outputs saved with full provenance and versioning
Result Rendering: Process-specific UI components display results

Reference Process Patterns

Systematic Screening Process (Analytical Pattern)

Implements dual assessment: relevance and contribution
Uses cognitive system in two distinct modes (librarian vs peer reviewer)
Produces filterable results with detailed rationales
Demonstrates journey-specific analysis

Journal Integration:

Research Journal: Records findings about relevant papers
Decision Journal: Documents inclusion/exclusion rationales
Method Journal: Captures evolving search strategies
Reflection Journal: Notes emerging patterns and insights

Guided Composition Process (Compositional Pattern)

Generates constrained content based on source materials
Creates evaluation rubrics aligned with objectives
Implements real-time assessment with feedback
Shows teacher/student role differentiation

Journal Integration:

Method Journal: Teaching approaches and constraint design
Decision Journal: Rubric adjustments and grading overrides
Reflection Journal: Student progress observations
Research Journal: Pedagogical insights from assignments

Process Context

Every process execution receives a context that includes:

Current knowledge scope
User journey with assembled journal context
Process-specific inputs
Execution metadata
Platform service references

Context Assembly:

Journal Selection: Relevant journals for current task
Entry Extraction: Recent significant entries
Narrative Compression: Maintaining coherence within token limits
Persona Integration: User’s conceptual vocabulary and patterns

This context ensures outputs remain personally relevant and meaningful within the specific inquiry.

Extension Architecture

Extension Points

The system provides several extension points for adding new capabilities:

Process Extensions: New analytical workflows via IAnalyticalProcess
Data Model Extensions: Domain-specific entities related to process executions
UI Component Extensions: Custom Blazor components for process interfaces
Result Renderer Extensions: Specialized visualization for process outputs

Extension Integration

Extensions integrate through:

Service registration in dependency injection container
Entity Framework migrations for data model changes
Blazor component registration for UI elements
Process registry for discovery and metadata

For detailed extension development, see EXTENSION-GUIDE.md.

Development Environment

Local Development Setup

Prerequisites: .NET 8 SDK, Docker Desktop, PostgreSQL client tools
Configuration: Local settings in appsettings.Development.json
Startup: Run via .NET Aspire for orchestrated services
Access Points:
- Web UI: https://localhost:5001
- API: https://localhost:5000
- Aspire Dashboard: https://localhost:15000

Testing Approach

Unit Tests: Domain logic and service methods
Integration Tests: API endpoints with test containers
E2E Tests: UI workflows with Playwright
Performance Tests: Vector search and embedding generation

Debugging Tools

Aspire Dashboard for distributed tracing
Structured logging with Serilog
PostgreSQL query analysis
Browser developer tools for Blazor

Security Patterns

Authentication & Authorization

ASP.NET Core Identity for user management
JWT tokens for API authentication
Process-based authorization (users see only their executions)
Scope-based data access control

Data Protection

Encryption at rest via PostgreSQL
TLS for all network communication
No sensitive data in logs
User journey isolation

Performance Optimization

Caching Strategy

Redis for frequently accessed metadata
In-memory cache for static data
Output caching for read-heavy endpoints
Embedding cache to avoid recomputation

Scaling Patterns

Horizontal scaling for API instances
Read replicas for database queries
Background workers for embedding generation
CDN for static assets

Deployment Considerations

Container Strategy

Multi-stage Docker builds for optimization
.NET Aspire for local orchestration
Kubernetes manifests for production
Health checks for all services

Configuration Management

Environment-specific settings
Secret management via platform
Feature flags for gradual rollout
Telemetry configuration

Monitoring & Observability

Application Insights or OpenTelemetry
Structured logging with correlation
Performance counters
Custom metrics for process execution

API Design Principles

RESTful Conventions

Resource-based URLs
Proper HTTP verbs
Consistent response formats
HATEOAS where appropriate

Response Patterns

All API responses follow a consistent structure with success indicators, data payloads, and error information. Pagination is implemented for list endpoints.

Versioning Strategy

URL-based versioning (v1, v2)
Backward compatibility commitment
Deprecation notices in headers
Migration guides for breaking changes

Design Patterns

All implementations MUST follow the imperative patterns documented in DESIGN-PATTERNS.md.

Key patterns include:

Domain-Driven Design with aggregate boundaries
Repository and Specification patterns
Result pattern for operation outcomes
Process Context for execution state
Adapter pattern for Cognitive System
Unit of Work for transaction management
CQRS for command/query separation

See DESIGN-PATTERNS.md for complete implementation details and code examples.