Veritheia Documentation

An environment for inquiry - complete documentation

View the Project on GitHub ZipThought/veritheia

Veritheia MVP Specification

1. Overview

This document specifies the Minimum Viable Product (MVP) features for Veritheia. Part A defines features for immediate implementation. Part B outlines post-MVP capabilities to inform architectural decisions. The MVP targets educational institutions and research groups requiring epistemic infrastructure for knowledge work.

Part A: MVP Features

2. Knowledge Database

The Knowledge Database provides persistent storage for documents and metadata without content generation.

2.1 Ingestion & Storage

ID Feature Description
1.1.1 Source Artifact Support Ingest and store PDF and plain text (.txt) files
1.1.2 Artifact Storage Manage user-defined local file system directory as immutable Raw Corpus
1.1.3 Database Schema PostgreSQL schema for all processed data and metadata

2.2 Processed Representation

ID Feature Description
1.2.1 Metadata Extraction User-initiated extraction of standard metadata (Title, Authors) from PDFs
1.2.2 Full-Text Indexing Generate and store full-text search index for all content
1.2.3 Vector Embedding Storage Store embeddings using PostgreSQL pgvector extension
1.2.4 Data Provenance Version all processed data with model/version metadata

2.3 Knowledge Layer API

ID Feature Description
1.3.1 Artifact & Metadata API CRUD endpoints for artifacts and metadata
1.3.2 Full-Text Search API Keyword-based search endpoint
1.3.3 Semantic Search API Vector similarity search endpoint (cosine distance)
1.3.4 Scoped Query API Optional scope parameter for all search/retrieval endpoints

2.4 Knowledge Scoping

ID Feature Description
1.4.1 Scope Management API Create, update, delete, and list knowledge scopes
1.4.2 Scope Types Project, Topic, Subject, and Custom scope types
1.4.3 Scope Hierarchy Nested scopes with parent-child relationships
1.4.4 Artifact-Scope Association Assign artifacts to scopes with bulk operation support

II. Process Engine

The orchestrating layer that executes all logic and workflows through a unified process architecture.

2.1 Process Architecture

ID Feature Description
2.1.1 Process Interface Common interface for all processes enabling uniform execution and monitoring
2.1.2 Process Registry Service for process discovery and metadata retrieval
2.1.3 Process Execution Runtime engine that executes processes with consistent error handling and result management
2.1.4 User-Triggered Execution Process execution initiated by explicit user action

2.2 Platform Services

ID Feature Description
2.2.1 Document Ingestion User-initiated pipeline for file processing: PDF/text extraction, chunking, embedding generation, indexing
2.2.2 Text Extraction Service Extract and clean text from PDFs and text files
2.2.3 Embedding Generation Generate vector embeddings for text chunks using configured Cognitive System
2.2.4 Metadata Extraction Extract title, authors, and other metadata from documents
2.2.5 Document Chunking Split documents into semantic chunks for processing

2.3 Reference Processes

2.3.1 Systematic Screening Process

ID Feature Description
2.3.1.1 Author’s Research Questions User composes their research questions, which become part of their journey’s conceptual framework
2.3.1.2 Personal Definitions User defines key terms from their perspective, shaping how the system interprets documents
2.3.1.3 AI Relevance Assessment AI acts as librarian measuring binary (T/F) + score (0-1) on relevance to user’s RQs, with rationale
2.3.1.4 AI Contribution Assessment AI acts as peer reviewer measuring if document directly answers user’s RQs (higher bar than relevance)
2.3.1.5 Dual Rationale Presentation AI provides distinct rationales: librarian perspective for relevance, peer reviewer for contribution
2.3.1.6 Interactive Results Table Sortable/filterable table showing title, authors, relevance score/rationale, contribution score/rationale
2.3.1.7 User-Driven Corpus Triage User interprets AI assessments to identify: core papers (high contribution), contextual papers (relevant only), papers to set aside

2.3.2 Constrained Composition Process

ID Feature Description
2.3.2.1 Source Material Selection Choose from uploaded corpus materials and specify sections/chapters
2.3.2.2 Task Type Selection Predefined pedagogical task types (e.g., descriptive writing, analysis, reflection)
2.3.2.3 Learning Objective Input Clear statement of what students should demonstrate
2.3.2.4 Content Constraints Boundaries for generated prompts (topic limits, tone, age-appropriateness)
2.3.2.5 Answer Constraints Rules for valid responses (word count, required elements, vocabulary level)
2.3.2.6 Prompt Generation AI-generated writing prompt based on source material and constraints
2.3.2.7 Rubric Generation Point-based grading rubric aligned with learning objectives
2.3.2.8 Assignment Management Save, edit, and distribute assignments to students
2.3.2.9 Student Submission Text input interface for student responses
2.3.2.10 AI Formative Assessment Real-time evaluation against teacher-approved rubric, measuring performance against criteria
2.3.2.11 Teacher Sovereignty Teachers review all AI assessments, can override grades, and maintain complete pedagogical control
2.3.2.12 Formation Analytics Dashboard reveals patterns in student understanding, informing teacher’s next instructional moves

2.4 Process Categories

Additional processes can extend these patterns:

Category Description Pattern
Methodological Processes Structure inquiry through established methodologies Analytical
Developmental Processes Present progressive challenges for skill development Compositional
Analytical Processes Structure systematic examination for pattern discovery Analytical
Compositional Processes Structure creation exercises for expressive development Compositional
Reflective Processes Structure contemplation exercises for deeper understanding Mixed

See EXTENSION-GUIDE.md for detailed implementation patterns.

III. Presentation (Desktop Web Client)

The primary user interface for the MVP, delivered as a self-contained desktop application.

3.1 Library Management

ID Feature Description
3.1.1 Artifact Upload UI for uploading PDF and .txt files
3.1.2 Library View Browsable list/table with metadata, scope filters, and scope indicators
3.1.3 Artifact Detail View Full metadata display with provenance info and scope management
3.1.4 Artifact Deletion Remove artifact and all associated processed data

3.2 Process Execution Interface

ID Feature Description
3.2.1 Process Selection List available analytical processes with descriptions
3.2.2 Dynamic Input Forms Render process-specific input forms based on process definition
3.2.3 Execution Monitoring Display process progress and status during execution
3.2.4 Result Display Process-specific result rendering (tables, visualizations, reports)

3.3 Search & Discovery

ID Feature Description
3.3.1 Search Interface Unified search bar for keyword/semantic search with scope selector
3.3.2 Document Viewer Integrated PDF/text viewer for artifact content
3.3.3 Artifact Navigation Browse between search results and artifact details

3.4 Scope Management

ID Feature Description
3.4.1 Scope Manager Tree view UI for scope hierarchy with CRUD operations and statistics
3.4.2 Bulk Assignment Multi-select artifacts for bulk scope assignment
3.4.3 Scope Navigation “Enter” scope to constrain all operations

3.5 User Contexts

ID Feature Description
3.5.1 Process-Based Context User interface adapts based on active process (researcher view vs educator view)
3.5.2 Assignment Access Students see only assigned tasks, educators see creation and review interfaces
3.5.3 Result Visibility Process determines what results users can access (own work vs class overview)

IV. User & Journey Model

The system for managing users and their intellectual journeys.

4.1 User Management

ID Feature Description
4.1.1 User Registration Basic user account creation with identity verification
4.1.2 Authentication Secure login with session management
4.1.3 User Profile Minimal profile for identification within journeys
4.1.4 Process Access Configure which processes users can access

4.2 Journey Management

ID Feature Description
4.2.1 Journey Creation Initiate new journey with selected process
4.2.2 Journey State Track progress within process workflow
4.2.3 Journey Context Maintain process-specific working memory
4.2.4 Journey List View and resume active journeys

4.3 Journal System with Edge-Linking and Long-Memory

ID Feature Description
4.3.1 Structured Journaling Framework-based journaling with templates for different intellectual activities
4.3.2 Journal Types Research, Method, Decision, and Reflection journals
4.3.3 Entry Recording Structured narrative entries at key process points
4.3.4 Edge-Linking Connect ideas across journal entries and documents through semantic relationships
4.3.5 Long-Memory Timelines Track intellectual development over extended periods with temporal navigation
4.3.6 Recursive Synthesis Iteratively deepen understanding through revisiting and refining previous entries
4.3.7 Context Assembly Extract relevant entries for process context

4.4 Persona Development

ID Feature Description
4.4.1 Vocabulary Tracking Build user’s conceptual vocabulary from journal entries
4.4.2 Pattern Recognition Identify user’s inquiry patterns across journeys
4.4.3 Context Personalization Adapt process interactions to user’s style

V. Cognitive System

The pluggable, LLM-based reasoning component.

5.1 Core Components

ID Feature Description
5.1.1 Adaptor Interface ICognitiveAdapter with CreateEmbeddingsAsync and GenerateTextAsync
5.1.2 Local Mode Default implementation using local inference (e.g., Ollama)
5.1.3 External API Mode Optional implementation for cloud APIs (e.g., OpenAI)

5.2 Context Management

ID Feature Description
5.2.1 Context Window Handling Manage content within available context size
5.2.2 Context Priority Essential elements prioritized for smaller windows
5.2.3 Narrative Coherence Maintain story flow in compressed contexts

VI. Deployment & Administration

Functionalities required for the infrastructure to be deployed and managed locally.

6.1 Installation

ID Feature Description
6.1.1 Desktop Installer Single package for Windows/macOS/Linux with all components

6.2 Configuration

ID Feature Description
6.2.1 Settings UI Dedicated settings panel in desktop app
6.2.2 Database Config Set file system path for Raw Corpus
6.2.3 Cognitive Config Toggle Local/External mode, model selection, API credentials
6.2.4 Scope Config Default assignment rules, mandatory scope option

Part B: Post-MVP Roadmap (Conceptual Overview)

These features guide architectural decisions but are not part of the initial release.

I. Collaborative Journeys

Allow multiple users to participate in shared intellectual endeavors.

Classroom Journeys

Research Group Journeys

II. Journey Templates

Pre-structured journeys for common use cases.

Curriculum Templates

Methodology Templates

III. Journal Sharing

Transform private journals into community resources.

Shareable Journal Types

Journal Libraries

IV. Advanced Analytics

Deeper insights through extended context and pattern recognition.

Cross-Journey Analysis

Formation Metrics

V. Institutional Features

Enterprise capabilities for educational institutions.

Multi-Tenant Architecture

Compliance & Governance

VI. Advanced Process Types

Sophisticated analytical capabilities.

Meta-Analytical Processes

Longitudinal Processes

VII. Integration Ecosystem

Connect with external systems and workflows.

Learning Management Systems

Research Infrastructure

VIII. Mobile & Cloud

Extend beyond desktop deployment.

Mobile Companions

Cloud Deployment

The architecture ensures these future capabilities can be added without fundamental restructuring.