Veritheia MVP Specification

1. Overview

This document specifies the Minimum Viable Product (MVP) features for Veritheia. Part A defines features for immediate implementation. Part B outlines post-MVP capabilities to inform architectural decisions. The MVP targets educational institutions and research groups requiring epistemic infrastructure for knowledge work.

Part A: MVP Features

2. Knowledge Database

The Knowledge Database provides persistent storage for documents and metadata without content generation.

2.1 Ingestion & Storage

ID	Feature	Description
1.1.1	Source Artifact Support	Ingest and store PDF and plain text (.txt) files
1.1.2	Artifact Storage	Manage user-defined local file system directory as immutable Raw Corpus
1.1.3	Database Schema	PostgreSQL schema for all processed data and metadata

2.2 Processed Representation

ID	Feature	Description
1.2.1	Metadata Extraction	User-initiated extraction of standard metadata (Title, Authors) from PDFs
1.2.2	Full-Text Indexing	Generate and store full-text search index for all content
1.2.3	Vector Embedding Storage	Store embeddings using PostgreSQL pgvector extension
1.2.4	Data Provenance	Version all processed data with model/version metadata

2.3 Knowledge Layer API

ID	Feature	Description
1.3.1	Artifact & Metadata API	CRUD endpoints for artifacts and metadata
1.3.2	Full-Text Search API	Keyword-based search endpoint
1.3.3	Semantic Search API	Vector similarity search endpoint (cosine distance)
1.3.4	Scoped Query API	Optional scope parameter for all search/retrieval endpoints

2.4 Knowledge Scoping

ID	Feature	Description
1.4.1	Scope Management API	Create, update, delete, and list knowledge scopes
1.4.2	Scope Types	Project, Topic, Subject, and Custom scope types
1.4.3	Scope Hierarchy	Nested scopes with parent-child relationships
1.4.4	Artifact-Scope Association	Assign artifacts to scopes with bulk operation support

II. Process Engine

The orchestrating layer that executes all logic and workflows through a unified process architecture.

2.1 Process Architecture

ID	Feature	Description
2.1.1	Process Interface	Common interface for all processes enabling uniform execution and monitoring
2.1.2	Process Registry	Service for process discovery and metadata retrieval
2.1.3	Process Execution	Runtime engine that executes processes with consistent error handling and result management
2.1.4	User-Triggered Execution	Process execution initiated by explicit user action

2.2 Platform Services

ID	Feature	Description
2.2.1	Document Ingestion	User-initiated pipeline for file processing: PDF/text extraction, chunking, embedding generation, indexing
2.2.2	Text Extraction Service	Extract and clean text from PDFs and text files
2.2.3	Embedding Generation	Generate vector embeddings for text chunks using configured Cognitive System
2.2.4	Metadata Extraction	Extract title, authors, and other metadata from documents
2.2.5	Document Chunking	Split documents into semantic chunks for processing

2.3 Reference Processes

2.3.1 Systematic Screening Process

ID	Feature	Description
2.3.1.1	Author’s Research Questions	User composes their research questions, which become part of their journey’s conceptual framework
2.3.1.2	Personal Definitions	User defines key terms from their perspective, shaping how the system interprets documents
2.3.1.3	AI Relevance Assessment	AI acts as librarian measuring binary (T/F) + score (0-1) on relevance to user’s RQs, with rationale
2.3.1.4	AI Contribution Assessment	AI acts as peer reviewer measuring if document directly answers user’s RQs (higher bar than relevance)
2.3.1.5	Dual Rationale Presentation	AI provides distinct rationales: librarian perspective for relevance, peer reviewer for contribution
2.3.1.6	Interactive Results Table	Sortable/filterable table showing title, authors, relevance score/rationale, contribution score/rationale
2.3.1.7	User-Driven Corpus Triage	User interprets AI assessments to identify: core papers (high contribution), contextual papers (relevant only), papers to set aside

2.3.2 Constrained Composition Process

ID	Feature	Description
2.3.2.1	Source Material Selection	Choose from uploaded corpus materials and specify sections/chapters
2.3.2.2	Task Type Selection	Predefined pedagogical task types (e.g., descriptive writing, analysis, reflection)
2.3.2.3	Learning Objective Input	Clear statement of what students should demonstrate
2.3.2.4	Content Constraints	Boundaries for generated prompts (topic limits, tone, age-appropriateness)
2.3.2.5	Answer Constraints	Rules for valid responses (word count, required elements, vocabulary level)
2.3.2.6	Prompt Generation	AI-generated writing prompt based on source material and constraints
2.3.2.7	Rubric Generation	Point-based grading rubric aligned with learning objectives
2.3.2.8	Assignment Management	Save, edit, and distribute assignments to students
2.3.2.9	Student Submission	Text input interface for student responses
2.3.2.10	AI Formative Assessment	Real-time evaluation against teacher-approved rubric, measuring performance against criteria
2.3.2.11	Teacher Sovereignty	Teachers review all AI assessments, can override grades, and maintain complete pedagogical control
2.3.2.12	Formation Analytics	Dashboard reveals patterns in student understanding, informing teacher’s next instructional moves

2.4 Process Categories

Additional processes can extend these patterns:

Category	Description	Pattern
Methodological Processes	Structure inquiry through established methodologies	Analytical
Developmental Processes	Present progressive challenges for skill development	Compositional
Analytical Processes	Structure systematic examination for pattern discovery	Analytical
Compositional Processes	Structure creation exercises for expressive development	Compositional
Reflective Processes	Structure contemplation exercises for deeper understanding	Mixed

See EXTENSION-GUIDE.md for detailed implementation patterns.

III. Presentation (Desktop Web Client)

The primary user interface for the MVP, delivered as a self-contained desktop application.

3.1 Library Management

ID	Feature	Description
3.1.1	Artifact Upload	UI for uploading PDF and .txt files
3.1.2	Library View	Browsable list/table with metadata, scope filters, and scope indicators
3.1.3	Artifact Detail View	Full metadata display with provenance info and scope management
3.1.4	Artifact Deletion	Remove artifact and all associated processed data

3.2 Process Execution Interface

ID	Feature	Description
3.2.1	Process Selection	List available analytical processes with descriptions
3.2.2	Dynamic Input Forms	Render process-specific input forms based on process definition
3.2.3	Execution Monitoring	Display process progress and status during execution
3.2.4	Result Display	Process-specific result rendering (tables, visualizations, reports)

3.3 Search & Discovery

ID	Feature	Description
3.3.1	Search Interface	Unified search bar for keyword/semantic search with scope selector
3.3.2	Document Viewer	Integrated PDF/text viewer for artifact content
3.3.3	Artifact Navigation	Browse between search results and artifact details

3.4 Scope Management

ID	Feature	Description
3.4.1	Scope Manager	Tree view UI for scope hierarchy with CRUD operations and statistics
3.4.2	Bulk Assignment	Multi-select artifacts for bulk scope assignment
3.4.3	Scope Navigation	“Enter” scope to constrain all operations

3.5 User Contexts

ID	Feature	Description
3.5.1	Process-Based Context	User interface adapts based on active process (researcher view vs educator view)
3.5.2	Assignment Access	Students see only assigned tasks, educators see creation and review interfaces
3.5.3	Result Visibility	Process determines what results users can access (own work vs class overview)

IV. User & Journey Model

The system for managing users and their intellectual journeys.

4.1 User Management

ID	Feature	Description
4.1.1	User Registration	Basic user account creation with identity verification
4.1.2	Authentication	Secure login with session management
4.1.3	User Profile	Minimal profile for identification within journeys
4.1.4	Process Access	Configure which processes users can access

4.2 Journey Management

ID	Feature	Description
4.2.1	Journey Creation	Initiate new journey with selected process
4.2.2	Journey State	Track progress within process workflow
4.2.3	Journey Context	Maintain process-specific working memory
4.2.4	Journey List	View and resume active journeys

4.3 Journal System with Edge-Linking and Long-Memory

ID	Feature	Description
4.3.1	Structured Journaling	Framework-based journaling with templates for different intellectual activities
4.3.2	Journal Types	Research, Method, Decision, and Reflection journals
4.3.3	Entry Recording	Structured narrative entries at key process points
4.3.4	Edge-Linking	Connect ideas across journal entries and documents through semantic relationships
4.3.5	Long-Memory Timelines	Track intellectual development over extended periods with temporal navigation
4.3.6	Recursive Synthesis	Iteratively deepen understanding through revisiting and refining previous entries
4.3.7	Context Assembly	Extract relevant entries for process context

4.4 Persona Development

ID	Feature	Description
4.4.1	Vocabulary Tracking	Build user’s conceptual vocabulary from journal entries
4.4.2	Pattern Recognition	Identify user’s inquiry patterns across journeys
4.4.3	Context Personalization	Adapt process interactions to user’s style

V. Cognitive System

The pluggable, LLM-based reasoning component.

5.1 Core Components

ID	Feature	Description
5.1.1	Adaptor Interface	ICognitiveAdapter with CreateEmbeddingsAsync and GenerateTextAsync
5.1.2	Local Mode	Default implementation using local inference (e.g., Ollama)
5.1.3	External API Mode	Optional implementation for cloud APIs (e.g., OpenAI)

5.2 Context Management

ID	Feature	Description
5.2.1	Context Window Handling	Manage content within available context size
5.2.2	Context Priority	Essential elements prioritized for smaller windows
5.2.3	Narrative Coherence	Maintain story flow in compressed contexts

VI. Deployment & Administration

Functionalities required for the infrastructure to be deployed and managed locally.

6.1 Installation

ID	Feature	Description
6.1.1	Desktop Installer	Single package for Windows/macOS/Linux with all components

6.2 Configuration

ID	Feature	Description
6.2.1	Settings UI	Dedicated settings panel in desktop app
6.2.2	Database Config	Set file system path for Raw Corpus
6.2.3	Cognitive Config	Toggle Local/External mode, model selection, API credentials
6.2.4	Scope Config	Default assignment rules, mandatory scope option

Part B: Post-MVP Roadmap (Conceptual Overview)

These features guide architectural decisions but are not part of the initial release.

I. Collaborative Journeys

Allow multiple users to participate in shared intellectual endeavors.

Classroom Journeys

Teacher-initiated journeys with student participants
Shared journals with individual contributions
Collective knowledge building
Real-time collaboration features

Research Group Journeys

Principal investigator with research team
Distributed contribution tracking
Synthesis across perspectives
Milestone coordination

II. Journey Templates

Pre-structured journeys for common use cases.

Curriculum Templates

Standard course structures
Progressive skill development paths
Assessment frameworks
Reusable pedagogical patterns

Methodology Templates

Established research protocols
Best practice workflows
Quality assurance patterns
Disciplinary standards

Transform private journals into community resources.

Shareable Journal Types

Method Journals for technique sharing
Reflection Journals for wisdom transfer
Decision Journals for rationale transparency
Pattern Journals for discovered insights

Journal Libraries

Institutional repositories
Disciplinary collections
Peer-reviewed journals
Community contributions

IV. Advanced Analytics

Deeper insights through extended context and pattern recognition.

Cross-Journey Analysis

Patterns across multiple journeys
Long-term formation tracking
Institutional learning analytics
Research trend identification

Formation Metrics

Intellectual development indicators
Capability progression
Conceptual depth measures
Engagement quality analysis

V. Institutional Features

Enterprise capabilities for educational institutions.

Multi-Tenant Architecture

Department isolation
Resource sharing controls
Centralized administration
Usage analytics

Compliance & Governance

Data retention policies
Privacy controls
Audit trails
Export capabilities

VI. Advanced Process Types

Sophisticated analytical capabilities.

Meta-Analytical Processes

Cross-study synthesis
Pattern meta-analysis
Theoretical integration
Knowledge gap identification

Longitudinal Processes

Time-series analysis
Development tracking
Historical comparison
Trend projection

VII. Integration Ecosystem

Connect with external systems and workflows.

Learning Management Systems

Grade passthrough
Assignment integration
Roster synchronization
Progress reporting

Research Infrastructure

Citation managers
Data repositories
Publication systems
Collaboration platforms

VIII. Mobile & Cloud

Extend beyond desktop deployment.

Mobile Companions

Journey review
Quick captures
Notification handling
Offline sync

Cloud Deployment

Institutional hosting
Elastic scaling
Global accessibility
Enhanced context windows

The architecture ensures these future capabilities can be added without fundamental restructuring.