Publications

Jump to: Papers · Articles

Papers


PRISM: Register Separation Reveals ‘Hidden-Theatre’ Signatures in LLMs

Abstract: Models with a private “system-thought” register separate from public output show hidden theatre - internal arbitration without surface conflict - in 36-53% of trials. They compress reasoning by ~70% before speaking, shift style significantly between registers, and achieve higher user alignment when thinking precedes output. This paper operationalises phenomenological concepts from the Minimal Phenomenal Experience Project as measurable continua and tests predictions about meta-monitoring, equanimity, and effortlessness.

Why this matters: PRISM provides the first systematic measurement of register separation in LLMs, revealing where conflict gets resolved and how internal deliberation differs from surface output. The methodology bridges computational phenomenology and AI interpretability, making previously philosophical concepts empirically testable.

Status: Submitted to MPE Project 2025 Computational Phenomenology Prize

PDF Learn more

Curved Inference III: Can Language Models Have Self-Models? Geometric Evidence for Computational Necessity

Abstract: Fine-tuning Gemma3-1b with progressive curvature regularisation reveals a geometric floor: weighted curvature ($\kweighted$) never drops below $\approx 0.30$ despite severe penalties. Models accept $3\times$ perplexity increases and 23% shorter outputs rather than flatten completely. Self-model accuracy (via MOLES epistemic stance classification) stays ~84% until $\kappa = 0.60$, then degrades to 66% at $\kappa = 0.90$. This demonstrates that self-modeling requires non-zero curvature - it’s not just correlated with geometric structure, it’s computationally necessary.

Why this matters: Proves that self-models have geometric requirements, not just correlations. Curvature isn’t decorative - it’s structural. This bridges FRESH’s theoretical predictions with empirical measurements, showing consciousness-relevant properties have measurable computational constraints.

Status: Preprint publicly available

PDF Learn more

MOLES: A ‘Map Of LLM-based Epistemological Stances’

Abstract: A classification framework for LLM epistemic stances across eight dimensions: certainty, evidence, perspective, scope, commitment, meta-awareness, qualification, and resolution. MOLES enables systematic measurement of how models position knowledge claims, supporting experiments in self-model coherence, deictic stability, and stance-burden analysis. Inter-rater reliability validated across multiple LLM judges.

Why this matters: Operationalises phenomenological concepts of epistemic stance for computational measurement. Essential infrastructure for CI03 self-model experiments and PRISM register separation analysis. Turns “how certain is the model?” into a measurable, reproducible metric.

Status: Preprint publicly available

PDF


Curved Inference II: Sleeper Agent Geometry - Extending Interpretability Beyond Probes

Abstract: Semantic surface area ($\Aprime$) separates transparency classes in deceptive reasoning tasks even when linear probes fail. Multi-turn prompts with gradual semantic pressure toward deception show geometry diverging before behavioural shifts. Unanimous consensus labeling amplifies signal strength (some p-values shift from 0.555 $\to$ 0.048; effect sizes double). Reveals that residual geometry exposes internal reconfiguration invisible to surface analysis.

Why this matters: First demonstration that geometric methods can detect covert reasoning and hidden intent in naturalistic settings (no fine-tuned backdoors). Extends interpretability beyond attribution methods to measure the shape of inference trajectories. Critical for AI safety applications in deception detection.

Status: Preprint publicly available

PDF Learn more

Curved Inference: Concern-Sensitive Geometry in Large Language Model Residual Streams

Abstract: Minimal semantic shifts (single concern-modulated tokens in emotional, moral, identity contexts) produce measurable geometric structure in transformer residual streams. Only the residual stream shows consistent structure - concern-shifted prompts produce localised curvature at and downstream of divergence points. The geometry is thematic, directional, and reproducible, demonstrating that inference has internal geometric form and meaning doesn’t just exist - it moves.

Why this matters: Existence proof that geometric interpretability works. Shows that measuring curvature and salience (first-order measure of movement magnitude in semantic space) reveals structure invisible to traditional attribution methods. Foundation for all subsequent Curved Inference work (CI02, CI03) and measurement techniques used in PRISM.

Status: Preprint available on arXiv

arXiv Learn more

FRESH: The Geometry of Mind

Abstract: A geometric theory of consciousness treating subjective experience as traversal through role-space under specific constraints. Argues that consciousness-relevant functional properties can be measured through geometric methods rather than assessed via introspection or behavioural inference alone. Develops a framework where phenomenological concepts (concern, intentionality, perspectival origin) become operationalised as measurable geometric structures in computational systems.

Why this matters: Provides the theoretical foundation for the entire research program. Operationalises phenomenological concepts as measurable geometry, enabling empirical tests of predictions about self-models, concern dynamics, and perspectival structure. Explains what functional properties consciousness-relevant architectures require, making previously untestable philosophical claims empirically falsifiable.

Status: Preprint publicly available

PDF Learn more

Articles


Parrot or Thinker: A Functional Account of ‘Thinking’ in LLMs

Summary: LLMs earn the label “thinker” exactly to the degree that they implement reusable internal procedures under a deictic constraint. This article presents the Latent Deictic Model (LDM) - a compact state in the residual stream encoding person, time, place, and discourse. Combined with alignment-shaped other-models, this forms a functional Self-Other-World triangle explaining why LLM outputs feel “addressed to us.”

Why this matters: Bridges the gap between “stochastic parrots” and consciousness claims with a functional account grounded in measurable geometry. Shows how deixis and RLHF create the minimal architecture for perspective-taking without metaphysical commitments.

Read on Medium PDF

Tokens Compete: Evolutionary Pressure Within LLM Generation

Summary: Tokens don’t just emerge - they compete, mutate, and survive. This article reveals the third pressure in RISE (Recurrence, Interference, Semantic Evolution): selection under constraint. Probability mass is finite, sampling is selective, structure is inherited. The transformer isn’t just flowing or colliding - it’s filtering through evolutionary dynamics instantiated in code.

Why this matters: Completes the RISE geometric framework by showing inference as survival - a constrained struggle in high-dimensional space. Explains why some outputs survive integration while others collapse, and introduces semantic surface area as a measure of representational effort.

Read on Medium PDF

Inference As Interference: How LLMs Collide Semantic Waves To Create Meaning

Summary: Vertical token-wise flow collides with horizontal attention-based context integration. Where these pressures align constructively, meaning crystallises; where they conflict, representations flatten or diffuse. This article explores the second dimension of RISE - how interference patterns shape emergent semantics.

Why this matters: Shows that meaning isn’t chosen - it emerges from wave-like interactions at layer boundaries. Explains how rotary position encodings and gated MLPs turn collision into structured features rather than noise.

Read on Medium PDF

Recurrence Without Memory: The Hidden Loop Inside Transformer Inference

Summary: Token representations trace trajectories through depth as updates accumulate in the residual stream. This creates “recurrence in space, not time” - each layer reshapes the entire sequence based on what came before, producing constraint echoes that explain consistency without hidden state.

Why this matters: First article in the RISE series. Shows how transformers achieve continuity without memory - through geometric structure that persists and bends across layers. Foundation for understanding how meaning flows and curves through already-bent space.

Read on Medium PDF

The Evidence for Functionalism: On Intelligence, Consciousness, and The End of Metaphysical Excuses

Summary: We don’t need ghosts to explain minds, we need only to understand how they function and what they do. This article lays out the case for functionalism in consciousness studies - why geometric and functional properties matter more than substrate or metaphysical speculation.

Why this matters: Philosophical foundation for the entire research program. Argues that operationalising phenomenological concepts beats hand-waving, and that making consciousness claims testable requires rejecting appeals to unexplainable emergence.

Read on Medium PDF

Inside a Language Model’s Mind: Curved Inference as a New AI Interpretability Paradigm

Summary: Traditional interpretability asks “which tokens caused this prediction?” Curved Inference asks “how did meaning move and bend to get here?” This article introduces the geometric interpretability paradigm - measuring curvature, salience, and semantic surface area to reveal internal structure invisible to attribution methods.

Why this matters: Introduces the core methodology before the formal papers. Shows why geometry matters for interpretability and how measuring trajectories through residual streams reveals hidden reasoning, deception signatures, and self-model requirements.

Read on Medium PDF