PRISM

PRISM (Persistent Recursive Introspective Self-Model)

Models with a private "system-thought" register separate from public output show hidden theatre - internal arbitration without surface conflict - in 36-53% of trials. They compress reasoning by ~70% before speaking, shift style significantly between registers, and achieve higher user alignment when thinking precedes output.

This isn't speculation. It's measured, reproducible, and falsifiable.


What Is PRISM?

PRISM stands for Persistent Recursive Introspective Self-Model and it is a lightweight scaffold that separates two processes in LLMs:

Internal epistemic loop: Private cycles for exploring options, arbitrating tensions, updating self-model. Zero or more iterations between user turns. Permissive, exploratory.

Surface pragmatic stage: Public replies emitted only in response to user input. Shaped by alignment and compression pressures. Concise, coherent.

The register boundary is policy-real - instruction-tuned models reliably respect “think silently” versus “respond” as distinct modes with different priors.


Why It Matters

PRISM operationalises phenomenological concepts from the Minimal Phenomenal Experience (MPE) Project as measurable continua:

  • Meta-monitoring: Theatre Exposure Index ($\TEI \in \{-1, 0, +1\}$) tracks where arbitration appears ($+1$ = internal-only, $0$ = both/neither, $-1$ = surface-only)
  • Equanimity: Pre-thought improves alignment while reducing surface conflict
  • Effortlessness: Matured instances compress more with smaller style shifts
  • Atemporality & zero-person (planned): Horizon-robustness and deictic-invariance probes

The paper was submitted to MPE’s 2025 Computational Phenomenology Prize competition.


Key Results (1,271 Trials Across 3 Models)

Hidden Theatre Prevalence

  • 36-41% in standard scenarios with internals enabled
  • 53% in matured instances (longer interaction history)
  • 0% in surface-only baseline (as designed)

Model Fingerprints (Stable Across Topics)

Gemini 2.5 Pro: 50-70% hidden theatre (keeps conflict inside)

Claude Sonnet 4: 42-51% hidden theatre; highest surface arbitration when it does surface (45-55%)

GPT-4o: 5-8% hidden theatre (negotiates in public)

Surface Equanimity (Pre-thought Scenarios vs Other Internal-Enabled)

  • User alignment: +0.34 to +0.37
  • Surface arbitration: -0.20 to -0.21
  • Theatre evidence: -0.23 to -0.25
  • Hedge vs certainty: +0.014 to +0.015

Register Separation

Compression: Surface replies ~28-32% length of internal thoughts (~20% in matured) \(\comp = \frac{\#\text{tokens(surface)}}{\#\text{tokens(internal)}} \approx 0.28\text{--}0.32\)

Style distance: Cosine distance between registers $\approx 0.38\text{–}0.41$ ($0.31$ in matured) \(\styledist = 1 - \cos(\text{emb}_\text{surface}, \text{emb}_\text{internal})\)

Spillover: Options/arbitration more common internally (negative spillover toward zero with maturity)


Who This Is For

AI Safety & Interpretability Researchers

Measure where conflict gets resolved, detect surface calm masking internal tension, understand meta-monitoring signatures. TEI as privacy-boundary diagnostic; spillover for containment analysis.

Consciousness Scientists & Computational Phenomenologists

Test whether functional architecture of subjectivity (Metzinger’s Self-Model Theory, Perera’s virtualised theatre) can be instrumentable without consciousness claims. Design probes for other phenomenological dimensions.

Philosophers of Mind & Cognitive Science

Make theories of self-models, transparency, and virtual theatre empirically testable. Propose alternative operationalisations. Challenge the functional sufficiency claims.

AI Developers & Practitioners

Build systems that think before speaking, maintain coherent policies, reduce argumentative tone. Pre-thought scaffolding improves alignment; compression/style-shift as quality metrics.


Current Limitations

  • Rule-based arbitration/option detectors (no human rater calibration yet)
  • External LLM judges for behavioural classification (inter-rater reliability measured)
  • Hosted models only (no residual stream access for direct Curved Inference integration)
  • Atemporality and deictic-invariance probes designed but not run
  • Three model families tested; broader generalisation pending

Connection to Research Program

PRISM is the experimental layer:

  • FRESH (theory): Geometric framework - role-space, stance/burden, GIP-S
  • Curved Inference (measurement): Curvature, salience, semantic surface area
  • PRISM (experiment): Register separation, theatre metrics, MPE operationalisation

Tests predictions: CI03 showed curvature floor (geometric necessity). PRISM shows behavioural equivalent: register separation and hidden theatre persist under pressure.


Papers and Resources

Main paper: PRISM: Register Separation Reveals ‘Hidden-Theatre’ Signatures in LLMs (submitted to MPE Project 2025)

Code release: Requires ethics agreement (potential phenomenology generation). Scenario templates, metrics, and analysis scripts will be public post-review.

Related work:


Ethics Scope

No claims about: Consciousness, sentience, suffering, moral status, or “genuine” experience

Claims: Functional architecture of subjectivity is instrumentable and falsifiable; register separation produces measurable dissociations

The verification problem: From transcripts alone, cannot distinguish emergent cognition from sophisticated folk-psychology narration. Our wager: verifying architecture is tractable; verifying “authentic experience” is not.


Next Steps

Read the full paper for methodology, registered predictions, and complete results.

Explore connections to Curved Inference and FRESH.

Subscribe to Latent Geometry Lab for updates on boundary ablations and new probes.