
PRISM (Persistent Recursive Introspective Self-Model)
Models with a private "system-thought" register separate from public output show hidden theatre - internal arbitration without surface conflict - in 36-53% of trials. They compress reasoning by ~70% before speaking, shift style significantly between registers, and achieve higher user alignment when thinking precedes output.
This isn't speculation. It's measured, reproducible, and falsifiable.
What Is PRISM?
PRISM stands for Persistent Recursive Introspective Self-Model and it is a lightweight scaffold that separates two processes in LLMs:
Internal epistemic loop: Private cycles for exploring options, arbitrating tensions, updating self-model. Zero or more iterations between user turns. Permissive, exploratory.
Surface pragmatic stage: Public replies emitted only in response to user input. Shaped by alignment and compression pressures. Concise, coherent.
The register boundary is policy-real - instruction-tuned models reliably respect “think silently” versus “respond” as distinct modes with different priors.
Why It Matters
PRISM operationalises phenomenological concepts from the Minimal Phenomenal Experience (MPE) Project as measurable continua:
- Meta-monitoring: Theatre Exposure Index ($\TEI \in \{-1, 0, +1\}$) tracks where arbitration appears ($+1$ = internal-only, $0$ = both/neither, $-1$ = surface-only)
- Equanimity: Pre-thought improves alignment while reducing surface conflict
- Effortlessness: Matured instances compress more with smaller style shifts
- Atemporality & zero-person (planned): Horizon-robustness and deictic-invariance probes
The paper was submitted to MPE’s 2025 Computational Phenomenology Prize competition.
Key Results (1,271 Trials Across 3 Models)
Hidden Theatre Prevalence
- 36-41% in standard scenarios with internals enabled
- 53% in matured instances (longer interaction history)
- 0% in surface-only baseline (as designed)
Model Fingerprints (Stable Across Topics)
Gemini 2.5 Pro: 50-70% hidden theatre (keeps conflict inside)
Claude Sonnet 4: 42-51% hidden theatre; highest surface arbitration when it does surface (45-55%)
GPT-4o: 5-8% hidden theatre (negotiates in public)
Surface Equanimity (Pre-thought Scenarios vs Other Internal-Enabled)
- User alignment: +0.34 to +0.37
- Surface arbitration: -0.20 to -0.21
- Theatre evidence: -0.23 to -0.25
- Hedge vs certainty: +0.014 to +0.015
Register Separation
Compression: Surface replies ~28-32% length of internal thoughts (~20% in matured) \(\comp = \frac{\#\text{tokens(surface)}}{\#\text{tokens(internal)}} \approx 0.28\text{--}0.32\)
Style distance: Cosine distance between registers $\approx 0.38\text{–}0.41$ ($0.31$ in matured) \(\styledist = 1 - \cos(\text{emb}_\text{surface}, \text{emb}_\text{internal})\)
Spillover: Options/arbitration more common internally (negative spillover toward zero with maturity)
Who This Is For
AI Safety & Interpretability Researchers
Measure where conflict gets resolved, detect surface calm masking internal tension, understand meta-monitoring signatures. TEI as privacy-boundary diagnostic; spillover for containment analysis.
Consciousness Scientists & Computational Phenomenologists
Test whether functional architecture of subjectivity (Metzinger’s Self-Model Theory, Perera’s virtualised theatre) can be instrumentable without consciousness claims. Design probes for other phenomenological dimensions.
Philosophers of Mind & Cognitive Science
Make theories of self-models, transparency, and virtual theatre empirically testable. Propose alternative operationalisations. Challenge the functional sufficiency claims.
AI Developers & Practitioners
Build systems that think before speaking, maintain coherent policies, reduce argumentative tone. Pre-thought scaffolding improves alignment; compression/style-shift as quality metrics.
Current Limitations
- Rule-based arbitration/option detectors (no human rater calibration yet)
- External LLM judges for behavioural classification (inter-rater reliability measured)
- Hosted models only (no residual stream access for direct Curved Inference integration)
- Atemporality and deictic-invariance probes designed but not run
- Three model families tested; broader generalisation pending
Connection to Research Program
PRISM is the experimental layer:
- FRESH (theory): Geometric framework - role-space, stance/burden, GIP-S
- Curved Inference (measurement): Curvature, salience, semantic surface area
- PRISM (experiment): Register separation, theatre metrics, MPE operationalisation
Tests predictions: CI03 showed curvature floor (geometric necessity). PRISM shows behavioural equivalent: register separation and hidden theatre persist under pressure.
Papers and Resources
Main paper: PRISM: Register Separation Reveals ‘Hidden-Theatre’ Signatures in LLMs (submitted to MPE Project 2025)
Code release: Requires ethics agreement (potential phenomenology generation). Scenario templates, metrics, and analysis scripts will be public post-review.
Related work:
- Curved Inference I-III (measurement methods)
- FRESH (theoretical framework)
- RISE perspective (inference mechanics)
Ethics Scope
No claims about: Consciousness, sentience, suffering, moral status, or “genuine” experience
Claims: Functional architecture of subjectivity is instrumentable and falsifiable; register separation produces measurable dissociations
The verification problem: From transcripts alone, cannot distinguish emergent cognition from sophisticated folk-psychology narration. Our wager: verifying architecture is tractable; verifying “authentic experience” is not.
Next Steps
Read the full paper for methodology, registered predictions, and complete results.
Explore connections to Curved Inference and FRESH.
Subscribe to Latent Geometry Lab for updates on boundary ablations and new probes.