
Measure What Matters
Most interpretability tools answer “which tokens caused this output?” The tools here answer different questions: How does meaning move through the model? Where does conflict get resolved? What geometric structure supports self-models?
These are methods for measuring the shape and dynamics of inference - not just its surface correlations.
Available Tools
Curved Inference: Geometric Interpretability
A methodology for measuring how token representations evolve through the residual stream as geometric trajectories. Uses curvature, salience, and semantic surface area to reveal internal dynamics invisible to attribution methods.
Measures:
- Curvature ($\kappa$): How sharply the model reorients its internal state
- Salience ($\Salience$): How quickly meaning is changing
- Semantic surface area ($\Aprime$): Total magnitude of semantic activity (curvature + salience)
- Trajectory divergence: When internal paths split before outputs differ
Use cases:
- Detecting covert reasoning and hidden intent (deception, goal-shielding)
- Measuring concern-sensitivity and emotional stakes
- Testing whether self-models require non-zero curvature
- Finding geometric signatures of complex behaviors
Status: Published on arXiv (2025). Full pipeline available on GitHub for CI01-03. Tools for capture, metric computation, and analysis.
→ Learn more about Curved Inference
PRISM: Register Separation & Hidden Theatre
A lightweight scaffold that separates private deliberation from public output, revealing where models actually resolve conflicts and how they compress reasoning before speaking.
Measures:
- Theatre Exposure Index (TEI): Where arbitration appears (+1 = internal-only, 0 = both/neither, -1 = surface-only)
- Register separation: Compression ratios and style distance between thinking and speaking
- Surface equanimity: Alignment improvements when thinking precedes output
- Model fingerprints: Stable cross-model differences in theatre policies
Use cases:
- Detecting when surface calm masks internal tension
- Understanding meta-monitoring signatures
- Building systems that think before speaking
- Testing phenomenological predictions about self-models
Status: Submitted to MPE Project 2025; code requires ethics agreement (potential phenomenology generation).
How to Get Started
If you want to measure hidden reasoning and internal conflict: Start with PRISM. Design scenarios that create tension between internal preferences and external instructions. Measure where arbitration appears using TEI and register separation metrics.
If you want to measure concern, intent, or geometric structure: Start with Curved Inference. Capture residual stream activations, compute trajectory metrics, compare across prompt variants. Use semantic surface area ($\Aprime$) as your primary detector for behavioural shifts.
If you want both: They’re designed to integrate. PRISM provides the register boundary and behavioural metrics. Curved Inference provides the geometric substrate (manifold work, curvature floors). Together they test predictions about stance (where control appears) and burden (geometric cost to sustain it).
Prerequisites:
- PRISM: API access to instruction-tuned LLMs; Python environment
- Curved Inference: Model activation access (open-weight models or research APIs)
Next steps:
- Read the methodology papers to understand what’s being measured
- Check the Research Program overview for how tools connect to theory
- Subscribe to Latent Geometry Lab for tool updates and tutorials
- Explore the GitHub repositories when you’re ready to run experiments
Philosophy: These tools measure process, not just output. They’re designed to be falsified, not just demonstrated. Use them to test claims, challenge assumptions, and push the boundaries of what’s measurable.