Information Geometry in Practice

Standard data science treats models as black boxes and metrics as arbitrary. The pipeline—data, features, model, tune—discovers nothing about the underlying structure of the domain. Distance between data points carries no meaning; the practitioner is an engineer following a grid, not a scientist discovering a garden.

We present an alternative methodology grounded in information geometry: construct the metric first. Ask what distance means in this data. Use the Fisher information matrix to encode domain meaning onto the geometry itself. Translate expert knowledge into testable variable relationships, verify statistical dependence, and build models from verified structure rather than model selection.

Applied at Synthesia over 10 months, this approach produced 9 production customer intelligence modules—including a churn prediction model (AUC 0.971) with an estimated impact of €6M/year—from data previously considered inseparable by conventional methods. The system self-corrects: when relationships between variables weaken, their contribution to the output drops automatically.

We argue that this methodology generalizes because the mathematics is canonical, not chosen. By Čencov’s theorem, the Fisher metric is the unique Riemannian metric on statistical manifolds that respects sufficient statistics. The same information-geometric structure appears beneath thermodynamics (Jaynes, 1957), across the sciences, and in any domain where meaning must be extracted from data. Future work formalizes this practice into executable instructions for agentic AI—automating geometric data science across arbitrary domains.

Information Geometry in Practice

Abstract

Key Results

Structure

References