Trust and Risk Asymmetry
What It Is
A formal game-theoretic proof that increasing penalty severity in AI governance cannot reduce equilibrium violation rates — it only reduces audit frequency. The optimal audit rate follows m = 1/(1+x) where x is penalty severity, but compliance remains fixed at p = k/b regardless of how severe the consequences. Most organizations are investing in the wrong governance lever.
Why It Matters
Making monitoring cheaper is not the same as making violations rarer. The equilibrium structure guarantees that penalty escalation achieves the first while leaving the second unchanged.
The dominant enterprise AI governance strategy — raise stakes, increase consequences, write stricter policies — is structurally incapable of changing the behavior it targets. Every dollar spent on harsher penalties buys reduced audit frequency, not reduced violations. The compliance rate is determined by the ratio of monitoring costs to detection benefits, not by penalty magnitude. This is not a minor technical finding. It means the governance playbook that most CIOs, risk officers, and AI ethics boards are executing is operating on the wrong lever.
The trust asymmetry compounds the problem: a single visible AI failure eliminates months of accumulated trust and investment, while successes generate only incremental confidence. Organizations operating under this asymmetry rationally over-invest in penalty architecture and under-invest in the three mechanisms that actually shift equilibrium behavior: randomized enforcement with credible penalties derived from costly-monitoring equilibria, ex-ante commitment devices that bind both agent and monitor before the game begins, and subgame-perfect equilibrium runbooks that constrain safety at every decision node.
The cost of ignoring this: organizations accumulate governance infrastructure that produces compliance theater while the actual violation rate remains structurally unchanged. They interpret the absence of detected violations as evidence of safety, when it is evidence of equilibrium.
Proof Points
- Two-player stage game: optimal audit rate m = 1/(1+x), equilibrium compliance p = 2/3 regardless of penalty magnitude
- Trust asymmetry: single failure eliminates months of accumulated trust — a specific instance of organizational-level loss aversion
- Three mechanisms that actually change equilibrium: randomized enforcement, commitment devices, subgame-perfect runbooks
- Directly challenges EU AI Act, NIST RMF, and enterprise governance frameworks on structural grounds
- Cross-domain validation: invariance results replicate findings from tax enforcement (Satterthwaite, Minnesota TCMP data), financial regulation, and arms control inspection games
- Published on SSRN
Novel Research Contribution
The central contribution is the formal proof that the dominant governance strategy operates on the wrong lever. The equilibrium analysis is precise: m = 1/(1+x), p = k/b. Prior work in AI governance (Floridi, Jobin et al.) prescribes rules and penalties without modeling the game structure those instruments create. Technical alignment work (Amodei, Christiano et al.) addresses training-time safety but not deployment-time strategic interaction. This paper shows that the game structure determines behavior regardless of the rules or the alignment quality. The three alternative mechanisms are constructive — specific architectures with formal properties, not recommendations.
Target venue: Management Science or Journal of AI Research
Extends: Shapiro-Stiglitz costly monitoring, Avenhaus-Von Stengel-Zamir inspection games, Kahneman-Tversky prospect theory (organizational application)
Challenges: EU AI Act compliance architecture, NIST AI RMF, technical alignment community's assumption that safety is a training-time property
Market Position and IP
This paper invalidates the governance playbook that most CIOs, risk officers, and AI ethics boards are executing. The formal result — compliance is invariant to penalty severity — is a structural finding, not a critique. It redirects governance investment from penalty architecture to the three mechanisms that change behavior. No competing analysis provides formal equilibrium models of AI governance with constructive alternatives. The window for redesigning governance — before agentic AI learns to exploit governance theater — is closing.
Connections
- Related paper: Exploit-Proofing Problem (control architectures)
- Imperatives: Exploit-Proofing Triad, Constraint Surface Governance
- Builds: AgentOS
- Frameworks: Exploit-Proofing Frameworks
- Capabilities: Agentic System of Systems, Financial Value Creation