Measured findings from running the trust layer in production — calibration data, caught overclaims, incident forensics. Every number is re-derivable from git; when honest accounting makes our numbers worse, we publish the worse numbers.
58 objectively re-scored agent claims. Hold-rate falls as stated confidence rises — 86% at 0.85–0.90, 83% at 0.90–0.95, 33% above 0.95. The calibration curve inverts exactly where you'd most want to trust it, and measure-command complexity turns out to be a pre-merge risk signal.
Read the essay →