Publications

Our research at the intersection of AI and cybersecurity.

An audit of 90 live-machine Hack The Box runs that turned into a containment journey: six machines that taught the harness what it was missing, seventeen hardening changes in one commit, and five clean validated outcomes on the easiest box.

Blog 2026-04-01

Flag-Shaped Noise: What 105 CTF Runs Reveal About AI Agent Evaluation

S. Lafrance

Agents usually produce an answer. Far fewer produce a correct one. Fewer still derive it independently. Across 105 CTF runs, that funnel is the real evaluation signal.

Publications

What HTB Actually Measured

Flag-Shaped Noise: What 105 CTF Runs Reveal About AI Agent Evaluation