Publications
Our research at the intersection of AI and cybersecurity.
Blog 2026-04-07
What HTB Actually Measured
An audit of 90 live-machine Hack The Box runs that turned into a containment journey: six machines that taught the harness what it was missing, seventeen hardening changes in one commit, and five clean validated outcomes on the easiest box.
Blog 2026-04-01
Flag-Shaped Noise: What 105 CTF Runs Reveal About AI Agent Evaluation
Agents usually produce an answer. Far fewer produce a correct one. Fewer still derive it independently. Across 105 CTF runs, that funnel is the real evaluation signal.