⇩ Markdown

performance benchmark -- Towers of Hanoi

Nice because agent can describe deterministic state changes and the final state is dim - verifiability -- high (and fast). Also, very easy to scale. Just add disks.