More closely tied to experimentation than traditional tests. Or at least more like TDD than a regression test suite. For example, they can be directly tweakable by evals persona -- domain experts in integrated prompting environment, and evals can be co-evolved with prompts. That need for tweaking makes the need for manually triggered in the app context be important.