⇩ Markdown

considering dark software factories→dark software factory -- key media→blog post - software factories and the agentic moment - 2026-02→validation naming is hard→why give evals a different name than just tests?→dim - comparison exactness

dim - comparison exactness

link not tracked

link not tracked

Backlinks

why give evals a different name than just tests?

Unlike most traditional tests, you need to decide the level of exactness that you care about. Often there is more than one acceptable answer. Or worse, the answers are not eval type -- verifiable, so you need to get LLM as a judge involved.
...
dim - comparison exactness

see in context