⇩ Markdown

mechanisms for making agents more reliable

artifact reviews (and also can do them more than once until satisfied)

backstops (deterministic, and rule-based) like linting, regression test suite

guardrail (during agent execution... what tools can be called, what context it can see)

judges (subjective evaluations)