⇩ Markdown
considering dark software factories
→
considering dark software factories -- changelog
→
mechanisms for making agents more reliable
→
LLM as a judge
→
example of using LLM as a judge for message structure
→
eval models can be less powerful than application models
eval models can be less powerful than application models
evals
models can be less powerful than application models