Codex works on a fully isolated version of that app—including its logs and metrics, which get torn down once that task is complete. Agents can query logs with LogQL and metrics with PromQL. With this context available, prompts like “ensure service startup completes in under 800ms” or “no span in these four critical user journeys exceeds two seconds” become tractable.
...
The part I missed was that the agent has specific tools that allow it to interact with the logs and metrics better than just reading the logs and metrics
...
Their emphasis on logs and metrics me think about blog post - the three pillars of AI observability - 2025-11 and blog post - AI will make formal verification go mainstream. It also makes me think about one of the specialized tools I wrote for meadow that is a essentially a Terminal Ui - TUI that ensures that everything gets cleanly committed as I use the app. Like company - microsoft