You can't stare at a prompt and know what's going to happen. AI systems are inherently non-deterministic, and therefore you must measure their behavior to know how they perform. This process is called "evaluation", and you can do it both in production ("online") and in dev and CI ("offline").
...
The process is called evaluation. online monitoring diff - pre-deploy evals vs. online monitoring