While some domains are seemingly more verifiable than others, if you look at the work in the domain and do task decomposition on the actual work, you'll see that some of those tasks are easily verifiable and other are hard to verify. The easy to verify stuff can be automated, and the hard to verify stuff can rely on people. You still get a speedup from automation, particularly if the person can do bulk review because the agent does speculative execution then presents the person with a decision log and the user can steer, in bulk, later rather than getting pinged for every little decision, which would cause the agent to get stuck waiting on user input)