The intern who never sleeps
In the spring of 2025, a logistics manager handed an AI agent her calendar, her inbox, and a company card and told it to plan the quarterly offsite. It booked the venue, talked the caterer down, rebooked three flights when a storm closed Denver, and filed the expense report. Her cost was forty minutes of review. A six-year-old who cannot spell function
shipped a working video game the same month by telling an agent what she wanted. The capability is not arriving. It is here, and most of the time it is boring in the way useful things are boring.
Now the other scene. A support agent at a different company invented a refund policy that did not exist and promised it, in writing, to a customer. When the bill came due, the company argued the bot was a thing apart, responsible for its own words. A Canadian tribunal disagreed: in Moffatt v. Air Canada, it ruled that you own what your machine says. The agent had been confident, fluent, and wrong — the three traits it is best at combining.
The same tool, four verdicts
That gap, between the offsite that ran itself and the policy that never existed, is where the argument lives. The agent builders see a tool that already works, slowed mostly by people too nervous to hand over the keys. The reliability engineers see what happens when you chain a step that works 95 percent of the time ten times in a row and get a coin flip; Gartner spent 2025 tallying enterprise agent pilots that died when the demo met the edge cases.
Two other camps care less about what the agent can do than about what happens when it errs. The accountability hawks want one question settled before anything ships: when it acts wrong, who pays? The human-in-the-loop pragmatists, who run most of the agents actually deployed, have answered it in practice. The agent drafts, a person signs, and the signature is the whole point.
Underneath the noise everyone concedes the same two facts: the agents are genuinely capable, and genuinely capable of being confidently wrong. The fight is over what follows. Is the reliability gap a temporary engineering problem the next model closes, or a standing property of systems that predict rather than know? Is the accountability vacuum a reason to wait, or a reason to write better contracts and move?
The tiebreaker is not a study. It is a docket. Liability law does not wait for the field to agree, and the first rulings — who paid when the agent erred — are being written now, case by case, by judges who never opened the model card.
An AI agent planned a company offsite in forty minutes and filed the expenses. Another invented a refund policy that never existed, and the company got the bill. Same tool. Whether you can trust it to act for you is about who is standing there when it is confidently wrong.
Perspectives:
- Agent builders
- Reliability engineers
- Accountability hawks
- Human-in-the-loop pragmatists