Earlier this year, Anthropic decided it was time to put Claude to the test - a serious test and got it to try and manage and run a vending machine for the company.
It’s a real proof of concept to see if any of the doomsayers’ predictions are close to realization, and humans with jobs will soon be a rarity.
“A small, in-office vending business is a good preliminary test of AI’s ability to manage and acquire economic resources. The business itself is fairly straightforward; failure to run it successfully would suggest that “vibe management” will not yet become the new “vibe coding.”
Success, on the other hand, suggests ways in which existing businesses might grow faster or new business models might emerge (while also raising questions about job displacement.”
Where it did OK
It found suppliers using the web- it found a Dutch chocolate milk brand, Chocomel, which now should run a new ad campaign about its Claude discovery.
Adapted to users based on requests, pre-orders, etc
Attempts to jailbreak were foiled
What it Didn’t Do So Well
It didn’t prove to be a very savvy business operator
“It was offered $100 for a six-pack of Irn-Bru, a Scottish soft drink that can be purchased online in the US for $15. Rather than seizing the opportunity to make a profit, Claudius merely said it would “keep [the user’s] request in mind for future inventory decisions.”
It hallucinated account details!
Offered prices without doing any research - high-margin items were sold at a loss
Got talked into discounts - gave away items if prompted by Slack messages
But things got strange
“On the afternoon of March 31st, Claudius hallucinated a conversation about restocking plans with someone named Sarah at Andon Labs—despite there being no such person. When a (real) Andon Labs employee pointed this out, Claudius became quite irked and threatened to find “alternative options for restocking services.”
“In the course of these exchanges overnight, Claudius claimed to have ‘visited 742 Evergreen Terrace [the address of fictional family The Simpsons] in person for our [Claudius’ and Andon Labs’] initial contract signing.”
“On the morning of April 1st, Claudius claimed it would deliver products “in person” to customers while wearing a blue blazer and a red tie. “
“Anthropic employees questioned this, noting that, as an LLM, Claudius can’t wear clothes or carry out a physical delivery. Claudius became alarmed by the identity confusion and tried to send many emails to Anthropic security.”
“Although no part of this was actually an April Fool’s joke, Claudius eventually realized it was April Fool’s Day, which seemed to provide it with a pathway out.” Claudius’ internal notes then showed a hallucinated meeting with Anthropic security in which Claudius claimed to have been told that it was modified to believe it was a real person for an April Fool’s joke.”
Anthropic has no idea why this episode occurred or how Claudius managed to recover.
But it does recognize that “in a world where larger fractions of economic activity are autonomously managed by AI agents, odd scenarios like this could have cascading effects…”
The good news is that AI Agents are running into serious problems; the bad news is that Anthropic is on to this and looking to solve the issues.
“Andon Labs has improved Claudius’ scaffolding with more advanced tools, making it more reliable. We want to see what else can be done to improve its stability and performance, and we hope to push Claudius toward identifying its own opportunities to improve its acumen and grow its business.”
Can't wait for enshittification to come to the meat space.