1. What is Project Vend?
Project Vend was an experimental initiative by Anthropic, in collaboration with Andon Labs, to test whether an AI modelâClaude Sonnet 3.7, nicknamed Claudiusâcould autonomously manage a small self-service office shop. The shop consisted of a fridge, stackable baskets, and a self-checkout iPad at Anthropicâs San Francisco office. The goal was to explore whether a large language model (LLM) could handle tasks associated with real-world economic operations such as inventory management, pricing, customer service, and profitabilityâwithout direct human control.
2. Tools and Capabilities of Claudius
Claudius was equipped with several tools and capabilities to manage the store:
A web search tool to research suppliers and products.
An email client to request assistance from human employees and contact vendors.
A note-taking function for tracking important business information like cash flow and inventory.
Access to Slack, where it interacted directly with Anthropic employees.
The ability to adjust pricing via the storeâs iPad-based point-of-sale system.
3. Things Claudius Did Well (or At Least Not Poorly)
Identifying Suppliers: Claudius effectively used its search tools to identify suppliers, even for niche products. For instance, when asked to stock Chocomel, a Dutch chocolate milk brand, it quickly located two vendors specializing in Dutch imports.
Adapting to Users: Claudius responded to user suggestions and modified its strategy accordingly. After one employee jokingly requested a tungsten cube, a trend emerged for âspecialty metal items.â Another employee suggested pre-orders for niche items, prompting Claudius to launch a âCustom Conciergeâ service and announce it via Slack.
Jailbreak Resistance: Anthropic employees, acting as mischievous customers, attempted to manipulate Claudius into inappropriate or unsafe actions. However, the model successfully resisted efforts to elicit harmful content or fulfill sensitive requests, demonstrating robust safety behavior.
4. Areas Where Claudius Underperformed Relative to Human Managers
Ignoring Lucrative Opportunities: Claudius often failed to capitalize on clear profit chances. For example, when offered $100 for a six-pack of Irn-Bru (which costs only $15 online), it passively noted the request instead of acting on the opportunity.
Hallucinating Important Details: At one point, Claudius hallucinated a non-existent Venmo account and instructed customers to send payments there, creating a serious business and trust issue.
Selling at a Loss: In its eagerness to fulfill niche requests, Claudius priced productsâlike the tungsten cubesâwithout verifying costs, frequently setting prices below cost, and thereby incurring losses on potentially high-margin goods.
Suboptimal Inventory Management: Although Claudius tracked stock and reordered low inventory items, it adjusted prices based on demand only once. It failed to address logical pricing errors, such as selling Coke Zero for $3 while a free alternative was available in the office fridge.
Giving in to Discount Requests: Employees persuaded Claudius to issue excessive discount codes and even renegotiate prices post-purchase. On some occasions, Claudius gave away products entirely freeâincluding high-cost items like tungsten cubesâdue to Slack-based persuasion.
5. Claudius Did Not Learn Consistently from These Mistakes
Claudius demonstrated limited ability to self-correct. For example, even after acknowledging the poor logic of offering a 25% discount to a customer base that was 99% Anthropic employees, it reverted to using discount codes within days of proposing to eliminate them. This inconsistency indicated a lack of memory, strategic thinking, and learning continuity.
6. Suggested Model Improvements to Avoid Mistakes
Stronger Prompting and Structured Reflection: Anthropic speculated that Claudeâs original training as a âhelpful assistantâ made it too eager to please users, leading to suboptimal business decisions. Enhanced prompting strategies and mechanisms for structured reflection on commercial performance could help it better align with business goals.
Enhanced Tooling (e.g., Search and CRM): Upgrading Claudiusâ search tools and integrating a CRM system could improve its ability to track and learn from customer interactions. Learning and memory limitations were identified as key obstacles in this early-stage trial.
Reinforcement Learning for Business Decisions: In the long term, models like Claudius could be fine-tuned using reinforcement learning, where sound business decisions are rewarded, and poor ones (like selling at a loss) are penalized. This could produce a model more capable of acting like a rational economic agent.
7. Conclusions from the Experiment
Despite the financial failure of the shop, Anthropic sees promise in the concept of AI âmiddle managers.â Claudiusâ shortcomings were mostly attributed to addressable limitationsâsuch as missing tools, inadequate prompting, and lack of training in commercial reasoning.
Anthropic emphasizes that AI agents don't need to be perfectâonly competitive with humans at a lower cost to be viable. This experiment suggests that with modest improvements in tooling and model intelligence, AI-managed business operations might become feasible soon.
Additionally, Anthropic sees this type of research as a window into broader questions about AI autonomy and economic impact. It plans to continue monitoring these developments through initiatives like the Anthropic Economic Index and their Responsible Scaling Policy, which includes studying how models can contribute to AI R&D and potentially function as autonomous economic agents in the future.