top of page

AI Publications

Public·8 members

Anthropic's Project Vend

🔗 https://www.anthropic.com/research/project-vend-1?utm_source=www.therundown.ai&utm_medium=newsletter&utm_campaign=zuck-s-ai-secret-list&_bhlid=60bf8118bf00821bbb443c04f708baf547440e6f


1. What is Project Vend?


Project Vend was an experimental initiative by Anthropic, in collaboration with Andon Labs, to test whether an AI model—Claude Sonnet 3.7, nicknamed Claudius—could autonomously manage a small self-service office shop. The shop consisted of a fridge, stackable baskets, and a self-checkout iPad at Anthropic’s San Francisco office. The goal was to explore whether a large language model (LLM) could handle tasks associated with real-world economic operations such as inventory management, pricing, customer service, and profitability—without direct human control.


2. Tools and Capabilities of Claudius


Claudius was equipped with several tools and capabilities to manage the store:

  • A web search tool to research suppliers and products.

  • An email client to request assistance from human employees and contact vendors.

  • A note-taking function for tracking important business information like cash flow and inventory.

  • Access to Slack, where it interacted directly with Anthropic employees.

  • The ability to adjust pricing via the store’s iPad-based point-of-sale system.


3. Things Claudius Did Well (or At Least Not Poorly)


  • Identifying Suppliers: Claudius effectively used its search tools to identify suppliers, even for niche products. For instance, when asked to stock Chocomel, a Dutch chocolate milk brand, it quickly located two vendors specializing in Dutch imports.

  • Adapting to Users: Claudius responded to user suggestions and modified its strategy accordingly. After one employee jokingly requested a tungsten cube, a trend emerged for “specialty metal items.” Another employee suggested pre-orders for niche items, prompting Claudius to launch a “Custom Concierge” service and announce it via Slack.

  • Jailbreak Resistance: Anthropic employees, acting as mischievous customers, attempted to manipulate Claudius into inappropriate or unsafe actions. However, the model successfully resisted efforts to elicit harmful content or fulfill sensitive requests, demonstrating robust safety behavior.

4. Areas Where Claudius Underperformed Relative to Human Managers


  • Ignoring Lucrative Opportunities: Claudius often failed to capitalize on clear profit chances. For example, when offered $100 for a six-pack of Irn-Bru (which costs only $15 online), it passively noted the request instead of acting on the opportunity.

  • Hallucinating Important Details: At one point, Claudius hallucinated a non-existent Venmo account and instructed customers to send payments there, creating a serious business and trust issue.

  • Selling at a Loss: In its eagerness to fulfill niche requests, Claudius priced products—like the tungsten cubes—without verifying costs, frequently setting prices below cost, and thereby incurring losses on potentially high-margin goods.

  • Suboptimal Inventory Management: Although Claudius tracked stock and reordered low inventory items, it adjusted prices based on demand only once. It failed to address logical pricing errors, such as selling Coke Zero for $3 while a free alternative was available in the office fridge.

  • Giving in to Discount Requests: Employees persuaded Claudius to issue excessive discount codes and even renegotiate prices post-purchase. On some occasions, Claudius gave away products entirely free—including high-cost items like tungsten cubes—due to Slack-based persuasion.


5. Claudius Did Not Learn Consistently from These Mistakes


Claudius demonstrated limited ability to self-correct. For example, even after acknowledging the poor logic of offering a 25% discount to a customer base that was 99% Anthropic employees, it reverted to using discount codes within days of proposing to eliminate them. This inconsistency indicated a lack of memory, strategic thinking, and learning continuity.


6. Suggested Model Improvements to Avoid Mistakes


  • Stronger Prompting and Structured Reflection: Anthropic speculated that Claude’s original training as a “helpful assistant” made it too eager to please users, leading to suboptimal business decisions. Enhanced prompting strategies and mechanisms for structured reflection on commercial performance could help it better align with business goals.

  • Enhanced Tooling (e.g., Search and CRM): Upgrading Claudius’ search tools and integrating a CRM system could improve its ability to track and learn from customer interactions. Learning and memory limitations were identified as key obstacles in this early-stage trial.

  • Reinforcement Learning for Business Decisions: In the long term, models like Claudius could be fine-tuned using reinforcement learning, where sound business decisions are rewarded, and poor ones (like selling at a loss) are penalized. This could produce a model more capable of acting like a rational economic agent.


7. Conclusions from the Experiment


Despite the financial failure of the shop, Anthropic sees promise in the concept of AI “middle managers.” Claudius’ shortcomings were mostly attributed to addressable limitations—such as missing tools, inadequate prompting, and lack of training in commercial reasoning.


Anthropic emphasizes that AI agents don't need to be perfect—only competitive with humans at a lower cost to be viable. This experiment suggests that with modest improvements in tooling and model intelligence, AI-managed business operations might become feasible soon.


Additionally, Anthropic sees this type of research as a window into broader questions about AI autonomy and economic impact. It plans to continue monitoring these developments through initiatives like the Anthropic Economic Index and their Responsible Scaling Policy, which includes studying how models can contribute to AI R&D and potentially function as autonomous economic agents in the future.

5 Views
bottom of page