Real AI Agents and Real Work

🔗https://www.oneusefulthing.org/p/real-ai-agents-and-real-work?utm_source=substack&utm_medium=emai l

By Ethan Mollick

1. Introduction

Mollick opens by pointing out that AI has quietly crossed a threshold: it can now complete tasks that have real economic value. He cites a recent OpenAI test in which expert practitioners in law, finance, retail, etc., created tasks that would typically take 4–7 hours for a human to finish. In blind judging, AI came very close to matching expert work—falling short mainly in formatting, instruction following, or polish. The implication: AI is not far from being able to perform meaningful work, at least at the task level.

However, Mollick cautions that doing tasks is not the same as capturing a job. Jobs are a bundle of tasks, some of them deeply social, contextual, or requiring judgment over long time spans. Even if AI overtakes many task types, human roles might shift rather than disappear. The art is in identifying which segments of work are automatable and which remain inherently human.

2. A Very Valuable Task

To illustrate, Mollick describes an experiment he ran: giving Claude (a later AI model) a complex economics paper and its dataset, and asking it to replicate the paper’s findings. Without step-by-step guidance, Claude translated code, worked through statistics, and generated results that passed human spot checks. What might have taken hours of domain expertise could now be done by AI, opening a pathway for automating labor-intensive academic tasks like replication or verification.

He notes that replication work—often under‑resourced and tedious—has long been a weak point in many scientific disciplines. If AI can take it on reliably, it could transform how research is validated, checked, and even expanded. While it’s not perfect, it signals that domains we thought were too niche or complex for AI may now be within reach.

3. Agents at the Heart of It All

Mollick dives into what enables this step change: AI agents. Unlike simple prompt-based systems that require constant human steering, agents can plan, chain steps, and use tools (search, code, external APIs). Because modern large models are much more accurate and self-correcting, small improvements in error rates yield exponential gains in what agents can reliably do.

He also points to the metric of how many steps an AI can take autonomously with at least 50% success—tracking progress from GPT‑3 onward. Agents are no longer fringe. They can tackle multi-step pipelines with minimal oversight, serving as the backbone of real, productive AI systems.

4. How to Use AI to Do Economically Valuable Things

Here Mollick warns against naive automation: letting agents perform tasks mindless of purpose risks flooding workplaces with useless or redundant outputs (e.g. dozens of variant PowerPoints). Instead, he proposes a hybrid workflow: humans delegate tasks with instructions, review outputs, and correct or re-prompt if needed. If the AI fails, revert to doing it manually. This mix can boost speed (estimates: 40% faster, 60% cheaper) while maintaining human oversight.

He stresses that the future of work with AI depends not just on capability, but judgment: deciding which tasks to automate, when to supervise, and how to integrate agents in a way that amplifies human purpose rather than drowns it in low-value output

Comentarios (2)

Ya no es posible comentar esta entrada. Contacta al propietario del sitio para obtener más información.

Sara Ramos Magana

06 oct

Great insights. The distinction between tasks and jobs really resonates. AI can accelerate repetitive work, but judgment and context remain human. The replication example shows how AI could free up time for higher-value thinking, much like how Anaplan helps automate complex modeling so teams can focus on strategy. The hybrid workflow feels like the right path forward.

Editado

JA Soler

09 oct

Contestando a

Thanks Sara! Totally agree — if AI handles the grunt work, we better show up with the good judgment. Otherwise, we’ll be the bottleneck in our own workflows. The Anaplan comparison is spot on: automate the complex to elevate the strategic.

And yes, hybrid is the sweet spot… at least until the agents start asking us for performance reviews 😅

Real AI Agents and Real Work

CuriousAI.net

Home AI Glossary AI Publications AI Forum

FOLLOW US

Copyright @2025 CuriousAI.net | All rights reserved | Online Privacy