The Context Window Problem: Scaling Agents Beyond Token Limits
đ https://www.factory.ai/context-window-problem?utm_source=substack&utm_medium=email
by Varin Nair
1. Critical Context for Effective Agents
Large language models (LLMs) are limited to around 1 million tokens in their context window. In contrast, enterprise monorepos and the surrounding knowledge ecosystem span millions of tokens across code, documentation, logs, and conversationsâa massive gulf that hampers agentic workflows at scale.
Effective agents require various layers of contextânot just code, but also task descriptions, developer persona, system architecture, historical decisions, and team conventions. Without these, LLMs produce misaligned outputs, misunderstand requirements, or violate team norms.
2. Why Existing Approaches Fail
Naive vector retrieval: Chunking files into vectors and retrieving similar ones flattens codeâs rich structure, breaks dependency chains, and interrupts multi-hop reasoningâleading agents astray with irrelevant fragments.
Scaling context windows: Even frontier models with 1â2M token windows fall short of encompassing full codebases. Big context doesnât guarantee better outcomesâattention tends to fade mid-prompt (a phenomenon known as âcontext rotâ), and costs skyrocket.
3. Factoryâs Context Stack
Factory addresses these constraints with a multi-layered context scaffolding that distills âall the company knowsâ into âexactly what the AI needs nowâ:
Repository Overviews: Summarizes structure, build setup, key filesâdelivered upfront to give the LLM a lightweight architectural map.
Semantic Search: Uses code-tuned embeddings to rank relevant files and folders, kicking off reasoning with a precise subset.
Targeted File System Commands: Fetches specific file sections, diffs, or outputs on demandâstaying within token budgets yet highly focused.
Enterprise Context Integrations: Incorporates observability data (like Sentry logs), design docs, architecture guides, and tribal knowledge from Notion or Google Docsâproviding depth beyond code.
Hierarchical Memory:
User Memory tracks individual preferences and past project work (e.g., OS, dev tools, style preferences).
Org Memory encodes team-wide norms like coding standards, onboarding templates, and documentation styles.
Together, these layers allow Factoryâs âDroidsâ (AI agents) to work with precise, relevant contextâno more noise, no more guesswork.
4. Key Metrics Improved
Adopting the Context Stack brings measurable impact:
Reduced onboarding and dev cycle timeâbecause context is automatically curated throughout sessions.
Higher code acceptance ratesâinputs align with team standards, minimizing review friction.
Improved user satisfaction and retentionâas memory personalization grows richer with each session, making the tool feel ever-smarter.
5. Future Directions
Looking ahead, advancements will include:
Larger LLM context windows and enhanced reasoning capabilities.
Smarter agents capable of long-horizon planning and multi-step task execution.Yet, challenges remainâagents still get distracted by irrelevant context, require the right tools to act, and need durable external memory for state continuity. Multi-agent orchestration and robust tooling will be essential. Factoryâs Context Stack serves as a blueprint for building reliable, scalable, and costâeffective agentic systems in real-world engineering environments.




