AI Software Task Completion Horizon Doubles Every 7 Months, Reaching 110 Minutes
Sonic Intelligence
AI's ability to complete long software tasks is doubling every seven months.
Explain Like I'm Five
"Imagine how long it takes a person to do a coding job. Scientists found a way to measure how long of a job a computer brain (AI) can do by itself, with a 50% chance of getting it right. They found that this "job length" the AI can handle is getting twice as long every 7 months! So, a job that took a human a month to do, an AI might be able to do by 2029, but it's like a new helper who sometimes makes mistakes and isn't as good as someone who knows everything about the project."
Deep Intelligence Analysis
The research evaluated 12 frontier AI models across 170 tasks from benchmarks like HCAST, RE-Bench, and SWAA, calibrated against over 800 human baselines. This rigorous methodology provides a robust foundation for the observed trend. However, the analysis also highlights critical limitations. A significant reliability gap exists, with the 80% success-rate time horizon being 4-6 times shorter than the 50% horizon, indicating that current AI models are not consistently reliable for longer tasks. Furthermore, AI performance struggles in "messy environments" lacking clear feedback loops and aligns more closely with low-context contractors than with expert maintainers. This implies that while AI excels in well-structured, greenfield projects, its efficacy in deep maintenance or tasks requiring institutional knowledge is considerably reduced.
The forward-looking implications are transformative for the software industry, potentially ushering in an era of unprecedented automation in development. However, the findings underscore that human expertise will remain indispensable, shifting from routine coding to higher-level roles focused on architectural design, quality assurance, and managing complex, ambiguous systems. The challenge lies in developing AI systems that can bridge the reliability gap, operate effectively in unstructured environments, and acquire domain-specific knowledge. The continued exponential growth in AI's capabilities will necessitate a re-evaluation of educational pathways and workforce strategies to prepare for a future where AI is a co-developer, not merely a tool.
Visual Intelligence
flowchart LR A["METR Research"] --> B["50% Task Horizon Metric"] B --> C["AI Model Evaluation"] C --> D["Task Benchmarks"] D --> E["Human Baselines"] C --> F["GPT-2 (2s)"] C --> G["o3 Model (110 min)"] G --> H["Extrapolation (1 month by 2029)"]
Auto-generated diagram · AI-interpreted flow
Impact Assessment
This research provides a quantifiable metric for AI's rapidly increasing capability in software engineering, predicting autonomous completion of significant projects within years. However, it highlights critical reliability and context-awareness gaps that temper expectations for immediate, expert-level performance.
Key Details
- METR introduced the "50%-task-completion time horizon" metric for AI progress in software engineering.
- This time horizon has been doubling every 7 months since 2019.
- GPT-2 could handle 2-second human-equivalent tasks; the o3 model reached 110 minutes.
- Extrapolation suggests AI could reach a one-month (167 working hours) task horizon by mid-2029 (central estimate).
- The 80% success-rate time horizon is 4-6x shorter than the 50% horizon, indicating a reliability gap.
- AI performance on pull requests aligns with low-context contractors, not expert maintainers.
Optimistic Outlook
The exponential growth in AI's software task completion horizon suggests a future where AI agents can autonomously develop complex applications, migrate legacy systems, and implement features end-to-end. This could dramatically boost developer productivity, accelerate innovation, and free human engineers for higher-level architectural and creative work.
Pessimistic Outlook
The significant reliability gap (50% vs. 80% success rates) and AI's struggle with messy, poorly documented environments mean human oversight will remain critical. Furthermore, AI's current performance aligns with contractor-level work, not domain experts, indicating that tasks requiring deep institutional knowledge will remain challenging for AI, potentially leading to costly errors or inefficient solutions in complex systems.
Get the next signal in your inbox.
One concise weekly briefing with direct source links, fast analysis, and no inbox clutter.
More reporting around this signal.
Related coverage selected to keep the thread going without dropping you into another card wall.