LLMs

Dual-LLM Policy Boosts Automated Program Repair Success by 39%

Source: ArXiv Research Original Author: Cambronero; José; Tufano; Michele; Shi; Sherry; Wei; Renyao; Uy; Grant; Cheng; Runxiang; Liu; Chin-J 2 min read Intelligence Analysis by Gemini

Sonic Intelligence

00:00 / 00:00

Signal Summary

Dual LLM policies significantly improve agentic program repair accuracy.

Explain Like I'm Five

"Imagine you have a super-smart robot that tries to fix mistakes in computer code. Sometimes it makes silly suggestions. This new idea uses two smart robots: one to say, 'Don't even try this mistake, it's too hard,' and another to say, 'This fix isn't good enough.' Together, they make the main robot much better at fixing real problems, saving human helpers a lot of time."

Deep Intelligence Analysis

Agentic Automated Program Repair (APR) systems are increasingly being deployed to tackle complex, repository-level bugs within industrial settings. However, a significant hurdle to their widespread adoption has been the generation of 'noisy' patches—fixes that are ultimately deemed unacceptable by human reviewers. This noise wastes valuable developer time and erodes trust in the efficacy of automated code changes. The presented research introduces a novel 'Dual-LLM Policy' dubbed 'Abstain and Validate' to mitigate this critical issue.

The core of this approach lies in two complementary LLM-based policies. The first, 'bug abstention,' is designed to proactively exclude bugs that the agentic APR system is unlikely to fix successfully. This prevents the system from expending resources on intractable problems and generating irrelevant patches. The second policy, 'patch validation,' acts as a quality filter, rejecting candidate patches that are unlikely to be a good or correct fix for the given bug. This ensures that only high-quality, relevant solutions are presented for human review.

The effectiveness of these policies was rigorously evaluated on three distinct sets of bugs sourced from Google's extensive codebase, with candidate patches generated by an internal agentic APR system. The results demonstrated substantial improvements: on a set of 174 human-reported bugs, removing bugs and patches rejected by these policies led to success rate increases of up to 13 and 15 percentage points, respectively. Crucially, when both policies were applied in combination, the success rates soared by up to 39 percentage points. Furthermore, patch validation alone proved beneficial for specific bug types, such as null pointer exceptions and sanitizer-reported bugs with machine-generated reports, improving average single-sample success rates. This two-policy framework offers a robust and practical methodology for the reliable, industrial-scale deployment of agentic APR systems, promising to significantly enhance developer productivity and software quality.

EU AI Act Art. 50 Compliant: This analysis is based solely on the provided source material, ensuring transparency and preventing the generation of unverified information.

AI-assisted intelligence report · EU AI Act Art. 50 compliant

Impact Assessment

This research directly addresses the 'noise' problem in automated program repair, where many generated patches are unacceptable. By significantly improving the quality and acceptance rate of AI-generated fixes, it saves valuable developer time, builds trust in AI tools, and accelerates software development cycles.

Key Details

A 'Dual-LLM Policy' named 'Abstain and Validate' is introduced for agentic automated program repair (APR).
The policy comprises bug abstention (excluding bugs unlikely to be fixed) and patch validation (rejecting poor patches).
Evaluated on three sets of Google's codebase bugs and internal APR system patches.
Achieved up to 13 percentage points increase with bug abstention and 15 percentage points with patch validation on 174 human-reported bugs.
Combined policies resulted in up to a 39 percentage point increase in success rates.
Patch validation also improved average single-sample success rates for null pointer exceptions and sanitizer-reported bugs.

Optimistic Outlook

This dual-LLM approach offers a practical pathway for industrial-scale deployment of agentic APR systems, leading to more efficient and reliable software development. It promises to reduce the manual burden of bug fixing, allowing human developers to focus on more complex, creative tasks.

Pessimistic Outlook

While improving success rates, the system still requires human review, indicating that full autonomy in critical code changes remains a challenge. Potential biases within the LLMs could also inadvertently influence which bugs or patches are accepted or rejected, requiring careful monitoring.

More reporting around this signal.

Related coverage selected to keep the thread going without dropping you into another card wall.

LLMs

TIDE System Boosts LLM Inference Efficiency with Per-Token Early Exit

TIDE optimizes LLM inference by enabling per-token early exit, reducing latency and increasing throughput.

LLMs

Hacker News Engagement: Unpacking LLM Launch Performance

Analysis reveals LLM launch engagement trends and provider performance on Hacker News.

LLMs

NVIDIA's TensorRT LLM Accelerates AI Inference with Specialized Optimizations

TensorRT LLM optimizes LLM and visual generation model inference.

Business

OpenAI's Strategic Acqui-Hires Signal Product Diversification and Image Management Efforts

OpenAI's recent acquisitions target product diversification and public image improvement.

Business

Economist Finds Hope in AI's Labor Market Impact

A leading economist finds a nuanced path to AI-driven economic stability.

Security

Vercel Hacked Via Compromised Third-Party AI Tool

**Vercel suffered a breach through a compromised third-party AI tool.**

Dual-LLM Policy Boosts Automated Program Repair Success by 39%

Sonic Intelligence

Explain Like I'm Five

Deep Intelligence Analysis

Impact Assessment

Key Details

Optimistic Outlook

Pessimistic Outlook

Get the next signal in your inbox.

More reporting around this signal.

TIDE System Boosts LLM Inference Efficiency with Per-Token Early Exit

Hacker News Engagement: Unpacking LLM Launch Performance

NVIDIA's TensorRT LLM Accelerates AI Inference with Specialized Optimizations

OpenAI's Strategic Acqui-Hires Signal Product Diversification and Image Management Efforts

Economist Finds Hope in AI's Labor Market Impact

Vercel Hacked Via Compromised Third-Party AI Tool