Back to Wire
Dual-LLM Policy Boosts Automated Program Repair Success by 39%
LLMs

Dual-LLM Policy Boosts Automated Program Repair Success by 39%

Source: ArXiv Research Original Author: Cambronero; José; Tufano; Michele; Shi; Sherry; Wei; Renyao; Uy; Grant; Cheng; Runxiang; Liu; Chin-J 2 min read Intelligence Analysis by Gemini

Sonic Intelligence

00:00 / 00:00
Signal Summary

Dual LLM policies significantly improve agentic program repair accuracy.

Explain Like I'm Five

"Imagine you have a super-smart robot that tries to fix mistakes in computer code. Sometimes it makes silly suggestions. This new idea uses two smart robots: one to say, 'Don't even try this mistake, it's too hard,' and another to say, 'This fix isn't good enough.' Together, they make the main robot much better at fixing real problems, saving human helpers a lot of time."

Original Reporting
ArXiv Research

Read the original article for full context.

Read Article at Source

Deep Intelligence Analysis

Agentic Automated Program Repair (APR) systems are increasingly being deployed to tackle complex, repository-level bugs within industrial settings. However, a significant hurdle to their widespread adoption has been the generation of 'noisy' patches—fixes that are ultimately deemed unacceptable by human reviewers. This noise wastes valuable developer time and erodes trust in the efficacy of automated code changes. The presented research introduces a novel 'Dual-LLM Policy' dubbed 'Abstain and Validate' to mitigate this critical issue.

The core of this approach lies in two complementary LLM-based policies. The first, 'bug abstention,' is designed to proactively exclude bugs that the agentic APR system is unlikely to fix successfully. This prevents the system from expending resources on intractable problems and generating irrelevant patches. The second policy, 'patch validation,' acts as a quality filter, rejecting candidate patches that are unlikely to be a good or correct fix for the given bug. This ensures that only high-quality, relevant solutions are presented for human review.

The effectiveness of these policies was rigorously evaluated on three distinct sets of bugs sourced from Google's extensive codebase, with candidate patches generated by an internal agentic APR system. The results demonstrated substantial improvements: on a set of 174 human-reported bugs, removing bugs and patches rejected by these policies led to success rate increases of up to 13 and 15 percentage points, respectively. Crucially, when both policies were applied in combination, the success rates soared by up to 39 percentage points. Furthermore, patch validation alone proved beneficial for specific bug types, such as null pointer exceptions and sanitizer-reported bugs with machine-generated reports, improving average single-sample success rates. This two-policy framework offers a robust and practical methodology for the reliable, industrial-scale deployment of agentic APR systems, promising to significantly enhance developer productivity and software quality.

EU AI Act Art. 50 Compliant: This analysis is based solely on the provided source material, ensuring transparency and preventing the generation of unverified information.
AI-assisted intelligence report · EU AI Act Art. 50 compliant

Impact Assessment

This research directly addresses the 'noise' problem in automated program repair, where many generated patches are unacceptable. By significantly improving the quality and acceptance rate of AI-generated fixes, it saves valuable developer time, builds trust in AI tools, and accelerates software development cycles.

Key Details

  • A 'Dual-LLM Policy' named 'Abstain and Validate' is introduced for agentic automated program repair (APR).
  • The policy comprises bug abstention (excluding bugs unlikely to be fixed) and patch validation (rejecting poor patches).
  • Evaluated on three sets of Google's codebase bugs and internal APR system patches.
  • Achieved up to 13 percentage points increase with bug abstention and 15 percentage points with patch validation on 174 human-reported bugs.
  • Combined policies resulted in up to a 39 percentage point increase in success rates.
  • Patch validation also improved average single-sample success rates for null pointer exceptions and sanitizer-reported bugs.

Optimistic Outlook

This dual-LLM approach offers a practical pathway for industrial-scale deployment of agentic APR systems, leading to more efficient and reliable software development. It promises to reduce the manual burden of bug fixing, allowing human developers to focus on more complex, creative tasks.

Pessimistic Outlook

While improving success rates, the system still requires human review, indicating that full autonomy in critical code changes remains a challenge. Potential biases within the LLMs could also inadvertently influence which bugs or patches are accepted or rejected, requiring careful monitoring.

Stay on the wire

Get the next signal in your inbox.

One concise weekly briefing with direct source links, fast analysis, and no inbox clutter.

Free. Unsubscribe anytime.

Continue reading

More reporting around this signal.

Related coverage selected to keep the thread going without dropping you into another card wall.