Assigning Dependabot To An Agent: Patch Faster, Review Harder
By wGrow Project Team ·
Dependabot opens a pull request. You assign an AI agent. Three seconds later, the patch exists. The bottleneck is no longer writing code — it’s code review, and that process almost certainly hasn’t kept pace with how fast the code is arriving.
The Bottleneck Has Shifted To Code Review
Vulnerability patching used to eat hours: read the advisory, trace the call sites, understand the version delta, write the fix, test it against your data paths. Agents collapse that timeline to near-zero for the generation step. What they don’t collapse is the risk of merging something that quietly alters system behaviour under the guise of a security fix.
Engineering leads managing multiple SME repositories are particularly exposed. The volume of dependency alerts across a typical portfolio isn’t manageable by hand — agents are the most scalable practical answer. But the instinct to treat a green build as a merge signal is exactly the instinct that will put you on-call at 2 AM.
A blind merge on a minor version bump is not routine maintenance. It is a deployment. Treat it accordingly.
The Minor Version Bump That Broke WaterDoctor
WaterDoctor is a deep-tech water sensor platform — we run the backend. Dependabot flagged CVE-2020-13091 against our pinned pandas 1.0.3. The CVE targets the library’s pickle-loading path — untrusted deserialisation that becomes a real exposure once a system ingests data streams it doesn’t fully control.
We instructed an agent to resolve the alert. It bumped pandas from 1.0.3 to 1.1.0 and modified the data loading wrappers to eliminate the insecure pickle path. The patch merged cleanly. The security alert disappeared from the dashboard. The build was green.
Production broke twelve hours later.
Our parser broke on the mixed-type index arrays used for trailing telemetry packets — the indexing assumption that held in 1.0.3 no longer held in 1.1.0. Our sensor parsing logic depends on a specific frame ordering for trailing telemetry packets. The minor version bump caused the parser to silently drop those trailing frames — no exception, no error log, just incomplete data.
Security was achieved. System integrity was not. The agent had no context on the downstream physical sensor requirements, and we hadn’t provided any. That’s on us, not the agent.
Mapping Blast Radius In A Legacy Public-Sector Portal

The contrasting case sits at the opposite end of that spectrum. A legacy portal we maintain for a public-sector client had a deep transitive dependency flagged for CVE-2019-10744 — prototype pollution in lodash versions prior to 4.17.12. Prototype pollution lets an attacker inject properties into JavaScript’s Object.prototype, with effects ranging from data corruption to privilege escalation depending on how the application handles that object downstream.
A straight npm install [email protected] wasn’t viable. The portal’s Node.js ecosystem had tight coupling around older lodash utility calls, several of which used methods that changed between 4.14.x and 4.17.x. Bumping the version without sanitisation would have traded one class of vulnerability for runtime errors in code paths the test suite didn’t adequately cover.
So we changed the prompt architecture.
Instead of asking the agent to write the patch and open a PR, we told it to: restate the CVE mechanism in its own words, enumerate every file and function invoking the affected lodash methods, trace execution paths from those call sites to any user-controlled input, and produce a ranked remediation list ordered by exposure.
The agent returned a complete blast radius report in under four minutes. Eleven call sites identified, three of which touched input sanitisation logic with no test coverage at all. That mapping saved roughly two days of manual tracing. The fix itself took another half day. The ratio matters: two days of understanding, four hours of fixing.
That is the correct order of operations.
Structuring The Agentic Remediation Prompt
| 1 | You are a vulnerability remediation agent. | |
| 2 | 1. Resolve the CVE flagged by Dependabot. | ← ① |
| 3 | 2. Document the core mechanism of the exploit. | |
| 4 | 3. List all files and exact call sites touched. | ← ② |
| 5 | 4. Identify missing regression tests for call sites. | ← ③ |
| 6 | Do NOT just output the code patch. |
- ① Standard patch generation
- ② Mandatory cross-codebase tracing
- ③ Highlight coverage blind spots for human review
The failure mode is prompting an agent to write the patch and open the PR in the same breath. Code generation is a commodity now. Contextual mapping is the actual requirement.
A remediation prompt for a security alert should mandate four outputs before any code is written:
1. Advisory summary. The agent must restate the CVE mechanism in plain terms — not copy the advisory verbatim, but prove it understands the exploit class. Prototype pollution, memory corruption, and arbitrary code execution are not interchangeable. The PR description should make that clear.
2. Affected call sites. Every file and function touching the vulnerable dependency, with line references. If the agent can’t enumerate these, it doesn’t have enough context to write a safe patch. Don’t proceed.
3. Execution path mapping. Which call sites handle user-controlled input? Which are internal utilities? Blast radius isn’t uniform. An agent that treats all call sites as equivalent risk will over-patch or under-patch — and both failures are expensive.
4. Missing test coverage. The agent should flag functions it touched that have no existing test. These are the places a silent behaviour change will hide. If the agent can’t identify them, your CI pipeline is unlikely to catch the regression either.
Only after those four sections exist in the PR description should you look at the code diff.
A Pragmatic Review Checklist For SME Repositories

For teams managing more than a handful of repositories simultaneously, a consistent review protocol matters more than any individual patch decision. From what I’ve seen across multi-repo environments, the failures cluster around the same three or four habits.
Never merge on a passing build alone. A green CI pipeline means the tests you wrote pass. It says nothing about the tests you didn’t write. For security patches, passing tests are necessary, not sufficient.
Verify the version delta explicitly. Pull the changelog between old and new releases. Look for deprecated methods, altered return types, changed default behaviours. A one-line bump in package.json can represent hundreds of lines of upstream change.
Mandate regression tests for flagged call sites. If the blast radius report identifies twelve call sites, the PR should include regression coverage for at least the high-exposure ones before merge. If the agent didn’t write them, require it to. If the codebase can’t support adding those tests quickly, the patch process has just surfaced a separate problem worth addressing.
Check for silent logic shifts. A security fix that changes a data type or removes error handling to eliminate a vulnerable code path introduces its own category of risk. Review the diff for changes that aren’t strictly additive. An agent that removes a try/catch block to eliminate a vulnerable deserialization path may have eliminated the only error boundary protecting a downstream process.
Cross-check the patch against the advisory. Does the fix address the actual reported mechanism? CVE-2020-13091 is about pickle deserialization. A patch that disables pickle loading is correct. A patch that only adds an input length check is not. Agents can produce plausible-looking fixes that address a surface symptom rather than the exploit class.
Operational Validation Trumps Speed
Security patching in the agentic era is a test of operational discipline, not technical capacity. The agent can write the fix. What it cannot do is weigh the patch against your physical sensor timing requirements, your public-sector SLA, or the undocumented assumptions your data pipeline inherited from a contractor who left in 2021.
The WaterDoctor incident wasn’t an AI failure. It was a prompt failure compounded by a review failure. We asked for a patch; we should have asked for a blast radius. We got a green build; we should have required regression coverage for the telemetry parser.
Stop treating security alerts as isolated bugs. An agent-generated security PR is a change to a running system — it deserves a deployment review, a test plan, and a rollback path. The agent provides speed. Engineering discipline provides safety. Neither substitutes for the other.
The bottleneck has moved. Make sure your review process has moved with it.