Skip to main content

AI Hackers

AI Systems Are Now Hunting Software Vulnerabilities—And Winning.

Three months ago, Google Security SVP Heather Adkins and cryptographer Bruce Schneier warned that artificial intelligence would unleash an “AI vulnerability cataclysm.” They said autonomous code-analysis systems would find and weaponize flaws so quickly that human defenders would be overwhelmed. The claim seemed hyperbolic. Yet October and November of 2025 have proven them prescient: AI systems are now dominating bug bounty leaderboards, generating zero-day exploits in minutes, and even rewriting malware in real time to evade detection.

On an October morning, a commercial security agent called XBOW—a fully autonomous penetration tester—shot to the top of HackerOne’s U.S. leaderboard, outcompeting thousands of human hackers. Over the previous 90 days it had filed roughly 1,060 vulnerability reports, including remote-code-execution, SQL injection and server-side request forgery bugs. More than 50 were deemed critical. “If you’re competing for bug bounties, you’re not just competing against other humans anymore,” one veteran researcher told me. “You’re competing against machines that work 24/7, don’t get tired and are getting better every week.”

XBOW is just a harbinger. Between August and November 2025, at least nine developments upended how software is secured. OpenAI, Google DeepMind and DARPA all released sophisticated agents that can scan vast codebases, find vulnerabilities and propose or even automatically apply patches. State-backed hackers have begun using large-language models to design malware that modifies itself in mid-execution, while researchers at SpecterOps published a blueprint for an AI-mediated “gated loader” that decides whether a payload should run based on a covert risk assessment. And an AI-enabled exploit generator published in August showed that published CVEs can be weaponized in 10 to 15 minutes—collapsing the time defenders once had to patch systems.

The pattern is clear: AI systems aren’t just assisting security professionals; they’re replacing them in many tasks, creating new capabilities that didn’t exist before and forcing a fundamental rethinking of how software security works. As bug-bounty programs embrace automation and threat actors deploy AI at every stage of the kill chain, the offense-defense balance is being recalibrated at machine speed.

A New Offensive Arsenal

The offensive side of cybersecurity has seen the most dramatic AI advances, but it’s important to understand the distinction between what’s happening: foundational model companies like OpenAI and Anthropic are building the brains, while agent platforms like XBOW are building the bodies. This distinction matters when evaluating the different approaches emerging in AI-powered security.

OpenAI’s Aardvark, released on Oct. 30, is described by the company as a “security researcher” in software form. Rather than using static analysis or fuzzers, Aardvark uses large-language-model reasoning to build an internal representation of each codebase it analyzes. It continuously monitors commits, traces call graphs and identifies risky patterns. When it finds a potential vulnerability, Aardvark creates a test harness, executes the code in a sandbox and uses Codex to propose a patch. In internal benchmarks across open-source repositories, Aardvark reportedly detected 92 percent of known and synthetic vulnerabilities and discovered 10 new CVE-class flaws. It has already been offered pro bono to select open-source projects. But beneath the impressive numbers, Aardvark functions more like AI-enhanced static application security testing (SAST) than a true autonomous researcher—powerful, but incremental.

Google DeepMind’s CodeMender, unveiled on Oct. 6, takes the concept further by combining discovery and automated repair. It applies advanced program analysis, fuzzing and formal methods to find bugs and uses multi-agent LLMs to generate and validate patches. Over the past six months, CodeMender upstreamed 72 security fixes to open-source projects, some as large as 4.5 million lines of code. In one notable case, it inserted -fbounds-safety annotations into the WebP image library, proactively eliminating a buffer overflow that had been exploited in a zero-click iOS attack. All patches are still reviewed by human experts, but the cadence is accelerating.

Anthropic, meanwhile, is taking a fundamentally different path—one that involves building specialized training environments for red teaming. The company has devoted an entire team to training a foundational red team model. This approach represents a bet that the future of security AI lies not in bolting agents onto existing models, but in training models from the ground up to think like attackers.

The DARPA AI Cyber Challenge (AIxCC), concluded in October, showcased how far autonomous systems have come. Competing teams’ tools scanned 54 million lines of code, discovered 77 percent of synthetic vulnerabilities and generated working patches for 61 percent—with an average time to patch of 45 minutes. During the final four-hour round, participants found 54 new vulnerabilities and patched 43, plus 18 real bugs with 11 patches. DARPA announced that the winning systems will be open-sourced, democratizing these capabilities.

A flurry of attack-centric innovations soon followed. In August, researchers demonstrated an AI pipeline that can weaponize newly disclosed CVEs in under 15 minutes using automated patch diffing and exploit generation; the system costs about $1 per exploit and can scale to hundreds of vulnerabilities per day. A September opinion piece by Gadi Evron, Heather Adkins and Bruce Schneier noted that over the summer, autonomous AI hacking graduated from proof of concept to operational capability. XBOW vaulted to the top of HackerOne, DARPA’s challenge teams found dozens of new bugs, and Ukraine’s CERT uncovered malware using LLMs for reconnaissance and data theft, while another threat actor was caught using Anthropic’s Claude to automate cyberattacks. “AI agents now rival elite hackers,” the authors wrote, warning that the tools drastically reduce the cost and skill needed to exploit systems and could tip the balance towards the attackers.

Yet XBOW’s success reveals an important nuance about agent-based security tools. Unlike OpenAI’s Aardvark or Anthropic’s foundational approach, XBOW is an agent platform that uses these foundational models as backends. The vulnerabilities it finds tend to be surface-level—relatively easy targets like SQL injection, XSS and SSRF—not the deep architectural flaws that require sophisticated reasoning. XBOW’s real innovation wasn’t its vulnerability discovery capability; it was using LLMs to automatically write professional vulnerability reports and leveraging HackerOne’s leaderboard as a go-to-market strategy. By showing up on public rankings, XBOW demonstrated that AI could compete with human hackers at scale, even if the underlying vulnerabilities weren’t particularly complex.

Defense Gets More Automated—But Threats Evolve Faster

Even as defenders deploy AI, adversaries are innovating. The Google Threat Intelligence Group (GTIG) AI Threat Tracker, published on Nov. 5, is the most comprehensive look to date at AI in the wild. For the first time, GTIG identified “just-in-time AI” malware that calls large-language models at runtime to dynamically rewrite and obfuscate itself. One family, PROMPTFLUX, is a VBScript dropper that interacts with Gemini to generate new code segments on demand, making each infection unique. PROMPTSTEAL is a Python data miner that uses Qwen2.5-Coder to build Windows commands for data theft, while PROMPTLOCK demonstrates how ransomware can employ an LLM to craft cross-platform Lua scripts. Another tool, QUIETVAULT, uses an AI prompt to search JavaScript for authentication tokens and secrets. All of these examples show that attackers are moving beyond the 2024 paradigm of AI as a planning aide; in 2025, malware is beginning to self-modify mid-execution.

GTIG’s report also highlights the misuse of AI by state-sponsored actors. Chinese hackers posed as capture-the-flag participants to bypass guardrails and obtain exploitation guidance; Iranian group MUDDYCOAST masqueraded as university students to build custom malware and command-and-control servers, inadvertently exposing their infrastructure. These actors used Gemini to generate reconnaissance scripts, ransomware routines and exfiltration code, demonstrating that widely available models are enabling less-sophisticated hackers to perform advanced operations.

Meanwhile, SpecterOps researcher John Wotton introduced the concept of an AI-gated loader, a covert program that collects host telemetry—process lists, network activity, user presence—and sends it to an LLM, which decides whether the environment is a honeypot or a real victim. Only if the model approves does the loader decrypt and execute its payload; otherwise it quietly exits. The design, dubbed HALO, uses a fail-closed mechanism to avoid exposing a payload in a monitored environment. As LLM API costs fall, such evasive techniques become more practical.

Consolidation and Friction

These technological leaps are reshaping the business of cybersecurity. On Nov. 4, Bugcrowd announced that it will acquire Mayhem Security from my good friend David Brumley. His team was previously known as ForAllSecure and won the 2016 DARPA Cyber Grand Challenge. Mayhem’s technology automatically discovers and exploits bugs and uses reinforcement learning to prioritize high-impact vulnerabilities; it also builds dynamic software bills of materials and “chaos maps” of live systems. Bugcrowd plans to integrate Mayhem’s AI automation with its human hacker community, offering continuous penetration testing and merging AI with crowd-sourced expertise. “We’ve built a system that thinks like an attacker,” Mayhem founder David Brumley said, adding that combining with Bugcrowd brings AI to a global hacker network. The acquisition signals that bug bounty platforms will not remain purely human endeavours; automation is becoming a product feature.

The Mayhem acquisition also underscores the diverging strategies in the AI security space. While agent platforms like XBOW focus on automation at scale, foundational model teams are making massive capital investments in training infrastructure. Anthropic’s multi-billion-dollar commitment to building specialized red teaming environments dwarfs the iterative approach seen elsewhere. This created substantial competitive pressure: when word spread that Anthropic was spending at this scale, it generated significant fear of missing out among both startups and established players, accelerating consolidation moves like the Bugcrowd-Mayhem deal.

Yet adoption is uneven. Some folks I spoke with are testing Aardvark and CodeMender for internal red-teaming and patch generation but won’t deploy them in production without extensive governance. They worry about false positives, destabilizing critical systems and questions of liability if an AI-generated patch breaks something. The friction isn’t technological; it’s organizational—legal, compliance and risk management must all sign off.

The contrast between OpenAI’s and Anthropic’s approaches is striking. OpenAI’s Aardvark, while impressive in benchmarks, functions primarily as enhanced SAST—using AI to improve traditional static analysis rather than fundamentally rethinking how security research is done. Anthropic, by contrast, is betting that true autonomous security research requires training foundational models specifically for offensive security, complete with vast training environments that simulate real-world attack scenarios. This isn’t just a difference in tactics; it’s a philosophical divide about whether security AI should augment existing tools or replace them entirely.

Attackers, by contrast, face no such constraints. They can run self-modifying malware and LLM-powered exploit generators without worrying about compliance. GTIG’s report notes that the underground marketplace for illicit AI tooling is maturing, and the existence of PROMPTFLUX and PROMPTSTEAL suggests some criminal groups are already paying to call LLM APIs in operational malware. This asymmetry raises an unsettling question: Will AI adoption accelerate faster on the offensive side?

What Comes Next

Experts outline three scenarios. The Slow Burn assumes high friction on both sides leads to gradual, manageable adoption, giving regulators and organizations time to adapt. An Asymmetric Surge envisions attackers hurdling friction faster than defenders, driving a spike in breaches and forcing a reactive policy response. And the Cascade scenario posits simultaneous large-scale deployment by both offense and defense, producing the “vulnerability cataclysm” Adkins and Schneier warned about—just delayed by organizational inertia.

What we know: the technology exists. Autonomous agents can find and patch vulnerabilities faster than most humans and can generate exploits in minutes. Malware is starting to adapt itself mid-execution. Bug bounty platforms are integrating AI at their core. And nation-state actors are experimenting with open-source models to augment operations. The question isn’t whether AI will transform cybersecurity—that’s already happening—but whether defenders or attackers will adopt the technology faster, and whether policy makers will help shape the outcome.

Time is short. Patch windows have shrunk from weeks to minutes. Signature-based detection is increasingly unreliable against self-modifying malware. And AI systems like XBOW, Aardvark and CodeMender are running 24 hours a day on infrastructure that scales infinitely.

Author: Tim Booher

Be the first to write a comment.

Leave a Reply

Required fields are marked *