Skip to main content
Posts tagged:

Autonomy

AI Agents for Autonomy Engineers

This week I attended a DARPA workshop on the future and dimensions of agentic AI and also caught up with colleagues building flight-critical autonomy. Both communities use the word “autonomy,” but they mean different things. This post distinguishes physical autonomy (embodied systems operating in physics) from agentic AI (LLM-centered software systems operating in digital environments), and maps the design loops and tooling behind each. Full disclosure: I have real hands on experience building physically autonomous systems and am just learning how AI agents work so I’m more wordy in that section and hungry for feedback from actual AI engineers.

The DoW CTO recently consolidated critical technology areas, folding “autonomy” into “AI.” That may make sense at the budget level, but it blurs an important engineering distinction: autonomous physical systems are certified, safety-bounded, closed-loop control systems operating in the real world; agentic AI systems are closed-loop, tool-using software agents operating in digital workflows. For agentic digital systems, performance is the engineer’s goal. Physical systems are constrained by and designed for safety.

Both are feedback systems. The difference is what the loop closes over: physics (sensors → actuators) versus software (APIs → tool outputs). That single difference drives everything else: safety regimes, test strategies, and what failure looks like.

Physical autonomy (embodied AI)

Physical autonomy (often called “physical AI”) is intelligence embedded in robots, drones, vehicles, and other machines with a body. These systems don’t just predict; they act—and the consequences are kinetic. That’s why high-performing autonomy is not enough: the system must be safe under uncertainty.

Conceptually, autonomy is independence of action. Philosophically that’s old; engineering makes it concrete. In physical systems, “getting it wrong” can mean a crash, injury, or property damage—so the loop is designed to be bounded, testable, and auditable.

The physical loop (perceive → plan/control → act → learn)

Perceive. Sensors (camera, LiDAR, radar, GNSS/IMU, microphones) turn the world into signals. In practice, teams build low-latency perception pipelines around ROS 2, often with GPU acceleration (e.g., NVIDIA Isaac ROS / NITROS) and video analytics stacks (e.g., NVIDIA DeepStream).

Plan + control. Near safety-critical edges, autonomy still looks modular: perception → state estimation → planning → control. Classical tools remain dominant because they’re inspectable and constraint-aware (e.g., Nav2 for navigation; MPC toolchains like acados + CasADi when explicit constraints matter). Where LLM/VLA models help most today is at the higher level (interpreting goals, proposing constraints, generating motion primitives) while lower-level controllers enforce safety.

Act. Commands become motion through actuators and real-time control. Common building blocks include ros2_control for robot hardware interfaces and PX4 for UAV inner loops. When learned policies are deployed, they’re increasingly wrapped with safety enforcement (e.g., control barrier function “shielding” such as CBF/QP safety filters).

Learn. Physical autonomy improves through data: logs, simulation, and fleet feedback loops. Teams pair real logs with simulation (e.g., Isaac Lab / Isaac Sim) and rely on robust telemetry and replay tooling (e.g., Foxglove) to debug and improve perception and planning. The hard part is that real-world learning is constrained by cost, safety, and hardware availability.

Agentic AI (digital autonomy)

Agentic AI refers to LLM-centered systems that plan and execute multi-step tasks by calling tools, observing results, and re-planning—closing the loop over software workflows rather than physical dynamics.

External definitions help ground the term. The GAO describes AI agents as systems that can operate autonomously to accomplish complex tasks and adjust plans when actions aren’t clearly defined. NVIDIA similarly emphasizes iterative planning and reasoning to solve multi-step problems. In practice, the “agent” is the whole system: model + tools + memory + guardrails + evaluation.

The agent loop (perceive → reason → act → learn)

Perceive. Agents “sense” through connectors: files, web pages, databases, and APIs. Many production stacks use RAG (retrieval-augmented generation) so the agent can look up relevant documents before answering. That typically means embeddings + a vector database (e.g., pgvector/Postgres, Pinecone, Weaviate, Milvus, Qdrant) managed through libraries like LlamaIndex or LangChain.

Reason. This is the “autonomy” part in digital form: the system turns an ambiguous goal into a sequence of checkable steps, chooses a tool for each step, and updates the plan as results arrive. The practical trick is to move planning out of improvisational chat and into something you can debug and replay.

In production, that usually means a few concrete patterns: a planner/executor split (one component proposes a plan, another executes it—often as a planner that emits a structured plan (e.g., JSON steps with success criteria) and a constrained executor that runs one step at a time, enforces policy/permissions, and can reject a bad plan and trigger re-planning), an explicit state object (often JSON) that tracks what’s known, what’s pending, and what changed, and a workflow/graph that defines allowed transitions (including branches, retries, and “ask a human” escalations). Frameworks like LangGraph, Semantic Kernel, AutoGen, and Google ADK are popular largely because they make this structure first-class.

Mechanically, there’s no special parsing logic: the plan is parsed as normal JSON and validated against a schema (and often repaired by re-prompting if it fails). The executor is then a deterministic interpreter (a state machine / graph runner) that maps each step type to an allowed handler/tool, applies guards (required fields, permissions/approvals), and only then performs side effects.

Tooling-wise, the parts that matter most are: structured outputs (JSON schema / typed arguments), validators (e.g., schema checks and business rules), and failure policies (timeouts, backoff, idempotent retries, and fallbacks to a smaller/bigger model). That’s how “reasoning” becomes reproducible behavior instead of a clever one-off response.

Act. This is where an agent stops being “a good answer” and becomes “a system that does work.” Concretely, acting usually means calling an API (create a ticket, update a CRM record), querying a database, executing code, or triggering a workflow. The enabling tech is boring-but-critical: tool definitions with strict inputs/outputs (often JSON Schema or OpenAPI-derived), adapters/connectors to your systems, and a runtime that can execute calls, capture outputs, and feed them back into the loop. Standards like MCP are emerging to describe tools/connectors consistently across ecosystems.

The hard parts show up immediately: (1) tool selection (choosing the right function among many), (2) argument filling (mapping messy intent into typed fields without inventing values), and (3) side effects (a wrong call can email the wrong person, change the wrong record, or spend money). You also have to assume hostile inputs: prompt injection, tool-output “instructions,” and data exfiltration attempts. In practice, tool selection is usually a routing problem: you maintain a tool catalog (names, descriptions, schemas, permissions, cost/latency), retrieve a small candidate set (rules, embeddings, or a dedicated “router” model), then force the model to choose from that set (function calling / constrained output) and verify the choice (allowed tool? required approvals? inputs complete?) before executing. Production stacks handle this with layered defenses: validation and business-rule checks before execution, idempotency keys and backoff for retries, timeouts and circuit breakers for flaky dependencies, least-privilege auth (scoped tokens, service accounts), sandboxing/allow-lists for sensitive actions, explicit approvals for high-impact actions, and end-to-end observability (traces/logs so you can see what tool was called, with what arguments, and what happened).

Learn. Agent systems generate traces: prompts/messages, retrieved context, tool calls + arguments, tool outputs, decisions, and outcomes—stitched together with a trace ID. In production, this typically looks like structured logs + distributed tracing (often via OpenTelemetry), with redaction/PII controls and secure storage so you can share traces with humans without leaking secrets. The payoff is that traces become an eval dataset: you can sample failures, replay runs, and measure behavior (did it pick the right tool, ask for approval, respect policies, and terminate) using tooling such as OpenAI Evals, then iterate on prompts, routers, and (sometimes) fine-tuning.

Implementation patterns (one quick map)

If you’re building agents today, you’ll usually see one of two patterns: (1) graph/workflow orchestration (explicit steps and state; e.g., LangGraph) or (2) multi-agent role orchestration (specialized agents with handoffs; e.g., CrewAI, AutoGen, Semantic Kernel). Google’s ADK, OpenAI’s Agents SDK, and similar toolkits package these patterns with connectors, observability, and evaluation hooks.

Comparing autonomy: physics vs. software loops

AspectPhysical autonomyAgentic AI
What it operates onPhysics: sensors and actuators in messy environmentsSoftware: APIs, documents, databases, and services
Loop constraintsReal-time latency, dynamics, and safety marginsTool latency, reliability, and permission boundaries
Primary failure modesUnsafe motion, collisions, degraded sensing, hardware faultsWrong tool/arguments, prompt injection, bad data, silent side effects
How you testSimulation + field testing + certification-style evidenceEvals + traces + sandboxing + staged rollout
OversightOften human-in-the-loop for safety-critical operationsOn-the-loop monitoring with guardrails + approvals for risky actions
  • The loop is the same shape, but the domain changes everything: physics forces safety-bounded control; software forces permissions and security.
  • “Autonomy” isn’t a vibe: in both worlds it’s engineered feedback, measurable behavior, and an evidence trail.
  • Both converge on systems engineering: interfaces, observability, evaluation, and failure handling matter as much as model quality.

Where they merge (the payoff)

The most interesting systems blend the two: digital agents plan, coordinate, and monitor; embodied systems execute in the real world under strict safety constraints. You can already see this pattern in:

  • Robotics operations: agents triage logs, run diagnostics, and propose recovery actions while controllers enforce safety.
  • Warehouses and factories: agents schedule work and allocate resources; robots handle motion and manipulation.
  • Multi-robot coordination: agent-like planners coordinate tasks; low-level autonomy keeps platforms stable and safe.

Conclusion

Physical autonomy and agentic AI are two different kinds of independence: one closes the loop over physics, the other over software. Treating them as the same “AI” category hides the real engineering work: safety cases and control in the physical world; security, permissions, and eval-driven reliability in the digital world. The future is hybrid systems—but the way you build trust in them still depends on what the loop closes over.

By 0 Comments

AI and Aerodynamics at Insect Scale: MIT’s Bumblebee‑Sized Robot

MIT soft robotics lab did an amazing job right at the intersection of physical systems, cyber and software. Small robots and insect-related autonomy have fascinated me for awhile. The autonomy of an insect brain is simple enough to just start to touch the boundaries of AI. The aero and mechanics of insects is still mind-bendingly ahead of our best planes (power, maneuver, agility, range).

MIT’s latest insect-scale robot fuses new soft “muscle” actuators, a redesigned four-wing airframe, and an AI-trained controller to achieve truly bug-like flight. Multi-layer dielectric elastomer actuators provide high power at much lower voltage, while long, hair-thin wing hinges and improved transmissions cut mechanical stress so the robot can hover for about 1,000 seconds and execute flips and sharp turns faster than previous microrobots. On top of that hardware, the team uses model-predictive control as an expert “teacher” and distills it into a lightweight neural policy, enabling bumblebee-class speed and agility in a package that weighs less than a paperclip and opens a realistic path to autonomous swarms for pollination and search-and-rescue.

Tiny flying robots have long promised to help with tasks that are too dangerous or delicate for larger drones. Building such machines is extraordinarily difficult: the physics of flapping‑wing flight at insect scale and the constraints of small motors and batteries limit endurance and manoeuvrability. Over the past four years, engineers at the Massachusetts Institute of Technology (MIT) have made three major breakthroughs that together push insect‑scale flight from lab curiosity toward practical autonomy.

I built an animation using three.js to show just how cool this flight path can be.

The insect’s flight path is modeled using a discrete-time Langevin equation, which simulates Brownian motion with aerodynamic damping. At each time step \( t \), the velocity vector \( \mathbf{v} \) is updated by applying a stochastic acceleration \( \mathbf{a}_{\text{rand}} \) and a damping factor \( \gamma \) (representing air resistance):
$$ v_{t+1} = \gamma \, v_t + \mathbf{a}_{\text{rand}} $$
where \( \gamma = 0.95 \) and the components of \( \mathbf{a}_{\text{rand}} \) are drawn from a uniform distribution \( \mathcal{U}(-0.05, 0.05) \). The position \( \mathbf{p} \) is then integrated numerically:
$$ \mathbf{p}_{t+1} = \mathbf{p}_t + \mathbf{v}_{t+1} $$
This results in a “random walk” trajectory that is smoothed by the inertial momentum of the simulated robot, mimicking the erratic yet continuous flight dynamics of a small insect.

Use mouse to rotate/zoom.

Flying robots smaller than a few grams operate in a very different aerodynamic regime from traditional drones. Their wings experience low Reynolds numbers where lift comes from unsteady vortex shedding rather than smooth airflow; getting useful thrust requires wings to flap hundreds of times per second with large stroke angles. At such high frequencies, actuators often buckle and joints fatigue. The robots have extremely tight power and mass budgets, making it hard to carry batteries or onboard processors. Early insect‑like microrobots could just barely hover for a few seconds and needed bulky external power supplies.

A microrobot flips 10 times in 11 seconds.
Credit: Courtesy of the Soft and Micro Robotics Laboratory

MIT’s Soft and Micro Robotics Laboratory, led by Kevin Chen, set out to tackle these problems with a combination of novel soft actuators, mechanically resilient airframes and advanced control algorithms. The resulting platform evolved in three stages: improved artificial muscles (2021), a four‑wing cross‑shaped airframe with durable hinges and transmissions (January 2025) and a learning‑based controller that matches insect agility (December 2025).

The first breakthrough addressed the “muscle” problem. Conventional rigid actuators were too heavy and inefficient for gram‑scale robots. Chen’s group developed multilayer dielectric elastomer actuators—soft “muscles” made from ultrathin elastomer films sandwiched between carbon‑nanotube electrodes and rolled into cylinders. In 2021 they unveiled a fabrication method that eliminates microscopic air bubbles in the elastomer by vacuuming each layer after spin‑coating and baking it immediately. This allowed them to stack 20 alternating layers, each about 10 µm thick , without defects.

Key results from this work included:

  • Lower voltage and more payload: The new actuators operate at 75% lower voltage and carry 80% more payload than earlier soft actuators. By increasing surface area with more layers, they require less than 500V to actuate yet can lift nearly three times their own weight.
  • Higher power density and durability: Removing defects increases power output by more than 300% and extends lifespan. The 20‑layer actuators survived more than 2 million cycles while still flying smoothly.
  • Record hovering: Robots powered by these actuators achieved a 20‑second hovering flight, the longest yet recorded for a sub‑gram robot. With a lift‑to‑weight ratio of about 3.7:1, they could carry additional electronics.

The low‑voltage soft muscles solved a critical bottleneck: the robots could now carry lightweight power electronics and eventually microprocessors instead of being tethered to an external power supply. This breakthrough laid the hardware foundation for future autonomy.

Aerodynamic and structural redesign: four wings, durable hinges and long endurance

The second breakthrough came in early 2025 with a redesign of the robot’s wings and transmission. Previous generations assembled four two‑wing modules into a rectangle, creating eight wings whose wakes interfered with each other and limiting control authority. The new design arranges four single‑wing units in a cross. Each wing flaps outward, reducing aerodynamic interference and freeing up central volume for batteries and sensors.

To exploit the power of the improved actuators, Chen’s team built more complex transmissions that connect each actuator to its wing. These transmissions prevent the artificial muscles from buckling at high flapping frequencies, reducing mechanical strain and allowing higher torque. They also developed a 2 cm‑long wing hinge with a diameter of just 200 µm, fabricated via a multistep laser‑cutting process. The long hinge reduces torsional stress during flapping and increases durability; even slight misalignment during fabrication could affect the wing’s motion.

Performance improvements from this structural redesign were dramatic:

  • Extended flight time: The robot can hover for more than 1,000 seconds (~17 minutes) without degradation of precision —100 times longer than earlier insect‑scale robots.
  • Increased speed and agility: It reaches an average speed of 35 cm/s and performs body rolls and double flips. It can precisely follow complex paths, including spelling “MIT” in mid‑air.
  • Greater control torque: The new transmissions generate about three times more torque than prior designs , enabling sophisticated and accurate path‑following flights.

These mechanical innovations show how aerodynamic design, materials engineering and manufacturing precision are intertwined. By reducing strain on the wings and actuators, the platform gains not only endurance but also the stability needed for advanced control.

Autonomy, AI and Systems Engineering

The final breakthrough is in control theory as it intersects autonomy. Insect‑scale flight demands rapid decision‑making: the wings beat hundreds of times per second, and disturbances like gusts can quickly destabilize the robot. Traditional hand‑tuned controllers cannot handle aggressive maneuvers or unexpected perturbations. MIT’s team collaborated with Jonathan How’s Aerospace Controls Laboratory to develop a two‑step AI‑based control scheme.

Step 1: Model‑predictive control (MPC). The researchers built a high‑fidelity dynamic model of the robot’s mechanics and aerodynamics. An MPC uses this model to plan an optimal sequence of control inputs that follow a desired trajectory while respecting force and torque limits. The planner can design difficult maneuvers such as repeated flips and aggressive turns but is too computationally intensive to run on the robot in real time.

Step 2: Imitation‑learned policy. To bridge the gap between high‑end planning and onboard execution, the team generated training data by having the MPC perform many trajectories and perturbations. They then trained a deep neural network policy via imitation learning to map the robot’s state directly to control commands. This policy effectively compresses the MPC’s intelligence into a lightweight model that can run fast enough for real‑time control.

The results show how AI enables insect‑like agility:

  • Speed and acceleration: The learning‑based controller allows the robot to fly 447% faster and achieve a 255% increase in acceleration compared with their previous best hand‑tuned controller.
  • Complex maneuvers: The robot executed 10 somersaults in 11 seconds, staying within about 4–5 cm of the planned trajectory and maintaining performance despite wind gusts of more than 1 m/s
  • Bio‑inspired saccades: The policy enabled saccadic flight behavior, where the robot rapidly pitches forward to accelerate and then back to decelerate, mimicking insect eye stabilization strategies.

An external motion‑capture system currently provides state estimation, and the controller runs offboard. However, because the neural policy is much less computationally demanding than full MPC, the authors argue that similar policies may be feasible on tiny onboard processors. The AI control thus paves the way for autonomous flight without external assistance.

A key message of MIT’s microrobot program is that agility at insect scale does not come from a single innovation. The soft actuators enable large strokes at lower voltage; the cross‑shaped airframe and long hinges reduce mechanical strain and aerodynamic interference; and the AI‑driven controller exploits the robot’s physical capabilities while respecting constraints. Chen’s group emphasizes that hardware advances pushed them to develop better controllers, and improved controllers made it worthwhile to refine the hardware.

This co‑design philosophy—optimizing materials, mechanisms and algorithms together—will be essential as researchers push toward untethered, autonomous swarms. The team plans to integrate miniature batteries and sensors into the central cavity freed by the four‑wing design, allowing the robots to navigate without a motion‑capture system. Future work includes landing and take‑off from flowers for mechanical pollination and coordinated flight to avoid collisions and operate in groups. There are also open questions about enabling onboard perception—using tiny cameras or event sensors to close the loop—and about energy management to extend endurance beyond 10,000s.

MIT’s bumblebee‑sized flapping robot illustrates how progress in materials science, precision fabrication, aerodynamics and AI can converge to solve a hard problem. The low‑voltage, power‑dense actuators prove that soft materials can outperform rigid designs, the four‑wing airframe with durable hinges unlocks long endurance and high torque, and the hybrid MPC/learning controller shows that sophisticated planning can be compressed into hardware‑friendly neural policies. Together, these advances give the microrobot insect‑like speed, agility and endurance.

While still reliant on external power and motion capture, the robot’s modular design and AI controller suggest a roadmap to fully autonomous operation. As the team integrates onboard sensors and batteries, insect‑scale robots could move from labs to fields and disaster zones, pollinating crops or searching collapsed buildings. In that future, the intersection of autonomy and aerodynamics will be defined not by a single breakthrough but by a careful co‑design of muscles, wings and brains.

By One Comment

AI Hackers

AI Systems Are Now Hunting Software Vulnerabilities—And Winning.

Three months ago, Google Security SVP Heather Adkins and cryptographer Bruce Schneier warned that artificial intelligence would unleash an “AI vulnerability cataclysm.” They said autonomous code-analysis systems would find and weaponize flaws so quickly that human defenders would be overwhelmed. The claim seemed hyperbolic. Yet October and November of 2025 have proven them prescient: AI systems are now dominating bug bounty leaderboards, generating zero-day exploits in minutes, and even rewriting malware in real time to evade detection.

On an October morning, a commercial security agent called XBOW—a fully autonomous penetration tester—shot to the top of HackerOne’s U.S. leaderboard, outcompeting thousands of human hackers. Over the previous 90 days it had filed roughly 1,060 vulnerability reports, including remote-code-execution, SQL injection and server-side request forgery bugs. More than 50 were deemed critical. “If you’re competing for bug bounties, you’re not just competing against other humans anymore,” one veteran researcher told me. “You’re competing against machines that work 24/7, don’t get tired and are getting better every week.”

XBOW is just a harbinger. Between August and November 2025, at least nine developments upended how software is secured. OpenAI, Google DeepMind and DARPA all released sophisticated agents that can scan vast codebases, find vulnerabilities and propose or even automatically apply patches. State-backed hackers have begun using large-language models to design malware that modifies itself in mid-execution, while researchers at SpecterOps published a blueprint for an AI-mediated “gated loader” that decides whether a payload should run based on a covert risk assessment. And an AI-enabled exploit generator published in August showed that published CVEs can be weaponized in 10 to 15 minutes—collapsing the time defenders once had to patch systems.

The pattern is clear: AI systems aren’t just assisting security professionals; they’re replacing them in many tasks, creating new capabilities that didn’t exist before and forcing a fundamental rethinking of how software security works. As bug-bounty programs embrace automation and threat actors deploy AI at every stage of the kill chain, the offense-defense balance is being recalibrated at machine speed.

A New Offensive Arsenal

The offensive side of cybersecurity has seen the most dramatic AI advances, but it’s important to understand the distinction between what’s happening: foundational model companies like OpenAI and Anthropic are building the brains, while agent platforms like XBOW are building the bodies. This distinction matters when evaluating the different approaches emerging in AI-powered security.

OpenAI’s Aardvark, released on Oct. 30, is described by the company as a “security researcher” in software form. Rather than using static analysis or fuzzers, Aardvark uses large-language-model reasoning to build an internal representation of each codebase it analyzes. It continuously monitors commits, traces call graphs and identifies risky patterns. When it finds a potential vulnerability, Aardvark creates a test harness, executes the code in a sandbox and uses Codex to propose a patch. In internal benchmarks across open-source repositories, Aardvark reportedly detected 92 percent of known and synthetic vulnerabilities and discovered 10 new CVE-class flaws. It has already been offered pro bono to select open-source projects. But beneath the impressive numbers, Aardvark functions more like AI-enhanced static application security testing (SAST) than a true autonomous researcher—powerful, but incremental.

Google DeepMind’s CodeMender, unveiled on Oct. 6, takes the concept further by combining discovery and automated repair. It applies advanced program analysis, fuzzing and formal methods to find bugs and uses multi-agent LLMs to generate and validate patches. Over the past six months, CodeMender upstreamed 72 security fixes to open-source projects, some as large as 4.5 million lines of code. In one notable case, it inserted -fbounds-safety annotations into the WebP image library, proactively eliminating a buffer overflow that had been exploited in a zero-click iOS attack. All patches are still reviewed by human experts, but the cadence is accelerating.

Anthropic, meanwhile, is taking a fundamentally different path—one that involves building specialized training environments for red teaming. The company has devoted an entire team to training a foundational red team model. This approach represents a bet that the future of security AI lies not in bolting agents onto existing models, but in training models from the ground up to think like attackers.

The DARPA AI Cyber Challenge (AIxCC), concluded in October, showcased how far autonomous systems have come. Competing teams’ tools scanned 54 million lines of code, discovered 77 percent of synthetic vulnerabilities and generated working patches for 61 percent—with an average time to patch of 45 minutes. During the final four-hour round, participants found 54 new vulnerabilities and patched 43, plus 18 real bugs with 11 patches. DARPA announced that the winning systems will be open-sourced, democratizing these capabilities.

A flurry of attack-centric innovations soon followed. In August, researchers demonstrated an AI pipeline that can weaponize newly disclosed CVEs in under 15 minutes using automated patch diffing and exploit generation; the system costs about $1 per exploit and can scale to hundreds of vulnerabilities per day. A September opinion piece by Gadi Evron, Heather Adkins and Bruce Schneier noted that over the summer, autonomous AI hacking graduated from proof of concept to operational capability. XBOW vaulted to the top of HackerOne, DARPA’s challenge teams found dozens of new bugs, and Ukraine’s CERT uncovered malware using LLMs for reconnaissance and data theft, while another threat actor was caught using Anthropic’s Claude to automate cyberattacks. “AI agents now rival elite hackers,” the authors wrote, warning that the tools drastically reduce the cost and skill needed to exploit systems and could tip the balance towards the attackers.

Yet XBOW’s success reveals an important nuance about agent-based security tools. Unlike OpenAI’s Aardvark or Anthropic’s foundational approach, XBOW is an agent platform that uses these foundational models as backends. The vulnerabilities it finds tend to be surface-level—relatively easy targets like SQL injection, XSS and SSRF—not the deep architectural flaws that require sophisticated reasoning. XBOW’s real innovation wasn’t its vulnerability discovery capability; it was using LLMs to automatically write professional vulnerability reports and leveraging HackerOne’s leaderboard as a go-to-market strategy. By showing up on public rankings, XBOW demonstrated that AI could compete with human hackers at scale, even if the underlying vulnerabilities weren’t particularly complex.

Defense Gets More Automated—But Threats Evolve Faster

Even as defenders deploy AI, adversaries are innovating. The Google Threat Intelligence Group (GTIG) AI Threat Tracker, published on Nov. 5, is the most comprehensive look to date at AI in the wild. For the first time, GTIG identified “just-in-time AI” malware that calls large-language models at runtime to dynamically rewrite and obfuscate itself. One family, PROMPTFLUX, is a VBScript dropper that interacts with Gemini to generate new code segments on demand, making each infection unique. PROMPTSTEAL is a Python data miner that uses Qwen2.5-Coder to build Windows commands for data theft, while PROMPTLOCK demonstrates how ransomware can employ an LLM to craft cross-platform Lua scripts. Another tool, QUIETVAULT, uses an AI prompt to search JavaScript for authentication tokens and secrets. All of these examples show that attackers are moving beyond the 2024 paradigm of AI as a planning aide; in 2025, malware is beginning to self-modify mid-execution.

GTIG’s report also highlights the misuse of AI by state-sponsored actors. Chinese hackers posed as capture-the-flag participants to bypass guardrails and obtain exploitation guidance; Iranian group MUDDYCOAST masqueraded as university students to build custom malware and command-and-control servers, inadvertently exposing their infrastructure. These actors used Gemini to generate reconnaissance scripts, ransomware routines and exfiltration code, demonstrating that widely available models are enabling less-sophisticated hackers to perform advanced operations.

Meanwhile, SpecterOps researcher John Wotton introduced the concept of an AI-gated loader, a covert program that collects host telemetry—process lists, network activity, user presence—and sends it to an LLM, which decides whether the environment is a honeypot or a real victim. Only if the model approves does the loader decrypt and execute its payload; otherwise it quietly exits. The design, dubbed HALO, uses a fail-closed mechanism to avoid exposing a payload in a monitored environment. As LLM API costs fall, such evasive techniques become more practical.

Consolidation and Friction

These technological leaps are reshaping the business of cybersecurity. On Nov. 4, Bugcrowd announced that it will acquire Mayhem Security from my good friend David Brumley. His team was previously known as ForAllSecure and won the 2016 DARPA Cyber Grand Challenge. Mayhem’s technology automatically discovers and exploits bugs and uses reinforcement learning to prioritize high-impact vulnerabilities; it also builds dynamic software bills of materials and “chaos maps” of live systems. Bugcrowd plans to integrate Mayhem’s AI automation with its human hacker community, offering continuous penetration testing and merging AI with crowd-sourced expertise. “We’ve built a system that thinks like an attacker,” Mayhem founder David Brumley said, adding that combining with Bugcrowd brings AI to a global hacker network. The acquisition signals that bug bounty platforms will not remain purely human endeavours; automation is becoming a product feature.

The Mayhem acquisition also underscores the diverging strategies in the AI security space. While agent platforms like XBOW focus on automation at scale, foundational model teams are making massive capital investments in training infrastructure. Anthropic’s multi-billion-dollar commitment to building specialized red teaming environments dwarfs the iterative approach seen elsewhere. This created substantial competitive pressure: when word spread that Anthropic was spending at this scale, it generated significant fear of missing out among both startups and established players, accelerating consolidation moves like the Bugcrowd-Mayhem deal.

Yet adoption is uneven. Some folks I spoke with are testing Aardvark and CodeMender for internal red-teaming and patch generation but won’t deploy them in production without extensive governance. They worry about false positives, destabilizing critical systems and questions of liability if an AI-generated patch breaks something. The friction isn’t technological; it’s organizational—legal, compliance and risk management must all sign off.

The contrast between OpenAI’s and Anthropic’s approaches is striking. OpenAI’s Aardvark, while impressive in benchmarks, functions primarily as enhanced SAST—using AI to improve traditional static analysis rather than fundamentally rethinking how security research is done. Anthropic, by contrast, is betting that true autonomous security research requires training foundational models specifically for offensive security, complete with vast training environments that simulate real-world attack scenarios. This isn’t just a difference in tactics; it’s a philosophical divide about whether security AI should augment existing tools or replace them entirely.

Attackers, by contrast, face no such constraints. They can run self-modifying malware and LLM-powered exploit generators without worrying about compliance. GTIG’s report notes that the underground marketplace for illicit AI tooling is maturing, and the existence of PROMPTFLUX and PROMPTSTEAL suggests some criminal groups are already paying to call LLM APIs in operational malware. This asymmetry raises an unsettling question: Will AI adoption accelerate faster on the offensive side?

What Comes Next

Experts outline three scenarios. The Slow Burn assumes high friction on both sides leads to gradual, manageable adoption, giving regulators and organizations time to adapt. An Asymmetric Surge envisions attackers hurdling friction faster than defenders, driving a spike in breaches and forcing a reactive policy response. And the Cascade scenario posits simultaneous large-scale deployment by both offense and defense, producing the “vulnerability cataclysm” Adkins and Schneier warned about—just delayed by organizational inertia.

What we know: the technology exists. Autonomous agents can find and patch vulnerabilities faster than most humans and can generate exploits in minutes. Malware is starting to adapt itself mid-execution. Bug bounty platforms are integrating AI at their core. And nation-state actors are experimenting with open-source models to augment operations. The question isn’t whether AI will transform cybersecurity—that’s already happening—but whether defenders or attackers will adopt the technology faster, and whether policy makers will help shape the outcome.

Time is short. Patch windows have shrunk from weeks to minutes. Signature-based detection is increasingly unreliable against self-modifying malware. And AI systems like XBOW, Aardvark and CodeMender are running 24 hours a day on infrastructure that scales infinitely.

By 0 Comments

fly brain

The Connectome Scaling Wall: What Mapping Fly Brains Reveals About Autonomous Systems

In October 2024, the FlyWire consortium published the complete connectome of an adult Drosophila melanogaster brain: 139,255 neurons, 54.5 million synapses, 8,453 cell types, mapped at 8-nanometer resolution. This is only the third complete animal connectome ever produced, following C. elegans (302 neurons, 1986) and Drosophila larva (3,016 neurons, 2023). The 38-year gap between the first and third isn’t just a story about technology—it’s about fundamental scaling constraints that apply equally to biological neural networks and autonomous robotics.

A connectome is the complete wiring diagram of a nervous system. Think of it as the ultimate circuit board schematic, except instead of copper traces and resistors, you’re mapping biological neurons and the synapses that connect them. But unlike an electrical circuit you can trace with your finger, these connections exist in three-dimensional space at scales invisible to the naked eye. A single neuron might be a tenth of a millimeter long but only a few microns wide—about 1/20th the width of a human hair. The synapses where neurons connect are even tinier: just 20-40 nanometers across, roughly 2,000 times smaller than the width of a hair. And there are millions of them, tangled together in a three-dimensional mesh that makes the densest urban skyline look spacious by comparison. The connectome doesn’t just tell you which neurons exist—it tells you how they talk to each other. Neuron A connects to neurons B, C, and D. Neuron B connects back to A and forward to E. That recurrent loop between A and B? That might be how the fly remembers the smell of rotting fruit. The connection strength between neurons—whether a synapse is strong or weak—determines whether a signal gets amplified or filtered out. It’s the difference between a whisper and a shout, between a fleeting thought and a committed decision.

Creating a connectome is almost absurdly difficult. First, you preserve the brain perfectly at nanometer resolution, then slice it into thousands of impossibly thin sections—the fruit fly required 7,000 slices, each just 40 nanometers thick, about 1/1,000th the thickness of paper, cut with diamond-knife precision. Each slice goes under an electron microscope, generating a 100-teravoxel dataset where each pixel represents an 8-nanometer cube of brain tissue. Then comes the nightmare part: tracing each neuron through all 7,000 slices, like following a single thread through 7,000 sheets of paper where the thread appears as just a dot on each sheet—and there are 139,255 threads tangled together. When Sydney Brenner’s team mapped C. elegans in the 1970s and 80s, they did this entirely by hand, printing electron microscopy images on glossy paper and tracing neurons with colored pens through thousands of images. It took 15 years for 302 neurons. The fruit fly has 460 times more.

This is where the breakthrough happened. The FlyWire consortium used machine learning algorithms called flood-filling networks to automatically segment neurons, but the AI made mistakes constantly—merging neurons, splitting single cells into pieces, missing connections. So they crowdsourced the proofreading: hundreds of scientists and citizen scientists corrected errors neuron by neuron, synapse by synapse, and the AI learned from each correction. This hybrid approach—silicon intelligence for speed, human intelligence for accuracy—took approximately 50 person-years of work. The team then used additional machine learning to predict neurotransmitter types from images alone and identified 8,453 distinct cell types. The final result: 139,255 neurons, 54.5 million synapses, every single connection mapped—a complete wiring diagram of how a fly thinks.

The O(n²) Problem

Connectomes scale brutally. Going from 302 to 139,255 neurons isn’t 460× harder—it’s exponentially harder because connectivity scales roughly as n². The worm has ~7,000 synapses. The fly has 54.5 million—a 7,800× increase in edges for a 460× increase in nodes. The fly brain also went from 118 cell classes to 8,453 distinct types, meaning the segmentation problem became orders of magnitude more complex.

Sydney Brenner’s team in 1986 traced C. elegans by hand through 8,000 electron microscopy images using colored pens on glossy prints. They tried automating it with a Modular I computer (64 KB memory), but 1980s hardware couldn’t handle the reconstruction. The entire project took 15 years.

The FlyWire consortium solved the scaling problem with a three-stage pipeline:

Connectome Evolution

Mapping the Neural Universe

From 302 neurons in a tiny worm to 139,255 in a fruit fly—witness the exponential complexity that defines the frontier of neuroscience.

Exponential Growth
460× more neurons in 38 years
7,142× more synapses
Complexity: O(n²) connections

Stage 1: Automated segmentation via flood-filling networks
They sliced the fly brain into 7,000 sections at 40nm thickness (1/1000th the thickness of paper), generating a 100-teravoxel dataset. Flood-filling neural networks performed initial segmentation—essentially a 3D region-growing algorithm that propagates labels across voxels with similar appearance. This is computationally tractable because it’s local and parallelizable, but error-prone because neurons can look similar to surrounding tissue at boundaries.

Stage 2: Crowdsourced proofreading
The AI merged adjacent neurons, split single cells, and missed synapses constantly. Hundreds of neuroscientists and citizen scientists corrected these errors through a web interface. Each correction fed back into the training set, iteratively improving the model. This hybrid approach—automated first-pass, human verification, iterative refinement—consumed approximately 50 person-years.

Stage 3: Machine learning for neurotransmitter prediction
Rather than requiring chemical labeling for every neuron, they trained classifiers to predict neurotransmitter type (glutamate, GABA, acetylcholine, etc.) directly from EM morphology. This is non-trivial because neurotransmitter identity isn’t always apparent from structure alone, but statistical patterns in vesicle density, synapse morphology, and connectivity motifs provide signal.

The result is complete: every neuron traced from dendrite to axon terminal, every synapse identified and typed, every connection mapped. UC Berkeley researchers ran the entire connectome as a leaky integrate-and-fire network on a laptop, successfully predicting sensorimotor pathways for sugar sensing, water detection, and proboscis extension. Completeness matters because partial connectomes can’t capture whole-brain dynamics—recurrent loops, feedback pathways, and emergent computation require the full graph.

50 largest neurons of the fly brain connectome.
Credit: Tyler Sloan and Amy Sterling for FlyWire, Princeton University, (Dorkenwald et al., 2024)

Power Scaling: The Fundamental Constraint

Here’s the engineering problem: the fly brain operates on 5-10 microwatts. That’s 0.036-0.072 nanowatts per neuron. It enables:

  • Controlled flight at 5 m/s with 200 Hz wing beat frequency
  • Visual processing through 77,536 optic lobe neurons at ~200 fps
  • Real-time sensorimotor integration with ~10ms latencies
  • Onboard learning and memory formation
  • Navigation, courtship, and decision-making

Compare this to autonomous systems:

SystemPowerCapabilityEfficiency
Fruit fly brain10 μWFull autonomy0.072 nW/neuron
Intel Loihi 22.26 μW/neuronInference only31× worse than fly
NVIDIA Jetson (edge inference)~15 WVision + control10⁶× worse
Boston Dynamics Spot~400 W total90 min runtime
Human brain20 W1 exaFLOP equivalent50 pFLOPS/W
Frontier supercomputer20 MW1 exaFLOP50 FLOPS/W

The brain achieves ~10¹² better energy efficiency than conventional computing at the same computational throughput. This isn’t a transistor physics problem—it’s architectural.

Why Biology Wins: Event-Driven Sparse Computation

The connectome reveals three principles that explain biological efficiency:

1. Asynchronous event-driven processing
Neurons fire sparsely (~1-10 Hz average) and asynchronously. There’s no global clock. Computation happens only when a spike arrives. In the fly brain, within four synaptic hops nearly every neuron can influence every other (high recurrence), yet the network remains stable because most neurons are silent most of the time. Contrast this with synchronous processors where every transistor is clocked every cycle, consuming 30-40% of chip power just on clock distribution—even when idle.

2. Strong/weak connection asymmetry
The fly connectome shows that 70-79% of all synaptic weight is concentrated in just 16% of connections. These strong connections (>10 synapses between a neuron pair) carry reliable signals with high SNR. Weak connections (1-2 synapses) may represent developmental noise, context-dependent modulation, or exploratory wiring that rarely fires. This creates a core network of reliable pathways overlaid with a sparse exploratory graph—essentially a biological ensemble method that balances exploitation and exploration.

3. Recurrent loops instead of deep feedforward hierarchies
The Drosophila larval connectome (3,016 neurons, 548,000 synapses) revealed that 41% of neurons receive long-range recurrent input. Rather than the deep feedforward architectures in CNNs (which require many layers to build useful representations), insect brains use nested recurrent loops. This compensates for shallow depth: instead of composing features through 50+ layers, they iterate and refine through recurrent processing with 3-4 layers. Multisensory integration starts with sense-specific second-order neurons but rapidly converges to shared third/fourth-order processing—biological transfer learning that amortizes computation across modalities.

Neuromorphic Implementations: Narrowing the Gap

Event-driven neuromorphic chips implement these principles in silicon:

Intel Loihi 2 (2024)

  • 1M neurons/chip, 4nm process
  • Fully programmable neuron models
  • Graded spikes (not just binary)
  • 2.26 μW/neuron (vs. 0.072 nW for flies—still 31,000× worse)
  • 16× better energy efficiency than GPUs on audio tasks

The Hala Point system (1,152 Loihi 2 chips) achieves 1.15 billion neurons at 2,600W maximum—demonstrating that neuromorphic scales, but still consumes orders of magnitude more than biology per neuron.

IBM NorthPole (2023)

  • Eliminates von Neumann bottleneck by co-locating memory and compute
  • 22 billion transistors, 800 mm² die
  • 25× better energy efficiency vs. 12nm GPUs on vision tasks
  • 72× better efficiency on LLM token generation

NorthPole is significant because it addresses the memory wall: in traditional architectures, moving data between DRAM and compute costs 100-1000× more energy than the actual computation. Co-locating memory eliminates this overhead.

BrainChip Akida (2021-present)

  • First commercially available neuromorphic chip
  • 100 μW to 300 mW depending on workload
  • Akida Pico (Oct 2024): <1 mW operation
  • On-chip learning without backprop

The critical insight: event-driven cameras + neuromorphic processors. Traditional cameras output full frames at 30-60 fps whether or not anything changed. Event-based cameras (DVS sensors) output asynchronous pixel-level changes only—mimicking retinal spike encoding. Paired with neuromorphic processors, this achieves dramatic efficiency: idle power drops from ~30W (GPU) to <1mW (neuromorphic).

RoboBees
Credit: Wyss Institute at Harvard University

The Micro-Robotics Power Wall

The scaling problem becomes acute at insect scale. Harvard’s RoboBee (80-100 mg, 3 cm wingspan) flies at 120 mW. Solar cells deliver 0.76 mW/mg at full sun—but the RoboBee needs 3× Earth’s solar flux to sustain flight. This forces operation under halogen lights in the lab, not outdoor autonomy.

The fundamental constraint is energy storage scaling. As robots shrink, surface area (power collection) scales as r², volume (power demand) as r³. At sub-gram scale, lithium-ion provides 1.8 MJ/kg. Metabolic fat provides 38 MJ/kg—a 21× advantage that no battery chemistry on the roadmap approaches.

This creates a catch-22: larger batteries enable longer flight, but add weight requiring bigger actuators drawing more power, requiring even larger batteries. The loop doesn’t converge.

Alternative approaches:

  1. Cyborg insects: Beijing Institute of Technology’s 74 mg brain controller interfaces directly with bee neural tissue via three electrodes, achieving 90% command compliance. Power: hundreds of microwatts for control electronics. Propulsion: biological, running on metabolic fuel. Result: 1,000× power advantage over fully robotic micro-flyers.
  2. Chemical fuels: The RoBeetle (88 mg crawling robot) uses catalytic methanol combustion. Methanol: 20 MJ/kg—10× lithium-ion density. But scaling combustion to aerial vehicles introduces complexity (fuel pumps, thermal management) at millimeter scales.
  3. Tethered operation: MIT’s micro-robot achieved 1,000+ seconds of flight with double aerial flips, but remains tethered to external power.

For untethered autonomous micro-aerial vehicles, current battery chemistry makes hour-long operation physically impossible.

The Dragonfly Algorithm: Proof of Concept

The dragonfly demonstrates what’s possible with ~1 million neurons. It intercepts prey with 95%+ success, responding to maneuvers in 50ms:

  • 10 ms: photoreceptor response + signal transmission
  • 5 ms: muscle force production
  • 35 ms: neural computation

That’s 3-4 neuron layers maximum at biological signaling speeds (~1 m/s propagation, ~1 ms synaptic delays). The algorithm is parallel navigation: maintain constant line-of-sight angle to prey while adjusting speed. It’s simple, fast, and works with 1/100th the spatial resolution of human vision at 200 fps.

The dragonfly’s visual system contains 441 input neurons feeding 194,481 processing neurons. Researchers have implemented this in III-V nanowire optoelectronics operating at sub-picowatt per neuron. The human brain averages 0.23 nW/neuron—still 100,000× more efficient than conventional processors per operation. Loihi 2 and NorthPole narrow this to ~1,000× gap, but the remaining distance requires architectural innovation, not just process shrinks.

Distributed Control vs. Centralized Bottlenecks

Insect nervous systems demonstrate distributed control:

  • Spinal reflexes: 10-30 ms responses without brain involvement
  • Central pattern generators: rhythmic movements (walking, flying) produced locally in ganglia
  • Parallel sensory streams: no serialization bottleneck

Modern autonomous vehicles route all sensor data to central compute clusters, creating communication bottlenecks and single points of failure. The biological approach is hierarchical:

  • Low-level reactive control: fast (1-10 ms), continuous, local
  • High-level deliberative planning: slow (100-1000 ms), occasional, centralized

This matches the insect architecture: local ganglia handle reflexes and pattern generation, brain handles navigation and decision-making. The division of labor minimizes communication overhead and reduces latency.

Scaling Constraints Across Morphologies

The power constraint manifests predictably across scales:

Micro-aerial (0.1-1 g): Battery energy density is the hard limit. Flight times measured in minutes. Solutions require chemical fuels, cyborg approaches, or accepting tethered operation.

Drones (0.1-10 kg): Power ∝ v³ for flight. Consumer drones: 20-30 min. Advanced commercial: 40-55 min. Headwinds cut range by half. Monarch butterflies migrate thousands of km on 140 mg of fat—200× better mass-adjusted efficiency.

Ground robots (10-100 kg): Legged locomotion is 2-5× less efficient than wheeled for the same distance (constantly fighting gravity during stance phase). Spot: 90 min runtime, 3-5 km range. Humanoids: 5-10× worse cost of transport than humans despite electric motors being more efficient than muscle (90% vs. 25%). The difference is energy storage and integrated control.

Computation overhead: At large scale (vehicles drawing 10-30 kW), AI processing (500-1000W) is 5-10% overhead. At micro-scale, computation dominates: a 3cm autonomous robot with CNN vision gets 15 minutes from a 40 mAh battery because video processing drains faster than actuation.

The Engineering Path Forward

The connectome provides a blueprint, but implementation requires system-level integration:

1. Neuromorphic processors for always-on sensing
Event-driven computation with DVS cameras enables <1 mW idle power. Critical for battery-limited mobile robots.

2. Hierarchical control architectures
Distribute reflexes and pattern generation to local controllers. Reserve central compute for high-level planning. Reduces communication overhead and latency.

3. Task-specific optimization over general intelligence
The dragonfly’s parallel navigation algorithm is simple, fast, and sufficient for 95%+ interception success. General-purpose autonomy is expensive. Narrow, well-defined missions allow exploiting biological efficiency principles.

4. Structural batteries and variable impedance actuators
Structural batteries serve as both energy storage and load-bearing elements, improving payload mass fraction. Variable impedance actuators mimic muscle compliance, reducing energy waste during impacts.

5. Chemical fuels for micro-robotics
At sub-gram scale, metabolic fuel’s 20-40× energy density advantage over batteries is insurmountable with current chemistry. Methanol combustion, hydrogen fuel cells, or cyborg approaches are the only paths to hour-long micro-aerial autonomy.

Minuscule RoBeetle Turns Liquid Methanol Into Muscle Power
Credit: IEEE Spectrum

Conclusion: Efficiency is the Constraint

The fruit fly connectome took 38 years after C. elegans because complexity scales exponentially. The same scaling laws govern autonomous robotics: every doubling of capability demands exponentially more energy, computation, and integration complexity.

The fly brain’s 5-10 μW budget for full autonomy isn’t just impressive—it’s the benchmark. Current neuromorphic chips are 1,000-30,000× worse per neuron. Closing this gap requires implementing biological principles: event-driven sparse computation, strong/weak connection asymmetry, recurrent processing over deep hierarchies, and distributed control.

Companies building autonomous systems without addressing energetics will hit a wall—the same wall that kept connectomics at 302 neurons for 38 years. Physics doesn’t care about better perception models if the robot runs out of power before completing useful work.

When robotics achieves even 1% of biological efficiency at scale, truly autonomous micro-robots become feasible. Until then, the scaling laws remain unforgiving. The connectome reveals both how far we are from biological efficiency—and the specific architectural principles required to close that gap.

The message for robotics engineers: efficiency isn’t a feature, it’s the product.


Explore the full FlyWire visualization at https://flywire.ai/

By 3 Comments

Autonomy

I lead autonomy at Boeing. What exactly do I do?

We engineers have kidnapped a word that doesn’t belong to us. Autonomy is not a tech word, it’s the ability to act independently. It’s freedom that we design in and give to machines.

It’s also a bit more. Autonomy is the ability to make decisions and act independently based on goals, knowledge, and understanding of the environment. It’s an exploding technical area with new discoveries daily and maybe one of the most exciting tech explosions in human history.

We can fall into a trap that autonomy is code — a set of instructions governing a system. Code is just language, a set of signals, it’s not a capability. We remember Descartes for his radical skepticism or for giving us the X and Y axes, but he is the first person who really get credit for the concept of autonomy with his “thinking self” or the “cogito.” Descartes argued that the ability to think and reason independently was the foundation of autonomy.

But I work on giving life and freedom to machines, what does that look like? Goethe gives us a good mental picture in his Der Zauberlehrling (later adapted in Disney’s “Fantasia”) when the sorcerer’s apprentice attempts to use magic to bring a broom to life to do his chores only to lose his own autonomy as chaos ensues.

Giving our human-like freedom to machines is dangerous and every autonomy story gets at this emergent danger. This is why autonomy and ethics are inextricably linked and “containment” (keeping AI from taking over) or “alignment” (making AI share our values) are the most important (and challenging) technical problems today.

A lessor known story gets at the promise, power and peril of autonomy. The Golem of Prague emerged from Jewish folklore in the 16th century. From centuries of pogroms, the persecuted Jews of Eastern Europe found comfort in the story of a powerful creature with supernatural strength who patrolled the streets of the Jewish ghetto in Prague, protecting the community from attacks and harassment.

The golem was created by a rabbi named Mahara using clay from the banks of the Vltava River. He brought the golem to life by placing a shem (a paper with a divine name) into its mouth or by inscribing the word “emet” (truth) on its forehead. One famous story involves the golem preventing a mob from attacking the Jewish ghetto after a priest had accused the Jews of murdering a Christian child to use their blood for Passover rituals. The golem found the real culprit and brought them to justice, exonerating the Jewish community.

However, as the legend goes, the golem grew increasingly unstable and difficult to control. Fearing that the golem might cause unintended harm, the Maharal was forced to deactivate it by removing the shem from its mouth or erasing the first letter of “emet” (which changes the word to “met,” meaning death) from its forehead. The deactivated golem was then stored in the attic of the Old New Synagogue in Prague, where some say it remains to this day.

The Golem of Prague

Power, protection of the weak, emergent properties, containment. The whole autonomy ecosystem in one story. From Terminator to Her, why does every autonomy story go bad in some way? It’s fundamentally because giving human agency to machines is playing God. My favorite modern philosopher, Alvin Plantinga describes the qualifications we can accept as a creator: “a being that is all-powerful, all-knowing, and wholly good.” We share none of those properties, do we really have any business playing with stuff this powerful?

The Technology of Autonomy

We don’t have a choice, the world is going here and there is much good work to be done. Engineers today have the honor to be modern days Maharal’s — building safer and more efficient systems with the next generation of autonomy. But what specifically are we building and how do we build it so it’s well understood, safe and contained?

A good autonomous system requires software (intelligence), a system of trust and human interface/control. At its core, autonomy is systems engineering. It is the ability to take dynamic and advanced technologies and make them control a system in effective and predictable ways. The heart of this capability is software. To delegate control to a system it needs software to control perception, decision-making capability, action and communication. Let’s break these down.

  1. Perception: An autonomous system must be able to perceive and interpret its environment accurately. This involves sensors, computer vision, and other techniques to gather and process data about the surrounding world.
  2. Decision-making: Autonomy requires the ability to make decisions based on the information gathered through perception. This involves algorithms for planning, reasoning, and optimization, as well as machine learning techniques to adapt to new situations.
  3. Action: An autonomous system must be capable of executing actions based on its decisions. This involves actuators, controllers, and other mechanisms to interact with the physical world.
  4. Communication: Autonomous systems need to communicate and coordinate with other entities, whether they be humans or other autonomous systems. This requires protocols and interfaces for exchanging information and coordinating actions.

Building autonomous systems requires a diverse set of skills, including ethics, robotics, artificial intelligence, distributed systems, formal analysis, and human-robot interaction. Autonomy experts have a strong background in robotics, combining perception, decision-making, and action in physical systems, and understanding the principles of kinematics, dynamics, and control theory. They are proficient in AI techniques such as machine learning, computer vision, and natural language processing, which are essential for creating autonomous systems that can perceive, reason, and adapt to their environment. As autonomous systems become more complex and interconnected, expertise in distributed systems becomes increasingly important for designing and implementing systems that can coordinate and collaborate with each other. Additionally, autonomy experts understand the principles of human-robot interaction and can design interfaces and protocols that facilitate seamless communication between humans and machines.

As technology advances, the field of autonomy is evolving rapidly. One of the most exciting developments is the emergence of collaborative systems of systems – large groups of autonomous agents that can work together to achieve common goals. These swarms can be composed of robots, drones, or even software agents, and they have the potential to revolutionize fields such as transportation, manufacturing, and environmental monitoring.

How would a boxer box if they could instantly decompose into a million pieces and re-emerge as any shape? Differently.

What is driving all this?

Two significant trends are rapidly transforming the landscape of autonomy: the standardization of components and significant advancements in artificial intelligence (AI). Components like VOXL and Pixhawk are pioneering this shift by providing open-source platforms that significantly reduce the time and complexity involved in building and testing autonomous systems. VOXL, for example, is a powerful, SWAP-optimized computing platform that brings together machine vision, deep learning processing, and connectivity options like 5G and LTE, tailored for drone and robotic applications. Similarly, Pixhawk stands as a cornerstone in the drone industry, serving as a universal hardware autopilot standard that integrates seamlessly with various open-source software, fostering innovation and accessibility across the drone ecosystem. All this means you don’t have to be Boeing to start building autonomous systems.

Standard VOXL board

These hardware advancements are complemented by cheap sensors, AI-specific chips, and other innovations, making sophisticated technologies broadly affordable and accessible. The common standards established by these components have not only simplified development processes but also ensured compatibility and interoperability across different systems. All the ingredients for a Cambrian explosion in autonomy.

The latest from NVIDIA and Google

These companies are building a bridge from software to real systems.

The latest advancements from NVIDIA’s GTC and Google’s work in robotics libraries highlight a pivotal moment where the realms of code and physical systems, particularly in digital manufacturing technologies, are increasingly converging. NVIDIA’s latest conference signals a transformative moment in the field of AI with some awesome new technologies:

  • Blackwell GPUs: NVIDIA introduced the Blackwell platform, which boasts a new level of computing efficiency and performance for AI, enabling real-time generative AI with trillion-parameter models. This advancement promises substantial cost and energy savings.
  • NVIDIA Inference Microservices (NIMs): NVIDIA is making strides in AI deployment with NIMs, a cloud-native suite designed for fast, efficient, and scalable development and deployment of AI applications.
  • Project GR00T: With humanoid robotics taking center stage, Project GR00T underlines NVIDIA’s investment in robotics learning and adaptability. These advancements imply that robots will be integral to motion and tasks in the future.

The overarching theme from NVIDIA’s GTC was a strong commitment to AI and robotics, driving not just computing but a broad array of applications in industry and everyday life. These developments hold potential for vastly improved efficiencies and capabilities in autonomy, heralding a new era where AI and robotics could become as commonplace and influential as computers are today.

Google is doing super empowering stuff too. Google DeepMind, in collaboration with partners from 33 academic labs, has made a groundbreaking advancement in the field of robotics with the introduction of the Open X-Embodiment dataset and the RT-X model. This initiative aims to transform robots from being specialists in specific tasks to generalists capable of learning and performing across a variety of tasks, robots, and environments. By pooling data from 22 different robot types, the Open X-Embodiment dataset has emerged as the most comprehensive robotics dataset of its kind, showcasing more than 500 skills across 150,000 tasks in over 1 million episodes.

The RT-X model, specifically RT-1-X and RT-2-X, demonstrates significant improvements in performance by utilizing this diverse, cross-embodiment data. These models not only outperform those trained on individual embodiments but also showcase enhanced generalization abilities and new capabilities. For example, RT-1-X showed a 50% success rate improvement across five different robots in various research labs compared to models developed for each robot independently. Furthermore, RT-2-X has demonstrated emergent skills, performing tasks involving objects and skills not present in its original dataset but found in datasets for other robots. This suggests that co-training with data from other robots equips RT-2-X with additional skills, enabling it to perform novel tasks and understand spatial relationships between objects more effectively.

These developments signify a major step forward in robotics research, highlighting the potential for more versatile and capable robots. By making the Open X-Embodiment dataset and the RT-1-X model checkpoint available to the broader research community, Google DeepMind and its partners are fostering open and responsible advancements in the field. This collaborative effort underscores the importance of pooling resources and knowledge to accelerate the progress of robotics research, paving the way for robots that can learn from each other and, ultimately, benefit society as a whole.

More components, readily available to more people will create a cycle with more cyber-physical systems with increasingly sophisticated and human-like capabilities.

Parallel to these hardware advancements, AI is experiencing an unprecedented boom. Investments in AI are yielding substantial results, driving forward capabilities in machine learning, computer vision, and autonomous decision-making at an extraordinary pace. This synergy between accessible, standardized components and the explosive growth in AI capabilities is setting the stage for a new era of autonomy, where sophisticated autonomous systems can be developed more rapidly and cost-effectively than ever before.

AI is exploding and democratizing simultaneously

Autonomy and Combat

What does all of this mean for modern warfare? Everyone has access to this tech and innovation is rapidly bringing these technologies into combat. We are right in the middle of a new powerful technology that will shape the future of war. Buckle up.

Let’s look at this in the context of Ukraine. The Ukraine-Russia war has seen unprecedented use of increasingly autonomous drones for surveillance, target acquisition, and direct attacks, altering traditional warfare dynamics significantly. The readily available components combined with rapid iteration cycles have democratized aerial warfare, allowing Ukraine to conduct operations that were previously the domain of nations with more substantial air forces and level the playing field against a more conventionally powerful adversary. These technologies are both accessible and affordable. While drones contribute to risk-taking by allowing for expendability, they don’t necessarily have to be survivable if they are numerous and inexpensive.

The future of warfare will require machine intelligence, mass and rapid iterations

The conflict has also underscored the importance of counter-drone technologies and tactics. Both sides have had to adapt to the evolving drone threat by developing means to detect, jam, or otherwise neutralize opposing drones. Moreover, drones have expanded the information environment, allowing unprecedented levels of surveillance and data collection which have galvanized global support for the conflict and provided options to create propaganda, to boost morale, and to document potential war crimes.

The effects are real. More than 200 companies manufacture drones within Ukraine and some estimates show that 30% of the Russian Black Sea fleet has been destroyed by uncrewed systems. Larger military drones like the Bayraktar TB2 and Russian Orion have seen decreased use as they became easier targets for anti-air systems. Ukrainian forces have adapted with smaller drones, which have proved effective at a tactical level, providing real-time intelligence and precision strike capabilities. Ukraine has the capacity to produce 150,000 drones every month, and may be able to produce two million drones by the end of the year and they have struck over 20,000 Russian targets.

As the war continues, innovations in drone technology persist, reflecting the growing importance of drones in modern warfare. The conflict has shown that while drones alone won’t decide the outcome of the war, they will undeniably influence future conflicts and continue to shape military doctrine and strategy.

Autonomy is an exciting and impactful field and the story is just getting started. Stay tuned.

By 0 Comments