Skip to main content
Posts tagged:

AI

robots

Robots and Chatbots

Before ChatGPT, human looking robotics defined AI in the public imagination. That might be true again in the near future. With AI models online, it’s awesome to have AI automate our writing and art, but we still have to wash the dishes and chop the firewood.

That may change soon. AI is finding bodies fast as AI and Autonomy merge. Autonomy (the field I lead at Boeing) is made of three parts: code, trust and the ability to interact with humans.

Let’s start with code. Code is getting easier to write and new tools are accelerating development across the board. So you can crank out python scripts, tests and web-apps fast, but the really exciting superpowers are those that empower you create AI software. Unsupervised learning allows for code to be grown not written by just exposing sensors to the real world and letting the model weights adapt into a high performance system.

Recent history is well known. Frank Rosenblatt’s work on perceptrons in the 1950s set the stage. In the 1980s, Geoffrey Hinton and David Rumelhart’s popularization of backpropagation made training deep networks feasible.

The real game-changer came with the rise of powerful GPUs, thanks to companies like NVIDIA, which allowed for processing large-scale neural networks. The explosion of digital data provided the fuel for these networks, and deep learning frameworks like TensorFlow and PyTorch made advanced models more accessible.

In the early 2010s, Hinton’s work on deep belief networks and the success of AlexNet in the 2012 ImageNet competition demonstrated the potential of deep learning. This was followed by the introduction of transformers in 2017 by Vaswani and others, which revolutionized natural language processing with the attention mechanism.

Transformers allow models to focus on relevant parts of the input sequence dynamically and process the data in parallel. This mechanism helps models understand the context and relationships within data more effectively, leading to better performance in tasks such as translation, summarization, and text generation. This breakthrough has enabled the creation of powerful language models, transforming language applications and giving us magical software like BERT and GPT.

The impact of all this is that you can build a humanoid robot by just moving it’s arms and legs in diverse enough ways to grow the AI inside. (This is called sensor to servo machine learning.)

This all gets very interesting with the arrival of multimodal models that combine language, vision, and sensor data. Vision-Language-Action Models (VLAMs) enable robots to interpret their environment and predict actions based on combined sensory inputs. This holistic approach reduces errors and enhances the robot’s ability to act in the physical world. The ability to combine vision and language processing with robotic control enables interpretation of complex instructions to perform actions in the physical world.

PaLM-E from Google Research provides an embodied multimodal language model that integrates sensor data from robots with language and vision inputs. This model is designed to handle a variety of tasks involving robotics, vision, and language by transforming sensor data into a format compatible with the language model. PaLM-E can generate plans and decisions directly from these multimodal inputs, enabling robots to perform complex tasks efficiently. The model’s ability to transfer knowledge from large-scale language and vision datasets to robotic systems significantly enhances its generalization capabilities and task performance.

So code is getting awesome, let’s talk about trust since explainability is also exploding. When all models, including embodied AI in robots, can explain their actions they are easier to program, debug but most importantly trust. There has been some great work in this area. I’ve used interpretable models, attention mechanisms, saliency maps, and post-hoc explanation techniques like LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations). I got to be on the ground floor of DARPA’s Explainable Artificial Intelligence (XAI) program, but Anthropic really surprised me last week with their paper “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

They identified specific combinations of neurons within the AI model Claude 3 Sonnet that activate in response to particular concepts or features. For instance, when Claude encounters text or images related to the Golden Gate Bridge, a specific set of neurons becomes active. This discovery is pivotal because it allows researchers to precisely tune these features, increasing or decreasing their activation and observing corresponding changes in the model’s behavior.

When the activation of the “Golden Gate Bridge” feature is increased, Claude’s responses heavily incorporate mentions of the bridge, regardless of the query’s relevance. This demonstrates the ability to control and predict the behavior of the AI based on feature manipulation. For example, queries about spending money or writing stories all get steered towards the Golden Gate Bridge, illustrating how tuning specific features can drastically alter output.

So this is all fun, but these techniques have significant implications for AI safety and reliability. By understanding and controlling feature activations, researchers can manage safety-related features such as those linked to dangerous behaviors, criminal activity, or deception. This control could help mitigate risks associated with AI and ensure models behave more predictably and safely. This is a critical capability to enable AI in physical systems. Read the paper, it’s incredible.

OpenAI is doing stuff too. In 2019, they introduced activation atlases, which build on the concept of feature visualization. This technique allows researchers to map out how different neurons in a neural network activate in response to specific concepts. For instance, they can visualize how a network distinguishes between frying pans and woks, revealing that the presence of certain foods, like noodles, can influence the model’s classification. This helps identify and correct spurious correlations that could lead to errors or biases in AI behavior.

The final accelerator is the ability to learn quickly through imitation and generalize skills across different tasks. This is critical because the core skill needed to interact with the real world is flexibility and adaptability. You can’t expose a model in training to all possible scenarios you will find in the real world. Models like RT-2 leverage internet-scale data to perform tasks they were not explicitly trained for, showing impressive generalization and emergent capabilities.

RT-2 is an RT-X model, part of the Open X-Embodiment project, which combines data from multiple robotic platforms to train generalizable robot policies. By leveraging a diverse dataset of robotic experiences, RT-X demonstrates positive transfer, improving the capabilities of multiple robots through shared learning experiences. This approach allows RT-X to generalize skills across different embodiments and tasks, making it highly adaptable to various real-world scenarios.

I’m watching all this very closely and it’s super cool as AI escapes the browser and really starts improving our physical world there are all kinds of cool lifestyle and economic benefits around the corner. Of course there are lots of risks too. I’m proud to be working in a company and in an industry obsessed with ethics and safety. All considered, I’m extremely optimistic, if not more than a little tired trying to track the various actors on a stage that keeps changing with no one sure of what the next act will be.

By 0 Comments

Power and AI

The Growing Power Needs for Large Language Models

In 2024, AI is awesome, empowering and available to everyone. Unfortunately, while AI is free to consumers, these models are expensive to train and operate at scale. Training them is expected to be the most expensive thing ever. Yes, more than the Manhattan project, pyramids and the entire GDP of the world economy. No wonder companies with free compute are dominating this space.

By 2025, the cost to train an LLM surpasses the Apollo Project, a historical benchmark for significant expenditure. This projection emphasizes the increasing financial burden and resource demand associated with advancing AI capabilities, underscoring the need for more efficient and sustainable approaches in AI research and development. The data points to a future where the financial and energy requirements for AI could become unsustainable without significant technological breakthroughs or shifts in strategy.

Why?

Because of how deep learning works and how it’s trained. The first era, marked by steady progress, follows a trend aligned with Moore’s Law, where computing power doubled approximately every two years. Notable milestones during this period include the development of early AI models like the Perceptron and later advancements such as NETtalk and TD-Gammon.

The second era, beginning around 2012 with the advent of deep learning, demonstrates a dramatic increase in compute usage, following a much steeper trajectory where computational power doubles approximately every 3.4 months. This surge is driven by the development of more complex models like AlexNet, ResNets, and AlphaGoZero. Key factors behind this acceleration include the availability of massive datasets, advancements in GPU and specialized hardware, and significant investments in AI research. As AI models have become more sophisticated, the demand for computational resources has skyrocketed, leading to innovations and increased emphasis on sustainable and efficient energy sources to support this growth.

Training LLMs involves massive computational resources. For instance, models like GPT-3, with 175 billion parameters, require extensive parallel processing using GPUs. Training such a model on a single Nvidia V100 GPU would take an estimated 288 years, emphasizing the need for large-scale distributed computing setups to make the process feasible in a reasonable timeframe. This leads to higher costs, both financially and in terms of energy consumption.

Recent studies have highlighted the dramatic increase in computational power needed for AI training, which is rising at an unprecedented rate. Over the past seven years, compute usage has increased by 300,000-fold, underscoring the escalating costs associated with these advancements. This increase not only affects financial expenditures but also contributes to higher carbon emissions, posing environmental concerns.

Infrastructure and Efficiency Improvements

To address these challenges, companies like Cerebras and Cirrascale are developing specialized infrastructure solutions. For example, Cerebras’ AI Model Studio offers a rental model that leverages clusters of CS-2 nodes, providing a scalable and cost-effective alternative to traditional cloud-based solutions. This approach aims to deliver predictable pricing and reduce the costs associated with training large models.

Moreover, researchers are exploring various optimization techniques to improve the efficiency of LLMs. These include model approximation, compression strategies, and innovations in hardware architecture. For instance, advancements in GPU interconnects and supercomputing technologies are critical to overcoming bottlenecks related to data transfer speeds between servers, which remain a significant challenge.

Implications for Commodities and Nuclear Power

The increasing power needs for AI training have broader implications for commodities, particularly in the energy sector. As AI models grow, the demand for electricity to power the required computational infrastructure will likely rise. This could drive up the prices of energy commodities, especially in regions where data centers are concentrated. Additionally, the need for advanced hardware, such as GPUs and specialized processors, will impact the supply chains and pricing of these components.

To address the substantial energy needs of AI, particularly in powering the growing number of data centers, various approaches are being considered. One notable strategy involves leveraging nuclear power. This approach is championed by tech leaders like OpenAI CEO Sam Altman, who views AI and affordable, green energy as intertwined essentials for a future of abundance. Nuclear startups, such as Oklo, which Altman supports, are working on advanced nuclear reactors designed to be safer, more efficient, and smaller than traditional plants. Oklo’s projects include a 15-megawatt fission reactor and a grant-supported initiative to recycle nuclear waste into new fuel.

However, integrating nuclear energy into the tech sector faces significant regulatory challenges. The Nuclear Regulatory Commission (NRC) denied Oklo’s application for its Idaho plant design due to insufficient safety information, and the Air Force rescinded a contract for a microreactor pilot program in Alaska. These hurdles highlight the tension between the rapid development pace of AI technologies and the methodical, decades-long process traditionally required for nuclear energy projects .

The demand for sustainable energy solutions is underscored by the rising energy consumption of AI servers, which could soon exceed the annual energy use of some small nations. Major tech firms like Microsoft, Google, and Amazon are investing heavily in nuclear energy to secure stable, clean power for their operations. Microsoft has agreements to buy nuclear-generated electricity for its data centers, while Google and Amazon have invested in fusion startups .

By 0 Comments