Robot Operating Systems & Software Stacks: The Invisible Infrastructure Powering Humanoid Robots

Robot Operating Systems

A humanoid robot is only as capable as the software that runs it. Beneath every fluid walking gait, every dexterous grasp, and every natural-language interaction lies a complex stack of software — from low-level motor controllers firing thousands of times per second to high-level AI models reasoning about what to do next. Understanding this software landscape is essential to understanding where humanoid robotics is heading, because the hardware is increasingly commoditised. The real battleground is code.

This article maps the key operating systems, middleware frameworks, and proprietary platforms that form the digital nervous system of modern humanoid robots — and explores the emerging AI-native approaches that may reshape the entire stack.

What Is a Robot Operating System?

Despite the name, a robot operating system is not an operating system in the way Windows or Linux is. Most robots actually run on top of a conventional OS — typically Ubuntu Linux. A robot operating system is better understood as a middleware framework: a structured set of libraries, communication tools, drivers, and conventions that sit between the underlying OS and the application code that makes a robot do useful things.

This middleware handles the messy plumbing of robotics: shuttling sensor data between processes, coordinating dozens of software modules running simultaneously, abstracting away hardware differences, and providing standard interfaces for common tasks like navigation, perception, and motion planning. Without it, every robotics team would spend most of its time reinventing basic infrastructure instead of working on the problems that matter.

ROS 2: The Industry Standard

The Robot Operating System — universally known as ROS — is the most widely adopted robotics middleware in the world. Originally developed at Willow Garage and now maintained by the Open Source Robotics Foundation (OSRF), ROS has become the de facto standard for robotics research and an increasingly important part of commercial deployment.

The original ROS (now referred to as ROS 1) served the community well for over a decade, but it carried significant limitations. It relied on a centralised master node that represented a single point of failure. It lacked native support for real-time systems. And it struggled with multi-robot deployments and unreliable networks — precisely the scenarios that modern humanoid robotics demands.

ROS 2 was built from the ground up to address these weaknesses. It moved to a decentralised, peer-to-peer architecture and adopted the Data Distribution Service (DDS) standard as its default communication middleware. This brought distributed discovery, configurable Quality of Service (QoS) policies, and better support for real-time and safety-critical applications. ROS 2 also introduced managed lifecycle nodes, giving developers more control over how software components start, run, and shut down — critical for robots operating in unpredictable environments.

ROS 1's final distribution, Noetic, reached end of life in May 2025. The transition to ROS 2 is now complete in principle, though in practice many legacy packages and workflows are still being migrated. The latest ROS 2 release, Kilted Kaiju (May 2025), introduced first-class support for Eclipse Zenoh as a Tier 1 middleware — a significant development we will return to shortly. The next release, Lyrical Luth, is expected in May 2026.

What ROS 2 Actually Provides

ROS 2 is not a single piece of software but a sprawling ecosystem. At its core, it provides a publish-subscribe communication model, where software nodes exchange messages over named topics. A camera driver node might publish image data; a perception node subscribes to that data, processes it, and publishes detected objects; a planning node subscribes to those detections and decides what to do. This modular, message-passing architecture allows complex robot behaviours to be composed from relatively simple, interchangeable components.

Beyond messaging, the ROS 2 ecosystem includes:

Navigation2 (Nav2) — a complete navigation stack for autonomous mobile robots, handling path planning, obstacle avoidance, and localisation
MoveIt 2 — a motion planning framework for robotic arms and manipulators, widely used for pick-and-place tasks and dexterous manipulation
Gazebo — a physics-based simulation environment where robots can be tested in virtual worlds before touching real hardware
tf2 — a transform library that tracks the spatial relationships between every part of the robot and its environment, essential for coordinating sensors and actuators
ros2_control — a hardware abstraction layer for motors and actuators, providing a standard interface regardless of the underlying hardware

For humanoid robots specifically, ROS 2 provides the communication backbone and many of the building blocks. However, the unique challenges of bipedal locomotion, whole-body coordination, and dexterous manipulation mean that humanoid developers typically build heavily customised stacks on top of ROS 2 rather than relying on its off-the-shelf packages.

The Middleware Layer: DDS, Zenoh, and Beyond

One of the most significant architectural decisions in ROS 2 was the adoption of DDS as its communication middleware. DDS — the Data Distribution Service, an OMG standard — provides the actual data transport layer beneath ROS 2's topic-based messaging. Several DDS implementations are supported, including eProsima's Fast DDS (the default), Eclipse Cyclone DDS, and RTI Connext.

DDS brought real benefits: distributed discovery without a central server, configurable reliability and latency guarantees, and an industry-standard wire protocol. But it also brought frustrations. DDS discovery generates significant network traffic that scales poorly with the number of nodes, creating what the community calls "packet storms" when new participants enter large networks. Over Wi-Fi — a near-universal requirement for mobile and humanoid robots — these problems become severe.

Enter Eclipse Zenoh. Selected by the ROS community as the official alternative middleware after a systematic evaluation by Intrinsic and Open Robotics, Zenoh is a lightweight publish-subscribe-query protocol designed from the ground up for efficiency over constrained and wireless networks. Users consistently report discovery traffic reductions of 97–99% compared to DDS. Zenoh's minimal wire overhead, flexible routing, and native support for internet-scale communication make it particularly well suited to humanoid robots that need to communicate reliably over Wi-Fi, with cloud services, and with each other.

With Kilted Kaiju's promotion of Zenoh to Tier 1 status, developers now have a genuine choice of communication middleware — and many in the humanoid robotics community see Zenoh as the future default.

NVIDIA Isaac: The GPU-Accelerated Platform

If ROS 2 is the community-driven open standard, NVIDIA's Isaac platform represents the GPU-accelerated commercial counterpart — and its influence on humanoid robotics is enormous.

NVIDIA Isaac is a comprehensive robotics development platform spanning simulation, training, and deployment. Its key components include:

Isaac Sim — a high-fidelity simulation environment built on NVIDIA's Omniverse platform, capable of generating photorealistic synthetic environments for robot training
Isaac Lab — a GPU-accelerated framework for running thousands of parallel simulations for reinforcement learning, dramatically reducing the time needed to train robot policies
Isaac GR00T — a research initiative and development platform specifically for humanoid robot foundation models
cuMotion — a CUDA-accelerated motion planning library that runs multiple trajectory optimisations simultaneously

The centrepiece for humanoid robotics is Isaac GR00T N1, announced at GTC in March 2025 as the world's first open foundation model for generalised humanoid robot reasoning and skills. GR00T N1 is a Vision-Language-Action (VLA) model with a dual-system architecture — a "slow-thinking" vision-language module that interprets the environment and instructions, coupled with a "fast-thinking" diffusion transformer that generates fluid motor actions in real time.

GR00T N1 is cross-embodiment, meaning it can be adapted to different humanoid platforms through post-training rather than being built for a single robot. It has been trained on a mixture of real robot trajectories, human video data, and massive synthetic datasets generated through the Isaac GR00T Blueprint. NVIDIA demonstrated that it could generate 780,000 synthetic training trajectories — equivalent to over nine months of continuous human demonstration — in just 11 hours, improving model performance by 40% when combined with real data.

Major humanoid developers with early access to GR00T N1 include Agility Robotics, Boston Dynamics, 1X Technologies, Fourier Intelligence, Mentee Robotics, and NEURA Robotics. The model is open-weight with a permissive licence, available through GitHub and Hugging Face.

NVIDIA also announced Newton, an open-source physics engine developed in collaboration with Google DeepMind and Disney Research, optimised for robot learning and compatible with both MuJoCo and Isaac Lab. Combined with the MuJoCo-Warp collaboration — which reportedly accelerates robotics machine learning workloads by more than 70x — this represents a significant push to make simulation-based training faster and more accessible.

For a deeper look at how robots train in simulation before operating in the real world, see our article on Sim-to-Real Transfer.

Proprietary Stacks: The In-House Approach

While open platforms like ROS 2 and Isaac provide shared foundations, the leading humanoid robot companies are increasingly building deeply proprietary software stacks — particularly at the AI and control layers where competitive differentiation is greatest.

Figure AI: Helix

Figure AI's Helix is one of the most ambitious proprietary stacks in the field. Developed entirely in-house after Figure ended its collaboration with OpenAI, Helix is a Vision-Language-Action model that unifies perception, language understanding, and motor control in a single neural network.

Helix uses a three-tier architecture. System 2 is a slower vision-language model that handles scene understanding, language comprehension, and high-level task planning, operating at around 7–9 Hz. System 1 is a fast visuomotor policy that converts System 2's semantic understanding into real-time robot actions at 200 Hz. The latest version, Helix 02, adds System 0 for learned whole-body locomotion control, enabling the robot to walk, manipulate, and balance as one continuous system — what the field calls loco-manipulation.

Helix runs entirely on embedded GPUs aboard Figure's robots, with no reliance on cloud computing. This on-device approach eliminates latency concerns and enables deployment in environments without reliable connectivity. In demonstrations, Helix 02 has completed continuous four-minute autonomous tasks — unloading a dishwasher, navigating across a room, stacking items in cabinets — with no resets and no human intervention.

Tesla: The FSD-to-Optimus Pipeline

Tesla's approach to Optimus is distinctive in that it leverages the same AI infrastructure built for its Full Self-Driving (FSD) vehicle autonomy programme. The core bet is that the intelligence required to drive a car through complex environments is substantially similar to the intelligence required for a humanoid robot to operate in homes and factories.

Tesla's software stack for Optimus uses vision-only perception (no LiDAR), end-to-end neural networks, and imitation learning from human demonstrations captured via camera. The company's fleet of millions of vehicles provides a continuous stream of real-world training data, and its massive Dojo and GPU training clusters provide the compute. Tesla has also recently announced "Digital Optimus" — a software AI agent developed jointly with xAI that mirrors the physical robot's capabilities in virtual environments.

Whether Tesla's automotive AI stack can genuinely transfer to the very different demands of humanoid manipulation and bipedal locomotion remains one of the field's most closely watched questions.

Other Proprietary Approaches

Most leading humanoid companies maintain proprietary control and AI stacks, even when they use open-source components elsewhere:

Agility Robotics uses a proprietary locomotion and planning stack for its Digit platform, purpose-built for warehouse logistics environments
UBTECH Robotics has developed its "Cloud AI" multimodal interaction system and proprietary motion control, backed by over 2,100 patents
AgiBot (Zhiyuan) takes a comprehensive full-stack approach, building all critical components — software, hardware, AI, motion control, and cloud systems — entirely in-house
NEURA Robotics has built its Neuraverse platform as a combined development environment, community hub, and ecosystem for its cognitive robots

For detailed profiles of these companies and their platforms, see our Company Profiles section.

The New Contenders: AI-Native Operating Systems

A new wave of software platforms is emerging that takes a fundamentally different approach to the robot software stack. Rather than starting from traditional robotics middleware and adding AI capabilities on top, these platforms are built from the ground up around large language models, vision models, and learned behaviours.

The most prominent example is OM1 from OpenMind, which launched in beta in September 2025 and bills itself as the world's first open-source operating system for intelligent robots. OM1 is hardware-agnostic, running across humanoids, quadrupeds, wheeled robots, and drones. It offers plug-and-play integration with major AI models — including those from OpenAI, Google, DeepSeek, and xAI — and provides preconfigured autonomous agents for popular platforms like the Unitree G1 humanoid.

OpenMind positions OM1 as the "Android moment" for robotics: a shared, open software layer that lets hardware manufacturers focus on building robots while the intelligence comes from a common platform. The company's FABRIC protocol adds a decentralised coordination layer, enabling robots from different manufacturers to verify identity, share context, and collaborate securely.

OM1 interfaces with existing infrastructure — it can communicate via ROS 2, CycloneDDS, Zenoh, or websockets — but its fundamental orientation is different. Where ROS 2 is a communication and tooling framework into which AI models can be integrated, OM1 is an AI runtime that treats robotics hardware as a peripheral. The long-term question is whether this AI-first architecture will complement or eventually challenge the traditional ROS-centred approach.

Similarly, several Chinese humanoid developers — notably AgiBot with its open-source Rhinoceros X1 architecture and RoboParty with its fully open-sourced Roboto Origin — are releasing complete software stacks alongside hardware, aiming to build developer ecosystems around their platforms.

How the Stack Fits Together

A modern humanoid robot's software stack can be understood as a series of layers, each building on the one below:

Hardware Abstraction Layer (HAL) — low-level drivers and interfaces for motors, sensors, and communication buses, often running on embedded real-time processors
Middleware — the communication backbone (ROS 2, DDS, Zenoh) that connects all software components and shuttles data between them
Perception — software that processes camera, LiDAR, IMU, force/torque, and tactile sensor data to build an understanding of the robot's body and environment
Planning and Control — motion planners, locomotion controllers, whole-body controllers, and grasp planners that determine how the robot should move
AI and Reasoning — foundation models (VLAs, VLMs, LLMs) that provide high-level task understanding, natural-language interaction, and generalised decision-making
Fleet and Cloud — telemetry, remote monitoring, over-the-air updates, and shared learning across multiple deployed robots

The boundaries between these layers are blurring. End-to-end neural networks like Helix and GR00T N1 are collapsing perception, planning, and control into single models that go directly from sensor input to motor output. Whether this end-to-end approach will fully replace the modular stack or coexist with it — handling some tasks while traditional controllers handle others — is one of the defining technical questions in the field today.

For more on the AI models driving this convergence, see our articles on Foundation Models for Robotics and Reinforcement Learning in the Physical World.

Open Source vs. Closed: The Strategic Tension

The humanoid robotics software landscape is defined by a fundamental tension between openness and proprietary control.

On one side, ROS 2, Zenoh, OM1, GR00T N1, MuJoCo, and a growing number of open-source platforms lower barriers to entry, enable interoperability, and accelerate the pace of innovation across the entire industry. Open-source projects like X-Humanoid's Tiangong and RoboParty's Roboto Origin are pushing this philosophy to its logical extreme — releasing complete hardware designs, software stacks, and engineering documentation to anyone who wants them.

On the other side, companies like Figure AI, Tesla, and Agility Robotics are investing heavily in proprietary AI stacks where they believe the decisive competitive advantage lies. The software that makes a robot genuinely useful in a real-world setting — the trained models, the control policies, the fleet learning systems — is increasingly seen as the primary source of value, not the hardware itself.

This mirrors the smartphone industry's evolution: Android (open) and iOS (closed) coexist, serving different market segments. In humanoid robotics, we may see a similar split — open platforms powering a broad ecosystem of research, education, and mid-market robots, while the highest-capability commercial systems run largely proprietary code.

What to Watch

The robot software stack is evolving faster than any other aspect of humanoid robotics. Several developments are worth watching closely:

The Zenoh transition — as Zenoh becomes the preferred middleware for ROS 2, expect improvements in multi-robot coordination, cloud connectivity, and wireless performance that directly benefit humanoid deployments
End-to-end models vs. modular stacks — the field is rapidly learning which tasks are best served by monolithic neural networks and which still need hand-engineered controllers; the answer will shape software architecture for years
Sim-to-real infrastructure — the combination of NVIDIA Isaac, Newton, and MuJoCo-Warp is making simulation-based training dramatically faster and cheaper, which could accelerate the pace at which humanoids learn new skills
The "Android for robots" race — OM1, AgiBot's open-source ecosystem, and Hugging Face's low-cost hardware initiatives are all competing to become the universal platform layer for humanoid robotics
Edge AI compute — as foundation models grow more capable but also more demanding, the hardware available for on-device inference (NVIDIA Jetson, custom ASICs, neuromorphic chips) will be a key constraint on what robots can do without cloud connectivity

The software that runs a humanoid robot is no longer a collection of hand-tuned controllers and scripted behaviours. It is rapidly becoming a living, learning system — trained in simulation, refined on real-world data, and updated over the air. The companies and communities that build the best software infrastructure will ultimately determine which humanoid robots succeed in the real world and which remain impressive demonstrations.

For a broader view of how artificial intelligence is transforming humanoid robotics, see our article on AI & The Robot Brain.

Last updated/reviewed: March 2026