Locomotion & Balance: How Humanoid Robots Learn to Walk

Walking on two legs is something most humans master by age two. For robots, it remains one of the hardest unsolved problems in engineering. This article explains why bipedal locomotion is so difficult, how engineers are tackling it, and where the technology stands today.

Why Walking Is So Hard for Robots

When you walk, you are — from a physics perspective — in a state of controlled falling. At almost every point during a stride, your centre of mass sits outside your base of support. You stay upright not because you are statically balanced, but because your brain, muscles, and sensory systems are constantly making tiny corrections at extraordinary speed. You do this without thinking. A robot has to be explicitly engineered to replicate every part of that process.

The challenge is compounded by several factors. A humanoid robot is essentially a tall, heavy inverted pendulum on small feet — an inherently unstable structure. Every step involves a complex sequence of coordinated joint movements across hips, knees, ankles, and the upper body. The robot must handle shifting loads, uneven surfaces, sudden disturbances like a push or a slope change, and transitions between standing, walking, turning, and stopping. All of this must happen in real time, with decisions made in milliseconds.

This is why, even after decades of research, locomotion remains the single greatest engineering bottleneck in humanoid robotics — and why so much current investment is focused on cracking it.

Static vs. Dynamic Walking

Early bipedal robots used what is called static walking. In this approach, the robot's centre of mass always stays directly above its feet. The robot essentially shuffles: it shifts its weight completely over one foot before lifting and placing the other. It's stable, but painfully slow and unnatural — more like a cautious toddler than an adult human.

Dynamic walking is what humans actually do. The body's centre of mass moves ahead of the supporting foot, creating a forward fall that is caught by the next step. This is far more energy-efficient and enables faster, more natural movement, but it requires the robot to maintain balance through continuous active control rather than geometric stability alone.

Almost all modern humanoid robots — from Boston Dynamics' Atlas to Unitree's H1 — use dynamic walking. The shift from static to dynamic locomotion was one of the defining breakthroughs in the field, first demonstrated convincingly by Honda's research programmes in the 1990s and early 2000s, which culminated in the ASIMO robot.

The Zero Moment Point (ZMP)

For decades, the dominant framework for planning and controlling bipedal walking has been the Zero Moment Point, or ZMP. The concept was introduced in 1968 by Serbian researcher Miomir Vukobratović and has been foundational to humanoid locomotion ever since.

The ZMP is the point on the ground where the combined forces of gravity and the robot's motion produce no net rotational torque in the horizontal plane. In practical terms: as long as the ZMP stays within the "support polygon" — the area defined by the robot's feet on the ground — the robot will not tip over.

ZMP-based control works by planning the robot's walking trajectory so that the ZMP never leaves the support polygon. Engineers use simplified mathematical models of the robot's dynamics — most commonly the Linear Inverted Pendulum Model (LIPM), which treats the robot as a point mass balanced atop a massless leg — to calculate viable walking patterns in real time.

Honda's ASIMO was one of the most successful demonstrations of ZMP-based control, achieving smooth walking, turning, and even stair climbing. The HRP series of robots developed in Japan's national research programmes also relied heavily on ZMP planning.

ZMP works well for predetermined movements on flat, predictable surfaces. Its limitations become apparent in the real world: it struggles with unexpected disturbances, uneven or deformable terrain, and the kind of rapid reactive movements that humans make instinctively. This has driven the search for more flexible control methods.

Whole-Body Control

Whole-body control (WBC) takes a broader view of the problem. Rather than planning locomotion purely through the legs and feet, WBC coordinates the entire robot — torso, arms, head, and legs — as a single integrated system. This reflects what humans actually do when walking: we swing our arms, shift our hips, and lean our torso to maintain balance, especially when carrying objects, navigating obstacles, or recovering from a stumble.

In a WBC framework, the controller solves an optimisation problem at each time step, balancing multiple objectives simultaneously: track the desired walking trajectory, maintain balance, avoid joint limits, respect torque constraints, and — if the robot is carrying or manipulating something — handle the task with its upper body at the same time.

Whole-body control is now standard in advanced humanoid platforms. It is essential for what the field calls loco-manipulation — performing physical tasks like carrying boxes, opening doors, or using tools while walking — which is ultimately what makes humanoid robots useful in practical settings.

Model Predictive Control (MPC)

Model Predictive Control is an approach where the robot continuously plans a short sequence of future actions, executes the first step of that plan, observes the result, and replans. It is like a chess player thinking several moves ahead, but recalculating after every move based on how the game actually unfolds.

MPC has proved particularly effective for locomotion because it can account for the robot's physical constraints — how fast its joints can move, how much torque its motors can produce, where its feet can safely land — while optimising for smooth, stable movement. It handles the transition between different gaits (standing to walking, walking to running) and can adapt quickly when conditions change.

KAIST's DRC-HUBO robot, which won the 2015 DARPA Robotics Challenge, used a form of MPC to dynamically switch between bipedal walking and wheeled locomotion. Boston Dynamics' Atlas uses advanced optimisation-based control that shares many principles with MPC, enabling the acrobatic feats — parkour, backflips, vaults — that have made the robot famous.

The Reinforcement Learning Revolution

The most significant shift in humanoid locomotion in recent years has been the rise of reinforcement learning (RL). Rather than engineers manually designing control policies and mathematical models, RL lets the robot discover its own walking strategies through trial and error — rewarding behaviours that work and penalising those that don't.

In practice, RL training happens overwhelmingly in simulation. Using physics engines like NVIDIA's Isaac Sim or DeepMind's MuJoCo, engineers can run millions of simulated walking experiments in parallel, compressing years of physical practice into hours or days of computation. The UK-based company Humanoid, for example, reported training over 52 million seconds of locomotion data in just two days using Isaac Sim — equivalent to roughly 19 months of real-time practice. Their robot achieved stable walking within 48 hours of final assembly.

The critical challenge is sim-to-real transfer: policies that work perfectly in simulation often fail on real hardware because the simulation doesn't perfectly capture the physics of the real world — friction, motor lag, sensor noise, the subtle flex of structural components. Techniques like domain randomisation (deliberately varying simulation parameters to force robust policies) and increasingly accurate simulation environments are closing this gap, but it remains an active area of research.

In early 2025, Boston Dynamics and the Robotics & AI Institute (RAI), led by Boston Dynamics founder Marc Raibert, formalised a partnership specifically focused on developing reinforcement learning pipelines for the electric Atlas robot. Their work aims at whole-body locomotion with "zero-shot" transfer — moving control policies directly from simulation to physical hardware without intermediate tuning. At CES 2026, Atlas demonstrated both natural-looking walking and gymnastic manoeuvres like cartwheels and backflips, all generated by the same underlying learning framework.

RL is increasingly being combined with other AI approaches. Imitation learning, where the robot learns from demonstrations of human movement captured via motion-capture suits, provides a starting point that RL can then refine and optimise. Boston Dynamics demonstrated this hybrid approach in March 2025, with Atlas learning locomotion policies seeded from human movement data.

Key Concepts in Robot Balance

Several technical concepts come up repeatedly when discussing how humanoid robots maintain balance:

Centre of Mass (CoM): The single point representing the average position of all the robot's mass. Keeping the CoM appropriately positioned relative to the support base is fundamental to balance.

Support Polygon: The area on the ground enclosed by the robot's contact points. When both feet are on the ground, this is the area between and including the feet. During single-leg support, it shrinks to the contact area of one foot — which is why mid-stride is the most unstable phase of walking.

Centre of Pressure (CoP): The point on the ground where the total ground reaction force effectively acts. In a stable gait, the CoP and the ZMP coincide. When they diverge, the robot is about to tip.

Angular Momentum: The robot's rotational energy. Controlling angular momentum — particularly through arm swings and torso rotation — is critical for balance recovery and for preventing the robot from spinning or toppling during dynamic movements.

Compliance: The ability of the robot's joints and structure to "give" slightly under force, rather than being perfectly rigid. Compliant joints absorb impact, smooth out walking, and make the robot safer around people. Series elastic actuators (SEAs) are one common way to introduce controlled compliance into a robot's legs.

The Sensors That Keep Robots Upright

Balance depends on information, and humanoid robots rely on a suite of sensors to know where they are and what's happening to them:

Inertial Measurement Units (IMUs) — typically mounted in the torso — measure acceleration and rotational velocity, giving the robot a sense of its own orientation and movement, analogous to the human vestibular system in the inner ear.

Force/torque sensors in the feet and ankles measure the ground reaction forces, allowing the robot to calculate its centre of pressure and detect when it is beginning to tip.

Joint encoders track the precise angle and velocity of every joint, feeding the data needed for the control system to know the robot's current configuration.

Cameras and depth sensors — including stereo cameras and LiDAR — provide information about upcoming terrain, steps, obstacles, and slopes, allowing the robot to plan its footsteps in advance rather than only reacting to what it feels underfoot.

Modern humanoid platforms fuse data from all of these sensors simultaneously, a process called sensor fusion, to build a real-time model of the robot's state and its environment.

The Terrain Problem

Walking on a flat, hard factory floor is a largely solved problem for current humanoid robots. The real world, however, is full of slopes, stairs, gravel, wet surfaces, soft ground, curbs, and clutter — and this is where most robots still struggle.

Stairs are an important benchmark. Robots like Atlas, UBTECH's Walker S2, and Agility Robotics' Digit have demonstrated stair climbing, but reliable, fast stair navigation in varying conditions remains challenging. Outdoor locomotion — handling rain, mud, snow, or uneven natural terrain — is even harder and largely unreliable in current platforms.

This is one area where reinforcement learning shows particular promise, because RL policies can be trained across thousands of randomised terrain types in simulation, building robustness that is difficult to achieve through hand-crafted control. Recent research has demonstrated RL-trained robots traversing narrow beams, recovering from trips, and navigating obstacles using only proprioceptive sensing (internal body awareness without vision), pushing the boundaries of what blind locomotion can achieve.

Legs vs. Wheels: The Ongoing Debate

Not all humanoid robots walk. Several companies — including Humanoid (HMND 01 Alpha), Apptronik, and some configurations from other manufacturers — use wheeled bases rather than legs, particularly for indoor industrial applications.

The argument for legs is compelling in theory: a legged robot can go anywhere a human can, including stairs, uneven terrain, and spaces designed for human navigation. Legs also allow the robot to step over obstacles, crouch, kneel, and position itself in ways that wheels cannot.

The argument for wheels is equally practical: most warehouses and factory floors are flat and smooth. Wheels are simpler, more energy-efficient, more stable, and cheaper to build and maintain. They don't fall over when power is lost — a significant safety consideration that the new ISO 25785-1 standard for humanoid robots specifically addresses.

Many in the industry see this as a spectrum rather than a binary choice. The immediate commercial deployments of humanoid robots are overwhelmingly in structured indoor environments where wheels may suffice. The long-term vision — robots in homes, construction sites, disaster zones, and outdoor environments — almost certainly requires legs. Some platforms, like KAIST's DRC-HUBO, have even combined both approaches, using wheeled locomotion on flat ground and converting to legs when terrain demands it.

Where We Stand Today

Locomotion in humanoid robotics has progressed remarkably in the past few years. Robots can now walk, run, jump, hop, sidestep, and recover from significant pushes. Boston Dynamics' Atlas can execute cartwheels and backflips. Unitree's H1 has demonstrated high-speed dynamic locomotion, including sprinting and recovering from heavy disturbances. Multiple startups are achieving stable walking within days of assembling new hardware, thanks to simulation-trained RL policies.

But a persistent gap remains between impressive demonstrations and reliable real-world deployment. As multiple industry analyses have noted, locomotion in current humanoid robots looks far more polished in video than it performs in sustained, uncontrolled environments. Walking on varied terrain, operating for full work shifts, handling unexpected obstacles, and doing all of this while simultaneously performing useful tasks with the upper body — this is where the real work remains.

The convergence of reinforcement learning, high-fidelity simulation, and increasingly capable hardware is accelerating progress at a pace that would have seemed implausible even five years ago. But the fundamental physics of balancing a heavy, top-heavy machine on two small feet, in a world full of surprises, ensures that locomotion will remain one of the defining challenges — and most active research frontiers — in humanoid robotics for years to come.

Further Reading on Droid Brief:

Actuators & Motors: The Muscles — Understand the hardware that powers robot movement.

Sensors & Perception — A deeper dive into how robots sense the world.

AI & The Robot Brain — How machine learning and reinforcement learning drive robot behaviour.

Key Terminology Glossary — Definitions of technical terms used in this article.