Learning Smooth Humanoid Locomotion
through Lipschitz-Constrained Policies



Supplementary Video


Abstract

Reinforcement learning combined with sim-to-real transfer offers a general framework for developing locomotion controllers for legged robots. To facilitate successful deployment in the real world, smoothing techniques, such as low-pass filters and smoothness rewards, are often employed to develop policies with smooth behaviors. However, because these techniques are non-differentiable and usually require tedious tuning of a large set of hyperparameters, they tend to require extensive manual tuning for each robotic platform. To address this challenge and establish a general technique for enforcing smooth behaviors, we propose a simple and effective method that imposes a Lipschitz constraint on a learned policy, which we refer to as Lipschitz-Constrained Policies (LCP). We show that the Lipschitz constraint can be implemented in the form of a gradient penalty, which provides a differentiable objective that can be easily incorporated with automatic differentiation frameworks. We demonstrate that LCP effectively replaces the need for smoothing rewards or low-pass filters and can be easily integrated into training frameworks for many distinct humanoid robots. We extensively evaluate LCP in both simulation and real-world humanoid robots, producing smooth and robust locomotion controllers.

Method

RL-based policies are prone to producing jittery behaviors, Lipschitz continuous is a way to characterize the smoothness of a function. We propose Lipschitz-Constrained-Policies (LCP), a simple method to train policies that produce smooth behaviors by enforcing a Lipschitz constraint on the policy. This constraint is implemented as a gradient penalty, which is differentiable and can be easily integrated into existing RL training pipelines with only a few lines of code. LCP provides a simple and effective alternative to commonly used non-differentiable smoothing techniques.

General Locomotion Framework

With LCP, we are able to develop a framework that can be used to train controllers for a variety of distinct humanoid robots.

Fourier GR1T1
    
Fourier GR1T2

Berkeley Humanoid
    
Unitree H1

More Real-World Results

                 
                 
                 
                 
                   

Sim-to-Sim Performance

    
    

Failure Cases

    

BibTeX

@article{chen2024lcp,
title={Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies},
author={Zixuan Chen and Xialin He and Yen-Jen Wang and Qiayuan Liao and Yanjie Ze and Zhongyu Li and S. Shankar Sastry and Jiajun Wu and Koushil Sreenath and Saurabh Gupta and Xue Bin Peng},
journal={arxiv preprint arXiv:2410.11825},
year={2024}
}