ResMimic: From General Motion Tracking to Humanoid Whole-Body Loco-Manipulation via Residual Learning

1Amazon FAR (Frontier AI & Robotics)   2University of Southern California   3Stanford University   4UC Berkeley   5Carnegie Mellon University
§Work done while interning at Amazon FAR   FAR Co-Lead

We present ResMimic, a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. It enables expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalizes across poses, and exhibits reactive behavior.

Abstract

Humanoid whole-body loco-manipulation holds transformative potential in daily service and industrial tasks. While recent advances in general motion tracking (GMT) enable humanoids to reproduce diverse human motions, such policies lack the precision and explicit modeling of object interaction necessary for loco-manipulation. We present ResMimic, a two-stage pretrain–post-train residual learning framework for precise humanoid loco-manipulation from human motion data. In the first stage, a GMT policy is pre-trained on large-scale human-only motion data to serve as a task-agnostic whole-body motion prior. In the second stage, a sample-efficient residual policy is post-trained to inject object-conditioned corrections, enabling accurate object interaction without re-learning general motion skills. Our framework employs a unified reward formulation shared across tasks, eliminating per-task reward engineering. To ensure stable and efficient training, we introduce a dense point-cloud object tracking reward, a contact tracking reward, and a virtual force curriculum. We validate ResMimic in simulation and on a real Unitree G1 humanoid. Results demonstrate substantial improvements in task success rate, robustness, and data efficiency over strong baselines.

Method Overview

Method Overview

Expressive Whole-Body Loco-Manipulation

Carry Box onto Back (1x)
Kneel on One Knee & Lift Box (1x)

Heavy Object with Whole-Body Contact

Squat & Lift Box with Whole-Body Contact (1x)
4.5kg

Lift Chair with Whole-Body Contact (1x)

4.5kg

Lift Chair with Whole-Body Contact (1x)

5.5kg

Robustness Test

Lift Chair with Whole-Body Contact (1x)

4.5kg

Lift Chair with Whole-Body Contact (1x)

4.5kg

General Object Interaction

Sit on Chair (1x)

ResMimic (Success ✅)

Sit on Chair (1x)

Base Policy (Failure ❌)

Sit on Chair (1x)


Continuous Execution with MoCap Input

Lift Box with Random Object Initial Pose

Autonomous Consecutive Lift Box


Reactivate Behavior to External Perturbation


Comparison with Baselines

ResMimic (Success ✅)

Base Policy (Failure ❌)

Train from Scratch (Failure ❌)

Base Policy + Finetune (Failure ❌)


Policy Visualization: Pretrained vs Residual

We visualize the difference between the pretrained policy and the residual policy in the joint action space. The green region represents the delta action. Notably, the residual policy exhibits more pronounced delta actions in the wrist joints, suggesting that it is better aligned for object interactions.

Ablation Results in Simulation

BibTeX

@misc{zhao2025resmimic,
        title={ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning}, 
        author={Siheng Zhao and Yanjie Ze and Yue Wang and C. Karen Liu and Pieter Abbeel and Guanya Shi and Rocky Duan},
        year={2025},
        eprint={2510.05070},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2510.05070}, 
  }