ResMimic: From General Motion Tracking to Humanoid Whole-Body Loco-Manipulation via Residual Learning

Humanoid whole-body loco-manipulation holds transformative potential in daily service and industrial tasks. While recent advances in general motion tracking (GMT) enable humanoids to reproduce diverse human motions, such policies lack the precision and explicit modeling of object interaction necessary for loco-manipulation. We present ResMimic, a two-stage pretrain–post-train residual learning framework for precise humanoid loco-manipulation from human motion data. In the first stage, a GMT policy is pre-trained on large-scale human-only motion data to serve as a task-agnostic whole-body motion prior. In the second stage, a sample-efficient residual policy is post-trained to inject object-conditioned corrections, enabling accurate object interaction without re-learning general motion skills. Our framework employs a unified reward formulation shared across tasks, eliminating per-task reward engineering. To ensure stable and efficient training, we introduce a dense point-cloud object tracking reward, a contact tracking reward, and a virtual force curriculum. We validate ResMimic in simulation and on a real Unitree G1 humanoid. Results demonstrate substantial improvements in task success rate, robustness, and data efficiency over strong baselines.

Carry Box onto Back (1x)

Kneel on One Knee & Lift Box (1x)

Squat & Lift Box with Whole-Body Contact (1x)

4.5kg

Lift Chair with Whole-Body Contact (1x)

4.5kg

Lift Chair with Whole-Body Contact (1x)

5.5kg

Robustness Test

Lift Chair with Whole-Body Contact (1x)

4.5kg

Lift Chair with Whole-Body Contact (1x)

4.5kg

Sit on Chair (1x)

ResMimic (Success ✅)

Sit on Chair (1x)

Base Policy (Failure ❌)

Sit on Chair (1x)

Lift Box with Random Object Initial Pose

Autonomous Consecutive Lift Box

ResMimic (Success ✅)

Base Policy (Failure ❌)

Train from Scratch (Failure ❌)

Base Policy + Finetune (Failure ❌)

We visualize the difference between the pretrained policy and the residual policy in the joint action space. The green region represents the delta action. Notably, the residual policy exhibits more pronounced delta actions in the wrist joints, suggesting that it is better aligned for object interactions.

BibTeX

@misc{zhao2025resmimic,
        title={ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning}, 
        author={Siheng Zhao and Yanjie Ze and Yue Wang and C. Karen Liu and Pieter Abbeel and Guanya Shi and Rocky Duan},
        year={2025},
        eprint={2510.05070},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2510.05070}, 
  }

ResMimic: From General Motion Tracking to Humanoid Whole-Body Loco-Manipulation via Residual Learning

We present ResMimic, a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. It enables expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalizes across poses, and exhibits reactive behavior.

Abstract

Method Overview

Expressive Whole-Body Loco-Manipulation

Heavy Object with Whole-Body Contact

Lift Chair with Whole-Body Contact (1x)

Lift Chair with Whole-Body Contact (1x)

Robustness Test

Lift Chair with Whole-Body Contact (1x)

Lift Chair with Whole-Body Contact (1x)

General Object Interaction

ResMimic (Success ✅)

Sit on Chair (1x)

Base Policy (Failure ❌)

Sit on Chair (1x)

Continuous Execution with MoCap Input

Lift Box with Random Object Initial Pose

Autonomous Consecutive Lift Box

Reactivate Behavior to External Perturbation

Comparison with Baselines

ResMimic (Success ✅)

Base Policy (Failure ❌)

Train from Scratch (Failure ❌)

Base Policy + Finetune (Failure ❌)

Policy Visualization: Pretrained vs Residual

Ablation Results in Simulation

BibTeX