Beyond the Majority: Long-tail Imitation Learning for Robotic Manipulation
Abstract
While generalist robot policies hold significant promise for learning diverse manipulation skills through imitation, their performance is often hindered by the long-tail distribution of training demonstrations. Policies learned on such data, which is heavily skewed towards a few data-rich head tasks, frequently exhibit poor generalization when confronted with the vast number of data-scarce tail tasks. In this work, we conduct a comprehensive analysis of the pervasive long-tail challenge inherent in policy learning. Our analysis begins by demonstrating the inefficacy of conventional long-tail learning strategies (e.g., re-sampling) for improving the policy's performance on tail tasks. We then uncover the underlying mechanism for this failure, revealing that data scarcity on tail tasks directly impairs the policy's spatial reasoning capability. To overcome this, we introduce Approaching-Phase Augmentation (APA), a simple yet effective scheme that transfers knowledge from data-rich head tasks to data-scarce tail tasks without requiring external demonstrations. Extensive experiments in both simulation and real-world manipulation tasks demonstrate the effectiveness of APA.
Motivation
As illustrated in Figure 1, owing to the skewed distribution of training demonstrations, the performance degradation of generalist robot policies is particularly pronounced for tail tasks relative to those in the head. Specifically, when shifting from a full to a long-tail dataset, the average success rate on data-rich head tasks experiences a relative decline of 27% (dropping from 74% to 54%). In stark contrast, the success rate on data-scarce tail tasks plummets by a dramatic 65% (dropping from 43% to 15%). This glaring disparity reveals that data scarcity on tail tasks severely impairs policy execution, underscoring the urgent need for strategies to mitigate the impact of long-tail distributions in robot learning.
Failure Analysis
To better understand the failure mechanisms of tail tasks and identify the core reason for the performance drop of policies trained on long-tail demonstrations, we decouple each complete task trajectory into two sequential phases:
- Target Approaching Phase, which encompasses all actions until the robot's end-effector reaches the immediate vicinity of the primary object.
- Subsequent Execution Phase, which includes all actions that follow a successful target approaching phase to complete the task.
Example task: "put the cream cheese in the bowl"
Then, we use Relative Risk (RR) to quantify the increased failure rate of models trained on the LT dataset versus the full dataset, where an RR > 100% indicates an elevated risk of failure. Detailed calculations and experimental results can be found in Section III-D.
As illustrated in Figure 2, shifting from a full to a long-tail dataset increases the risk of failure across both task phases. However, this degradation is heavily skewed. The relative risk of failure during the target approaching phase reached 400.89% across all tail tasks, compared to 164.34% in the subsequent execution phase. This indicates that, relative to a policy trained on the full dataset, a policy trained on long-tail data is 4x more likely to fail during target approaching phase. Because this critical phase relies heavily on precise robot-object coordination, this disparity reveals that data scarcity severely undermines a policy's spatial reasoning capability.
Method
Motivated by those findings, we propose the Approaching-Phase Augmentation (APA) method, which leverages demonstrations from head tasks to generate new, high-quality training examples for tail tasks. Our method involves a three-step process:
- Head Task Trajectory Segmentation, which extracts crucial approaching-phase trajectory segments from successful demonstrations of data-rich head tasks.
- Tail to Head Object Grafting, which generates augmented training data by replacing the original objects within the segmented trajectories with long-tail task objects via asset replacement or image inpainting.
- Instruction Formatting and Co-Training, which formats the corresponding language instructions and then trains the policy on the combined dataset.
Real-World Experiments
Our real-world deployments demonstrate the practical viability and robustness of the APA method on physical robotic systems. When tested in actual physical environments, APA effectively handled long-tail tasks, achieving a 38.4% relative improvement in success rates compared to the baseline.
Simulation Experiments
Complementing these physical results, our extensive simulation experiments rigorously validated APA's foundational ability to overcome data scarcity. In these controlled settings, the method significantly boosted the average success rate from 26.5% to 36.1%, delivering a substantial relative improvement of 36.2%.
BibTeX
@inproceedings{zhu2026beyond,
title={Beyond the Majority: Long-tail Imitation Learning for Robotic Manipulation},
author={Zhu, Junhong and Zhang, Ji and Song, Jingkuan and Gao, Lianli and Shen, Heng Tao},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2026}
}