The challenge is structured in two rounds. In Phase 1, all teams are evaluated on a standardized real-robot bimanual manipulation benchmark using a shared dataset. In Phase 2, the top teams are selected and supported in iterative rollout collection to further improve their policies via post-training.
Phase 1 — Evaluation TracksPhase 1 evaluates submitted policies on 3–5 standardized real-robot tasks across two ranking tracks. Teams perform offline training on the released dataset (expert data + baseline success and failure rollouts) and submit policies for benchmark evaluation.
Policies are evaluated and ranked on each task independently. Best for specialists optimizing for individual task performance.
A single policy is evaluated jointly across all tasks. Best for generalist policies that share representations across skills.
Phase 1 includes three real-robot bimanual manipulation tasks. Select a task below to view a sample rollout video and its scoring criteria.
insert-mouse-batterytower-of-hanoi-gameseal-water-bottle-capFor Phase 1, we release a standardized real-robot bimanual dataset designed for offline training. The dataset contains four complementary components:
High-quality human teleoperation demonstrations on the benchmark tasks.
Trajectories where a baseline policy successfully completed the task.
Trajectories where the baseline policy failed — useful negative signal for post-training.
Human interventions, corrections, and preference labels collected during baseline rollouts.
All four components are intended for offline training in Phase 1. Hosted on Hugging Face Datasets.
Download on Hugging FaceStep-by-step instructions for joining the challenge — including environment setup, dataset access, baseline reproduction, evaluation protocol, and submission format — are hosted in our GitHub repositories. Reference code and starter scripts are provided so teams can get up and running quickly.
Based on Phase 1 results, the top 3 teams advance to Phase 2. During this round, selected teams collect rollouts by deploying their own policies and use those rollouts to further improve performance via post-training.
Listed prizes apply to each track (Single-Task & Multi-Task).
| # | Team / Model | insert-mouse-battery |
tower-of-hanoi-game |
seal-water-bottle-cap |
Average | ||||
|---|---|---|---|---|---|---|---|---|---|
| Score | SR | Score | SR | Score | SR | Score | SR | ||
| 1 |
Sii-Yitong
just-one-step
trials: 6 / 5 / 5
|
73.3 | 66.7% | 100 | 100% | 70 | 60% | 81.1 | 75.6% |
| 2 |
QianYe
jingshuo
trials: 5 / — / —
|
80 | 80% | — | — | — | — | 80 | 80% |
| 3 |
Jingyu
SYSU
trials: 5 / 5 / 5
|
96 | 80% | 0 | 0% | 100 | 60% | 65.3 | 46.7% |
| 4 |
JiabingYang
CASIA
trials: 5 / — / 5
|
100 | 100% | — | — | 24 | 0% | 62 | 50% |
| 5 |
Zhangyu
Huazhong
trials: 5 / 5 / 5
|
76 | 60% | 0 | 0% | 78 | 60% | 51.3 | 40% |
| 6 |
Haruki
U-Tokyo
trials: 5 / 5 / 5
|
76 | 60% | 26 | 20% | 50 | 40% | 50.7 | 40% |
| 7 |
Zhangyu
Huazhong · HIL
trials: — / — / 5
|
— | — | — | — | 50 | 40% | 50 | 40% |
| 8 |
Guanqi
HKU
trials: 10 / 10 / 5
|
66 | 50% | 6 | 0% | 52 | 20% | 41.3 | 23.3% |
| 9 |
VLAlab-JP
trials: 20 / 5 / 5
|
80 | 80% | 0 | 0% | 20 | 20% | 33.3 | 33.3% |
| 10 |
Benson
Tongji
trials: 5 / 5 / —
|
60 | 60% | 6 | 0% | — | — | 33 | 30% |
| 11 |
Sii-Yitong
1step
trials: 10 / — / —
|
30 | 10% | — | — | — | — | 30 | 10% |
| 12 |
Sii-Yitong
trials: 11 / — / —
|
27.3 | 9.1% | — | — | — | — | 27.3 | 9.1% |
| 13 |
Yangxin
SYSU
trials: 5 / 5 / 5
|
40 | 40% | 0 | 0% | 0 | 0% | 13.3 | 13.3% |
| 14 |
Zecheng
ICL
trials: — / 5 / —
|
— | — | 6 | 0% | — | — | 6 | 0% |
| 15 |
Pankhuri
KIT
trials: 5 / 5 / 5
|
0 | 0% | 0 | 0% | 0 | 0% | 0 | 0% |
| 16 |
Keweichen
Datawhale
trials: — / — / 5
|
— | — | — | — | 0 | 0% | 0 | 0% |
| 17 |
Zhihaozhan
SYSU
trials: — / — / 5
|
— | — | — | — | 0 | 0% | 0 | 0% |
| — |
Baseline
pi05
|
87 | 80% | 47 | 45% | 47 | 35% | 60.3 | 53.3% |
| # | Team / Model | insert-mouse-battery |
tower-of-hanoi-game |
seal-water-bottle-cap |
Average | ||||
|---|---|---|---|---|---|---|---|---|---|
| Score | SR | Score | SR | Score | SR | Score | SR | ||
| 1 |
Haruki
U-Tokyo
trials: 5 / 5 / 5
|
100 | 80% | 0 | 0% | 60 | 60% | 53.3 | 46.7% |
| 2 |
capV*
wesklake
trials: 10 / 5 / 5
|
64 | 60% | 20 | 20% | 70 | 60% | 51.3 | 46.7% |
| 3 |
VLAlab-JP
trials: 5 / 5 / 5
|
76 | 60% | 0 | 0% | 70 | 60% | 48.7 | 40% |
| 4 |
Sii-Yitong
just-one-step
trials: 10 / 10 / 10
|
40 | 20% | 10 | 10% | 71.1 | 50% | 40.4 | 26.7% |
| 5 |
JiabingYang
CASIA
trials: 5 / 5 / 5
|
80 | 80% | 0 | 0% | 20 | 20% | 33.3 | 33.3% |
| 6 |
Zhangyu
Huazhong
trials: 5 / 5 / 5
|
75 | 60% | 6 | 0% | 0 | 0% | 27 | 20% |
| 7 |
Jingyu
SYSU
trials: 5 / 5 / 5
|
80 | 80% | 0 | 0% | 0 | 0% | 26.7 | 26.7% |
| 8 |
ShijieGeng
Drexel
trials: 5 / 5 / 5
|
80 | 80% | 0 | 0% | 0 | 0% | 26.7 | 26.7% |
| 9 |
Zhihaozhan
SYSU
trials: 5 / 5 / 4
|
8 | 0% | 0 | 0% | 0 | 0% | 2.7 | 0% |
| 10 |
Pankhuri
KIT
trials: 5 / 5 / 5
|
0 | 0% | 0 | 0% | 0 | 0% | 0 | 0% |
| 11 |
Qichang
SYSU
trials: 5 / 5 / 5
|
0 | 0% | 0 | 0% | 0 | 0% | 0 | 0% |
| 12 |
Xingxin
HKUST
trials: 5 / 5 / 5
|
0 | 0% | 0 | 0% | 0 | 0% | 0 | 0% |
| — |
Baseline
pi05
|
68 | 60% | 53.7 | 50% | 35 | 35% | 52.3 | 48.3% |
Score: progress score per task (higher is better). SR: success rate over rollouts. trials: number of evaluation rollouts per task, listed in column order insert-mouse-battery / tower-of-hanoi-game / seal-water-bottle-cap. The final ranking uses the Average column.
Note: The current leaderboard is in an iteration and debugging phase. Submissions are being evaluated with 5–10 rollouts per task to allow rapid iteration. The final ranking will be re-run under a unified evaluation setting with a standardized trial count once all teams have stabilized.
RegistrationTeam registration is now open. Please fill in the Google Form below with your team information and track preference. Submission instructions will be shared with registered teams.
Register your team| Time | Event |
|---|---|
| 9:00 - 10:30 | Invited Talks Karl Pertsch · Abhishek Gupta · Jianlan Luo · more speakers TBA |
| 10:30 - 11:00 | Coffee Break & Participating Teams' Policy Deployment Demonstrations |
| 11:00 - 12:00 | Participating Team Reports |
| 12:00 - 13:00 | Panel Discussion & Audience Q&A |
Discussion topics for this workshop include, but are not limited to:
If you use this dataset or reference the challenge in your research, please cite us:
@misc{posttraining_robotics_2026,
title = {Post-Training for Robotics Foundation Models Dataset and Challenge},
author = {Zhang, Shiduo and Wang, Yue and Chang, Haonan and Zhao, Hang and
Liu, Yicheng and Guizilini, Vitor and Bobu, Andreea and Wagenmaker, Andrew and
Dixit, Anushri and Yu, Chao and Shah, Dhruv and Simchowitz, Max},
year = {2026},
howpublished = {RSS 2026 Workshop & Challenge},
url = {https://posttraining-for-robotics.github.io}
}
For any questions about the workshop or the challenge, please reach out to:
Shiduo Zhang · sdzhang23@m.fudan.edu.cn