The challenge is structured in two rounds. In Phase 1, all teams are evaluated on a standardized real-robot bimanual manipulation benchmark using a shared dataset. In Phase 2, the top teams are selected and supported in iterative rollout collection to further improve their policies via post-training.
Phase 1 — Evaluation TracksPhase 1 evaluates submitted policies on 3–5 standardized real-robot tasks across two ranking tracks. Teams perform offline training on the released dataset (expert data + baseline success and failure rollouts) and submit policies for benchmark evaluation.
Policies are evaluated and ranked on each task independently. Best for specialists optimizing for individual task performance.
A single policy is evaluated jointly across all tasks. Best for generalist policies that share representations across skills.
Phase 1 includes three real-robot bimanual manipulation tasks. Select a task below to view a sample rollout video and its scoring criteria.
insert-mouse-batterytower-of-hanoi-gameseal-water-bottle-capFor Phase 1, we release a standardized real-robot bimanual dataset designed for offline training. The dataset contains four complementary components:
High-quality human teleoperation demonstrations on the benchmark tasks.
Trajectories where a baseline policy successfully completed the task.
Trajectories where the baseline policy failed — useful negative signal for post-training.
Human interventions, corrections, and preference labels collected during baseline rollouts.
All four components are intended for offline training in Phase 1. Hosted on Hugging Face Datasets.
Download on Hugging FaceStep-by-step instructions for joining the challenge — including environment setup, dataset access, baseline reproduction, evaluation protocol, and submission format — are hosted in our GitHub repositories. Reference code and starter scripts are provided so teams can get up and running quickly.
Based on Phase 1 results, the top 3 teams advance to Phase 2. During this round, selected teams collect rollouts by deploying their own policies and use those rollouts to further improve performance via post-training.
Listed prizes apply to each track (Single-Task & Multi-Task).
| # | Team / Model | insert-mouse-battery |
tower-of-hanoi-game |
seal-water-bottle-cap |
Average | ||||
|---|---|---|---|---|---|---|---|---|---|
| Score | SR | Score | SR | Score | SR | Score | SR | ||
| 1 |
Zhangyu
Huazhong
trials: 5 / 5 / 5
|
100 | 100% | 66 | 60% | 56 | 40% | 74 | 66.7% |
| 2 |
Haruki
U-Tokyo
trials: 5 / 10 / 10
|
80 | 80% | 0 | 0% | 90 | 90% | 56.7 | 56.7% |
| 3 |
VLAlab-JP
trials: 5 / 10 / 10
|
100 | 100% | 40 | 40% | 30 | 30% | 56.7 | 56.7% |
| 4 |
PengfangQian
SII
trials: 10 / 10 / 10
|
90 | 90% | 33 | 30% | 41.1 | 30% | 54.7 | 50% |
| 5 |
Yitong
SII
trials: 5 / 10 / 5
|
80 | 80% | 9 | 0% | 60 | 60% | 49.7 | 46.7% |
| 6 |
RuikaiShi
HKUST
trials: 5 / 5 / 5
|
80 | 80% | 0 | 0% | 40 | 40% | 40 | 40% |
| 7 |
TongJiang
cityu
trials: 5 / 5 / 5
|
93.3 | 40% | 14 | 0% | 70 | 60% | 59.1 | 33.3% |
| 8 |
ShijieGeng
Drexel
trials: 5 / 5 / 5
|
52 | 20% | 7.5 | 0% | 60 | 60% | 39.8 | 26.7% |
| 9 |
HoKyunIm
Yonsei
trials: 5 / 5 / 5
|
60 | 60% | 0 | 0% | 42 | 20% | 34 | 26.7% |
| 10 |
JiabingYang
CASIA
trials: 5 / 5 / 5
|
72 | 40% | 12 | 0% | 40 | 20% | 41.3 | 20% |
| 11 |
QianYe
jingshuo
trials: 5 / — / 5
|
52 | 20% | — | — | 40 | 40% | 30.7 | 20% |
| 12 |
Guanqi
HKU
trials: 5 / 5 / 5
|
44 | 20% | 6 | 0% | 40 | 20% | 30 | 13.3% |
| 13 |
Jingyu
SYSU
trials: 5 / 5 / 5
|
58 | 40% | 0 | 0% | 0 | 0% | 19.3 | 13.3% |
| 14 |
Xingxin
HKUST
trials: 5 / 5 / 5
|
40 | 40% | 18 | 0% | 0 | 0% | 19.3 | 13.3% |
| 15 |
Qichang
SYSU
trials: 5 / 5 / 5
|
16 | 0% | 0 | 0% | 0 | 0% | 5.3 | 0% |
| 16 |
Yangxin
SYSU
trials: 5 / 5 / 5
|
16 | 0% | 0 | 0% | 0 | 0% | 5.3 | 0% |
| 17 |
SiyangZheng
SYSU
trials: 5 / — / —
|
8 | 0% | — | — | — | — | 2.7 | 0% |
| 18 |
Benson
Tongji
trials: — / 5 / —
|
— | — | 6 | 0% | — | — | 2 | 0% |
| 19 |
Zhihaozhan
SYSU
trials: 5 / 5 / 5
|
0 | 0% | 0 | 0% | 0 | 0% | 0 | 0% |
| # | Team / Model | insert-mouse-battery |
tower-of-hanoi-game |
seal-water-bottle-cap |
Average | ||||
|---|---|---|---|---|---|---|---|---|---|
| Score | SR | Score | SR | Score | SR | Score | SR | ||
| 1 |
Haruki
U-Tokyo
trials: 5 / 5 / 5
|
80 | 80% | 66 | 60% | 60 | 60% | 68.7 | 66.7% |
| 2 |
ZechengZhu
ICL
trials: 10 / 10 / 10
|
80 | 80% | 52 | 40% | 55 | 50% | 62.3 | 56.7% |
| 3 |
PengfangQian
SII
trials: 10 / 10 / 10
|
96 | 80% | 23 | 20% | 44.4 | 40% | 54.5 | 46.7% |
| 4 |
VLAlab-JP
trials: 10 / 10 / 10
|
98 | 90% | 30 | 30% | 35 | 20% | 54.3 | 46.7% |
| 5 |
TongJiang
cityu
trials: 10 / 10 / 10
|
75.6 | 60% | 10 | 10% | 75 | 70% | 53.5 | 46.7% |
| 6 |
Zhangyu
Huazhong
trials: 10 / 10 / 10
|
100 | 90% | 13 | 10% | 35 | 30% | 49.3 | 43.3% |
| 7 |
HoKyunIm
Yonsei
trials: 10 / 10 / 10
|
68 | 60% | 10 | 10% | 27 | 20% | 35 | 30% |
| 8 |
QianYe
jingshuo
trials: 5 / 5 / 2
|
80 | 80% | 12 | 0% | 25 | 0% | 39 | 26.7% |
| 9 |
ZeyuPing
SYSU
trials: 5 / 5 / 5
|
60 | 60% | 0 | 0% | 30 | 20% | 30 | 26.7% |
| 10 |
JiabingYang
CASIA
trials: 5 / 5 / —
|
60 | 60% | 20 | 20% | — | — | 26.7 | 26.7% |
| 11 |
Yitong
SII
trials: — / 5 / 3
|
— | — | 0 | 0% | 83.3 | 67% | 27.8 | 22.2% |
| 12 |
HanZhao
Westlake
trials: 5 / 5 / 5
|
60 | 60% | 12 | 0% | 32.5 | 0% | 34.8 | 20% |
| 13 |
ShijieGeng
Drexel
trials: 5 / 5 / —
|
76 | 60% | 0 | 0% | — | — | 25.3 | 20% |
| 14 |
Guanqi
HKU
trials: 4 / 5 / 5
|
70 | 50% | 0 | 0% | 22 | 0% | 30.7 | 16.7% |
| 15 |
Xingxin
HKUST
trials: 5 / 5 / —
|
40 | 40% | 0 | 0% | — | — | 13.3 | 13.3% |
| 16 |
Zhihaozhan
SYSU
trials: 5 / 5 / —
|
16 | 0% | 38 | 20% | — | — | 18 | 6.7% |
| 17 |
Qichang
SYSU
trials: 5 / 5 / —
|
24 | 0% | 0 | 0% | — | — | 8 | 0% |
| 18 |
TianxingShi
Tongji
trials: 5 / 5 / 5
|
8 | 0% | 0 | 0% | 0 | 0% | 2.7 | 0% |
| 19 |
HaotongChen
Tongji
trials: 5 / — / 5
|
0 | 0% | — | — | 0 | 0% | 0 | 0% |
| 20 |
Jingyu
SYSU
trials: — / 3 / —
|
— | — | 0 | 0% | — | — | 0 | 0% |
Score: progress score per task (higher is better). SR: success rate over rollouts. trials: number of evaluation rollouts per task, listed in column order insert-mouse-battery / tower-of-hanoi-game / seal-water-bottle-cap. Ranking is by Average SR first, then Average Score as a tie-breaker. In both tracks the Average is computed over all three tasks — any task a team did not evaluate (—) counts as 0. Runs with only a single rollout are excluded.
Final results. These are the final-phase evaluation results. Each policy was evaluated with up to 5 rollouts per task under a unified setting.
RegistrationTeam registration is now open. Please fill in the Google Form below with your team information and track preference. Submission instructions will be shared with registered teams.
Register your team| Time | Event |
|---|---|
| 9:00 - 10:30 | Invited Talks Karl Pertsch · Abhishek Gupta · Jianlan Luo · more speakers TBA |
| 10:30 - 11:00 | Coffee Break & Participating Teams' Policy Deployment Demonstrations |
| 11:00 - 12:00 | Participating Team Reports |
| 12:00 - 13:00 | Panel Discussion & Audience Q&A |
Discussion topics for this workshop include, but are not limited to:
If you use this dataset or reference the challenge in your research, please cite us:
@misc{posttraining_robotics_2026,
title = {Post-Training for Robotics Foundation Models Dataset and Challenge},
author = {Zhang, Shiduo and Wang, Yue and Chang, Haonan and Zhao, Hang and
Liu, Yicheng and Guizilini, Vitor and Bobu, Andreea and Wagenmaker, Andrew and
Dixit, Anushri and Yu, Chao and Shah, Dhruv and Simchowitz, Max},
year = {2026},
howpublished = {RSS 2026 Workshop & Challenge},
url = {https://posttraining-for-robotics.github.io}
}
For any questions about the workshop or the challenge, please reach out to:
Shiduo Zhang · sdzhang23@m.fudan.edu.cn