RSS 2026 Workshop & Challenge

Post-training for Robotics Foundation Models

From Pretrained Policies to Real-World Mastery.
University of Technology Sydney · July 13 2026

About
Post-training is a crucial stage in robotics foundation model research: it enables robots to go beyond large-scale pretraining on expert demonstrations and further improve performance through their own interaction with the environment. By collecting interaction rollouts, distinguishing good and bad behaviors, and iteratively optimizing policies, post-training can substantially enhance precision, failure recovery, and generalization—bringing robotics foundation models closer to reliable real-world deployment.

This challenge and workshop will focus on the foundational questions and key challenges in the post-training stage of robotics foundation models. The event consists of two tightly coupled components: (1) a pre-workshop challenge and (2) an in-person workshop at RSS. The workshop program will include four parts: the pre-workshop challenge, invited speaker talks, team presentations, and a panel discussion.

We expect the workshop to (i) establish shared problem definitions and evaluation protocols for robotics foundation model post-training, (ii) surface empirical lessons and failure modes from real-robot rollouts, and (iii) catalyze reusable resources (e.g., benchmarking protocol, reports, and baselines) that enable more reproducible and scalable research in this area.

Challenge

The challenge is structured in two rounds. In Phase 1, all teams are evaluated on a standardized real-robot bimanual manipulation benchmark using a shared dataset. In Phase 2, the top teams are selected and supported in iterative rollout collection to further improve their policies via post-training.

Phase 1 — Evaluation Tracks

Phase 1 evaluates submitted policies on 3–5 standardized real-robot tasks across two ranking tracks. Teams perform offline training on the released dataset (expert data + baseline success and failure rollouts) and submit policies for benchmark evaluation.

Single-Task Track

Policies are evaluated and ranked on each task independently. Best for specialists optimizing for individual task performance.

Multi-Task Track

A single policy is evaluated jointly across all tasks. Best for generalist policies that share representations across skills.

Task Samples & Scoring

Phase 1 includes three real-robot bimanual manipulation tasks. Select a task below to view a sample rollout video and its scoring criteria.

Scoring Criteria · insert-mouse-battery
  • Place the mouse in the center of the table with the battery slot facing right. +0.4
  • Insert the battery in the correct polarity, but not fully inserted. +0.4
  • Battery is fully inserted. +0.2
Steps are cumulative — completing the final step yields a total score of 1.0.
Left wrist
Top
Right wrist
Dataset

For Phase 1, we release a standardized real-robot bimanual dataset designed for offline training. The dataset contains four complementary components:

Standardized Real-Robot Bimanual Dataset
Expert Data

High-quality human teleoperation demonstrations on the benchmark tasks.

Baseline Success Rollouts

Trajectories where a baseline policy successfully completed the task.

Baseline Failure Rollouts

Trajectories where the baseline policy failed — useful negative signal for post-training.

Human-in-the-Loop Data

Human interventions, corrections, and preference labels collected during baseline rollouts.

All four components are intended for offline training in Phase 1. Hosted on Hugging Face Datasets.

Download on Hugging Face
Tutorial
How to Participate

Step-by-step instructions for joining the challenge — including environment setup, dataset access, baseline reproduction, evaluation protocol, and submission format — are hosted in our GitHub repositories. Reference code and starter scripts are provided so teams can get up and running quickly.

Phase 2 — Iterative Improvement

Based on Phase 1 results, the top 3 teams advance to Phase 2. During this round, selected teams collect rollouts by deploying their own policies and use those rollouts to further improve performance via post-training.

  • Top 3 teams selected from Phase 1 ranking
  • Iterative rollout collection supported by the organizers
  • Final policies presented at the workshop @ RSS 2026
Schedule
Key Dates
Now → May 31, 2026
Team registration open
Now → June 10, 2026
Phase 1 evaluation window
Early June, 2026
Top 3 teams announced
June 10 – 30, 2026
Phase 2 iterative improvement
July, 2026
Workshop @ RSS — team reports & panel
Phases
  • Registration — team sign-up and track selection
  • Phase 1: Ranking — single-task & multi-task evaluation
  • Phase 2: Iterative Improvement — top 3 teams
  • Workshop @ RSS — team presentations and panel discussion
Awards
💰 $20,000+ Total Prize Pool
🥇
1st Place
$5,000
+ Bimanual YAM Robot
🥈
2nd Place
$3,000
🥉
3rd Place
$2,000
🏅
4th – 10th
$500 each
Top-performing teams will also be invited to give a presentation at the workshop.
Practical issues, lessons, and experiences from the challenge will be compiled into a technical report shared with the community.
Leaderboard
Phase 1 evaluation results across both tracks. The pi05 baseline is provided as a reference; team submissions will populate the table during the evaluation window.
# Team / Model insert-mouse-battery tower-of-hanoi-game seal-water-bottle-cap Average
ScoreSR ScoreSR ScoreSR ScoreSR
Baseline
pi05
8780% 4745% 4735% 60.353.3%

Score: progress score per task (higher is better). SR: success rate over rollouts. The final ranking uses the Average column.

Registration

Team registration is now open. Please fill in the Google Form below with your team information and track preference. Submission instructions will be shared with registered teams.

Register your team

Invited Speakers

Karl Pertsch

Physical Intelligence

Abhishek Gupta

University of Washington

More Speakers

To be announced


Schedule

Time Event
9:00 - 10:30 Invited Talks
Karl Pertsch · Abhishek Gupta · more speakers TBA
10:30 - 11:00 Coffee Break & Participating Teams' Policy Deployment Demonstrations
11:00 - 12:00 Participating Team Reports
12:00 - 13:00 Panel Discussion & Audience Q&A

Discussion Topics

Discussion topics for this workshop include, but are not limited to:

Post-Training Paradigms for Robotics Foundation Models
Efficient Post-Training System Design
Q-function and Value Function Learning
Human-in-the-Loop and Feedback-Driven Methods
Learning from Human Videos
Continual Post-Training and Catastrophic Forgetting
Real-Robot Evaluation and Deployment Constraints
Robustness and Generalization under Distribution Shift
Safety and Alignment in Robotics Post-Training

Organizers

Andrew Wagenmaker

UC Berkeley

Dhruv Shah

Princeton University

Anushri Dixit

University of California, Los Angeles

Hang Zhao

Tsinghua University

Max Simchowitz

Carnegie Mellon University

Shiduo Zhang

Fudan University

Chao Yu

Tsinghua University

Vitor Guizilini

Toyota Research Institute

Yicheng Liu

Tsinghua University

Yue Wang

University of Southern California

Haonan Chang

WorldEngine


Industry Partners



Contact

For any questions about the workshop or the challenge, please reach out to:

Shiduo Zhang · sdzhang23@m.fudan.edu.cn