Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting
Abstract
Posterior-Transition Reweighting (PTR) improves offline robot policy adaptation by dynamically weighting training samples based on the attribution of their post-action consequences, enabling more conservative and effective learning from heterogeneous datasets.
Offline post-training adapts a pretrained robot policy to a target dataset by supervised regression on recorded actions. In practice, robot datasets are heterogeneous: they mix embodiments, camera setups, and demonstrations of varying quality, so many trajectories reflect recovery behavior, inconsistent operator skill, or weakly informative supervision. Uniform post-training gives equal credit to all samples and can therefore average over conflicting or low-attribution data. We propose Posterior-Transition Reweighting (PTR), a reward-free and conservative post-training method that decides how much each training sample should influence the supervised update. For each sample, PTR encodes the observed post-action consequence as a latent target, inserts it into a candidate pool of mismatched targets, and uses a separate transition scorer to estimate a softmax identification posterior over target indices. The posterior-to-uniform ratio defines the PTR score, which is converted into a clipped-and-mixed weight and applied to the original action objective through self-normalized weighted regression. This construction requires no tractable policy likelihood and is compatible with both diffusion and flow-matching action heads. Rather than uniformly trusting all recorded supervision, PTR reallocates credit according to how attributable each sample's post-action consequence is under the current representation, improving conservative offline adaptation to heterogeneous robot data.
Community
We propose Posterior-Transition Reweighting (PTR), which is a reward-free and conservative post-training method that decides how much each training sample should influence the supervised update. PTR is particularly suitable for complex, heterogeneous robot data of varying quality.
arxiv: https://arxiv.org/abs/2603.16542
blog: https://research.beingbeyond.com/ptr
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Reinforcement-aware Knowledge Distillation for LLM Reasoning (2026)
- Value-Based Pre-Training with Downstream Feedback (2026)
- TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT (2026)
- Zero-Shot Off-Policy Learning (2026)
- Latent Policy Steering through One-Step Flow Policies (2026)
- SPAARS: Safer RL Policy Alignment through Abstract Exploration and Refined Exploitation of Action Space (2026)
- Embedding Morphology into Transformers for Cross-Robot Policy Learning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper