TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward Paper • 2603.07700 • Published about 1 month ago • 13