Submitted by JackShu 3 SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning BytedanceDouyinContent 9 2