Video: Pre-Trained Policy Discriminators are General Reward Models (Jul 2025)

Video ▶ Tonton di YouTube

Video oleh AI Paper Slop