Pinned
We found that training (multi-agent) policies to win a race with sparse competitive rewards leads to agile policies with strategic behaviors. These policies outperform the ones trained with behavioral-prescribing dense rewards and transfer better to the real world.
00:00










