RL never gone away.
Reading the DS paper:
2501.12948
While they aim to explore innovative solution, they are by necessity falling into the footsteps of OpenAI. They started with pure RL, but it went bad, so they added human feedback as cold-start. ChatGPT went all the way with this approach (RLHF - RL from Human Feedback) to make the chat more "human-like". There are many similar or superior models to DS out there, Chinese or not (i'm sure China has better models behing the digital wall). The only diff is that DS is marketed outside of China and as a research innovation.