Research Question

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.

Approach

DeepSeek-R1-Zero

DeepSeek-R1

DeepSeek-R1 incorporates a small amount of cold-start data and a 4-stage training pipeline. The pipeline incorporates two RL stages aimed at discovering improved reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model’s reasoning and non-reasoning capabilities.