Research Question

We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model.

TOKENIZER & MODEL

TOKENIZER

MODEL ARCHITECTURE

image.png

QWEN2 DENSE MODEL

QWEN2 MIXTURE-OF-EXPERTS MODEL

PRE-TRAINING

In the pre-training of Qwen2, our efforts were focused on refining the dataset and investigating methods to handle extended context lengths effectively.

PRE-TRAINING DATA