Research Question

Pretraining

image.png

We began with the pretraining approach described in LLaMA paper, using an optimized auto-regressive transformer, but made several changes to improve performance:

image.png

Pretraining Data

Our training corpus includes a new mix of data from publicly available sources, which does not include data from Meta’s products or services.

Training Details

Architecture

image.png

We adopt most of the pretraining setting and model architecture from Llama 1.