Research Question:

Approach:

Details of GenRMs:

GenRM represents solution correctness using the LLM’s probability distribution over tokens, instead of predicting a separate numerical score

Data:

image.png

Models & Training:

Experiments

Setup