Research Question

Given several key limitations of the current reasoning with LLMs:

We propose a new LLM reasoning framework, Reasoning via Planning (RAP). RAP repurposes the LLM as both a world model and a reasoning agent, and incorporates a principled planning algorithm based on Monte Carlo Tree Search for strategic exploration in the vast reasoning space

Approach

image.png

Language Model as World Model

Reward Design

The assessment of each reasoning step (i.e., applying an action $a_t$ to the state $s_t$) is performed by a reward function $r_t = r(s_t, a_t) ∈ R$. Here we introduce several common rewards applicable to different tasks and shown to be effective in our experiments.

Planning with Monte Carlo Tree Search