Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Research Question

Language Models can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role.
We introduce “Tree of Thoughts” (ToT), which generalizes over the popular “Chain of Thought” approach to prompting language models, and enables exploration over coherent units of text (“thoughts”)

Approach

Notations

General Notations
- Pre-trained Language Model: $p_\theta$
- Language Sequence:
  - Represented using lowercase letters: $x, y, z, s, \dots$
  - A sequence is defined as $x = (x[1], \dots, x[n])$
- Probability of a Language Sequence: $p_\theta(x) = \prod_{i=1}^n p_\theta(x[i] \mid x[1 \dots i-1])$
- Collection of Language Sequences: represented using uppercase letters $S, \dots$
Input-Output (IO) Prompting
- Output Sampling: $y \sim p_\theta(y \mid \text{prompt}_{IO}(x))$
- Prompting Mechanism: $\text{prompt}_{IO}(x)$: Wraps input with task-specific instructions and/or few-shot input-output examples.
- Simplified Notation: $p_\theta^{\text{prompt}}(\text{output} \mid \text{input}) = p_\theta(\text{output} \mid \text{prompt}(\text{input})).$
- Final Formulation: IO prompting is expressed as $y \sim p_\theta^{IO}(y \mid x)$
Chain-of-Thought (CoT) Prompting
- Key Idea:
  - Introduces a chain of thoughts $z_1, \dots, z_n$ to bridge $x$ and $y$.
  - Each $z_i$ is a coherent language sequence representing an intermediate step toward problem-solving, namely a thought (e.g., an equation for math QA).
- Sequential Thought Sampling: Each thought is sampled sequentially. $z_i \sim p_\theta^{\text{CoT}}(z_i \mid x, z_1 \dots z_{i-1})$
- Final Output: $y \sim p_\theta^{\text{CoT}}(y \mid x, z_1 \dots z_n)$
Self-Consistency with CoT (CoT-SC)
- Samples $k$ i.i.d. chains of thought $[z_1^{(i)}, \dots, z_n^{(i)}, y^{(i)}] \sim p_\theta^{\text{CoT}}(z_1 \dots z_n, y \mid x)$ for $i = 1, \dots, k$
- Returns the most frequent output $y$ using majority voting $\arg \max_y \# \{i \mid y^{(i)} = y\}$

ToT Framework

ToT frames any problem as a search over a tree, where each node is a state $s = [x, z_1, \dots, z_i]$representing a partial solution with the input and the sequence of thoughts so far. A specific instantiation of ToT involves answering four questions:

Thought Decomposition
- A thought could be a couple of words (Crosswords), a line of equation (Game of 24), or a whole paragraph of writing plan (Creative Writing).
- In general, a thought should be “small” enough so that LMs can generate promising and diverse samples (e.g. generating a whole book is usually too “big” to be coherent), yet “big” enough so that LMs can evaluate its prospect toward problem solving (e.g. generating one token is usually too “small” to evaluate)
Thought Generator $G(p_\theta, s, k)$
- Definition: Generates $k$ candidates for the next thought step given a tree state $s = [x, z_1, \dots, z_i]$
- Strategies:
  1. Sample i.i.d. Thoughts:
    - $z^{(j)} \sim p_\theta^{\text{CoT}}(z_{i+1} \mid x, z_1 \dots z_i)$ for $j = 1 \dots k$
    - Works Better When the thought space is rich (e.g., each thought is a paragraph), and Independent and identically distributed (i.i.d.) samples enhance diversity.
  2. Propose Thoughts Sequentially:
    - $[z^{(1)}, \dots, z^{(k)}] \sim p_\theta^{\text{propose}}(z_{i+1}^{(1 \dots k)} \mid s).$
    - Works Better When the thought space is constrained (e.g., each thought is a word or line), so proposing thoughts in the same context avoids duplication.
State Evaluator $V(p_\theta, S)$
- State Evaluator serves as a heuristic for the search algorithm to determine which states to keep exploring and in which order. We propose to use the LM to deliberately reason about states.
- Similar to the thought generator, we consider two strategies to evaluate states either independently or together:
  - Value each state independently: $V(p_\theta, \mathcal{S})(s) \sim p_\theta^{\text{value}}(v \mid s) \quad \forall s \in \mathcal{S},$ where a value prompt reasons about the state $s$ to generate a scalar value $v$ (e.g. 1-10) or a classification (e.g. sure/likely/impossible) that could be heuristically turned into a value.
  - Vote across states: $V(p_\theta, \mathcal{S})(s) = \mathbb{1}[s = s^]$, where a “good” state $s^ \sim p_\theta^{\text{vote}}(s^* \mid \mathcal{S})$ is voted out based on deliberately comparing different states in $S$ in a vote prompt. This is similar in spirit to a “step-wise” self-consistency strategy, i.e. cast “which state to explore” as a multi-choice QA, and use LM samples to vote for it.
Search Algorithm
- Breadth-first search (BFS)
  - maintains a set of the $b$ most promising states per step.
  - used for Game of 24 and Creative Writing where the tree depth is limit ($T ≤ 3$), and initial thought steps can be evaluated and pruned to a small set ($b ≤ 5$).
- Depth-first search (DFS)
  - explores the most promising state first, until the final output is reached ($t > T$ ), or the state evaluator deems it impossible to solve the problem from the current $s$ ($V(p_\theta, {s})(s) \leq v_{th}$ given a value threshold $v_{th}$)
  - In the latter case, the subtree from s is pruned to trade exploration for exploitation. In both cases, DFS backtracks to the parent state of $s$ to continue exploration.

Research Question

Approach

Notations

ToT Framework

Experiments