Research Question
- Language Models can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role.
- We introduce “Tree of Thoughts” (ToT), which generalizes over the popular “Chain of Thought” approach to prompting language models, and enables exploration over coherent units of text (“thoughts”)
Approach
Notations
- General Notations
- Pre-trained Language Model: $p_\theta$
- Language Sequence:
- Represented using lowercase letters: $x, y, z, s, \dots$
- A sequence is defined as $x = (x[1], \dots, x[n])$
- Probability of a Language Sequence: $p_\theta(x) = \prod_{i=1}^n p_\theta(x[i] \mid x[1 \dots i-1])$
- Collection of Language Sequences: represented using uppercase letters $S, \dots$
- Input-Output (IO) Prompting
- Output Sampling: $y \sim p_\theta(y \mid \text{prompt}_{IO}(x))$
- Prompting Mechanism: $\text{prompt}_{IO}(x)$: Wraps input with task-specific instructions and/or few-shot input-output examples.
- Simplified Notation: $p_\theta^{\text{prompt}}(\text{output} \mid \text{input}) = p_\theta(\text{output} \mid \text{prompt}(\text{input})).$
- Final Formulation: IO prompting is expressed as $y \sim p_\theta^{IO}(y \mid x)$
- Chain-of-Thought (CoT) Prompting
- Key Idea:
- Introduces a chain of thoughts $z_1, \dots, z_n$ to bridge $x$ and $y$.
- Each $z_i$ is a coherent language sequence representing an intermediate step toward problem-solving, namely a thought (e.g., an equation for math QA).
- Sequential Thought Sampling: Each thought is sampled sequentially. $z_i \sim p_\theta^{\text{CoT}}(z_i \mid x, z_1 \dots z_{i-1})$
- Final Output: $y \sim p_\theta^{\text{CoT}}(y \mid x, z_1 \dots z_n)$
- Self-Consistency with CoT (CoT-SC)
- Samples $k$ i.i.d. chains of thought $[z_1^{(i)}, \dots, z_n^{(i)}, y^{(i)}] \sim p_\theta^{\text{CoT}}(z_1 \dots z_n, y \mid x)$ for $i = 1, \dots, k$
- Returns the most frequent output $y$ using majority voting $\arg \max_y \# \{i \mid y^{(i)} = y\}$
ToT Framework
ToT frames any problem as a search over a tree, where each node is a state $s = [x, z_1, \dots, z_i]$representing a partial solution with the input and the sequence of thoughts so far. A specific instantiation of ToT involves answering four questions:
- Thought Decomposition
- A thought could be a couple of words (Crosswords), a line of equation (Game of 24), or a whole paragraph of writing plan (Creative Writing).
- In general, a thought should be “small” enough so that LMs can generate promising and diverse samples (e.g. generating a whole book is usually too “big” to be coherent), yet “big” enough so that LMs can evaluate its prospect toward problem solving (e.g. generating one token is usually too “small” to evaluate)
- Thought Generator $G(p_\theta, s, k)$
- Definition: Generates $k$ candidates for the next thought step given a tree state $s = [x, z_1, \dots, z_i]$
- Strategies:
- Sample i.i.d. Thoughts:
- $z^{(j)} \sim p_\theta^{\text{CoT}}(z_{i+1} \mid x, z_1 \dots z_i)$ for $j = 1 \dots k$
- Works Better When the thought space is rich (e.g., each thought is a paragraph), and Independent and identically distributed (i.i.d.) samples enhance diversity.
- Propose Thoughts Sequentially:
- $[z^{(1)}, \dots, z^{(k)}] \sim p_\theta^{\text{propose}}(z_{i+1}^{(1 \dots k)} \mid s).$
- Works Better When the thought space is constrained (e.g., each thought is a word or line), so proposing thoughts in the same context avoids duplication.
- State Evaluator $V(p_\theta, S)$
- State Evaluator serves as a heuristic for the search algorithm to determine which states to keep exploring and in which order. We propose to use the LM to deliberately reason about states.
- Similar to the thought generator, we consider two strategies to evaluate states either independently or together:
- Value each state independently: $V(p_\theta, \mathcal{S})(s) \sim p_\theta^{\text{value}}(v \mid s) \quad \forall s \in \mathcal{S},$ where a value prompt reasons about the state $s$ to generate a scalar value $v$ (e.g. 1-10) or a classification (e.g. sure/likely/impossible) that could be heuristically turned into a value.
- Vote across states: $V(p_\theta, \mathcal{S})(s) = \mathbb{1}[s = s^]$, where a “good” state $s^ \sim p_\theta^{\text{vote}}(s^* \mid \mathcal{S})$ is voted out based on deliberately comparing different states in $S$ in a vote prompt. This is similar in spirit to a “step-wise” self-consistency strategy, i.e. cast “which state to explore” as a multi-choice QA, and use LM samples to vote for it.
- Search Algorithm
Experiments