By Albert Mao
Jan 3, 2024
Chain of Thought Prompting
As large language models (LLM) grow increasingly complex, their output becomes even more accurate. Still, even the best LLMs can make mistakes and have issues when solving tasks that require complex reasoning.
Prompt engineering has proved itself to be a key in obtaining relevant results when working with AI, and several techniques already exist that significantly improve LLMs' performance on tasks, such as arithmetic, commonsense and symbolic reasoning. Below, we discuss the chain-of-thought prompting approaches and various techniques aimed at eliciting reasoning abilities in large language models.
What is the Chain-of-Thought Prompting
Chain-of-thought prompting (CoT) is a prompting method that leads to the emergence of reasoning abilities with large language models and significantly increases their performance. To this end, CoT prompting induces a large language model to articulate its reasoning steps before giving the final answer to the initial question.
A classic example of a chain of thought prompting is offered in the work by Wei et al. (2022), Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. At the beginning of the report, the researchers explicitly demonstrate the difference between standard prompting and chain-of-thought prompting in solving a simple math word problem. Here is the figure from the study, comparing both approaches side-by-side and highlighting the portions of the text that include chain-of-though reasoning processes:
Figure 1: Standard Prompting vs CoT Prompting | Image source: Wei et al. (2022)
In the experiment, the LLM returned an incorrect answer to an arithmetic question when given a standard prompt shown on the left-hand side of the diagram above. However, in the right hand side of the document, introducing a chain-of-thought prompting resulted in successful multi-step reasoning behavior and led to the correct solution of an arithmetic task.
When to Use Chain of Thought Prompting
As follows from the above example, chain-of-thought prompting allows to enhance the performance of LLMs by inducing the models to go through a step-by-step process before arriving at a solution. This new ability is particularly useful when solving tasks where large language models frequently struggle, such as arithmetic math word problems, commonplace reasoning about physical or human interactions or symbolic manipulations with letters, digits or logical operators.
The benefits of chain-of-thought prompting make this technique helpful in various other scenarios, for example:
1. When the problem to be solved includes multiple reasoning steps.
2. Where the user needs to demonstrate the LLM which steps to follow to correct its performance.
3. When the user needs to know how the model arrives at a particular answer or check a specific stage where the reasoning might go wrong.
Types of Chain-of-Thought Prompting
Currently, there are several approaches to chain-of-thought prompting, which can be successfully implemented depending on the specifics of each task. These methods include zero-shot chain-of-thought prompting (Zero CoT), manual chain-of-thought prompting (Manual-CoT) and automatic chain-of-thought prompting (Auto-CoT).
Zero-Shot Chain-of-Thought Prompting
The simplest method to induce an LLM to solve a problem in steps, known as Zero-Shot-CoT, is to ask the model, "Let's think step by step." When given this prompt, LLMs generate a rationale for their reasoning before giving a final answer to a question.
The studies have demonstrated that while large LLMs like GPT-3 or higher are "decent zero-shot reasoners," they can still produce mistakes during their reasoning process. This flaw in LLMs' complex reasoning can be mitigated by an application of the other two CoT methods.
Manual Chain-of-Thought Prompting
With manual chain-of-thought-prompting, the large language model relies on manually designed demonstrations, which prompt the LLM to go through specific reasoning steps instead of giving it full freedom with a "let's think step-by-step" approach. The above-illustrated example of chain-of-thought prompting in Figure 1 is a classic case of Manual-CoT where the user gives the LLM a specific example to follow for its reasoning process.
The studies, including the already mentioned report [Wei et al. 2022] and another study by Kojima et al. (2022), demonstrated superior LLM performance with Manual-COT compared to the Zero-Shot-CoT approach. The obvious drawback of the manual chain-of-thought prompting is the heavier effort required to draft effective demonstrations for the LLM reasoning.
Automatic Chain-of-Thought Prompting
The work done by a group of researchers from Shanghai Jiao Tong University introduced an automatic chain-of-thought approach (Auto-CoT). The method builds upon the zero-shot chain-of-thought prompting, eliminating the need to create demonstrations manually. At the same time, the Auto-CoT method effectively mitigates errors in LLM reasoning by leveraging the diversity of demonstrations created through the Zero-Shot-CoT.
The process of automatic chain-of-thought prompting goes through two stages:
1. Question clustering, which includes classifying questions into several clusters based on their meaning and semantics,
2. Demonstration sampling, which involves selecting a representative question from each cluster and asking the LLM, "Let's think step-by-step," to generate the reasoning chain.
In Zhang et al. (2022) study, this process is demonstrated on a diagram:
Figure 1: Automatic Chain of Thought Prompting in Large Language Models | Image source: Zhang et al. (2022)
Limits to Chain-of-Thought Prompting
Even though LLMs' reasoning process is greatly enhanced by the chain-of-thought prompting, these methods have limitations of their own. First, the researchers observed that CoT prompting does not improve the performance of smaller LLMs with less than 100B parameters. When given CoT prompts, such models produced illogical chains, which led to lower performance than when provided standard prompting.
Second, while larger LLMs perform better at reasoning tasks given chain-of-thought prompting, they still provide incorrect answers with the Zero-Shot-CoT method. This tendency is mitigated to a considerable extent through Manual-CoT prompting by providing the LLM with human-crafted demonstrations or automatically generated demonstrations in the Auto-CoT method.
Leverage LLM Prompting Techniques with VectorShift
Chain-of-thought prompting proves itself as a powerful method to improve the performance of LLM on reasoning tasks by decomposing complex processes into simple steps for better accuracy. While the Zero-Shot chain-of-thought method induces an LLM into thinking step-by-step, manual and automatic CoT methods provide context and diversity of demonstrations, eliminating the mistakes in the reasoning chain.
VectorShift is an AI automation platform that allows to try out techniques such as Chain of Thought prompts and build them directly into AI applications, either through no-code or SDK (software development kit) interfaces. Try it out here for free, or book a free consultation with the VectorShift team if you have any questions.