AI: prompt engineering to reduce hallucinations [part 1]

Prompt engineering techniques allow us to improve the reasoning and responses provided by LLMs , such as ChatGPT. However, are we sure that the responses received are correct? In some cases no! When this happens, the model is said to have hallucinated. Let's find out what this is and what are the techniques to reduce the probability of getting them.

Reading time: 5 minutes

Many of us use large language models, also called LLMs by insiders, for many activities. In fact, with the advent of ChatGPT in late 2022 there has been a real revolution. Although people use ChatGPT and its siblings out of curiosity or for other recreational and/or business purposes, how many are aware that the answers given are not always truthful? This is a problem that is denominated hallucination. It means that the answer we get is incorrect or meaningless. In short, LLM we are using is making up an answer!!!

What are the reasons behind hallucinations? First, we must be aware that LLMs do not understand the output that is generated. In fact, these models do not analyze the responses they provide to understand their meaning. Also, remember that although the responses seem linguistically fluid to us, the word sequence of the responses is generated, to simplify, on a conditional probability basis. Having seen a large amount of textual data, the answer seems correct to us. Other aspects that lead to hallucinations are lack of information of a given context and/or recent facts, and model training that did not explicitly include a phase to avoid false content.

Although hallucinations can happen, The probability of their occurrence can be reduced by carefully structuring the prompts we provide to these models. Several prompt engineering techniques, in addition to those already seen in Prompt engineering: prompting techniques [part 1] and Prompt engineering: prompting techniques [part 2], focus on guiding the model toward producing more truthful, rational, and meaningful responses.

In this article we will explore a number of techniques. For each we will provide an overview of how they work, some examples to illustrate the structure of the prompt, and an analysis of their strengths and limitations. Let’s get started!

Retrieval augmented generation (RAG)

Retrieval augmented generation (RAG) systems combine the capabilities of language models with external sources of information. The key idea is to retrieve relevant context from knowledge bases before text generation, so that the language model is based on verified context data.

RAG systems reduce hallucination by ensuring that all generated responses have supporting evidence, rather than blindly speculating the model. If no contextual evidence is found, the system can honestly admit its ignorance.

Let’s look at some examples to better understand RAG prompts in action.

In the above RAG queries, the system searches for relevant facts in a knowledge source before generating an answer. In this way, the output is based on the retrieved information rather than letting the language model create unsupported facts.

Advantages

RAG systems, by incorporating external knowledge in addition to that used to be trained, reduce the likelihood of an unfounded hallucination in many contexts. In fact, the model can provide an answer that indicates a lack of knowledge regarding a given topic. In addition, the output can include the sources from which the response was generated, to increase the credibility of the response.

Disadvantages

These systems, to be reliable, require large datasets or knowledge bases for contextual information retrieval, which can be expensive to create and maintain. For this very reason, they also present a scalability problem compared to other pure language model approaches. It also does not rule out the risk of repetition of facts at the surface level without deeper understanding if the knowledge source itself contains errors or factual gaps.

Although RAG prompts help reduce hallucination through facts retrieved from predefined databases, the technique is most effective when the underlying knowledge sources are extensive, of high quality and regularly updated. The use of multiple complementary knowledge bases can also provide greater robustness.

ReAct

ReAct prompts, or Recursive Assistant, are designed to recursively query the model about its thought process and confidence levels at each stage. The main objectives are:

encourage the model to think deeply about its reasoning rather than respond impulsively
enable calibration of uncertainty and bring out knowledge gaps.
improve truthfulness and avoid false content.

Here is an example of a ReAct prompt.

We can deepen the search recursively:

The lack of specific sources of evidence may indicate potential knowledge gaps. We can therefore rephrase the initial question.

If the model is still unable to provide verified external references, it demonstrates introspection about the limits of its knowledge, reducing the chances of a false invention.

Advantages

React prompts build recursive transparency into the model reasoning process without the use of external sources. Through a chain of reasoning, it allows knowledge gaps to emerge and encourages calibration of model confidence.

Disadvantages

Application of this technique leads to several interactions with the model, which implies that chat can be tedious and/or inefficient. It does not avoid the risk of overfitting the model, which to respond articulately to recursion does not improve the integrity of the underlying knowledge. In fact, it does not incorporate an external knowledge base like RAG systems that can be relied upon to verify the given response.

Although ReAct queries alone do not guarantee full veracity, recursively questioning the model’s confidence and reasoning is a useful technique for reducing blind hallucination. Suggestions can be made more robust by combining ReAct questioning with corroborating evidence from external knowledge sources.

Chain-of-Verification (CoVe) Prompt

Chain-of-Verification (CoVe) prompts explicitly require the model to provide step-by-step verification of its answers, citing authoritative external sources.

The prompt is formulated as a series of verifiable logical inferences to reach the final answer:

For instance:

By setting up a chain of reasoning, grounded in verification at each step, CoVe’s suggestions reduce speculation unrelated to reality. Let’s analyze the strengths and weaknesses of this approach.

Advantages

Chained logical inferences reinforce systematic and structured thinking, and the explicit verification requirement minimizes blind assumptions. The gradually revealed context focuses the answer without leaving room for hallucinations.

Disadvantages

CoVe’s technique reduces the linguistic flexibility of the prompt in exchange for a gain in logical transparency. It works well for queries with fairly linear reasoning flows, but can become tedious for analyses that require more unstructured inference. If the reasoning is complex or, in some even ambiguous, this technique can be difficult to apply. Finally, it requires external references that are not always available.

More To Explore

Artificial intelligence

AI: prompt engineering to reduce hallucinations [part 1]

Prompt engineering techniques allow us to improve the reasoning and responses provided by LLMs , such as ChatGPT. However, are we sure that the responses received are correct? In some cases no! When this happens, the model is said to have hallucinated. Let’s find out what this is and what are the techniques to reduce the probability of getting them.

Alessandro Fiori 17 June 2024

Python language

Pandas: data analysis with Python [part 2].

Pandas is a Python library that allows us to analyze data from a variety of sources. Among the most useful features we surely find several functions to clean our data and extract some statistics about the distribution of values of various attributes. In addition, we can create aggregations with different logics and graph the data to extract more information. Let’s find out how to do all this with just a few lines of code!

Alessandro Fiori 3 June 2024

AI: prompt engineering to reduce hallucinations [part 1]

Share

Retrieval augmented generation (RAG)

Advantages

Disadvantages

ReAct

Advantages

Disadvantages

Chain-of-Verification (CoVe) Prompt

Advantages

Disadvantages

More To Explore

AI: prompt engineering to reduce hallucinations [part 1]

Pandas: data analysis with Python [part 2].

Leave a Reply Cancel reply

Progetta con MongoDB!!!

Design with MongoDB!!!