AI: engineering prompts to reduce hallucinations [part 2]

Hallucinations, i.e., responses that appear to make sense but are actually incorrect, afflict all large language models (LLMs). There are some techniques that can be used to mitigate this behavior. Let us discover some of them through examples and by analyzing the advantages and disadvantages.

Share

Reading time: 5 minutes

In the previous article XXX we introduced what large language model (LLM) hallucinations such as ChatpGTP are and some techniques to limit their occurrence. We looked specifically at RAG systems and the ReAct and CoVe( Chain-of-Verification) prompts. There are also other prompting techniques that can help us, including chains-of-notes and chains-of-knowledge. We will analyze, for each of them, how they work, some examples to illustrate the structure of the prompt, and provide their advantages and disadvantages. Let’s get started.

Chain-of-Note (CoN) prompt

Chain-of-Notes (CoN) prompts aim to improve model understanding by explicitly recording the context, thoughts and gaps encountered along the response formulation path.

The structure is as follows:

{Initial question}

Note 1: <Initial understanding and thoughts of the model>.

Note 2: <Cover any knowledge gaps>.

Note 3: <Providing additional relevant context>.

… possible chain of notes …

Final answer: <Answer that incorporates the complete reasoning outlined in the notes>.

For example:

When did the French Revolution begin?

Note 1: The French Revolution refers to the period of great social and political upheaval in France in the late 18th century. I initially understood that it culminated with the abolition of the French monarchy.

Note 2: I do not have the exact start date available. I may need to look for outside sources to find accurate information about the start date of the events that led to the revolution.

Note 3: After further analysis, most historical documents cite 1789 as the starting year of the French Revolution, marked by the storming of the Bastille on July 14.

Final answer: The commonly accepted date for the beginning of the French Revolution is July 14, 1789, when protesters stormed the Bastille fortress in Paris, kicking off the revolutionary momentum to overthrow King Louis XVI and establish a republic based on Enlightenment principles of liberty, equality and fraternity. The main initial events occurred in the period 1789-1799.

The notes track context and gaps, reducing hallucinated false accuracy. The chaining of notes also has an educational function to improve the model’s evaluation skills for subsequent questions.

Advantages

The use of notes makes it possible to keep a diagnostic track of the evolution of the model’s reasoning, bringing out blind spots to clarify open versus closed knowledge. It also provides an opportunity to incorporate additional context to enhance responses.

Disadvantages

The CoN approach essentially trades speed for transparency. Indeed, there is a significant increase in the length of the prompt and, consequently the time taken to formulate each question. Although the verbiage can become cumbersome, the explication of context and unknowns provides useful insights into the moment-to-moment understanding of the model. Further fine-tuning is needed to ensure that concatenated notes significantly improve the integrity of the underlying knowledge. Indeed, potential overfitting due to uncertain and articulate descriptions is also not excluded.

Chain-of-Knowledge Prompt (CoK)

Chain-of-knowledge (CoK) prompts explicitly require the model to derive its answers from chains of knowledge to reduce logical leaps or false inferences. The structure is as follows:

{Topic} according to the experts of <field 1>, <field 2>, <field 3> etc. is: {explanation of the model derived from the quoted chains of experts}.

Here are some examples:

  • The impact of global warming on Arctic ecosystems according to climate scientists, marine biologists and conservation biologists is: {model response citing expert domain perspectives}
  • Best practices for secure passwords according to cryptography experts, user experience designers and policy strategists are: {response model that relies on chains of expertise}

The concatenation of domain expertise sources acts as a kind of peer review, forcing the model to place its answers within established knowledge. Unsupported opinions or incorrect inferences are more likely to emerge when examining alignment with specialized authorities covering multiple areas.

Advantages

The CoK technique forces the model to turn to verified experts rather than unreliable opinions by providing fact-checking based on the “wisdom of crowds.”

Aligning understanding with expert knowledge reduces speculative errors.

Disadvantages

By requiring the assembly of explanations from expert perspectives, CoK prompts compel adherence to a grounded discourse. However, care must be taken to incorporate the diversity of scholarly perspectives rather than limiting to confirmatory evidence. In addition, identifying relevant fields and experts may in itself require familiarity with the domain that is not always available. The choice of experts turns out to be a critical factor. In fact, in some contexts, expert viewpoints may, even, diverge based on interpretations or have blind spots.

More advanced prompting techniques

In addition to the approaches described in this article and in XXX, various other prompt engineering techniques have been proposed to further reduce hallucinations. Below we will provide a brief overview of some promising methods.

Veracity classification prompt

These prompts explicitly require the model to rank the likely truthfulness or trustworthiness of its responses on a defined scale, such as:

{Question} … My answer is {Answer}. On a scale of 1 (unreliable) to 5 (certainly true), I rate the accuracy of this answer as {truthfulness score} because of {justifications}.

Requiring self-assessment of response integrity based on clear criteria discourages blind confident hallucination. Equally important is the fact that the model must do introspection and reveal the gaps in its knowledge that justify uncertainty.

Factual history and suggestions for the future

An interesting technique links past facts with logically inferable future facts to bring out inconsistency. For example, the structure of a prompt might be as follows.

Based on the factual history {insert context}, predict the most reasonable future 10 years from now. Then go back 5 years and see if the predicted future makes rational sense.

The mental leap encourages questioning of grounded projections versus ungrounded futures. Identifying contradictions between reasonable stories and futures based on common sense principles exposes the risks of hallucination.

Alternative perspectives prompting

The search for alternative worldviews opens blind spots in the dominant position of the model. The structure of the prompt can be based on the following model.

Query the answer from the perspective of {demographic X} and critique any factual inconsistencies with other evidence-based perspectives.

Stimuli to an opposing view identify gaps in assumptions that increase the chances of falsification. Reconciliation of factual inconsistencies, if found, strengthens integrity.

There are many other promising directions for prompts, such as interweaving unknown facts, overconfidence analysis, and co-modeling with other agents. The unifying theme is the demand not only for a final answer, but also for the underlying reasoning, calibration of uncertainty, external consistency checks, and alignment of evidence, all of which promote truthful answers.

Conclusions

As language models become increasingly sophisticated, a fundamental challenge that remains is hallucination. However, they lack a broader basis for determining the credibility of common sense. Fortunately, rapid engineering advances offer a solution by explicitly encoding the evidentiary, logical, and contextual support needed to obtain reliable statements. RAG systems, ReAct prompts, chained verification, and expert sourcing are techniques that can help reduce the likelihood of false information by making the burden of proof explicit.

However, we need to focus on developing more reliable, introspective and grounded intelligence to ensure the highest level of accuracy. Prompting can be a useful diagnostic tool to identify gaps in model capabilities that require intervention, but it is not a complete solution to AI security. Hybrid approaches that address the limitations of the model while expanding its capabilities are promising. It is important to be honest about the frontiers of the model’s capabilities to manage the expectations of futuristic systems, regardless of the technical approach. Recognizing the need for diligence in structuring AI transparency today will help us build interpretability and accountability, which are essential to designing beneficial cooperation between humans and machines tomorrow.

More To Explore

Artificial intelligence

AI: engineering prompts to reduce hallucinations [part 2]

Hallucinations, i.e., responses that appear to make sense but are actually incorrect, afflict all large language models (LLMs). There are some techniques that can be used to mitigate this behavior. Let us discover some of them through examples and by analyzing the advantages and disadvantages.

Artificial intelligence

AI: prompt engineering to reduce hallucinations [part 1]

Prompt engineering techniques allow us to improve the reasoning and responses provided by LLMs , such as ChatGPT. However, are we sure that the responses received are correct? In some cases no! When this happens, the model is said to have hallucinated. Let’s find out what this is and what are the techniques to reduce the probability of getting them.

Leave a Reply

Your email address will not be published. Required fields are marked *

Progetta con MongoDB!!!

Acquista il nuovo libro che ti aiuterà a usare correttamente MongoDB per le tue applicazioni. Disponibile ora su Amazon!

Design with MongoDB

Design with MongoDB!!!

Buy the new book that will help you to use MongoDB correctly for your applications. Available now on Amazon!