Prompt engineering: tecniche di prompting [parte 2]

Le tecniche proposte dall'ingegneria dei prompt permettono di svolgere compiti anche molto complessi. Alcune di esse, come zero-shoot, few shoot e Chain-of-Thought (CoT), riescono a fornire risultati ottimi in alcuni contesti. Laddove però i modelli hanno delle limitazioni si può ovviare con tecniche ancora più avanzate come Self-Consistency, Generated Knowledge e Tree of Thoughts.

Reading time: 9 minutes

L’uso dei LLM è sempre più diffuso e le loro capacità di generazione del testo migliorano man mano. Nonostante ciò, sicuramente hanno alcune limitazioni. Ad esempio, come abbiamo visto nell’articolo XXX, il ragionamento su problemi matematici è limitato e in molti casi produci risultati errati. Abbiamo visto, però, come è possibile superare questi limiti agendo solamente sul prompt invece che riaddestrare un modello. Ciò ci permette di ottimizzare il risultato senza investire in risorse computazionali ulteriori, ma basandoci su ciò che abbiamo già a disposizione. Nell’articolo XXX abbiamo introdotto alcune tecniche come il few shot e il chain-of-thought. Di seguito vedremo altre tecniche di prompt engineering che possono risultare efficaci per alcuni compiti.

Self-Consistency

La tecnica del self-consistency, ossia l’autoconsistenza, si basa sull’idea di campionare percorsi di ragionamento multipli e diversificati attraverso la few-shots CoT e di utilizzare le generazioni per selezionare la risposta più coerente (Wang et al. (2022). Questo aiuta a migliorare le prestazioni della richiesta di CoT su compiti che coinvolgono l’aritmetica e il ragionamento di senso comune.

Proviamo il seguente esempio di ragionamento aritmetico:

				
					Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done,
there will be 21 trees. How many trees did the grove workers plant today?
A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted.
So, they must have planted 21 - 15 = 6 trees. The answer is 6.

Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.

Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
A: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74
chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.

Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops
did Jason give to Denny?
A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of
lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.

Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does
he have now?
A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so
in total he has 7 + 2 = 9 toys. The answer is 9.

Q: There were nine computers in the server room. Five more computers were installed each day, from
monday to thursday. How many computers are now in the server room?
A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 =
20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers.
The answer is 29.

Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many
golf balls did he have at the end of wednesday?
A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On
Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.

Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
A: She bought 5 bagels for $3 each. This means she spent $15. She has $8 left.

Q: When I was 6 my sister was half my age. Now I’m 70 how old is my sister?
A:

Output 1

				
					If when you were 6, your sister was half your age, that means she was 3 years old at the time. Since then, the age gap between you two remains constant, so she will always be 3 years younger than you. Therefore, if you are now 70 years old, your sister would be 67 years old.

Output 2

				
					When you were 6, your sister was half your age, which means she was 3 years old at that time. The age gap between you and your sister is 6 - 3 = 3 years. So, when you are now 70 years old, your sister would be 70 - 3 = 67 years old.

Gli output che abbiamo ottenuto da ChatGPT, seppur in forma diversa, hanno prodotto lo stesso risultato finale. Nell’articolo originale, si evidenzia che in alcuni casi l’LLM utilizzato potrebbe fornire risposte sbagliate. Ciò accadeva anche alla semplice applicazione della domanda diretta. Infatti, se sottoponiamo la seguente domanda

				
					When I was 6 my sister was half my age. Now
I’m 70 how old is my sister?

La risposta di chatGPT sarà la seguente

				
					When you were 6, your sister was half your age, which means she was 3 years old (half of 6).

Now, if you're 70 years old, and assuming your sister is still alive, she would be 67 years old, because the age difference between you two remains the same.

Ciò dimostra che l’LLM di OpenAI è notevolmente migliorato nell’ultimo periodo. Infatti, nell’articolo di Wang et al. (2022) si evidenziava che la risposta che si otteneva era invece questa:

La risposta è ovviamente sbagliata. La tecnica vista precedentemente mirava a superare questo limite che comunque si potrebbe superare su determinati compiti e/o con differenti LLM.

Prompt Generated Knowledge

I LLM continuano a essere migliorati. Una tecnica largamente usata include la capacità di incorporare conoscenze o informazioni per aiutare il modello a fare previsioni più accurate.

Utilizzando un’idea simile, è possibile utilizzare il modello anche per generare conoscenza prima di fare una previsione? È quello che si propone nell’articolo di Liu et al. 2022: generare conoscenza da utilizzare come parte del prompt. In particolare, quanto è utile per compiti come il ragionamento di senso compiuto?

Proviamo con un semplice prompt:

				
					Part of golf is trying to get a higher point total than others. Yes or No?

Per alcuni LLM la risposta potrebbe essere Yes. Questo tipo di errore rivela i limiti di alcuni LLM nell’eseguire compiti che richiedono una maggiore conoscenza del mondo. Tuttavia, se provate con ChatGPT la risposta è la seguente:

				
					No, that statement is incorrect. In golf, the objective is to complete the course in as few strokes as possible, not to achieve a higher point total. The ultimate goal is to have the lowest score, not the highest. Each stroke is counted, and the player with the fewest strokes at the end of the round is the winner.

Nel caso l’LLM utilizzato abbia lacune nella conoscenza di un determinato contesto, come possiamo migliorare questo aspetto con la generazione di conoscenza? Innanzitutto, generiamo alcune “conoscenze”:

				
					Input: Greece is larger than mexico.
Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375 sq km, making Mexico 1,389% larger than Greece.

Input: Glasses always fog up.
Knowledge: Condensation occurs on eyeglass lenses when water vapor from your sweat, breath, and ambient humidity lands on a cold surface, cools, and then changes into tiny drops of liquid, forming a film that you see as fog. Your lenses will be relatively cool compared to your breath, especially when the outside air is cold.

Input: A fish is capable of thinking.
Knowledge: Fish are more intelligent than they appear. In many areas, such as memory, their cognitive powers match or exceed those of ’higher’ vertebrates including non-human primates. Fish’s long-term memories help them keep track of complex social relationships.

Input: A common effect of smoking lots of cigarettes in one’s lifetime is a higher than normal chance of getting lung cancer.
Knowledge: Those who consistently averaged less than one cigarette per day over their lifetime had nine times the risk of dying from lung cancer than never smokers. Among people who smoked between one and 10 cigarettes per day, the risk of dying from lung cancer was nearly 12 times higher than that of never smokers.

Input: A rock is the same size as a pebble.
Knowledge: A pebble is a clast of rock with a particle size of 4 to 64 millimetres based on the Udden-Wentworth scale of sedimentology. Pebbles are generally considered larger than granules (2 to 4 millimetres diameter) and smaller than cobbles (64 to 256 millimetres diameter).

Input: Part of golf is trying to get a higher point total than others.
Knowledge:

Le risposte fornite dal Liu et al (2022) sono le seguenti.

Output 1:

				
					The objective of golf is to play a set of holes in the least number of strokes. A round of golf typically consists of 18 holes. Each hole is played once in the round on a standard golf course. Each stroke is counted as one point, and the total number of strokes is used to determine the winner of the game.

Output 2:

				
					Golf is a precision club-and-ball sport in which competing players (or golfers) use many types of clubs to hit balls into a series of holes on a course using the fewest number of strokes. The goal is to complete the course with the lowest score, which is calculated by adding up the total number of strokes taken on each hole. The player with the lowest score wins the game.

A questo punto gli autori suggeriscono di integrare le conoscenze e ottenere una previsione. Riformattiamo la domanda in formato D/R per guidare il formato della risposta.

				
					Question: Part of golf is trying to get a higher point total than others. Yes or No?

Knowledge: The objective of golf is to play a set of holes in the least number of strokes. A round of golf typically consists of 18 holes. Each hole is played once in the round on a standard golf course. Each stroke is counted as one point, and the total number of strokes is used to determine the winner of the game.

Explain and Answer:

Nell’articolo si ottengono differenti risposte in base al LLM usato. Alcune sono corrette in quanto il modello è più confidente sulle informazioni fornite. In altri casi invece la risposta continua ad essere sbagliata. Ma ChatGPT come si comporta rispetto alla domanda originale? Questa è la sua risposta.

				
					No, that statement is incorrect. In golf, the objective is to complete the course in as few strokes as possible, not to achieve a higher point total. The ultimate goal is to have the lowest score, not the highest. Each stroke is counted, and the player with the fewest strokes at the end of the round is the winner.

Come si vede il modello proposto da ChatGPT è stato addestrato su una gran mole di dati che gli permette di rispondere in modo corretto anche in alcuni contesti più di nicchia e/o diversi dal trend abituale. Nonostante ciò, la tecnica di fornire conoscenza direttamente dal prompt può venire in soccorso su determinati compiti e contesti.

Tree of Thoughts (ToT)

Come abbiamo visto in questo articolo e in quello precedente, le tradizionali tecniche di prompting risultano inefficienti per task più complessi dove, ad esempio, è richiesta maggiore strategia o l’esplorazione di più alternative. Yao et el. (2023) e Long (2023) hanno proposto Tree of Thoughts (ToT), una generalizzazione della tecnica di chain-of-thought prompting che incoraggia l’esplorazione di più “pensieri” che vengono utilizzati come step intermedi per effettuare problem solving con i Language Model (LM).

La tecnica ToT crea un albero di pensieri, come mostrato in figura, dove i pensieri sono sequenze linguistiche che rappresentano i passi per raggiungere la risoluzione del problema. Questo approccio permette ad un LM di valutare i suoi stessi progressi intermedi verso la risoluzione del problema. L’abilità del LM di generare e valutare i “pensieri” viene combinata con algoritmi di ricerca (es.: breadth-first search e depth-first search), in modo da esplorare i pensieri con lookahead e backtracking.

Quando si usa la tecnica ToT, è necessario definire il numero di pensieri candidati (i più promettenti) e il numero di passi di necessari che il LM deve effettuare per raggiungere la soluzione. Nell’articolo di Yao et el. (2023), il Gioco del 24 viene utilizzato come task di ragionamento matematico che richiede una decomposizione in 3 passi, ognuno avente una equazione intermedia. Ad ogni passo, i migliori b=5 candidati vengono salvati.

Per effettuare una BFS nella tecnica ToT per il Gioco del 24, ogni passo candidato viene valutato in base alla possibilità di raggiungere il numero 24 attraverso l’operazione matematica proposta. Ad ognuno viene assegnata un’etichetta tra “sicuro/forse/impossibile”. Come affermato dagli autori, lo scopo è quello di promuovere le soluzioni parziali corrette, che possono essere verificate guardando in avanti di pochi passi, eliminare le soluzioni parziali impossibili basandosi, per esempio, sulla grandezza del numero “il numero è troppo piccolo/grande per raggiungere 24 nei prossimi step”, e tenere il resto, quelle etichettate con “forse”. I valori vengono campionati 3 volte per ogni passo.

I risultati riportati dagli autori dimostrano che questa tecnica risulta vincente rispetto a quelle che abbiamo visto già in precedenza. Ovviamente richiede di sviluppare codice ad-hoc per sottomettere sia le richieste al modello che valutare le varie risposte. Qui trovate un esempio.

Ma se vogliamo applicare questa tecnica usando solo un prompt testuale? Hulbert (2023) ha proposto una possibile struttura di prompt per rispondere a questa domanda.

Un esempio è il seguente:

				
					Imagine three different experts are answering this question.
All experts will write down 1 step of their thinking,
then share it with the group.
Then all experts will go on to the next step, etc.
If any expert realises they're wrong at any point then they leave.
The question is...

More To Explore

Intelligenza artificiale

Gradio: applicazioni web in python per AI [parte2]

Gradio è una libraria python che ci permette di creare applicazioni web in modo veloce e intuitivo per i nostri modelli di machine learning e AI. Le nostre applicazioni richiedono sempre un’interazione con l’utente e una personalizzazione del layout. Scopriamo, mediante degli esempi, come migliorare le nostre applicazioni.

Alessandro Fiori 22 Aprile 2024

Intelligenza artificiale

Gradio: applicazioni web in python per AI [parte1]

Scrivere applicazioni web per i nostri modelli di machine learning e/o di intelligenza artificiale può richiedere molto tempo e competenze che non sono in nostro possesso. Per snellire e velocizzare questo compito ci viene in aiuto Gradio, una libreria Python pensata per creare applicazioni web con poche righe di codice. Scopriamo le sue funzionalità base con alcuni esempi.

Alessandro Fiori 8 Aprile 2024