38.1 Agents that generate text: what’s the underlying decision-making problem?
In § 34.1 we saw that present-day Large Language Model determine their belief in what the next word or token should be, only based on the sequence of words seen so far, and on the frequencies of such sequences in a huge collection of texts. These beliefs are not determined on possible future outcomes of their choice of words, as instead is the case in human conversation. We managed to use our Optimal Predictor Machine in the same way in a simplified setting.
Large Language Models do not output probabilities, however: they output words. So at every step they are effectively choosing one of the possible words about which they calculated their beliefs. This is decision, and to be optimal and self-consistent it should be based on some outcomes and their utilities.
What are the outcomes and utilities underlying word-choice in Large Language Models? This is still an open question. The approaches followed so far in the literature have been based more on intuition and on “playing around” rather than framing the problem in a systematic way. This means that there’s a lot of room for an AI engineer to bring forth a better understanding and major improvements.
Let’s consider our OPM agent used as a “small language model” in § 34.2. It was only calculating degrees of belief about the \(n\)th token of a string of \(n\) tokens. First of all let’s ask again: beliefs about what? We could say: the belief that this \(n\)th token is the correct one, in this particular sequence. Keep in mind that this is a suspicious point of view; can we really say that there’s a “correct” token?
If we want our OPM agent to also choose one of the possible tokens, we need to find the appropriate set of outcomes and their utilities for this decision problem.
Exercise 38.1
On your own or in group, think about the problem above.
What kind of meaningful outcomes are there in this problem?
Can they be easily assessed? Do they depend on the choice of present token alone, or in future ones as well?
What are the utility values for the outcomes? How to assess them?
Can we find alternative points of view about outcomes and utilities – points of view that do not reflect natural language but may make sense somehow in the present context?
38.2 A tentative decision-making point of view
A tentative and somewhat vague point of view is the following: the agent should generate, on the long run, text that “looks natural”. Now if we assign a given utility, say \(+1\), to the “correct” choice of token, the agent should at each step choose the token that always has highest probability. This, however, eventually leads to a circular repetition, as soon as a string of \(n-1\) token appears again. Let’s see an example of this phenomenon with our OPM agent.
## Load main functionssource('tplotfunctions.R')source('OPM_nominal.R')source('SLMutilities.R')## Set seed for reproducibilityset.seed(700)## Prepare metadata and training datangramfiles <-preparengramfiles(inputfile ='texts/human_rights.txt',outsuffix ='rights',n =3,maxtokens =Inf,numbersasx =TRUE)
We give an initial prompt of two tokens: ARTICLE, x.
We let the agent calculate the degrees of belief about the next token.
The next token is chosen as the one having highest probability. This corresponds to a decision process with a unit utility matrix: correct token choices have utility \(+1\); and wrong choices, \(0\).
The first token is discarded, the second becomes first, and the last token generated becomes the second.
Repeat from 1.
## Starting tokensword1 <-'ARTICLE'word2 <-'x'## Convenience variable to wrap texttextlength <-0for(i in1:100){## Print the starting tokens if first iterationif(i ==1){ cat('\n', word1, word2, '') }## Calculate beliefs about next token probs <-infer(agent = opmSLM,predictor =list(word1 = word1, word2 = word2),predictand ='word3')## Decision: next token## is the one with max prob. word3 <-names(which.max(probs))## Wrap text if it'll get too long textlength <- textlength +nchar(word3) +1if(textlength >50){cat('\n') textlength <-0 }## Print chosen tokencat(word3, '')## Use last two tokens for new prediction word1 <- word2 word2 <- word3}
ARTICLE x EVERYONE HAS THE RIGHT TO FREEDOM OF OPINION AND
EXPRESSION ; THIS RIGHT INCLUDES FREEDOM TO CHANGE HIS
RELIGION OR BELIEF , AND THE RIGHT TO FREEDOM OF OPINION
AND EXPRESSION ; THIS RIGHT INCLUDES FREEDOM TO
CHANGE HIS RELIGION OR BELIEF , AND THE RIGHT TO FREEDOM
OF OPINION AND EXPRESSION ; THIS RIGHT INCLUDES
FREEDOM TO CHANGE HIS RELIGION OR BELIEF , AND THE RIGHT
TO FREEDOM OF OPINION AND EXPRESSION ; THIS RIGHT
INCLUDES FREEDOM TO CHANGE HIS RELIGION OR BELIEF , AND
THE RIGHT TO FREEDOM OF OPINION AND EXPRESSION ; THIS
RIGHT INCLUDES FREEDOM TO
As you see, the agent started looping at “THERIGHT”. The text doesn’t look natural from this point of view. The set of decisions and of utilities we chose for the agent are not appropriate.
Let’s try to define a little more precisely what we mean by “look natural”.
One condition is the absence of text loops like the one above. The text should be to us somewhat unpredictable.
Also, the frequencies with which 3-grams appear in the output text should reflect those of natural language – like the frequencies in the texts used to train the agent.
These conditions suggest that we use an “unpredictable decision”, in the sense discussed in § 36.3.
Let’s introduce the following decision:
“Unpredictably select and output the next token according to its belief”.
In the long run, making this decision will produce text that is unpredictable, and also whose 3-gram frequencies reflect those of the training texts, because the agent bases its beliefs on those (though beliefs and frequencies are not exactly equal). Therefore it seems that, in the long run, this decision will lead to the highest utility.
The function generatetext() in the file SLMutilities.R applies the heuristic strategy above and outputs the resulting text. We saw an example output in § 34.5. Let’s see another example output:
text <-generatetext(agent = opmSLM,prompt =c('ARTICLE', 'x'),stopat =70,online =FALSE)wrapprint(text, wrapat =50)
ARTICLE x EVERYONE HAS THE RIGHT TO JUST AND
FAVOURABLE CONDITIONS OF WORK AND TO SHARE IN SCIENTIFIC
ADVANCEMENT AND ITS BENEFITS. EVERYONE HAS THE RIGHT TO
FREEDOM OF MOVEMENT AND RESIDENCE WITHIN THE BORDERS OF
EACH STATE. EVERYONE HAS THE RIGHT TO BE PRESUMED
INNOCENT UNTIL PROVED GUILTY ACCORDING TO LAW IN A
DEMOCRATIC SOCIETY. THESE RIGHTS AND FREEDOMS, EVERYONE
SHALL BE GIVEN TO THEIR ...
We’re approximating
The discussion above is not a rigorous application of the principle of maximal expected utility. We did not list properly the decisions, or the outcomes, or the utility values. The reasoning was purely heuristic and intuitive.
For this reason it is quite possible that the strategy adopted is not optimal.
Exercise 38.2
Try to formalize our heuristic reasoning above and to derive the strategy rigorously, from the principle of maximal expected utility. (This is not an easy task, but why not try?)
Reverse-engineer the function generatetext() and verify that it indeed implements the decision discussed above.
Try generating text with the OPM agent using other prompts. You should notice that sometimes the output text suddenly becomes gibberish, or is gibberish right from the start. It is also possible that gibberish output suddenly becomes more coherent again.
Can you explain why these transitions from between sensible and insensible text occur?
Use other text to train the OPM agent and explore its output. You can prepare texts yourself, or use the ones in the directory code/texts. In size order: