Google Docs provides language-model driven autocorrect, where it highlights “unlikely” strings of words and offers to replace…

2024-09-17

Google Docs provides language-model driven autocorrect, where it highlights “unlikely” strings of words and offers to replace them for you.

It makes a lot of sense. But it’s also a bit ironic, given that statistical language models started with Shannon’s information theory, which identifies “information” with surprisal. Like, you could imagine if you just keep right-clicking the text, eventually Google will rewrite it for you into something carrying zero bits of information.

That’s actually what LLM-written texts try to optimize for. Which I believe you can notice!

There is no “idea” in the poem except what is forced on it by the prompt; the text generation algorithm specifically tries to put as few bits as possible into the text. The more clichéd the poem is, the closer it is to optimal.

Sometimes humans fall into the same trap. I keep thinking about the editor who changed “a feeling of jealousy” into “a pang of jealously”, replacing the phrase by a cliché and deliberately erasing the bits of information contained in it.