Party tricks are a bad way to evaluate ChatGPT

People talk about ChatGPT and similar large language models as being “trained on the Internet as a whole”, which makes it seem even more magical when it manages to do something ridiculous like knock out a rap battle between George Carlin and Julius Caesar, or explain quantum physics in haiku. These are essentially party tricks, but they feel more impressive than doing something useful like summarizing a document or writing a cover letter because it feels like a good test of generalization. After all, what are the odds that the model happened to be trained on something so random?

Unfortunately the answer seems to be “a lot higher than I think”, and it’s kind of alarming how often I come up with a “novel” task to give to ChatGPT or Stable Diffusion only to find something close to it with a quick web search. Turns out the Internet is really vast (who knew?), and apparently I’m also not nearly as creative as I think I am. Add the fact that since ChatGPT-3.5 OpenAI has included human-generated answers to commonly-asked prompts in their training, and it’s especially hard to figure out whether what it’s doing is magic or “just” clever interpolation.