Santa's AI Problems

Each year Santa and I get together to update his Business Continuity Plan and review his Risk Register. Normally this happens while his team takes a short rest break in preparation for Christmas. This year we couldn’t meet as he was, as he put it, too busy figuratively firefighting. (He’s a skilled literal firefighter for obvious reasons). So it wasn’t until January that we managed to meet up virtually for our usual chat.

Santa looking exhausted after a bad Christmas

Santa looked exhausted. Christmas always takes a lot out of him, but normally he has recovered by now.

"So what happened?" I asked him.

He stared at me, then whispered the word as if saying it out loud would cause his troubles to return: “ChatGPT”.

Not ChatGPT specifically, he quickly explained. But the idea of ChatGPT. That was the cause of the series of disasters that had almost caused Christmas not to happen.

"Our IT operation here is one of the largest on the planet. We have to determine who deserves presents, handle vast amounts of correspondence in many languages, and handle the logistics of sourcing, constructing, and delivering items from all over the world.

"Our elves have always been at the forefront of technology. They have to be. And when we read about the recent advances in Large Language Models (LLMs), we immediately looked for ways AI could be used to streamline our operations.

"Language translation has always been an issue for us. Wouldn’t it be great if our elves could work in their native language, with machine translation to and from other languages. That would solve so many problems. Do you know how difficult it is to find an elf who is able to read letters written in Xhosa or Mirandese? Our elves would even be able to say QISmaS DatIvjaj ‘ej DIS chu’ DatIvjaj wih or Ĝojan Kristnaskon.

"Then we thought about the Naughty or Nice assessment. We have to handle billions of reports to decide whether people are naughty or nice. In AI terms, that’s a problem known as “sentiment analysis”.

"We also considered using AI to help our Human Resources department. Perhaps we could evaluate all our elf workers with a simple AI interview. That would save our managers a lot of effort and improve consistency in assessments.

“Finally there’s public relations. Our brand looks old. We haven’t done much to modernize our image since our questionnable tie-up with Coca-Cola in the 1930s. Some people think we should project a more modern image.”

“It all sounds very ambitious”, I remarked, “but it's quite conservative compared to some of the proposals I’ve been hearing recently. So what went wrong?”

He sighed. “Where shall I start? It wasn’t like we didn’t prototype things. But although we did pilot studies, things started to go wrong when we moved to production”.

“Perhaps we should start with the Naughty and Nice list?”

“Yes, that one’s definitely the biggie. The Naughty or Nice list drives our entire operation. Easy, we thought. We have over a thousand years of training data - written reports made by our agents around the world, each helpfully tagged with a classification of Naughty or Nice. We immediately started digitizing our archive and feeding the results into our learning algorithm. But when we started testing on real data we realized we had a problem. Does a report of showing too much skin in public mean you are Naughty or Nice, or is it neutral? If you often hang out with a group of naughty people, does that make you naughty too? It depends upon your sex, your status, and where and when you are. It’s easy to create accidental bias if you don’t understand local social norms. We found that by combining all the reports we’d created a monster of a model which was both sexist and racist, with a penchant for Victorian moral values. That’s probably due to the excessive number of written reports from that era. By the time we’d realized the approach wasn’t viable - world-wide historical data couldn’t provide the situation-specific classification we needed - we’d wasted a lot of effort. It took a lot of manual processing to get things back on track. If you didn’t get a Christmas present or lump of coal last year, that might be the reason.”

“So the Naughty or Nice list didn’t go so well. How did using large language models for email work out? Machine translation is getting pretty good now, isn’t it?”

Santa looked me straight in the face. “You realize how little is written in Elvish, don’t you? There’s almost nothing published in modern Elvish, just a few poems composed by Tolkien who wasn’t even a native speaker. We quickly gave up on machine translation into modern Elvish. But even though they had been told not to, some elves started using ChatGPT to write email to our suppliers. And since the output was in a foreign language, often they didn’t read the email before sending it. Apparently the language model we were using had been trained on online reviews. It couldn’t explain why, but it started including plausible reasons for delaying or avoiding payments into outgoing emails. We discovered the problem when a supplier received an email complaining about the poor audio quality of Christmas Trees. Shortly afterwards, we falsely accused Apple of making fake Apple products. We had to fire a few elves over that one. Once our elves understood the possibility of AI hallucinations and what might happen if they didn’t check the output, it didn’t happen again.”

“You said you tried using large language models for HR?”

“Yes. Somebody in our HR department thought it would be a good idea to use a widely available LLM to generate our annual employment reviews. We’re not like most organizations. It generated bland questionnaires and tried to set our elves quite unreasonable objectives. “Where do you see yourself five years from now?” isn’t something to ask an elf working in the toy factory at the North Pole. Asking an elf about their “personal growth objective” is particularly insulting. People accept that somebody can make the occasional mistake, but with the sudden burst of AI-assisted mistake making, we nearly had a strike on our hands.”

“Surely you had some successes”

“Public relations and branding. That mostly worked. As you know, our brand is very important to us. Think what the world would be like if everybody still thought about Krampus at Christmas! We used generative AI to come up with some new ideas, something that is very hard when you’ve been doing the same thing for hundreds of years. We came up with a few good ideas which unfortunately I can’t tell you about as we prefer to run our PR campaigns in stealth mode. We also tried using generative AI to come up with some new images. That didn’t work out so well. Initially our creatives were worried about their jobs. But when they discovered that the generative AI network we were using had an obsession with buxom females in fantasy landscapes… Perfect, perhaps, if our brand was aimed at teenage boys. But we’re family-oriented. Their jobs are definitely safe for another year.”

Santa looked quickly to one side. “I have to go shortly. Do you have any other questions?”

“Can you tell me what I’m be getting for Christmas next year?” It was worth a try.

“As a large language model I only have knowledge of the world before last December. Is there anything else I can help you with today?”

“You’re pretending to be an AI generated Santa so you don’t have to answer my question, aren’t you?”

Santa smiled for the first time and guffawed loudly. “Ho, ho, ho” he roared. “I bet you didn’t think of using Artificial Intelligence as an excuse to avoid questions! But trust me, you really don’t want to know the answer.”

And at that point Santa closed the connection.

Michael Z. Bell

7 January 2026