
Tesla first started making promises about full self driving vehicles in 2013. Since then, AI models have been trained on millions of miles of driving. Surely that means they've seen every possible situation, and that full self-driving is just around the corner. So where's my self-driving car?
Let’s look at a list of some of the conditions I’ve encountered while driving or being a passenger in a vehicle. (It's a long list. When you think you’ve got the idea, you may want to skip to the next section!):
I’m not a commercial driver, but I’ve encountered all except one of these at least once since I started driving. Each problem is unique. Each problem is less about the mechanics of driving and more about understanding the wider environment and taking the appropriate action. Some of these are hard problems even for a human (ever had to negotiate which way you should be turning with a police officer directing traffic? Ever had to negotiate your way past a police road block? Is it better to hit a deer or drive off the road into a ditch?)
Self driving trains work in because they operate in a highly constrained environment. Problems requiring human judgement have been mostly engineered out of the system with fences, barriers, etc. Trains travel fast, can’t stop quickly, and many lives have been lost in previous accidents. Multiple redundant safety systems are incorporated to reduce the risk of collision with another train.
A real full self-driving car needs to work in an unconstrained environment where the workings of the outside world need to be understood and taken into account. That requires a degree of general intelligence which no autonomous vehicle is likely to have in the foreseeable future. Expect driver assist, not driver replacement. Self-driving taxis will therefore continue to require the occasional remote human intervention whatever the grandiose promises being made to investors
The AI model for self-driving cars may be able to train on common conditions — but it’s the unforeseen uncommon conditions that will kill you.
Stay up to date with the free Risky Thinking Newsletter.
]]>

Santa looked exhausted. Christmas always takes a lot out of him, but normally he has recovered by now.
"So what happened?" I asked him.
He stared at me, then whispered the word as if saying it out loud would cause his troubles to return: “ChatGPT”.
Not ChatGPT specifically, he quickly explained. But the idea of ChatGPT. That was the cause of the series of disasters that had almost caused Christmas not to happen.
"Our IT operation here is one of the largest on the planet. We have to determine who deserves presents, handle vast amounts of correspondence in many languages, and handle the logistics of sourcing, constructing, and delivering items from all over the world.
"Our elves have always been at the forefront of technology. They have to be. And when we read about the recent advances in Large Language Models (LLMs), we immediately looked for ways AI could be used to streamline our operations.
"Language translation has always been an issue for us. Wouldn’t it be great if our elves could work in their native language, with machine translation to and from other languages. That would solve so many problems. Do you know how difficult it is to find an elf who is able to read letters written in Xhosa or Mirandese? Our elves would even be able to say QISmaS DatIvjaj ‘ej DIS chu’ DatIvjaj wih or Ĝojan Kristnaskon.
"Then we thought about the Naughty or Nice assessment. We have to handle billions of reports to decide whether people are naughty or nice. In AI terms, that’s a problem known as “sentiment analysis”.
"We also considered using AI to help our Human Resources department. Perhaps we could evaluate all our elf workers with a simple AI interview. That would save our managers a lot of effort and improve consistency in assessments.
“Finally there’s public relations. Our brand looks old. We haven’t done much to modernize our image since our questionnable tie-up with Coca-Cola in the 1930s. Some people think we should project a more modern image.”
“It all sounds very ambitious”, I remarked, “but it's quite conservative compared to some of the proposals I’ve been hearing recently. So what went wrong?”
He sighed. “Where shall I start? It wasn’t like we didn’t prototype things. But although we did pilot studies, things started to go wrong when we moved to production”.
“Perhaps we should start with the Naughty and Nice list?”
“Yes, that one’s definitely the biggie. The Naughty or Nice list drives our entire operation. Easy, we thought. We have over a thousand years of training data - written reports made by our agents around the world, each helpfully tagged with a classification of Naughty or Nice. We immediately started digitizing our archive and feeding the results into our learning algorithm. But when we started testing on real data we realized we had a problem. Does a report of showing too much skin in public mean you are Naughty or Nice, or is it neutral? If you often hang out with a group of naughty people, does that make you naughty too? It depends upon your sex, your status, and where and when you are. It’s easy to create accidental bias if you don’t understand local social norms. We found that by combining all the reports we’d created a monster of a model which was both sexist and racist, with a penchant for Victorian moral values. That’s probably due to the excessive number of written reports from that era. By the time we’d realized the approach wasn’t viable - world-wide historical data couldn’t provide the situation-specific classification we needed - we’d wasted a lot of effort. It took a lot of manual processing to get things back on track. If you didn’t get a Christmas present or lump of coal last year, that might be the reason.”
“So the Naughty or Nice list didn’t go so well. How did using large language models for email work out? Machine translation is getting pretty good now, isn’t it?”
Santa looked me straight in the face. “You realize how little is written in Elvish, don’t you? There’s almost nothing published in modern Elvish, just a few poems composed by Tolkien who wasn’t even a native speaker. We quickly gave up on machine translation into modern Elvish. But even though they had been told not to, some elves started using ChatGPT to write email to our suppliers. And since the output was in a foreign language, often they didn’t read the email before sending it. Apparently the language model we were using had been trained on online reviews. It couldn’t explain why, but it started including plausible reasons for delaying or avoiding payments into outgoing emails. We discovered the problem when a supplier received an email complaining about the poor audio quality of Christmas Trees. Shortly afterwards, we falsely accused Apple of making fake Apple products. We had to fire a few elves over that one. Once our elves understood the possibility of AI hallucinations and what might happen if they didn’t check the output, it didn’t happen again.”
“You said you tried using large language models for HR?”
“Yes. Somebody in our HR department thought it would be a good idea to use a widely available LLM to generate our annual employment reviews. We’re not like most organizations. It generated bland questionnaires and tried to set our elves quite unreasonable objectives. “Where do you see yourself five years from now?” isn’t something to ask an elf working in the toy factory at the North Pole. Asking an elf about their “personal growth objective” is particularly insulting. People accept that somebody can make the occasional mistake, but with the sudden burst of AI-assisted mistake making, we nearly had a strike on our hands.”
“Surely you had some successes”
“Public relations and branding. That mostly worked. As you know, our brand is very important to us. Think what the world would be like if everybody still thought about Krampus at Christmas! We used generative AI to come up with some new ideas, something that is very hard when you’ve been doing the same thing for hundreds of years. We came up with a few good ideas which unfortunately I can’t tell you about as we prefer to run our PR campaigns in stealth mode. We also tried using generative AI to come up with some new images. That didn’t work out so well. Initially our creatives were worried about their jobs. But when they discovered that the generative AI network we were using had an obsession with buxom females in fantasy landscapes… Perfect, perhaps, if our brand was aimed at teenage boys. But we’re family-oriented. Their jobs are definitely safe for another year.”
Santa looked quickly to one side. “I have to go shortly. Do you have any other questions?”
“Can you tell me what I’m be getting for Christmas next year?” It was worth a try.
“As a large language model I only have knowledge of the world before last December. Is there anything else I can help you with today?”
“You’re pretending to be an AI generated Santa so you don’t have to answer my question, aren’t you?”
Santa smiled for the first time and guffawed loudly. “Ho, ho, ho” he roared. “I bet you didn’t think of using Artificial Intelligence as an excuse to avoid questions! But trust me, you really don’t want to know the answer.”
And at that point Santa closed the connection.
Stay up to date with the free Risky Thinking Newsletter.
]]>
My interest in with accurate time keeping probably dates from school days, when there were serious bragging rights to be earned from having an accurate watch which exactly matched the beeps of the Greenwich Time Signal on BBC Radio. Therefore, when a YouTube channel I followed showed how to create your own network time server using a cheap ESP32 micro-controller, I thought about building it. So I examined the AI generated example code.
I’ve tired to keep this at a high level, but to understand the problems you have to understand some of the details - and that is partly the point.
The code has to do two things:
The AI generated code:
The code “works”, in that it compiles and would give a syntactically correct reply if sent an NTP packet. It looks plausible.
But the code is wrong. Very wrong. And looking out how it is wrong gives some indication of the dangers of “vibe” coding.
I think you get the idea.
But perhaps, I thought, I’m being unfair to the human joint-author of this code. He doesn’t claim to be an expert programmer. I therefore experimented with some “vibe coding” myself using Google’s gemini-cli for comparison. I tried a number of simple Python coding tasks, from adding functionality to an existing script to generating a simple command line application in Python. The results were mixed:
However, it was a great deal of fun and it felt as if it was productive…
The “vibe” code worked except:
But in the GPS case the code did look plausible and would probably pass some simple tests. Apart from the off-by-one second error the faults would be unpredictable and intermittent. (i.e. It produced the type of faults which are most difficult and expensive to find and diagnose, especially after deployment.)
The code would probably fool an inexperienced programmer or someone who was unfamiliar with the hardware specifications. And that’s the problem. Does the person doing “vibe coding” actually know enough to know what they don’t know?
Stay up to date with the free Risky Thinking Newsletter.
]]>

There are two business continuity plans at work here: the VoIP provider’s and the data center’s.
From the the VoIP provider’s perspective this should have been a quick (if not automated) switch-over to a different data center with partial (if not full) functionality. It’s clear that the possibility of a complete data center failure was either (a) forgotten, (b) ignored, or (c) judged to be too costly to fully mitigate for the probability of it happening.
Whatever the reason, it doesn’t make them look good. And the fact that I only found out about their problems by encountering them myself rather than being warned in a helpful customer email or in a statement on their website does little to inspire confidence.
Even a rudimentary business continuity plan should include warning customers about service problems.
So the VoIP provider is handling the outage badly. They are cheap and full-featured, but I can no longer say that they are reliable, I won’t recommend them any more and will be looking for a suitable replacement.
But what of the data center? Why isn’t it back yet?
Fortunately (if you know the right place to look and the right discussion board to follow) you can find out that the data center they used actually cared about notifying its customers. Here’s what happened:
Power remains off at our data center in REDACTED per the local fire marshal.
We have had an electrical failure with one of our redundant UPS’ that started to smoke and then had a small fire in the UPS room. The fire department was dispatched and the fire was extinguished quickly. The fire department subsequently cut power to the entire data center and disabled our generators while they and the utility verify electrical system. We have been working with both the fire department and the utility to expedite this process.
We are currently waiting on the fire marshal and local utility to re-energize the site. We are completely dependent upon their inspection and approval. We are hoping to get an update that we can share in the next hour.
At the current time, the fire department is controlling access to the building and we will not be able to let customers in.
And this is what BCP plans often fail to take into account when considering the risk of a small fire. If it’s electrical, the priority of the fire department and the local electrical utility is safety. The fire may have been small. The fire may have been swiftly extinguished. But the fire marshal’s job is to ensure safety. That means shutting off all electrical systems and not taking any chances.
And when everything is shut down, and after repairs are made, it takes a surprisingly long amount of time to switch everything back on and get it working correctly. Here’s an update from twenty-four hours later:
The REDACTED data center remains powered down at this time per the fire marshal. We are continuing with our cleanup efforts into the evening and working overnight as we make progress towards our 9AM EDT meeting time with the fire marshal and electrical inspectors in order to reinstate power at the site.
Once we receive approval and utility is restored, we will turn up critical systems. This will take approximately 5 hours. After the critical systems are restored, we will be turning up the carriers and then will start to turn the servers back on.
The fire marshal has requested replacement of the smoke detectors in the affected area as well as a full site inspection of the fire life safety system prior to allowing customers to enter the facility. Assuming that all goes as planned, the earliest that clients will be allowed back into the site to work on their equipment would be late in the day Wednesday.
The points to note here are that:
TL;DR? When planning remember that even small fires with limited damage can have major consequences.
Stay up to date with the free Risky Thinking Newsletter.
]]>
I received an email survey today. Not surprising. Many companies send them out. This one was from Southwest Airlines, and it offered me a reward of $100 for completing their survey.
Except it wasn’t. I knew this immediately as I have never flown on Southwest Airlines.
This was a phishing email directing me to a plausible sounding survey website. The fraudsters hadn’t bothered to get a free SSL certificate for their plausible domain, perhaps because this can trigger alerts when the Certificate Transparency List is published.
So should I warn Southwest Airlines? I went to their website. Was their an email address or form to fill in to let them know about the website “southwestairlinessurvey.today” phishing their customers? No there wasn’t. Plenty of ways to report lost baggage, but no obvious way to report an issue to their security team - assuming they have one. I could have found out how long their customer service queues were by calling their 1-800 number and found out whether their customer service agents knew how to contact their security team, but I didn’t.
I’m neither a customer nor a shareholder, so in a literal sense it’s none of my business. I will do things that are easy to do if it helps makes society better. But I won’t put in a lot of effort to help a company that doesn’t make it easy to help them. I suspect I’m not alone in this.
if you care about people phishing your customers make it easy to report it: otherwise you will only hear about it very much later from disgruntled customers who believe you cheated them out of rewards, goods, or services.
Stay up to date with the free Risky Thinking Newsletter.
]]>