Reflections on Vibe Coding

One of my hobbies is playing with micro-controllers - low price single-chip computers. I follow an interesting YouTube channel on the subject. Recently the channel’s creator has used vibe coding to create example code. How well did it work?

An Unhappy NTP Server

My interest in with accurate time keeping probably dates from school days, when there were serious bragging rights to be earned from having an accurate watch which exactly matched the beeps of the Greenwich Time Signal on BBC Radio. Therefore, when a YouTube channel I followed showed how to create your own network time server using a cheap ESP32 micro-controller, I thought about building it. So I examined the AI generated example code.

I’ve tired to keep this at a high level, but to understand the problems you have to understand some of the details - and that is partly the point.

The code has to do two things:

Read the time from the GPS/GNSS module and sets the local clock to match it.
Read for an NTP query packet on the network interface, and constructs a reply which includes the time when the packet was received and the time when it was sent back. (Using these a client can calculate trip delays and decide how to adjust its clock to best match the server’s clock.)

The AI generated code:

Created a task which waited for a report from the GPS module, parsed it to get the time, waited in a “busy loop” for a variable to be set indicating a new second had started (see below), then set the local time.
Created a task which waited for an NTP packet on the network interface, constructed a reply including the time (from the local clock) when the packet was received, the time (from the local clock) when the packet was prepared, and sent the reply,
Created a mutex - a synchronization primitive that can be used by a program to prevent two tasks trying to read and write the same variables at the same time.
Created an interrupt handler which, every time the GPS module indicated the start of a new second, set a variable to true.

The code “works”, in that it compiles and would give a syntactically correct reply if sent an NTP packet. It looks plausible.

But the code is wrong. Very wrong. And looking out how it is wrong gives some indication of the dangers of “vibe” coding.

Although the code declares a mutex to prevent the two tasks from accessing the same variables at the same time, it isn’t ever used. Where is task synchronization needed? The shared variable isn’t in this code, it’s in the system time library. (This is possibly why the AI generated code misses it.) What happens if one task is setting the time and the other is reading the time simultaneously. It’s undefined. Will this happen often? No. It might show up in a long test, but will probably not show up in any simple test.
What happens on power up before the GPS module has received its first fix? The time will be (unless we add a real time clock) shortly after1 January 1970. That’s zero (uninitialized) time for time libraries. This time will be shared on the network until the GPS gets its first fix, possibly minutes later.
The GPS code waits for a variable to be set by the interrupt handler in a busy loop. There are a number of problems here:
- The variable is not declared (in C++) as volatile. This means that a compiler can assume its value is not being changed by another task and keep its value in a register rather than reading it from memory. So the time task might loop indefinitely depending upon the compiler being used and the options given to the compiler.
- Simple busy loops, where the code repeatedly tests whether a variable is true, are a bad thing: the processor is prevented from doing useful work while the task is looping. This may cause some unnecessary delays in the other task.
- There’s an off-by-one error in the timing. The task waits for the start-of-second variable to become true after reading the time from the GPS module, then uses the time previously read from the GPS module. This is the start of the next second, not the start of the second whose time was read.
- GPS signals are not perfect. They depend upon moving satellites and can be affected by nearby objects etc. A GPS receiver can lose lock, in which case the next start-of-second can be seconds (or minutes or hours) after the previous one. This results in the clock being set incorrectly until a new signal is received.
The interrupt service routine didn’t use the correct type of memory, and didn’t use the best method to communicate with the rest of the program. Micro-controllers have a confusing array of different types of memory, and code that is run in interrupt handlers should be placed in a specific type of memory. There’s also a specific primitive an interrupt handler should be using to communicate with a task (eliminates the busy wait loop), but that wasn’t used here.
There’s a questionable choice of hardware: the ESP32 natively uses WiFi - which isn’t really suitable for NTP time servers because there’s a significant amount of jitter (unpredictable delay) on a WiFi network due to other traffic or radio interference. I think this is acceptable in this case (cheap and widely available hardware for teaching purposes) but wouldn’t be a good idea in an area with multiple WiFi networks.

I think you get the idea.

Personal Tests

But perhaps, I thought, I’m being unfair to the human joint-author of this code. He doesn’t claim to be an expert programmer. I therefore experimented with some “vibe coding” myself using Google’s gemini-cli for comparison. I tried a number of simple Python coding tasks, from adding functionality to an existing script to generating a simple command line application in Python. The results were mixed:

The simple case of adding functionality worked excellently, except that I had to guide gemini as to where to some file format documentation: it’s general web search was inconclusive. There was a package it should have used to format the CSV file, but it’s own implementation worked in this instance.
The second attempt had to be abandoned: the program hallucinated what a documented function did and built an entire program around using it. When testing the resulting code it produced runtime errors, and a quick investigation showed that the documentation of the Python package it was using had been completely misunderstood. The assumption was a fundamental error, and as a result none of the code generated was remotely useful.
The third attempt produced incorrect code which could be modified . Once again gemini misinterpreted the specification it was using and chose the wrong values to use, this time giving plausible incorrect output. On the way it wasted a lot of its time trying to use XML namespaces (which weren’t used being used). It also decided to delete the database it was creating each time the program was run - an error which might have allowed test cases to work with serious consequences if ever used in production. In the end it I had to make several code correction by hand.

However, it was a great deal of fun and it felt as if it was productive…

To sum up…

The “vibe” code worked except:

It didn’t handle startup conditions correctly
It didn’t handle error conditions correctly.
It didn’t handle race conditions between different tasks.
It used undefined C++ behavior and in doing so made unwarranted assumptions about how the compiler would behave. A compiler change could cause the code to stop working.
It didn’t understand the specification of the GPS module it was using, resulting in an off-by-one-second error even when it worked as intended.
In general, it didn’t understand online documentation in sufficient depth.
It displayed a tendency to hallucinate functions or function arguments, and to assume it knew what the function did.
There was a complete lack of questioning the wider scope: in the GPS case, could this hardware choice ever give the required performance?

But in the GPS case the code did look plausible and would probably pass some simple tests. Apart from the off-by-one second error the faults would be unpredictable and intermittent. (i.e. It produced the type of faults which are most difficult and expensive to find and diagnose, especially after deployment.)

The code would probably fool an inexperienced programmer or someone who was unfamiliar with the hardware specifications. And that’s the problem. Does the person doing “vibe coding” actually know enough to know what they don’t know?

Resources

It’s easy to feel that “vibe coding” makes you more efficient: at least one randomized control trial suggests that the opposite is the case.
I don’t think it would be fair to mention the YouTube channel or GitHub repository containing the code here. The moral is to make sure you understand the code in depth before trusting it.

Michael Z. Bell

15 July 2025