Last year, I wrote an article about famous programming mistakes throughout history, covering Mariner 1, the Millennium Bug, the World of Warcraft Corrupted Blood pandemic, the Heartbleed Bug, and the GitLab backup incident.
While there are the usual errors in programming, such as syntax errors, logical errors, and compilation errors, and yes, they are crucial to understand and address, this blog will focus more on the underlying causes of these mistakes and how we can prevent them.
And don't worry—I won't just be talking about types of programming errors that happen to "someone else far, far away." I'll also share some mistakes I made at Devōt that taught me a thing or two.
Types of errors in programming that most developers make
1. Not preparing the work before the beginning
Ever heard of Mariner 1? It's a classic example of what happens when proper preparation is neglected before diving into work.
NASA developed this nifty little spacecraft to gather data on temperature, magnetic fields, and other interesting stuff from Venus's orbit. They named it Mariner 1. Sending the spacecraft to Venus requires waiting until Earth and Venus are at their closest, roughly every 19 months. Calculating this optimal launch window and the entire flight path involves a ton of complex math.
So, picture the scene: Mariner 1's launch day. Everything looks perfect. But just moments after liftoff, it veers off course, loses control, and the decision is made to self-destruct to prevent any potential catastrophic damage on Earth. No, I don't have a photo of that explosion—I'm not that old—but you get the picture.
Back to Mariner 1 and its untimely demise. Post-incident investigations revealed that the spacecraft's trajectory had been programmed incorrectly. For many programmers who chimed in at the time—yes, the programming language in question was FORTRAN.
The error? A missing hyphen at the top of the letter R in the code. As we all remember from high school (or maybe not), this symbol represents the "n-th average value of the time derivative of radius R." Without the hyphen, it gets a different meaning.
Further investigation revealed that the missing hyphen was also absent from the calculation papers from which the code was copied. So, is the programmer to blame if the programmer was just transcribing what a physicist gave him?
I don't know who NASA eventually blamed for this mess or who got fired, but in my opinion, the programmers share some of the responsibility.
I've written plenty of code for financial and medical institutions, as well as American tax systems and equestrian shows, without having studied economics, medicine, or equestrianism. Yet, my code miraculously worked fine. The main reason is thorough preparation before starting, involving consultations with subject matter experts. Based on their specifications, I ensured all normal and edge cases were covered with tests.
2. Sometimes, you are not the smartest person in the room
Let me share a lesson I learned the hard way while working on applications for finance, medicine, and equestrian shows. On one of these projects, I almost made a huge mistake—ironically, on the one involving the most complicated math: the horse show application.
I was tasked with changing something related to the scoring system.
While reviewing the tests for that class, I noticed some scenarios were oddly set up—like cases where multiple competitors tied for a place. So, I started writing "better" tests for those scenarios and discovered a bug. I wasn’t sure how the scoring should work when, for example, two competitors tie for second place, but the current implementation seemed definitely wrong. The scoring went something like this: 1st place got 10 points, 2nd place got 9 points, 3rd place got 8 points, and so on, with all 10 competitors receiving points from 10 to 1.
However, if two people tied for 2nd place, the 1st place would get 10 points, the two tied for 2nd would both get 9 points, and the fourth would get 8 points, and so on. I prepared two solutions: one where those who tied would both get the higher number of points and another where they would get the average. The 4th place would, of course, get the points for 4th place.
I deployed one of those versions to the staging environment and informed the client that I had discovered a system error but not to worry—I had already prepared a fix before he even woke up (he was in Canada).
The client requested a call—which was a bit nerve-wracking for me as a junior developer since it hadn’t happened before. On the call, the client very politely explained that the previous version wasn’t a bug and that the strange scoring method was an old, traditional system retained for historical reasons. I felt so embarrassed.
This experience taught me an important lesson: sometimes, even when you think you're the smartest person in the room, you might not have all the context. Understanding the historical and business logic behind certain systems is crucial before jumping to conclusions and making changes.
3. Trusting other programmers too much
Let's move to a more recent event, one that you might have heard about—a lot was written about it. You might know it as the Linux hack or backdoor, but it's often called the XZ Utils backdoor because of the open-source library involved.
First, a quick rundown:
In programming, a library is a collection of prewritten code designed to solve specific problems. For instance, you wouldn't write your own functions for complex mathematical problems like derivatives or logarithms. Instead, you'd use a library that someone else wrote for that purpose.
Open source means the code is publicly shared, allowing others to suggest improvements or even write code to solve issues. However, any changes must be approved by maintainers.
XZ Utils is such a library that deals with data compression and is included in many systems.
A backdoor in programming refers to a method of bypassing normal authentication processes.
Now, how did this happen? Lasse Collin has been maintaining this library since its inception in 2009. Jia Tan, who started contributing improvements (pull requests) in 2021, gained commit and release rights in late 2022 after over a year of building trust with Collin. Jia Tan wrote high-quality code and solved complex problems, all the while people complained that updates for the library were too slow and that it was absurd for such an important library to be maintained by just one person.
This is a classic example of social engineering, which, as far as I know, is the most common hacking method. Online statistics suggest that 98% of cyberattacks are based on social engineering. It's much easier to obtain someone's password by convincing them you're helping than by cracking it with software.
Where did the mistake happen, and who is to blame besides the attacker?
As a good practice, programmers should not blindly trust other programmers' code. Not because all programmers are malicious, but because everyone makes mistakes. That's why we have code reviews, where we objectively look at each other's code to find potential errors or areas for improvement.
In this case, the backdoor was hidden in binary files, which are much harder to read unless you're from the Matrix. However, one should still question why those files were changed and analyze exactly what was altered.
Here, the lack of time played a role. If Collin didn't have enough time to work on the project alone, he also didn't have time to double-check everything done by someone helping him.
Commercial companies using open-source code and taking it for granted should also consider the potential consequences. This isn't the first time something like this has happened; older listeners will surely remember many such anecdotes.
4. Sharing knowledge is great - but RTFM
Often, instead of verifying and researching information ourselves, we rely on knowledge we get from others. This reminds me of an acronym that used to be quite popular—RTFM, or "read the freaking manual."
RTFM is typically a slightly impolite response to questions that could easily be answered by simply opening the documentation.
One challenge that programmers face is that reading documentation can be boring.
We've all probably said RTFM to others more politely when they ask us something they could easily learn on their own. While sharing knowledge is fantastic, and we should support those who want to learn, navigating and understanding documentation is also a highly valuable skill for programmers.
Especially now, when AI tools are so good at writing decent code, and 97% of the questions you'll have in your first year of programming have already been answered on Stack Overflow.
At some point, you'll encounter incorrect answers, which can be difficult to recognize if you're relying solely on ChatGPT or similar tools as your knowledge source. My suggestion is to use new tools that make programming easier, but at the same time, be skeptical and try to prove the tools wrong. Ask for sources, find sources yourself, and compare them with the tool's responses.
In the end, reading the manual or documentation isn’t just about finding answers—it's about building a deeper understanding of the tools and languages you're using. This foundational knowledge can save you from making mistakes and help you become a more self-sufficient developer.
5. Overengineering
Overengineering is something every programmer has encountered or been guilty of at some point. It's when you add unnecessary complexity to your code or project, often because you're trying to anticipate every possible future need.
The problem with overengineering is that it not only wastes time but also makes your code harder to maintain. The more complex the system, the more room there is for errors, and the harder it is for someone else (or even you, six months down the line) to understand what’s going on.
Here's a pro tip: focus on the core functionality first. Make sure it works perfectly before adding any bells and whistles. Ask yourself, "Is this feature really necessary?" and "Is there a simpler way to achieve this?"
Remember, simplicity is key. It’s not just about writing less code but about writing clear, maintainable, and efficient code. So, before you implement that next big idea, take a step back and consider whether it's really needed or if you're just over-engineering.
Why do programming errors happen?
Programming errors are inevitable, even for the most experienced developers. They can arise from various sources, often rooted in common practices and habits that can be easily overlooked. Understanding why these errors occur is crucial for improving code quality and reducing bugs. Here are some common reasons why programming errors happen:
Lack of communication
Not writing tests
Using someone else’s code without understanding
Not reading the documentation
Not doing thorough code reviews
Are we really saving time?
All the errors we've discussed so far have something in common: we make them because we're trying to do things faster and save time. However, the consequences of these mistakes often end up costing us more time in the long run.
If we rush through tasks without talking to anyone, skip writing tests, use random libraries without reading the documentation (just because we saw it might help on Stack Overflow), and merge everything into production without thorough code reviews, we might get lucky and have everything work fine. But have we really avoided making any errors?
By cutting corners in these ways, we're not truly saving time. Instead, we're likely to encounter more significant issues later, leading to more time spent debugging, fixing bugs, and dealing with runtime errors and incorrect output. It's essential to remember that taking the time to follow best practices initially can save us from bigger headaches down the road.
Agreements as the answer to programming errors
Breaking any agreement you've previously committed to is a mistake because agreements are designed to minimize potential errors. When you're working solo on a project, this might not seem like a big deal, but what happens when you're part of a team?
In a collaborative environment, following structured agreements is crucial. The Software Development Life Cycle (SDLC) provides a framework for this. Key components such as the Definition of Ready, Definition of Done, test coverage, security score, the two-person rule, and a consistent Git workflow are all agreements that help ensure quality and reduce errors.
By staying committed to these practices, teams can improve communication, ensure thorough testing, maintain security standards, and streamline code integration. This structured approach helps to catch potential issues early and keeps everyone aligned, ultimately leading to fewer programming errors.
I mean, even NASA introduced extra procedures, tests, and programs to check the code and ensure such catastrophic errors couldn't happen again. They say that thanks to these new procedures, the Apollo moon landing was possible—despite some minor software glitches.