10 Developer Horror Stories To Keep You Up at Night

October 19th, 2020 • By Yoz Grahame

This is a guest post from our partners at LaunchDarkly.

We software engineers like to think ourselves unflappable. Consider that we spend most of our days staring at glowing pages of eldritch horror that no mortal was meant to witness. We whisper and type our otherworldly incantations, all the while feeling the hungry gaze of a lurking cross-site scripting bug, or a shadowy use-after-free, or an accidental summoning of ZALGO. (H̨e̛ ̕c͢om͢es, you know.)

But no. Truthfully, we’re far more fragile than that. Living our lives on a tightrope over an ocean of chaos (or “unspecified behaviour”), we’re only one bad deploy away from a manic screaming fit, followed by a move to the countryside and banishment of any technology invented after 1947. So we consume horror novels by the truckload in an attempt to persuade ourselves that... well, things could be worse, you know? When you see that a senior engineer dresses all in black, listens to Sisters Of Mercy and Dimmu Borgir, and has a line of Melanie Tem novels above the O’Reilly manuals, remember that she uses them to calm down. Because she’s seen things.

As, likely, have you.

We know every developer has at least one horror story that still haunts them to this day. Likely, they have more than they’d care to remember.

For All Hallows’ Eve, we decided to share some of the most dreadful stories we’ve come across over the years. We hope that some will be educational to the innocents in our industry. Maybe some will be entertaining to the more experienced among you. And if none of them quickens your pulse, then we’d like to hear your stories on Twitter (just tag @Rollbar and @LaunchDarkly with #errorhorrorstory so we can share around the virtual campfire).

Now, are you ready? Roll your Herman Miller chair up by the fireplace, pour a glass of Knocker’s port, and prepare to be chilled...

1. The “Crashing Every Single Computer in the Data Center” Error

When working on social network ads at Google (remember Myspace?), I wrote some C++ code that looked something like this:

  for (int i = 0; i < user->interests->length(); i++) {
  for (int j = 0; j < user->interests(i)->keywords.length(); j++) {
      keywords->add(user->interests(i)->keywords(i)) 
  }
  }

Readers who are programmers probably see the mistake: The last argument should be j not i. My unit tests didn’t catch the mistake, nor did my reviewer. After going through the launch process, my code was pushed late one night—and promptly crashed all the computers in a data center.

- Ellen Spertus, Professor of Computer Science, Mills College

2. The “I Simply Forgot to Save All My Work” Error

My simple answer might be the same as many who read this. The worst is forgetting to save. The only thing that can be worst [sic] is to save another file over another file which there is no backup. Now I press Ctrl + S so often it became a habit.

- Maynas Eric Chua, CEO and founder, R.BZ

3. The “Break That Actually Broke Everything” Error

On January 15, 1990, AT&T's long-distance telephone switching system crashed. This was a strange, dire, huge event. Sixty thousand people lost their telephone service completely. During the nine long hours of frantic effort that it took to restore service, some seventy million telephone calls went uncompleted.

An obscure software fault in an aging switching system in New York was to lead to a chain reaction of legal and constitutional trouble all across the country. As it happened, the problem itself—the problem per se—took this form. A piece of telco software had been written in C language, a standard language of the telco field. Within the C software was a long "do ... while" construct. The "do ... while" construct contained a "switch" statement. The "switch" statement contained an "if" clause. The "if" clause contained a "break." The "break" was SUPPOSED to "break" the "if clause." Instead, the "break" broke the "switch" statement.

- Bruce Sterling, Author of The Hacker Crackdown

4. The “I Was Experimenting and Made a Mistake I Couldn’t Reverse” Error

A long time ago, when it was still a fairly common and feasible practice to put an entire app's database on a few floppy disks, I made the mistake of fiddling with the .DBF files without first making a backup. Needless to say, I screwed something up and had to spend the rest of my weekend fixing the files using a C program I cobbled together to gather up all the old data into new tables. Luckily, I had enough information from reference materials on hand to be able to figure out the file format and where all the data was on the disks (this pre-Internet times). Still wasn't fun and my supervisor rightfully chewed me out for not taking proper precautions.

- Junilu Lacar, Agile Transformation Coach & Software Developer

5. The “It’s Taking Me Longer to Fix Than to Build” Error

Yoz here. I’d like to note that I hope this is one that’s in the past for many of you LaunchDarkly and Rollbar users...

It’ll only take me a few hours to implement the feature,” we sometimes say. But after finishing, we find that every few weeks, we’re either fixing a bug with the feature, explaining it to another engineer, or helping answer a question from customer support about how it works. The total investment of time to maintain the feature far exceeds the initial few hours of development.

When code is too complex, it becomes harder to ramp up, harder to reason about it, harder to fix bugs. It’s difficult to untangle the dependencies and data flows to track down the source of errors. Engineers may actively avoid the most complex parts of the codebase, opting to work around it even if it’s the most logical place to make a certain change. Or they may avoid working in those areas all together, even if the work can be high-impact.

- Edmond Lau, Co-founder at Co Leadership, Author of the Effective Engineer

6. The “User Input Is Only Input by the User” Error

A junior sent me his work for code review and was very proud about the exhaustive validation he put on the user input. When I asked him why he didn't validate the User-Agent header, he argued that it is not entered by the user and it cannot contain special characters or HTML anyway. The best way to convince him was to inject a nice JavaScript alert in the admin panel through the User-Agent header.

- Ilyes Kooli, Lead Software Engineer

7. The “Division of Integers Isn’t Adding Up” Error

A couple of friends and I were at a hackathon last year, building this cool tool that could play the piano along with a user in real-time. Basically, it would play out notes on the piano which would complement any tune the user was attempting.

We were terribly stuck and we had no idea why. The tool was playing a diarrhea of notes all at the same time, and it was “unpleasant” to put it politely.

It was about 3:30 AM, we’d been awake for about 18 hours, and we were exhausted. Like zombie level exhausted. No amount of coffee was keeping us fresh and awake anymore.

I slapped myself, sat in front of my screen and squinted long and hard. Mentally going through every line, every expression, every calculation. We’d spent close to 3 hours in this pit, and I wanted to dig us out of it. (Perhaps “dig” isn’t a good verb, as that would only put one deeper into the pit. But yeah, you get the idea)

I’m looking at this one snippet, and then it hits me. I wanted to stab myself. Repeatedly. I look up to the heavens in despair and motion my friends to come over. I very slowly demonstrate what the bug is, and their reactions are quite poetic. One walks away in disgust and stares at the wall for a good 5 minutes, while the other hurled an expletive and went to bed. I fix the error, and everything from there worked like a charm. You see, Python 2.x does this funky thing where an expression like “1/2” evaluates to 0, and not 0.5 as humans would expect. Integer division, for those who know their jargon.

  1. >>> x = 1/2 # what we were doing
  2. >>> y = 1/2.0 # what we should have done
  3. >>> x,y
  4. (0, 0.5)

All along, the value for the time delay between playing consecutive notes was being calculated this way resulting in a delay of zero seconds. In other words, we were essentially telling the tool to go batshit crazy and spit out every note at the exact same instant, thus explaining the aforementioned diarrhea.

Three hours, my friends. Three hours. That’s how long it took to figure this one out. I know that I come off very heroic in this tale I tell, but perhaps that image of me will be altered when you learn that the bug was introduced by yours truly.

- Chittaranjan Velambur, Senior Software Engineer at Nasdaq

8. The “I Know What I’m Doing is Completely Backwards” Error

Yesterday, I had a smugness related disaster.

I’d very confidently checked in a new feature, declaring to the world around me how confident I was it was working.

‘I’ve got loads of unit tests’, I said.

‘You’ll hardly have to manual test it at all’, I said.

Failed the first manual test when a checkbox worked exactly backwards.

Turned out I had got those unit tests exactly backwards, then wrote code to make them pass.

D’Oh!

So the repeating fault here would be my hubristic tendency to feel a bit smug from time to time.

- Alan Mellor, Senior Software Engineer at BJSS

9. The “Error That Should Not Have Worked—But It Did” Error

So… Many moons ago, I ended up writing a fairly short Perl program - maybe 150 lines. Its purpose was to automate several functions of our backup system that the software package didn’t provide in a reasonable manner.

So it goes into production, and runs every morning at 8:03AM. Works perfectly for some 6–7 years. And then we have a procedure change, and I have to take a line of code that said “if A is equal to B1, B2, or B3, do this”, and change it to “if A is equal to B1, B2, B3, or B4, do this”.

And while I’m there, I notice that another if statement a few lines down is backwards. And it’s the most crucial if statement in the whole 150 lines, and the code can’t possibly work - the program would obviously crash in a very specific way.

So I carefully quit out of the editor, and check… Yes, the datestamp on the production code is over 6 years back, and it’s run correctly at 8:03AM every morning for 6 years. And the if statement is backwards.

I get 4 co-workers to look at it individually, and they all agree the if statement is backwards, and it should be crashing in a very specific way, and nobody understands how this worked for 6 years.

The next morning, I get an email from our monitoring software at 8:04AM saying that the program crashed. And sure enough, the datestamp is 6 years back, the if statement is backwards, and it crashed exactly the way you’d expect it to crash if it was backwards.

I went in, reversed the if statement to be correct, added B4 to the other if statement, and the program continued to work properly until we decommissioned that backup system 5 years later.

I’m still mystified how that line of code worked until somebody looked at it.

- Valdis Kletnieks, Former Computer System Senior Engineer

10. The “Fear of Making a Mistake and Being Hunted Down” Error

To end one of the best programming adages I've received...

Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. — John Woods, Programmer and former UK based computer game producer

So, did they chill you to your very core? Did you feel your soul leave your body in fright? What, not even when you read the Perl one? (Have you seen Perl?) Oh, you’ve experienced worse? Really? You poor soul. Please, we need to hear about it.

Post it on Twitter and tag it with #errorhorrorstory and tag us @Rollbar and @LaunchDarkly.

Remember: Rollbar makes it much easier to catch them when they start rampaging, and LaunchDarkly can speed their return to the nether dimensions with the flick of a switch. Join us for a live chat on November 12. RSVP now.

Get the latest updates delivered to your inbox.