“What’s the weight of the world worth to your side?
Here is how you got lost.
Here is how you got by.”
“Mutiny, I Promise You” – The New Pornographers
Gather ’round kids, let me tell you The story of the Smoke Detector That Didn’t Beep
Like any homeowner, I have a lot of these things in my house:
Everybody should know what these are (?!) – smoke detectors go BEEP BEEP BEEP if they detect smoke. They are meant to let you know when #1: your house is burning down or #2: your dinner is slightly-above-average in its smokiness. Primarily the first one.
And, as everyone knows, they usually run on batteries and must periodically be tested. They are tested like so:
You press the button, wipe the blood out of your ears, and then you go on your way, knowing that your smoke detector is working and will probably work for awhile. Repeat the process for each of these in your home every few months. It’s a best practice. 😀
So there I was a few months ago down in my basement, checking my smoke detector like a good guy, and it didn’t beep. Whoa! Paydirt! I replaced the battery because I am a 21st century QA, I can find *and* fix problems. I pressed the button again. And again. No beep. I took the battery out and put it back in, pressed the button. Nothing. “Ok, wow! I guess these things just wear out,” I said to no one in particular, then I pulled it off the ceiling and threw it in a trashcan in my tool room. And I promptly forgot about the whole thing.
(SPOILER: this article is not about my house burning down! I should probably replace that smoke detector!)
Fast-forward three months to a morning a few weeks ago. I’m getting ready and I hear a beep through the floor. Hmm, one of the smoke detectors is running low on batteries, I thought, I better go replace its battery. I follow the sound downstairs into the tool room. That’s funny, I thought, I don’t remember there even being a smoke detector in here. There’s one in the finished side and one in the stairwell, but not in here… *BEEP* Ok, Ok! Where is this thing!? And of, course, it’s in the trashcan. It’s the one I pulled off the ceiling. (I don’t go in the toolroom very often, not very handy.)
It turns out that the smoke detector was working the whole time. The button wasn’t working. For all I know it would have detected a fire just fine. But I can’t trust it, so in the trash it remains. I pulled the battery and left it for dead.
The Tale of Our Tests
I was thinking about that smoke detector this week as my fellow testers and I started reworking the way our automated tests run. Why are we reworking them? Because they had the same problem. Let me explain:
We have a nice set of automated UI tests for my product that, back-to-back, take about four hours to run. Some of them require processes to complete that take a few minutes, so that four-hour run includes about 65 tests. It covers a wide swath of functionality, with most parts of the product getting a happy-path test and some parts getting quite a bit more than that. The tests run overnight after our nightly build is completed, and in general, we were pretty happy with how much we knew about our product every morning when we walked in. We could always stand to know more (and we are working to expand our automated testing coverage) but we were happy. Or so we thought!
Here’s the thing: the tests all work individually. If you run any of them on its own, it will succeed. But sometimes, for some reason, one of them will fail as part of that larger suite. There’s four or five of them that just aren’t as reliable as the others. Probably 3 of those are small things that are intermittently failing in the product that we haven’t been able to reproduce outside the nightly test run. The other two are failing because some other test (not sure which out of the 65) isn’t smart enough about cleaning up after itself. Usually we walk in in the morning and 63/65 tests have passed, and we understand why the other two failed and we don’t care to fix it right this minute. We rolled like that for a long time.
The problem with that is this: when the test build fails, I get an email telling me that the overnight tests failed. I get one of those every single morning. Yeah, yeah, I know, one of the tests failed. I know about that, nothing to see here. Except last week, on Thursday, I found out that one of our builds had been unusable since Monday. I had been in meetings and working on other things, and never noticed. No one did! The email telling us that the build was broken was just noise.
That email is my smoke detector. What is a smoke detector, but an automated testing device you keep in your house? And what does it tell you? IT TELLS YOU WHETHER YOUR F#$%ING HOUSE IS BURNING DOWN. It’s important! I had lost sight of that.
So what have we done to fix it?
We got smoke detection working
The first thing we did was, we took almost all of the tests out of that nightly build. When we check in code, we already have a large battery of unit tests that runs before the code is committed. That’s good. Fast feedback is good. This episode made us realize that we want that for our UI tests as well.
Now, we can’t run our UI tests until after the code is committed and an installer is built, but we can automatically run UI tests immediately afterward. We decided as a team that when we check in and kick off an installer, we want to know whether the house is on fire in less than 60 minutes. Our installer takes about 20 minutes to build, and when we run our UI tests the deployments take about 15-20 minutes. That means that our smoke test suite must take < 20 minutes to run. We identified the tests that can hit all of the functionality representing a sane, testable build in that much time. Those are in the smoke test. Everything else is out. Every installer gets a smoke test, from now on. And that f#$%er shall be green or we will stop everything and fix it.
We relegated the rest of the test to feature inspection builds
This is a win that we got from our adoption of the BDD tools: our automated tests are now neatly organized into nice, self-documenting feature files. Anybody can crack one of those things open and read what would be tested if we were to run all the tests for a feature. So we hooked each feature up to a build as well. We can kick each of those manually.
This is nice for the following reasons:
- It is now stupid easy to find an automated test, and to run it you just kick a build. This is much easier than anything we’ve had before in terms of arbitrarily running tests.
- Some of the features (e.g. “upgrading from a prior version”) differ from the others in some important way that altered the deployment of the testing environment. I was having trouble wrapping my head around how to handle this in the epic nightly builds, but with this system it’s easy mode: just change how the environment is deployed for each build.
- We’re setting a standard that each of the feature builds should take < 1 hour. That means that:
- Anyone can assume that they will know the status of a feature in less than an hour if they want to know.
- We have a bit of clarity on how big a feature definition can be. If its tests take > 1 hour then it’s too big.
(That last bullet might seem absurd to some, how can a single feature take an hour!? That’s the world I’m living in, my product does heavy duty data processing.)
The big downside for all this is, of course, that we have huge swaths of tests that are ready to run, but which aren’t running on a regular schedule. We’re considering our options here and also wondering how important that is: if you know that tests are available for a feature, you’re going to run the tests when you mess with the feature… right?
But the moral of the story is: the single most important test in your house, and also in your product, is the one that lets you know whether your house is on fire. Make sure it’s working.
Feedback and questions in the comments are always welcome!