One of the most important lessons I began to learn in my career as a Computer Science student was the importance of software tests. (I say “began” because I still haven’t learned to habitually write tests first.) The past couple of weeks have reminded me of this, so I’d like to talk about it today.
If a piece of software isn’t all that important, it can get by without formal tests; if—when—something goes wrong, we’ll find out when it breaks. But for a library used by other applications, a core system utility, or (for that matter) a program that I’m trying to write, it’s far better to find that something’s wrong when the tests fail than when some apparently-unrelated piece of code breaks.
Just this past week or so, I’ve been having very frustrating problems in my development of the map viewer for Strategic Primer. With the help of unit tests to verify that what I thought of as the lowest layers were working properly, or not, I finally tracked down what turned out to be two bugs in Java itself: first,
TreeSet not only assumes that
compareTo is consistent with
equals, but that if
compareTo returns zero the two objects are equal, and second that if an object’s
hashCode changes after it’s been added to a
HashSet, testing for it with
contains will fail. (Both of those are now marked as documentation bugs, which is unfortunate but understandable. And by the way, I worked around them by using a
HashMap rather than a
TreeMap, and by changing the few set-keys that themselves contained collections or otherwise used mutable state to produce their hash-codes to instead give constant hash-codes.) Without unit tests, debugging this would have been much harder, but if Java’s test suite had been sufficiently extensive (and developed by testers based on the interface, not the actual behavior …) these bugs wouldn’t have made it into the wild.
One of my favorite features of Gentoo Linux is that you can require any package that has a test suite to run and pass it before installation. When I was running binary distributions like Red Hat (way back when), Mandriva, Debian, and Ubuntu, I wouldn’t have insisted on this even if I could, because it’s the packagers’ job to test the software as they’re building it—if they did their job properly, any problems I ran into were mostly either hardware problems (which test suites might, incidentally, help diagnose) or configuration problems and unexpected interactions (which test suites would be unlikely to catch). But in Gentoo (despite claims by some—I thought I’d read a blog post on Planet Gentoo a while back, but I can’t find it right now—that users shouldn’t use this feature) every sysadmin is in effect the packager, so I’ve had
FEATURES=test enabled nearly as long as I knew that it existed. And I don’t resent that, for example, Autoconf takes a few minutes to build and an hour or two to run its tests.
But test suites aren’t very helpful if they take hours to run, are sufficiently complex and brittle that fixing or disabling broken tests is a Herculean task, and (especially) have large numbers of “expected failures” that are nevertheless not marked as such. Most egregiously, for the past five years and more every single version of the Gnu C library (
glibc) in the Gentoo Portage tree fails its tests. Unlike other, less important packages, if
glibc breaks the whole system can break, and (further) downgrading usually guarantees that the system will break, so it’s very risky to upgrade
glibc without the test-suite to show that it’s all right … and the test suite always fails. What’s worse, the stated policy of its maintainers was at one time (I can only find evidence of a change quite recently) that test failures should at most rarely be reported upstream as bugs, since the build system is delicate and results are so sensitive to the build environment—”if there is any real problem [the developers] will notice it and fix it quickly.”
I disable tests for a small number of packages. In one case (the most recent version of LibreOffice marked stable), I’m willing to live with the failing tests, and I don’t want to go through the build (which took over ten hours!) over and over trying to diagnose the problem—and someone else has already reported the failure as a bug. In a few other cases, there are dependency problems—tests require programs that aren’t in the package tree yet, or require libraries that conflict with previously-installed ones, or simply have large numbers of undocumented dependencies, and these are usually for programs or libraries I installed to try out a program or something like that. In a few other cases, building or running tests causes the build to break, but it’s not the software’s fault, so I disabled them for that version. And so on. The only package for which I’ve disabled tests for all versions is Boost, because its test suite eats vast amounts of time, memory, and disk space, but it’s not so core a package that if it broke programs that use it I couldn’t downgrade it easily.
The point of this somewhat unfocused post, I think, so far as there is a point, is twofold: First, that software packagers and users of source-based distributions—those who build software from source rather than using binary packages prepared by others—should routinely use the test suites shipped with such software to ensure that it arrives in proper working order. And second, that software developers should maintain test suites to help (themselves, but also users) ensure that their software works as expected, builds correctly, and so on, treating test failures as seriously as build failures, standards noncompliance, or other bugs. Preferably, this test suite should make it easy for other developers and users to see which tests failed, add more tests, mark tests as expected to fail, and so on; I think JUnit is nearly ideal (at least as I see it from my IDE), and DejaGNU (written in Expect, of all things …) is about as far from that as you can get. Test suites should be as extensive as necessary and as easy to use as possible.