Friday, May 27, 2011

Take-Away Points From my Software Craftmanship 2011 Session

At the Software Craftsmanship 2011 conference yesterday I had the pleasure of running a workshop on how to deal with "legacy" test suites. A part from having a lot fun I also learned a lot from the discussions during the workshop.

The premise of the workshop was a problem that I think is becoming more and more common: Even though more and more code bases have test suites around them, the quality of those tests are low. This lack of quality can stem from a number of things. First of all most automated tests, in my experience, still aren't written in a TDD workflow, they are written some time after the production code. Sometimes right after, sometimes days or weeks later. Secondly automated testing efforts sometimes seem to lack a clear purpose. Lastly test suites are often (as in very often) abandoned as soon the project comes under any sort of pressure - from deadlines, budgets, changing requirements or whatever else. These things lead to test suites that are treated as second or third class code; code that is not seen as important. That leads to a range of problems, such as tests with bland names - like test001 - tests that assert nothing or next to nothing, tests that just mimic the structure of the implementation without showing or exercising the behaviour from a domain/specification perspective and to low coverage of both code and features.

To get concrete we took a hard look at the tests for the NerdDinner sample application. In there we found most of the problems listed above, which begs the question given that we where taking over maintainance and future development of NerdDinner how would we proceed with regard to the tests. Clearly the existing tests are not at a quality level that would allow us to have confidence in the changes to NerdDinner we make just based on the tests running. On the other hand they do cover about a third of the code, and most of them do assert stuff. So if one of them was to break that likely indicates a problem.

The discussion during the workshop seemed to divide people into two camps: One that said, that these test are not worth carrying forward because they will require maintainance while not providing any real value as regressions guards or as specifications. The other camp said, keep them and incrementally improve or delete them, keeping in mind that we dont know how valid the individual tests are.

To sum up the main points I got from the throw the test away camp were:
  • Since coverage is way too low, we're better off with a small number of broad acceptance tests. We can write those in the initial iteration, where we need to familiarize ourselves with the code base anyway.
  • Tests that don't assert much just gives us false confidence, so they are a liability that we don't want to keep them around.
  • Tests that don't work as specifications are mainly a confusion to us, because we don't have access to the people that wrote, and consequently can't find out what the purpose of those tests was.
  • It's simply quicker to write a bunch of exploratory tests, than it is to try to understand and maintain the existing ones.
And the main points I got from the keep the tests camp were:
  • Even though there are problems with the tests, each of them were written with some sort of a goal in mind. As such we suspect that they can point out something we haven't realized yet.
  • The tests are just another part of the code base, and as such it's natural that we will have to refactor - and improve - them over time.
  • We can put the legacy tests in some kind of a quarantine, where we keep running them but don't necessarily trust them, and when one breaks we take look at it and then decide to either delete the test or fix it and move it from the quarantined suite to our normal test suite.
  • The Boy Scout Rule applies to tests as well. Use that as the discipline to incrementally improve the tests.
In general I'm in the keep the tests camp, but I guess it really comes down to making an informed decision on a case-by-case basis. Which camp are you in? Or which points do think we missed?