Tests Are Good. But What Are Good Tests ?

Tests are an essential part of development. Well tested application have more line of code in tests than in actual code. But not all tests are equals and, unfortunately, some tests are written only for the sake of reaching a code coverage target or making QA happy to see test base increasing.

What is a test anyway ?

Let’s kill a myth. Autotests had never ensured your application works correctly. It ensures your application works like before. It is up to developer, QA, Product Owner and finally User to decide if the application behaves correctly. Once you have the correct behavior, you set it in stone by writing a test.

But what about TDD, which advocate writing tests before writing code ? If we look at the big picture, TDD’s real added value is not really to respect written requirements, but to ensure the design you are conceiving is testable. TDD Katas and scholar exercises are often toys with clearly defined specifications and simple use case. In the real world, you often iterate on your TDD because you realize the unit test you wrote last week contradicts the integration test you are writing today. Because API has changed, because result was not the good one after all, because user change its mind and prefer logging an error than throwing an exception, etc… And it’s not an issue. As you wrote test first, you ensure your design is testable now, but also in the future. Even if requirement change, you can easily update your test assertions to match the new expected result.

In some cases, you don’t even know what result you would expect until you have implemented your algorithm. This often happen when dealing with scientific computation. Let’s imagine this not-so-fictive conversation between a biologist and a developer:

— Hi, I want to compute size of a Coronavirus from CryoEM acquisition.
— Ok, do you have a dataset ?
— Sure, here it is.
— Cool, and what is the expected size of virus in this dataset ?
— Are you kidding me ? That’s what I’m asking for !

So you go home with inputs data but no outcomes. You start writing a test, because you know TDD is a good practice, but you don’t know what to write as an expected result. Never mind, you leave it blank and start implementing the algorithm. Once you have a result, you give a pre-version to your user which will decide if the result is coherent or not (it often happens in scientific domain that you don’t know what result to expect but can validate a result a posteriori). If the result is OK, you put it as expected result in a test to forbid regression in the future.

What is also important to understand is that failing test does not always indicate something is broken. It indicates something has changed. If you make an improvement in your algorithm, the test will fail, indicating the result is different from before. Scientist will tell you if result is better or worst, but if result is better, you have to update test baseline to freeze this new behavior.

In the previous example, you may argue that only the final result is really validated. You cannot guaranty there are no bugs in intermediates results. You just hope that, as the final result is good, intermediate result should be good, no ? Unfortunately, no. And that’s lead me to a statement I discover with pain in my career:

Any working software contains an even number of internal bugs which correct each other.

And worse than that, some unit tests are here to ensure these buggy behavior stay in place to avoid collapsing the whole system.

Gross value a net value

Now we have seen what is a test, let’s talk about its value.

Like for your salary, gross value doesn’t matter that much. What matter in the end is net value. And like your salary, tests have a gross value and a net value. Gross value is the test ability to detect regression. Net value is much more subtle.

A test failing half of the time for no reason, taking several minutes to execute and a day to be deployed on the developer environment has a negative net value. You waste more time maintaining it than you actually gain by catching the regression early. On the other hand, a test executed on a few seconds, both on test and developer environment, which detect only true positive is much more valuable.

To have a high net value, a test should:

  • Have a high probability of detecting regressions
  • Have a low probability of creating a false positive
  • Be fast and easy to execute, both on Test and Dev environment
  • Have a low maintenance cost

Filter your test

Choosing if a test is valuable or not really depend on your domain. Critical DO-178A aeronautics system should be fully tested, whatever the cost. Temporary mobile applications don’t need such a robustness. It is up to you to place the cursor. Then, you (I mean, not you personally, but your organization) have to be honest with yourself and drop the tests you judge not profitable enough. The time you will save by not maintaining useless tests can be invested in writing new highly valuable tests. In a nutshell:


Similar topics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s