When talking about clean code, DRY – Don’t Repeat Yourself – often comes into place. DRY is a good practice. Two codes having the same purpose should be merged into single one. Unfortunately when a good practice becomes a dogma, it quickly becomes badly applied, and whenever two codes looks alike, developers are tempted to merge them into single one. The issue here is about the difference between “looking alike” and “having the same purpose”.
A small example
Let’s imaging a YAML file handling continuous integration pipeline on Azure DevOps. This script must build a C++ application on Windows and Linux. Its would look like this:
One can see there is a small repetition for both config. Lets try to factorize this:
Good. Now, add a test step to our pipeline. Our tests are located in
build\Release\test.exe on Windows, but in
./build/Release/test on Linux. Yeah, that good old slash/backslash issue…
Well… At least, nothing is repeated twice. Now, imagine that dev team works mostly on Windows and would like to generate a Debug build and store binaries to help debugging. We don’t want to generate Debug build on Linux because our build is quite long and Linux build machine are already overloaded. Yeah, we are still not on an “AWS-Docker-Cloud-Based-On-Demand-Auto-Scale-Up” infrastructure. Anyway, here is our script:
I’m not sure of the syntax but you get the idea. At least we didn’t repeat ourself as not a single line was present twice. So our code is clean, maintanable and readable, no ?
Now, what would this script looked like if we didn’t wanted to factorizes lines at the beginning ?
The result is much more readable. Ok, a few lines were copied twice, but is it a big deal ?
Two identical bloc of code may not serve the same purpose
Lets remember an important point: Copy/pasting doesn’t prevent a software to work. It prevent a software to evolve. DRY only has a sense regarding the evolution of the software.
Now when looking at 2 pieces of identical code, we should ask ourselves: does these pieces of code should evolve together ? Should they do the same thing and should they evolve synchronously in the future ? If the answer is yes, then you
should must factorize those lines into a common function. But if the answer is no, then factorizing those lines into common function might force you implementing a function that handle different behavior in the future.
In the example above, we have 2 identical bloc of code :
The goal of one is to build on Linux, whereas the goal of the other is to build on Windows. Their similarity is more a coincidence than a duplication of code. You can argue that “their goal is to build, it is the same”, but C++ developers know that, despite all the effort done by tools like CMake to make build similar across platforms, there WILL be differences one day or the other.
So we just copy/past and that’s it ?
No, don’t get me wrong. In the example above, we only have a block of 2 lines that are copied for Windows and Linux build. It is not that a big deal as is, but in real use case it may be a dozen of lines. In this case, a proper solution would be to factorize these lines into lets says a
common_build.sh script, and call this
common_build.sh in our pipeline:
DRY extremists may argue that the line
common_build.sh Release is still repeated, but in this case it is not an issue. We can consider this single line of code as atomic, and thus cannot diverge with evolution of code.
Ok, How could-you summarize this ?
Let’s take another example of a python piece of code managing user in a database. You can get, create or delete a user using some imaginary framework.
This is dirty copy/past:
This is misunderstood DRY:
This is DRY: