To DRY or Not to DRY ?

When talking about clean code, DRY – Don’t Repeat Yourself – often comes into place. DRY is a good practice. Two codes having the same purpose should be merged into single one. Unfortunately when a good practice becomes a dogma, it quickly becomes badly applied, and whenever two codes looks alike, developers are tempted to merge them into single one. The issue here is about the difference between “looking alike” and “having the same purpose”.

A small example

Let’s imaging a YAML file handling continuous integration pipeline on Azure DevOps. This script must build a C++ application on Windows and Linux. Its would look like this:

jobs:
  – job: Linux
    pool:
    vmImage: “ubuntu-latest”
    steps:  
    – script: |
        cmake .
        cmake –build .

  – job: Windows
    pool:
    vmImage: “windows-latest”
    steps:  
    – script: |
        cmake .
        cmake –build .

One can see there is a small repetition for both config. Lets try to factorize this:

strategy:
  matrix:
    linux:
      imageName: ‘ubuntu-latest’
    windows:
      imageName: ‘windows-latest’

pool:
  vmImage: $(imageName)
steps:
task: Build
script: |
    cmake .
    cmake –build .

Good. Now, add a test step to our pipeline. Our tests are located in build\Release\test.exe on Windows, but in ./build/Release/test on Linux. Yeah, that good old slash/backslash issue…

strategy:
  matrix:
    linux:
      imageName: ‘ubuntu-latest’
      testExecutable: ‘build/Release/test’
    windows:
      imageName: ‘windows-latest’
      testExecutable: ‘build\Release\test.exe’

pool:
  vmImage: $(imageName)
steps:
task: Build
script: |
    cmake .
    cmake –build .
   $(testExecutable)

Well… At least, nothing is repeated twice. Now, imagine that dev team works mostly on Windows and would like to generate a Debug build and store binaries to help debugging. We don’t want to generate Debug build on Linux because our build is quite long and Linux build machine are already overloaded. Yeah, we are still not on an “AWS-Docker-Cloud-Based-On-Demand-Auto-Scale-Up” infrastructure. Anyway, here is our script:

strategy:
  matrix:
    linux:
      imageName: ‘ubuntu-latest’
      testExecutable: ‘./build/Release/test’
    windows:
      imageName: ‘windows-latest’
      testExecutable: ‘build\Release\test.exe’

pool:
  vmImage: $(imageName)
steps:
– task: Build
– script: |
    cmake .
    cmake –build .
    $(testExecutable)
    if ($env:AGENT_OS -eq “Windows_NT”) {
      cmake –build . -build_type=Debug
    }
– publish: $(System.DefaultWorkingDirectory)/Debug
  artifact: Debug_windows
  condition: eq(variables[‘imageName’], ‘windows-latest’)

I’m not sure of the syntax but you get the idea. At least we didn’t repeat ourself as not a single line was present twice. So our code is clean, maintanable and readable, no ?

Now, what would this script looked like if we didn’t wanted to factorizes lines at the beginning ?

jobs:
job: Linux
  pool:
    vmImage: ‘ubuntu-latest’
  steps:
  – script: |
      cmake .
      cmake –build .
      ./build/Release/UnitTestSuite

job: Windows
  pool:
    vmImage: ‘windows-latest’
  steps:
  – script: |
      cmake .
      cmake –build .
      cmake –build . –config Debug
      cd Release
      UnitTestSuite.exe
  – publish: ./build/Debug
     artifact: Debug_windows

The result is much more readable. Ok, a few lines were copied twice, but is it a big deal ?

Two identical bloc of code may not serve the same purpose

Lets remember an important point: Copy/pasting doesn’t prevent a software to work. It prevent a software to evolve. DRY only has a sense regarding the evolution of the software.

Now when looking at 2 pieces of identical code, we should ask ourselves: does these pieces of code should evolve together ? Should they do the same thing and should they evolve synchronously in the future ? If the answer is yes, then you should must factorize those lines into a common function. But if the answer is no, then factorizing those lines into common function might force you implementing a function that handle different behavior in the future.

In the example above, we have 2 identical bloc of code :

script: |
        cmake .
        cmake –build .

The goal of one is to build on Linux, whereas the goal of the other is to build on Windows. Their similarity is more a coincidence than a duplication of code. You can argue that “their goal is to build, it is the same”, but C++ developers know that, despite all the effort done by tools like CMake to make build similar across platforms, there WILL be differences one day or the other.

So we just copy/past and that’s it ?

No, don’t get me wrong. In the example above, we only have a block of 2 lines that are copied for Windows and Linux build. It is not that a big deal as is, but in real use case it may be a dozen of lines. In this case, a proper solution would be to factorize these lines into lets says a common_build.sh script, and call this common_build.sh in our pipeline:

jobs:
job: Linux
  pool:
    vmImage: ‘ubuntu-latest’
  steps:
  – script: |
      common_build.sh Release
      ./build/Release/UnitTestSuite

job: Windows
  pool:
    vmImage: ‘windows-latest’
  steps:
  – script: |
      common_build.sh Release
      common_build.sh Debug
      cd build/Release
      test.exe
  – publish: ./build/Debug
     artifact: Debug_windows

DRY extremists may argue that the line common_build.sh Release is still repeated, but in this case it is not an issue. We can consider this single line of code as atomic, and thus cannot diverge with evolution of code.

Ok, How could-you summarize this ?

Let’s take another example of a python piece of code managing user in a database. You can get, create or delete a user using some imaginary framework.

This is dirty copy/past:

def create_user(user_name):
  try:
    db = get_db()
    db.connect(“username”, “password”)
    table = db.get_table(“user”)
  except Exception as e:
    logging.error(str(e))
    db.cleanup()
   
  table.create(user_name)
 
def get_user(user_name):
  try:
    db = get_db()
    db.connect(“username”, “password”)
    table = db.get_table(“user”)
  except Exception as e:
    logging.error(str(e))
    db.cleanup()
   
  return table.get(user_name)
 
def delete_user(user_name):
  try:
    db = get_db()
    db.connect(“username”, “password”)
    table = db.get_table(“user”)
  except Exception as e:
    logging.error(str(e))
    db.cleanup()
   
  table.delete(user_name)

This is misunderstood DRY:

def _user_action(user_name, action):
  try:
    db = get_db()
    db.connect(“username”, “password”)
    table = db.get_table(“user”)
  except Exception as e:
    logging.error(str(e))
    db.cleanup()
   
  if action == “create”:
    table.create(user_name)
  elif action == “get”:
    return table.get(user_name)
  elif action == “delete”:
    table.delete(user_name)
   
def create_user(user_name):
  _user_action(user_name, “create”)
 
def get_user(user_name):
  return _user_action(user_name, “get”)
 
def delete_user(user_name):
  _user_action(user_name, “delete”)

This is DRY:


def _get_table(tablename):
  try:
    db = get_db()
    db.connect(“username”, “password”)
    return db.get_table(“user”)
  except Exception as e:
    logging.error(str(e))
    db.cleanup()

def create_user(user_name):
  table = _get_table(“user”)
  table.create(user_name)
 
def get_user(user_name):
  table = _get_table(“user”)
  return table.get(user_name)
 
def delete_user(user_name):
  table = _get_table(“user”)
  table.delete(user_name)

Home


Similar topics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s