I Can't Remove It, It's not a Clone

Posted on 01/17/2018 by Dr. Nils Göde

Redundant source code fragments—code clones— are a major indication of low-quality code. Code clones require you to propagate your code changes to multiple locations always risking to overlook individual locations. This does not only increase the time needed to change the code but also leads to incomplete and costly bugfixes. Because code clones are one of the top-rated bad smells, Teamscale has a sophisticated algorithm to locate and inform about redundant code fragments.

However, when discussing code clones with Teamscale users, we often hear that »this is not a clone because there is no senseful way to eliminate the redundancy« I wrote this post to dispel this misunderstanding.

Clone Removal

Certainly, you should always try to remove clones if you can get rid of the redundancy with a reasonable abstraction. The most popular refactoring for clone removal is unifying the redundant code in a single method (or function) that can be accessed from all the places where it is needed. In some cases, the redundant code fragments are not far from each other in the class hierarchy and can be removed by using a pull-up method refactoring that unifies the functionality in a common base class. If your programming languages offers these abstraction mechanisms and a senseful removal is possible: always do so.

While many clones can be removed that way, others are hard or even impossible to remove sensefully. That means, there certainly is a way to technically get rid of these clones—but not without compromising the readability and understandability of the code. Think of C or C++ code with its very universal abstraction mechanism in form of the preprocessor. The preprocessor allows you to get rid of any redundancy, but often with a huge drop in readability. Apart from that, in any language you will find redundant code parts that cannot be unified in a reasonable way.

Change Coupling

The common reaction to these clones that you cannot really remove is: »Then it’s not a clone and irrelevant to me«. This is a common and serious misunderstanding. The removability of a clone has nothing to do whether it is a true or false positive. Even tough some clones cannot be removed, they are still relevant because they require change propagation and are always prone to inconsistent changes.

So the question you really want to ask is »If I change one of the cloned code fragments, do I need to change the others as well?«. If the answer is »yes«, »maybe«, »sometimes« or »I need to decide from case to case« the clone is relevant. Only if the answer is »definitely not«, the clone is a false positive. Admittedly, Teamscale sometimes detects false positives in rare cases, but this is an inherent trade-off between precision and recall.

In conclusion, change coupling is what makes a clone relevant or not. The next time you are tempted to mark a clone as irrelevant, ask yourself if it needs to change together with its counterparts—independent from whether there is a way to remove the clone or not.

The Myth of No Clones

Defining clones by change coupling often triggers the question of how to get a redundancy-free system. The answer is: you can’t. There will always be a certain number of clones that are relevant but cannot be removed. These clones exist in every system and although you should try to keep the number of clones as low as possible, no system is completely free of redundancy. To give you an impression of what to expect: good systems have a clone coverage of around 5%. That means 5% of the code is similar to at least one other code fragment in the system. It is hard to get any lower and a professional assessment will take this into account.

I hope this post helps to sharpen your view of what is a relevant clone and what is not. Even if a clone cannot be removed, the information that it exists is very valuable itself.