Where should I start to clean up my code?

How many of you know the feeling, when an incoming change request forces you to dig into code you never wanted to dig into? And how many of you have drawn the conclusion while reading the code: »I don’t get what’s going on.« With the immediate follow-up question: »Who the hell has written this code?«

As we all probably have experienced, software systems evolve over time and without effective counter measurements, their quality gradually decays, making it hard to understand and to maintain the system. With this blog post, we provide a useful way to start preventing further decay of a grown software system by cleaning up the code.

With decreasing code quality, changes become harder to perform and, hence, more expensive. Research and practitioners have proposed many different static code analyses (structural metrics, redundancy measurements, test assessment, or bug pattern detection) that reveal quality defects and assess the quality of your code. When these analyses are newly introduced to a software system, they typically reveal thousands of findings.

Confronted with a huge number of findings, you as a software developer lack a concrete starting point for a long-term software improvement process. Often, one does not fix any findings at all because who knows which findings are worth to spend the refactoring effort on…

In our research paper »Prioritizing Maintainability Defects by Refactoring Recommendation«—accepted for publication at the International Conference on Program Comprehension, ICPC, Hyderabad, India (2. – 3.6.2014)—we address the problem for two specific types of findings: code clones and long methods. To help you getting started with cleaning up your code, we provide an analysis of the low hanging fruits in the refactoring process. We identify those quality defects in your code, which are very easy to remove with a common refactoring pattern. Examples include duplicated methods that can be pulled up to an existing super class or long methods that have a long block (for example, a for- or while-loop) that can be extracted into a new method. We detect these low hanging fruits with a heuristic data flow analysis on source code that finds extractable block of codes with a definition-usage analysis of local variables.

In a case study with developers, we investigated how useful it is to remove the low hanging fruits. We showed each quality defect which was identified as easy to remove to a developer and asked him, if he would sit down and indeed spend the time to remove it. It turned out that roughly 80% of the low hanging fruits were accepted for removal. Hence, our analysis does not only reveal findings that can be refactored quickly but also findings that developers considered worth to remove.

Interestingly, the reasons why developers decided not to remove a certain finding varied for code clones and long methods. The reasons not to remove a long method were mostly obvious from the code itself: methods that contained only UI code, only logging code, or methods that only filled data objects. In contrast, the reasons not to remove a clone could be hardly recognized from the code itself, but required context knowledge about the entire system or even external knowledge about its environment. In one case study system, developers did not remove several clones because they all belonged to two entirely cloned components. Hence, the developers rather wanted to wait for a gerneral design decision how to proceed with the two cloned components in general before starting to fix individual clones. In another case study systems, the reasons for the redundancy in the code actually stem from redundant XML schemata on which the code is based. As removing the redundancy in the XML schemata was out of scope for the developers, they could not remove the clones in their code, either.

To wrap it up, the approach presented in our paper, helps you as a developer to find the low hanging fruits in the refactoring process. It provides you an easy, but also useful starting point if you want to clean up your code.