We have had countless discussions about code clones with the teams responsible for their maintenance. These teams generally accept that some of the clones in their software are the product of copy & paste. In many cases this is obvious, since the clones share some quirks that can only be plausibly explained by copy & paste (e.g. identical typos in comments).

 

One hypothesis that comes up time and again, however, is that some or many of the clones were not created by copy & paste, but instead were written independently and then evolved into the same form.

 

Convergent Evolution

 

This hypothesis reminds me of convergent evolution, where environmental factors drive independent evolution of similar traits in species whose ancestors do not show those traits.…

Learn more

Dr. Benjamin Hummel

Every software system has been built by copy & paste at least to some

degree. Some of this redundancy is caused by technical limitations in

the languages and tools. Some is caused by time pressure, where

duplicating and modifying an existing piece of code is faster than

developing suitable abstractions. Some is also caused by the laziness

of the developers or missing awareness of possible long-term

consequences. While these duplications often work fine when they are created, they typically cause additional maintenance overhead during software

evolution, as duplicates have to be changed consistently. But the real risk of these duplications is in inconsistent changes, as this frequently leads to actual bugs in the system.

Learn more

Dr. Lars Heinemann

If you are controlling software quality in a continuous manner, the absolute values of the quality measures at a specific point in time are often not the most important thing. What matters more, is the direction in which you are going. While there may be short-term decreases in quality over time, e.g. due to larger refactorings, the general trend should be towards improvement. An effective way to determine this trend is a Delta Analysis.

 

A Delta Analysis compares two snapshots of the code base and determines how the changes in the time frame affected the software quality. To do this correctly, a tool has to be able to differentiate between old and new quality deficits (we call them findings). Many existing tools have major limitations in their tracking of…

Learn more

Dr. Nils Göde

Almost every long-living software system has accumulated an abundance of quality deficits over time. It’s not only impossible to remove all findings, I also do not recommend to do that. You may very well argue to not remove any legacy finding by saying »It has worked all the time« or »It wasn’t me who introduced the problem«. But then—on the other hand—you should make sure that you don’t introduce any new problems. To check this, you need a tool that can reliably differentiate between legacy findings and findings that have been recently introduced. This has to work also if directories of files are renamed, code is moved between files and findings change their appearance in the code. Making the distinction between legacy and recent findings is one of the many…

Learn more

Dr. Andreas Göb

As my colleague Fabian explained a few weeks ago, a combination of change detection and

execution logging can substantially increase transparency regarding which recent changes of a software

system have actually been covered by the testing process. I will not repeat all the details of the

Test Gap Analysis approach here, but instead just summarize the core idea:

Untested new or changed code is much more likely to contain bugs than other parts of a software system.

Therefore it makes sense to use information about code changes and code execution during testing in

order to identify those changed but untested areas.

 

Several times we heard from customers that they like the idea, but they are not sure about its

applicability in their specific…

Learn more

Code quality audits aim to assess the quality of a system’s source code and identify weak points in it. Two areas of the quality audits that have been discussed in the previous posts by my colleagues are the redundancy caused by copy/paste and the anomalies that go undetected unless static analysis tools like FindBugs are used periodically to check the source code for defects. In the following, I will outline a small experiment meant to see whether the findings of the static analysis tool FindBugs reside in code blocks that have been copied over in other parts of a system’s source code. To illustrate this experiment, I will use a »Big Data« open-source project, namely Apache Hadoop. It is worth mentioning that, related to its code quality, Apache Hadoop was…

Learn more

Continuous Integration (CI) is one of the dominating terms when talking about software engineering methods with developers. The initial idea behind continuous integration was merging the changes performed on developers’ working copies several times a day, in order to prevent integration problems. By keeping the changes since the last integration small, new integration problems are much easier to analyze and to solve and thus the »integration hell« becomes less scary.

 

Learn more

Fabian Streitel

When you say »software test«, most people will immediately have a mental picture of automated unit tests, continuous integration, fancy mutation testing tools etc.

But in reality a large portion of testing activities is still manual, i.e., someone clicking through the UI, entering values, hitting buttons and

comparing on-screen results with an Excel sheet.

Such tests are sometimes exploratory (i.e., the tester performs some random actions in the UI and reports any bugs he may or may not encounter on the way) and

sometimes they are structured (i.e., someone writes a natural-language document with step-by-step instructions for the tester).

 

In our long-running research cooperation with TU München and our partner HEJF GbR,

we have encountered large…

Learn more

Dr. Florian Deißenböck

Although there are numerous excellent resources about the correct handling of character encodings in software systems, many systems still get it wrong because their architects and developers never understood what this is all about.

This blog post aims to remedy this by focusing on the single most important rule for developers while abstracting away everything that is not required to understand this rule.

 

Learn more

Most of the posts in this blog focus on measuring and improving the quality of software systems.

When talking about software quality most of us think about the quality of its source code.

However, with the recent trend to continuously deliver new releases the quality of build scripts and thus maintenance costs are becoming increasingly important.

From auditing the build systems of our customers we know that coping with clones in build scripts significantly increases maintenance costs and hampers implementation of new build features.

This post summarizes how we compared 3,872 systems to interpret cloning in build scripts and examine how it can be avoided.

Learn more

Interested in our blog? Subscribe!

Get a short notification when we blog about software quality, speak on conferences or publish our CQSE Spotlight.

By submitting your data you confirm that you agree to our privacy policy.