Let me ask you a simple question: How many lines of code does your application comprise? If you do not want to rely on prominent gut feelings, it’s pretty easy to install some tool, run it over your system, let it do the counting and get a result. But is it the answer? Maybe. Depends on what you counted. Did you include test code? Should it be included? Did you exclude generated code? Oh, and what about that one component that was copied and pasted for some experiments – which, of course, did not work out in the end?

READ_MORE

Counting lines of code is a very simple metric. Yet, the scope of analyzed code has substantial impact. It does so even more for more complex metrics, such as clone coverage, for example, or comment completeness. Defining the analysis scope is what we call “code discrimination”. And whenever we use any sort of metric to gain insights about a software system, we devote significant resources to get it right.

To analyze software quality, we care about the application code of a system – the actual code running in production – as well as about the test code. Usually, we analyze it separatedly though, because many customers have different quality expectations on their test code. While application and test code is included in the analysis scope, generated code is excluded. As this code is re-generated rather than manually changed, quality aspects of the generated code do neither impact a developer nor the maintainability of a system – rather the input of the generator should be analyzed than its output.

Per design, generated code is often structurally similar and, hence, it can, for example, significantly distort clone detection results. If the code discrimination is not done properly, it is not surprising to end up with a clone coverage of 50% and above, while the clone coverage of the actual application code maybe only be about 7% or 8%. However, excluding generated code completely from the analysis is a somewhat defensive approach, because there are certain quality aspects such as potential null pointer dereferences that would also be relevant when occurring in generated code.

And then, there is a whole lot of other code, that should be excluded from the analysis scope, too. Very often, when developers are confronted with a finding, they respond with “Yeah, but we do not change this code.” Which is, often, in fact, true for third-party code or library code that does not have to be patched. As the developers are neither responsible for the quality of third-party code nor affected by its maintainability, we exclude it from the analysis. Also experimental code is excluded, along with old (i.e. unused) code, tool code or simulation code.

Depending on how well a system is organized, the code discrimination can be fairly easy. With the Maven standard directory layout, for example, application code may reside under /src/main/, test code under /src/test/ and generated code is directly put into /target/. However, many systems are not that well organized. Numerous of our code quality audits revealed that multiple factors can lead to a chaotic system organization, for example, the length of the system’s history, the size of the team, developer fluctuation, technology changes, or simply a lack of organisational discipline. For software systems that have grown over the years, discriminating the code properly can take days. For us as software analysts, this comprises the continuous manual effort of configuring and running our analysis tools, manually inspecting the results, discovering new code to dicriminate, reconfirming with the develerops, reconfiguring the analysis tool – repeat.

An unorganized system structure creates several problems during the sofware life cycle. First, for any new developer entering your team, it will be harder to get an overview of the system. But also for exisiting developers, they will face the question: “Is this code still needed?” multiple times, for example when migrating the code to a new technology or when refactoring central functionality. Let it alone come to testing and covering the application code. You cannot measure your code coverage on application code if you don’t really know what the application code is.