Living in the #ifdef Hell
Posted on 10/28/2015 by Dr. Florian Deißenböck
C/C++ programs often use conditional compilation to implement variations of a program. While conditional compilation is extremely flexible and easy to use, it leads to code that is hard to maintain. Using examples from open-source systems, this post demonstrates why such code is often referred to as the »#ifdef Hell« and what can be done to keep conditional compilation in check.
C and C++ (like many other languages) feature a preprocessor that prepares the source code before it is handed to the actual compiler. This preprocessor offers features like file inclusion, macro expansion and conditional compilation. Conditional compilation allows to exclude parts of a source code file from compilation by the C/C++ compiler if a condition is met. To support this, the preprocessor provides a set of if-else directives. An example is shown in the figure below (taken from the Firefox source code but actually an arithmetic module provided by IBM¹). The code between the #if and #endif directives will only be compiled if the macro DECSUBSET is defined, making the code use a special arithmetic subset defined in ANSI X3.274. Otherwise the code in the red frame will not be compiled (and, hence, never executed).
In many systems this mechanism is used to express variations that cannot or should not be handled at runtime. Types of variation include:
- Support for specific features. Conditional compilation can be used to enable/disable specific features, e.g. the special arithmetic subset in the example above.
- Support for different hardware platform. Conditional compilation can be used to implement hardware specific code, i.e. code that needs to be written differently for different hardware platforms.
- Feature Toggles. Conditional compilation can be used to temporarily disable features that are not complete yet. This enables something akin to version control branches without actually using branches.
- Logging and Tracing. Conditional compilation can be used to remove logging statements for debugging purposes from production code.
- Compiler Switches. Conditional compilation can be used to make the code work with different compilers.
Conditional Compilation considered Harmful
Conditional compilation is extremely flexible and, hence, often used to implement variants. However, it creates several challenges in the code and in the architecture as well as for testing.
Negative Impact on Code Understandability
The code easiest to understand is code without any conditions; just one statement after the other. It is easy to understand because for each statement it is crystal clear under which condition it is executed: always! Vice versa, every condition, e.g. an if or while statement, makes code harder to understand. This is true for »normal« conditions that are evaluated at runtime as well as for compile-time conditions used by conditional compilation. However, compile-time conditions add a completely new layer of complexity as they are not part of the programming language, e.g. C++, but part of the preprocessor language. Particularly, if runtime and compile-time conditions are intermixed (as in the example above), the reader of the code always has too keep track of the two language layers. Hence, the mental load for understanding code using conditional compilation is high.
As the preprocessor defines a language of its own, one can easily create source code artifacts that are valid w.r.t. to the preprocessor but not w.r.t. to the C language. This, however, is only discovered after the preprocessing step. Moreover, it allows to use preprocessor conditions literally everywhere. For example, even within a runtime condition which obviously makes it more difficult to understand. This is illustrated by Example #2, again taken from Firefox (inflate.c²), where a compile-time condition is used to »inject « a ternary expression into a runtime condition. Liebig and other researchers refer to this type of compile-time conditions as undisciplined preprocessor annotations.
Example #3 is taken from the Linux source code (atariNCR5380.c³) and shows how difficult things can be become in code that uses lots of compile-time conditions.
Negative Impact on the Architecture
In the most simple case a system doesn’t have any variation, i.e. all customers get exactly the same binaries. Even then, lifecycle management is non-trivial as usually different versions of these binaries float around. If you add variants, however, things quickly become a lot more complex because you are essentially maintaining not one but multiple systems. This becomes most obvious in the area of testing as you don’t have to test one but multiple systems (see below). All this is still manageable as long as you are dealing with few, clearly defined variants. If you face a proliferation of variants, however, things can get easily out of control.
Hence, variation should not simply happen but be a central architectural concept that is as explicit as possible. Also, variation points should be few and limited to certain places in a system. Conditional compilation, however, can be used everywhere and, hence, is bound to grow in an uncontrolled manner. To illustrate this, the following treemap shows the source code of Firefox version 41.0.2. Each rectangle in the treemap symbolizes a .c or .cpp file. The size of the rectangle reflects the size of the file measured in lines of code. The colors in the treemaps show if a file uses conditional compilation (red) or not (green)*.
The analyzed code comprises 9,608 files that contain about 5.6 million lines of code (MLOC). Of these, almost 3,000 files contain conditional compilation. These files contain about 3.5 MLOC, i.e. more than 60% of the code. As the treemap illustrates, conditional compilation is not limited to specific parts of the system but almost omnipresent.
I did not perform a historical study for the system and, hence, do not know if the amount of conditional compilation grew over time. However, my experience is that it is hard to concentrate conditional compilation in specific parts of the system. If a system uses conditional compilation to the extent the example above does, variation obviously becomes hard to handle as it becomes virtually impossible to actively manage the variants of the system.
Negative Impact on Testing
Next to the effects on code understanding and architecture, conditional compilation also poses a major challenge for testing. If you attempt to cover all the paths in a piece of code that have compile-time conditions, you have to create a test setup that compiles the code for all combinations of the conditions (if there are finitely many). And for each combination you have to execute all the test cases that cover the runtime conditions. This makes testing a lot harder as the setup required for re-compiling the variants is complicated. Moreover, the additional compilations steps take time and thereby lengthen the overall test execution time.
Other Negative Impacts
While the list of problems associated with conditional compilation is already quite long, there’s more to come:
- Code with conditional compilation is known to be hard to reuse as the required preprocessor macros must be set, too. This in, turn, often leads to the use of copy&paste programming to reuse code. As several posts on this blog have pointed out, e.g. Benjamin’s, this is a quality defect by itself.
- Tools that analyze the source code are often challenged by conditional compilation. This applies to metrics and quality analysis tools but, most importantly, to refactoring tools. It is incredibly hard to implement a safe refactoring tool for source code with compile-time conditions and in some cases, impossible. The lack of good refactoring tools, however, makes creating and maintaining high-quality code a lot harder.
What to do about it?
I hope the above convinced you that conditional compilation is a problem for software maintenance. But what to do about it? From my point of view, the answer strongly depends on your goals and the position of your system in its lifecycle. Hence, the following paragraphs outline strategies for avoiding, managing and removing conditional compilation.
Avoiding Conditional Compilation
When implementing new systems, I strongly advocate an implementation of variations without conditional compilation. This doesn’t mean that all variations have to be managed at runtime. If compile-time management of variations is required, however, it should be on coarse-grained level, e.g. by including or excluding complete files. This can be managed with the build system or the now common dependency injection containers (which technically introduce load-time variants).
Managing Conditional Compilation
The big question, however, is how to deal with with grown systems that already exhibit a fair amount of conditional compilation. In my experience, it is rarely possible to sit down and fully re-engineer the system for several reasons. First, there is usually no time and budget to do this. Second, this type of re-engineering is highly error-prone; particularly if there are no good automatic tests. Hence, I propose an approach to manage the existing conditional compilation. This management should cover the following aspects:
- Conditional compilation must be limited to specific areas, e.g. selected architectural components or layers. In these areas conditional compilation will be tolerated; outside of them it is prohibited.
- The current state of the system is accepted as a baseline, i.e. no active steps are undertaken to remove conditional compilations, even outside the designated areas. However, new code outside these areas may not use conditional compilation.
- Rules for the use of conditional compilations must be agreed on, e.g. it is allowed to to enable/disable whole blocks of code but not to enable/disable individual tokens (undisciplined preprocessor annotations).
- Baselining applies for these rules, too: old code that violates the rules is tolerated, new code must adhere to them.
- A process needs to be established that identifies new code with conditional compilation outside the designated areas as well as undisciplined conditional compilation. All violations in new code must be addressed swiftly.
- The processes must be supported by a tool like Teamscale that supports the analysis of preprocessor directives as well as baselining.
Removing Conditional Compilation
With such a management of conditional compilation, one cannot improve the quality of the code but ensure that things don’t deteriorate further. Which is a lot better than doing nothing about the problem at all. If the system is expected to live a lot longer, one should seriously think about complementing the management strategy with an actual improvement strategy. For this, I propose the following steps:
- An analysis of the current use of conditional compilation. Questions that should be answered are: 1) Why do we use condition compilation? To accommodate conflicting customer requirements or differences in hardware or something else? 2) Where in the system do we use conditional compilation? 3) Are the different types of conditional compilation (e.g. customer-driven vs hardware-driven) clearly separated or intermixed in the same modules or even files? 4) Which variants do we really have, i.e. which combinations of compile-time switches lead to variants we actually use? 5) Do we use conditional compilation in a disciplined manner, i.e. only to enable/disable whole blocks of code and not to enable/disable individual tokens?
- An analysis of the type of variation that is actually required. We found that an in-depth discussion of this seemingly fundamental question often brings interesting insights, e.g. about hardware platforms that are supported but used by none of the customers. Often the type of variation that is really required is not obvious to everybody. Hence, this analysis can only be carried out with all relevant stakeholders in place. In particular it requires the involvement of technical-minded people (developers, architects, testers,…) as well as business-minded people (product owner, sales). The latter ones are required as they are often the driving force behind variants (»But customer X really requires this to be the other way round.« or »We could really sell this to Y if it was the other way round«).
- After this, it should be analyzed which types of variation and which concrete variants are not required anymore. These should be removed.
- Based on this analysis, it should be discussed which types of variation could be addressed by other means than conditional compilation. In some cases, the system already provides other variation mechanisms, e.g. plugins, that are not used everywhere because a lot of the conditional compilation pre-dates the introduction of these mechanisms. These mechanisms are obvious candidates and should be preferred over mechanisms that are completely new to the system.
- Only after all these analysis steps have been carried out thoroughly, one can create a project plan outlining the concrete steps required to re-engineer the system.
- Once the plan has been made, the parts of the system affected by the re-engineering have to be analyzed w.r.t. to their test coverage. If these parts are not covered sufficiently by automatic tests, I highly recommend to cancel the whole endeavor or to invest in these tests first (which is a good idea anyway if the system is supposed to live for another couple of years).
- During the re-engineering (which may take several years) the lightweight management process introduced above should always be in place to ensure that other development activities that run parallel to the re-engineering do not make things worse.
Conditional compilation is technique that is widespread in C/C++ and other languages that feature a preprocessor. Nevertheless, it is known to make systems hard to maintain. To deal with it, one needs to define clear rules and must ensure that these rule are adhered to. Removing conditional compilation from systems is hard but achievable and worthwhile if the system is expected to be long-lived. If you are faced with maintaining a system infested with conditional compilation and don’t know how to go forward with it, please don’t hesitate to contact me. We at CQSE will not be able to magically solve your problem but we provide an in-depth analysis of your system and the right tools to manage conditional compilation in large and grown code bases. This may help you and your team to survive the #ifdef hell.
Conditional compilation is a topic that has been well researched over the last couple of years. If you are interested in the scientific background, I can recommend the following papers that also served as inspiration when writing this post:
- Jörg Liebig et al.: Analyzing the Discipline of Preprocessor Annotations in 30 Million Lines of C Code, AOSD 2011
- Sandro Schulze et al.: Analyzing the Effect of Preprocessor Annotations on Code Clones, SCAM 2011
- Janet Feigenspan et al.: Do Background Colors Improve Program Comprehension in the #ifdef Hell?, Journal of Empirical Software Engineering 2013
¹ Copyright © IBM Corporation, 2000–2012. All rights reserved. This software is made available under the terms of the ICU License – ICU 1.8.1 and later.
² Copyright © 1995–2012 Mark Adler
³ Copyright © 1993, Drew Eckhardt
* To determine the color, each source file was searched for the #endif preprocessor directive. The search was performed on the token stream, so comments containing the string »#endif« were excluded. If a file contains two or more #endifs, it is colored red. If it contains zero or one, it is colored green. Considering only files with at least two #endifs to contain conditional compilation was used as a simple analysis heuristic to exclude include guards from the analysis. Overall 34,843 instances of the #endif preprocessor directive were found in the Firefox source code.