Migrating our JUnit 3 Tests to JUnit 4

If you are developing a legacy code base with tests, you might still be stuck with JUnit 3—up until recently, I was in the same boat. However, we decided to take the plunge and migrate to JUnit 4. It was well worth it! If you want to know why and how we did it, read on…

Types of Tests

Let’s talk a minute about testing and test types in general, just to establish a common understanding of terms and concepts. By now, the fact that proper testing is one of the most essential pillars supporting professional software engineering is all but undisputed. Individual understanding of what exactly »proper« might mean in this context differs, but some guidelines are generally accepted: Good tests ought to be automated, fast, specific and as independent as possible. On the other hand, there are always concerns counteracting these goals. For example, a UI test is by its nature unspecific (and often none too fast, either), but in lots of scenarios it’s still the best you can aim for.

From these requirements, a pyramid shape (sometimes called the test pyramid) that reflects how often each kind of test should appear almost naturally emerges. At its bottom, making up the bulk of the pyramid, is a multitude of unit tests—small and independent, focusing on testing a single unit (often just a single function or, at most, a class). Here, dependencies are mocked away instead of supplied. Note: UI code can be unit tested just the same as backend code, using e.g. Jasmine. In this case, I still refer to the test as a unit test.

In the middle of the pyramid, we find integration tests that make sure that components that have passed individual unit tests also work as intended in concert. Boundaries of individual units are crossed so that classes and packages can talk to each other. However, the testing environment in which they live is still artificial (i.e. mocked).

The first time a complete system is fired up is during the aptly named system test located in the higher parts of the pyramid, housing only a handful of tests. In a system test, a real live instance of the system is free to perform the I/O operations and expensive computations that are frowned upon in the lower layers.

At the very top of the pyramid are UI tests, in which the execution path of every test begins or ends in the user-visible frontend of your application.

These are the kind of tests we are considering in this post—if you are unlucky, the enterprise you are working for may also include manual tests in the test pyramid one step above the UI tests, in which case you might want to keep an eye open for new job opportunities ;)

The Test Pyramid

By the way: All of the above tests are usually automated using a testing framework such as JUnit, TestNG or Spock. This is unfortunate insofar as automating an integration test using JUnit does not make it a unit test—it just makes it a JUnit test, i.e. a test that can be executed using the JUnit testing framework. It’s a good habit to get into to always make the distinction between automated tests and proper unit tests in order to cut short potential confusion on this point.

Motivation for the Migration

At CQSE, we maintain a code base of about 725k LOC for our software intelligence suite Teamscale. Of these, about 112k LOC belong to tests. These tests include unit tests, integration tests, a handful of system tests and UI tests. However, many of these tests were written during a time when JUnit 3 was the de-facto standard. Consequently, for some of these tests we had to implement functionality that JUnit 4 would have provided out-of-the-box. In particular, we were missing parameterized tests and a rule system.

One example where not having parameterized tests was hurting us appeared in our »custom checks« framework. These are quality checks on the specifics of a single programming language, like avoiding using == in JavaScript. The structure of the unit tests for these custom checks was always the same: load some test input for a language and run it against that language’s checks. To simplify this, we wrote some code to dynamically create JUnit 3 test cases based on the language and its test data—sounds suspiciously like parameterization, doesn’t it? A sure way to recognize that you need to upgrade your framework is when you find that your own implementation duplicates functionality from a new framework release!

Another problem was the structure of our UI tests. These were not implemented as independent test cases but as »testlets«, executed by one overarching UI test. The UI test was also responsible for bootstrapping the test execution environment which consisted of a full-fledged Teamscale server instance complete with some test data. Additionally, it performed intermittent resets of the environment in between runs of different test cases so every test had a clean execution environment.

On the one hand, this had the advantage of being faster than starting up separate Teamscale instances for all the testlets. On the other hand, the structure was one more thing to be learned and understood by developers working on UI-relevant code. Adding insult to injury, it made debugging harder because there wasn’t an obvious and easy method to execute a specific single testlet by itself. If you wanted to do that, you had to manually remove all other testlets from the UI test registration routine for that test run, which of course leaves something to be desired.

Upon the release of JUnit 4, even more technical debt started getting added to the test code base since our historic Ant build (which, by the way, has been migrated to Gradle in the meantime) still used a JUnit 3 Testrunner by default. This meant people having to wrap their JUnit 4 tests explicitly by annotating them with @RunWith(JUnit4.class) and providing a suite method like this:

public static junit.framework.Test suite() {
    return new JUnit4TestAdapter(AbapCoverageUploadServiceTest.class);
}

In the interest of removing such pain points and, of course, forward compatibility (with JUnit 5 already looming beckoning on the horizon), we decided to go ahead and remove all JUnit 3 references from our code. Other frameworks were not considered, since we were already familiar with JUnit 4 and did not expect significant improvement from a switch to a different framework.

Migration Procedure

The migration itself was fairly straightforward. Our migration task force worked on a shared feature branch, which is perfectly suited for this kind of group work—it always gave us a shared frame of reference for what was already working and what still needed to be done. If you need some inspiration on the general workflow, StackOverflow has a fairly complete rundown of everything you need to do. In our case, I used NotePad++ with its handy »Search and Replace in Files« feature, though sed would have done the job just as well. The only slight difficulty here was that I didn’t want to touch files serving as raw test data for our parsers, which in our system reside in the folder test-data. This turned out to be something of a hassle because there’s no easy way to exclude specific directories in a recursive search and replace. In the end, I just performed the replacements on all files and afterwards locally resetted the files in the test-data folders using Git. To make sure that no tests got lost along the way, I did a before/after comparison of the Jenkins test results.

The other big area of work were our UI testlets. We ended up restructuring them so each testlet could be run as an independent JUnit test (which would set up its very own Teamscale instance), while at the same time keeping the possibility of running all of them in sequence, sharing the same Teamscale instance. The ExternalResource Class Rule offered by JUnit 4 came in very handy for this: it let us model the running Teamscale instance as an external resource for the tests, to be started before the first test run and stopped after the last one. If you have not yet familiarized yourself with JUnit 4 Rules, you definitely should; they are quite powerful.

Unfortunately, to share the running Teamscale instance among all of the tests, we had to resort to a Singleton pattern that keeps track of the current Teamscale instance. This is somewhat justified by not being able to have more than one running instance of the server (on the port used by the test configuration) in any case, but it still strikes me as ugly. If you had the same problem and solved it differently, please let me know in the comments!

During the migration, we tried to be careful to let everybody know what we were doing at all times. It’s important to remember the value of transparent communication even for rather technical tasks such as this one.

Results

We have already started seeing the pay-off for our efforts:

  • Writing and maintaining tests is easier now, because there’s just one single framework we need to know about (instead of mixing JUnit 3 and 4)
  • Test method names can be a bit more descriptive because they don’t have to begin with »test« any longer
  • We have a better overview of disabled test cases (when you just can’t get around that), because @Ignore is explicit rather than just renaming a test method and then forgetting about it. To be fair, the convention we had before, which was to annotate such methods with a TODO, also worked quite well. Teamscale can pick up those TODOs and remind you about them.
  • We recently added a JUnit 4 Rule to take a screenshot and gather additional context data on UI test failures. It is quite unclear how such a behaviour could have been implemented in the old system, at least with the same degree of elegance.

Last but not least, we are now ready for the next migration to JUnit 5 which is just around the corner—we’re keeping our fingers crossed for an early 2017 release! Hopefully, our preparatory work will enable us to switch with comparatively less effort than it would have taken us coming directly from JUnit 3. Judging by the milestone releases it certainly looks that way, so we are excited to see what’s coming next and optimistic that we have done our best to meet the future heads-on.