Lessons from code reviews in the wild (part 2): Get rid of unnecessary code

Posted on 03/21/2016 by Martin Pöhlmann

One side-effect that I have observed from performing code reviews for years is that the code after review is mostly way shorter then it was before. Reducing the size of code will increase maintainability as we have less code to read and comprehend in the future. This stresses one of the primary goals of code reviews: Producing clean and concise code that is easy to understand.

In my last blog post I explained why there can never be an exhaustive checklist for manual code reviews, but a reviewer has to use his own rationale instead. For example, I especially take a look at code that may be written in a more concise fashion to reduce its overall size. As in the previous blog post I want to refer to the article of Kasper B. Graversen on the changing notion of code readability. He mentions one interesting fact, that I fully agree with:

Beginners do not write verbose code out of malice. They do it to help themselves. To make the code readable to them.

In this blog post I want to show a few examples that support this statement and shortly outline how to improve readability and how the overall code size will shrink with these refactorings. The examples are written in C# but apply to other languages as well.

Caveat

Let me warn you beforehand: This post contains my personal opinion on clean and concise code. Hence it is highly subjective and you may agree or disagree at some points.

Unnecessary corner cases

You are familiar with the code and different ways you tried getting something implemented. However, some code may still be reminiscent of what you had in mind writing at first, but discarded afterwards. This can easily be overlooked by the developer but will leave questions and uncertainty for everyone who touches the code in the future. One crucial part of code reviews is to get an neutral opinion on your code, hence the reviewer might detect these unneeded reminiscences. A trivial example of this are superfluous null-checks like this:

1var list = new List<string>();
2
3// some code obviously not assigning to list
4
5if (list != null)
6{
7  // do something
8}

You may have started working with an uninitialized list at first but changed the code to initialize the list beforehand. As redundant or unnecessary null-checks express uncertainty of the variable state, I recommend to remove these.

Unnecessary specialization

At some point you may also lose yourself in the special case you have to address. The reviewer may have a look at your assumptions as outsider and should think about whether they hold in general. For an example let’s have a look at the representation of a symbolic link

1class SymLink
2{
3  public string Source {get; set;}
4  public string Target {get; set;}
5}

Consider you have a list of these symbolic links and you want to check whether you have already a link targeting the same file as another given link. In a naive implementation you might want to simply override the Equals method of the SymLink object like this:

 1public override bool Equals(object o)
 2{
 3  SymLink link = o as SymLink;
 4  if (link == null)
 5  {
 6    return false;
 7  }
 8
 9  return this.Target == link.Target;
10}

In the code that will check whether a link is targeting the same file you may then simply use listOfLinks.Contains(otherLink). While the code will perfectly work for your case, it will have the following drawbacks:

  • You are making the assumption that two links are equal when the target is equal. However, there might be cases where this is not what you want to check.
  • In the special case of overriding the Equals method you have to further ensure to also provide a method GetHashCode, add the override keyword and actually not overloading the method by specifying a second method bool Equals(SymLink link).

The above implementation looks like a simple solution to the given problem but has several implications that need to be considered. It exploits the equality of objects and is better implemented focusing on the exact problem instead of specializing a method that is intended for something else:

1public bool HasSameTarget(SymLink link)
2{
3  return this.Target == link?.Target;
4}

On the calling side one can use listOfLinks.Any(SymLink.HasSameTarget) to check whether a link with the same target exists. Again, the overall size of the code will be reduced after the refactoring. Also note that the identifiers on the calling side express now exactly what we are searching for (Any SymLink HasSameTarget) and we do not have to jump in the definition of the method to understand its intend.

Unnecessary local variables

1var result = CalculateResult(/* pass some data */);
2return result;

What we see here is a pattern that I often encounter: before returning the result of a method, it is stored in a local variable instead of simply writing return CalculateResult(..);. Obviously the local variable is not needed, but why is the code written like this? Some explanation may be that the programmer wanted to insert a breakpoint on the line containing the return statement to inspect what value result actually contains in the debug window of the IDE or that he/she had a debugging statement in between the two lines while writing the code (e.g., Console.WriteLine(result);).

I prefer (and suggest during reviews) not having an extra local variable and directly returning the result. For debugging purposes most modern IDEs (like VisualStudio or Eclipse) allow inserting a breakpoint at the end of a method or step over the last statement. Hence, one can easily inspect what the actual return value is and remove the local variable.

Unnecessary complicated control flow

Another important aspect I look at during reviews is the control flow within methods. I aim to remove nested code whenever possible as each nesting puts some constraint on your mental stack that you have to keep track of when reading and understanding a method. The following (not-so-complicated) example tries to explain this:

 1public string GetUserName(User user)
 2{
 3  string name;
 4  if (user != null)
 5  {
 6    name = user.Name;
 7  }
 8  else
 9  {
10    name = null;
11  }
12
13  return name;
14}

As consequence of having a single return point at the end of the method, the variable name has to be defined at the beginning and is in this case even uninitialized (which may be error-prone if writing code in a language that does not pre-initialize variables with e.g., null). There are two solutions to this:

  • Initializing the variable with the default value intended to return if some constraints do not hold.
  • Handle constraints first by usage of immediate return statements, e.g., by first checking whether user is null and simply return null in that case. Afterwards one can safely access properties of the user variable. The code may be simplified as follows:
 1public string GetUserName(User user)
 2{
 3  if (user == null)
 4  {
 5    return null;
 6  }
 7
 8  return user.Name;
 9}

The up-to-date reader may notice that this could be written even shorter using recent programming language features: public string GetUserName(User user) => user?.Name;.

In general I am highly objecting the single-method-return-point pattern and the related define-all-method-variables-at-the-top pattern as they make code complex and hard to read. Let me explain my opinion with another example. We are searching a list for a user with a certain name and return this user if found, otherwise we return a guest user. With both previously named anti-patterns, the code may end up like this:

 1public User FindUser(List<User> users, string userName)
 2{
 3  User result = null;
 4
 5  foreach (User user in users)
 6  {
 7    if (user.Name == userName)
 8    {
 9      result = user;
10      break;
11    }
12  }
13
14  if (result == null)
15  {
16    result = GuestUser;
17  }
18
19  return result;
20}

This may be refactored to initialize result with GuestUser at the top of the method, but a cleaner solution, that does not follow the single-method-return-point pattern, simply returns the found user from within the foreach loop.

 1public User FindUser(List<User> users, string userName)
 2{
 3  foreach (User user in users)
 4  {
 5    if (user.Name == userName)
 6    {
 7      return user;
 8    }
 9  }
10
11  return GuestUser;
12}

Again you might argue that with recent language features this can be expressed even more concise using language integrated queries in C# or the Java8 Collection Stream API, but that is not what I wanted to show. Especially if conditions get more complex having chained filter/map/reduce queries are harder to read as the well-known iterative approach.

Summary

These are just a few examples that I put emphasis on when reviewing code with respect to clean and concise code. As already detailed in the predecessor post, this list is not exhaustive. I could go on with unnecessary type conversions, redundant statements, unneeded backing fields for type properties, …

In the examples we see how maintainability may be improved twofold: First, by simplifying the code by removing unnecessary parts and rearranging the control flow. Second, refactoring the code will mostly reduce its size, so we have less code to read in the future.