Quality Control for Matlab Code

Posted on 11/03/2016 by Dr. Corneliu Popeea

CQSE uses Teamscale for code quality control for our customers and our own code. The spectrum of programming languages that we support is large, including C#, ABAP, Java, JavaScript and Matlab.

In this post, I give an overview of establishing a code quality control process for a Matlab codebase, like we did for one of our customers. Some of the code examples are taken from three popular open-source applications listed at the Matlab Central webpage: export_fig, T-MATS and CNN-for-Image-Retrieval.

Discussion of relevant code quality criteria

One objective of a code quality control process is to keep a codebase in an easy-to-maintain state. To achieve this objective, we discuss with a customer the relevance of the following criteria:

  • Important guidelines should be documented in a coding guidelines document and followed consistently in the codebase.

  • Important architectural decisions should be documented in form of architectural diagrams. The diagrams should be consistent with dependencies in the codebase.

  • Code should be well structured into easy-to-understand entities, files, functions, code blocks.

  • One conceptual change should be implemented in few different location in code that need to be kept consistent.

  • Functions that are part of an API should be thoroughly documented in code. Exception cases too.

Concrete steps towards establishing the process

Based on the previous set of code quality criteria, the following activities are undertaken to formally define a control process based on five code quality indicators:

  • Coding Guideline Violations: We analyze the customer-specific coding guidelines document and propose a subset of the rules that can be meaningfully checked using available static analysis tools. An example that is important to follow are naming conventions. There is no consensus in established programming style recommendations for Matlab over naming conventions for function names, various consistent schemes being proposed: either alllowercase, or lowerCamelCase, or UpperCamelCase identifiers (for details, see this guideline document authored by Richard K. Johnson). In any case a mix of more than one convention can cause confusion.

  • Architecture Conformance: We analyze architecture documents. Then we propose one or more architectural views that can be modelled in Teamscale and checked for consistency with respect to dependencies in code.

  • Code Structure: Regarding structure of code in entities, we decided on the following thresholds:
    • Ideally, a file should have a size up to 300 SLOC, a close-to-ideal file should have size between 300 and 750 SLOC, a file that has more than 750 SLOC is bad structured.
    • Ideally, a function should have a size up to 50 SLOC, a close-to-ideal function should have between 50 and 75 SLOC, a function that has more than 75 SLOC is bad structured.
    • Ideally, a code block should not be nested deeper than level 3, a close-to-ideal code block would be nested at level 5, a code block nested deeper than level 5 is bad structured.
  • Duplicated Code: Regarding multiple code blocks that are cloned, we decided to focus on cloned code blocks with length of at least of 10 statements. Cloned blocks are a maintainability challenge, since they need to be consistently updated.

  • Documentation: Identify the subset of entities that should be documented in code. An objective could be that all functions are documented, in this case the interactive command ‘help’ provides a useful output when used in the Matlab IDE.

After agreeing on the above configuration, we setup a Matlab analysis profile and let Teamscale analyze the code quality reguarly (i.e., incremental analysis and timely feedback to developers for each commit).

Initial feedback from Teamscale

To illustrate the feedback one can expect from a code quality control process based on Teamscale, this section includes some findings/observations based on the analysis of the open-source applications mentioned at the beginning of this article. Note that neither of the three applications is regularly analyzed with the static analysis tool Teamscale.

  • Coding Guideline Violations: Regarding function naming conventions, two of the three codebases use indeed a mix or more than one coding conventions, which is likely to cause confusion.

  • Architecture Conformance: No architecture views from any of the three applications were available, so this post does not include any observations on architecture conformance.

  • Code Structure: Only one file of all three codebases is longer than 750 SLOC. A screenshot from Teamscale shows how the number of SLOC of this file evolved over the last two years, as additional functionality accumulated in the same file. A second screenshot shows an example of a deeply nested code block: the assignment to the variable skipNext from line 973 is placed in scope of three nested conditionals, of a try-catch statement and of a switch statement. (Not shown in the screenshot, the switch statement is in scope of two more additional conditionals and of a for loop.) A typical suggestion would be the extraction of an utility function that handles the otherwise case of the switch statement.

  • Duplicated Code: One codebase shows many findings since two different versions of MatConvNet (a MATLAB toolbox implementing Convolutional Neural Networks) coexist. An additional screenshot shows two cloned blocks, the left block from the function TMATSC_compressor(..) and the right block from the function TMATSC_turbine(..) have significant code logic in common. A typical solution would be the extraction of an utility function that is invoked from both original functions.

  • Documentation: Regarding function commments, two of the three codebases have function comments for almost all functions defined in the codebase. For the third codebase, Teamscale provides an overview of the subfolders for which the completeness of code documentation could be improved.

Conclusion

To summarize, this article lists the conceptual steps to establish a code quality control process for monitoring five code quality indicators on a continuous basis for a Matlab codebase. One way to understand the effect of the control process is to relate it to the book The Elements of Matlab Style by Richard K. Johnson that is the standard reference for guidelines to create Matlab code that is easy to understand, enhance, and maintain. With our control process we aim to bring the solid programming principles from the book to the actual application codebases.

I would be interested to know which aspects of such a code quality process would you use during active development of an application implemented in Matlab. Which of the guidelines listed in this book would you consider to be relevant when developing your application. Let me know your opinions via comments. If you want to try Teamscale yourself, download your free trial today.