Incremental global ABAP security analysis with Teamscale

Our mission at CQSE is to help customers improve the quality of their code. Our tool Teamscale checks source code and reports issues such as logical flaws, copy&paste programming and possible performance bottlenecks.

However there is one aspect of code quality that we did not address so far: code security. Code is secure if it can not be used by an attacker to perform unintented, dangerous actions on the host system. For example, if an attacker inserts '; DROP TABLE Customers; in an input field, this might cause a system to delete the Customers table—a well known »SQL Injection« attack. In this post, I will explain how new analyses in Teamscale can efficiently detect vulnerabilities for such attacks and report them to developers.

Security Threats in Source Code

Security threats can be roughly categorized in three groups:

  • Code that allows attackers to execute own code on the host (e.g., inject a virus or logger),
  • extract confidential information, or inject wrong information.
Having good code quality (in terms of maintainability) helps to avoid these risks, since it is easier for developers to spot vulnerable statements in well-maintained code than in chaotic code. However, generally rising threat levels and regulatory requirements often require an automated check that explicitly focuses on known security issues.

We implemented a security analysis for ABAP (the main programming language on SAP systems). Our analysis finds occurrences of known security issues and shows them to the development team. The findings are integrated with the Teamscale infrastructure, which means that we can show, for example, a security-findings trend over the project history (e.g., over the last years). In combination with other knowledge about the system, an expert could use this information to roughly evaluate the system vulnerability.

One of the strong features of Teamscale is the ability to analyze the entire commit-history of a project. So how did we implement, typically time-intensive, security analyses in a context where we have to analyze every commit in a project? To answer this question, I will first explain why security analyses are typically expensive, and second, how we mitigated this cost.

The cost of global analysis

Our ABAP security analyses are, in part, based on a data-flow analysis tracking which values are stored in which variables during execution of the program. In the following example report, a parameter (defined by the user calling the report) is given to the CALL 'SYSTEM' command and subsequently executed by the operating system. Essentially, this allows the caller to execute any command on the host machine.

1REPORT  /some/test.
2PARAMETERS p_cmd TYPE string LOWER CASE.
3START-OF-SELECTION.
4  CALL 'SYSTEM' ID 'COMMAND' FIELD p_cmd.

This problem would not be hard to detect, even without a data-flow analysis. In the next, slightly more complex example, we used an intermediary function that receives and returns the user-defined parameter. Additionally, we define the intermediary function in a different ABAP source repository object.

1FUNCTION zfunc_intermediary
2  IMPORTING i_cmd TYPE string
3  EXPORTING result TYPE string. 
4    result = i_cmd.
5ENDFUNCTION.
1REPORT  /some/test.
2PARAMETERS p_cmd TYPE string LOWER CASE.
3START-OF-SELECTION.
4  DATA lv_local TYPE string.
5  CALL FUNCTION zfunc_intermediary 
6        EXPORTING i_cmd = p_cmd 
7        IMPORTING lv_local = result.
8  CALL 'SYSTEM' ID 'COMMAND' FIELD lv_local.

This is more difficult to detect since we have to handle the data flow across different variables, a function call, and the data flow in different files. Usually, data-flow analyses are implemented as intra-procedural analyses as analyzing all possible control flows in a procedure is already a computationally expensive problem. Even with optimizations that avoid enumerating all flows, this can easily make static-analysis tools unusable. If we now include control-flow branching that results from called functions (which again, might call other functions) we have to find other ways to tackle the problem.

Even more challenging is our requirement to analyze security of each commit in the repository. This means that if one file changes, we have to re-analyze all files that might have control-flow affected from this change.

Efficient, incremental analysis

Our first step to make the analysis efficient is to not use data-flow analysis for simple checks. We used our custom check framework to implement 32 checks that check, for example, whether a specific, dangerous function such as CALL 'SYSTEM' is used in the analyzed code. This way, we will report such suspicious code even if our data-flow analysis would not detect it.

Screenshot of findings based on custom checks

Second, we try to re-analyze code as few times as possible. Our data-flow analysis is technically a taint-propagation analysis. It statically (without actually executing the code) tracks tainted data (from user input) along the control flow and reports if tainted data is passed to a critical function (e.g., a system call).

To solve the global data-flow problem, our analysis uses a divide-and-conquer approach. It analyzes code on a local (method) level first, and later combines this local information to get global results. For each repository change set (e.g., a commit), the analysis traverses the code of each changed method and builds a simplified representation of the input/output data flows of the method. This representation is cached and remains unchanged until the code of the method changes again.

In the next step, we compute global data flows. To implement this, we need to know for each function (or method) call which functions (or methods) can be targets of that call. Typically, this call graph does not change much after a single commit. Therefore, if the commit changes the implementation of only one function, we don’t need to reconsider functions in completely unrelated parts of the call graph. This saves a lot of computational effort.

Statistics

Our security analysis can analyze 12.600.000 source lines of code with 270.000 methods in under 3 hours (on a developer notebook). This is the time for the initial analysis of the entire source code. A typical commit changing only some part of the system is then analyzed in seconds.

Result

As a result, we can efficiently find and display security vulnerabilities in source code. The following screenshot shows a finding generated by the data-flow analysis. It illustrates the data-flow which propagates a »tainted« value from a user input (the PARAMETERS statement) to a system call. Screenshot of tainted data flow This information enables developers to easily understand and debug the reported vulnerability.

Feel free to share your opinions by commenting on this article: I’d be very interested to know which experiences you have with security issues in ABAP code and corresponding analysis tools. If you are working on a security-sensitive project and want to try Teamscale for your development, contact us and we’ll help you get started!

References

[1] Full screenshot of the data-flow finding.