Is there any interest in creating its own metrics in a code analysis tool? What benefits can we expect? What disadvantages? What obstacles are we going to meet?
It always strikes me to see that these issues are mostly ignored when choosing a tool. Most of the times, the question is to know what is the ease or the level of difficulty to customize the software and create his own rules rather than wondering if it is really needed.
The objective of this series of articles is therefore to clarify these questions and help you to choose the right criteria if you want to acquire a code analysis tool. If you are a quality consultant and must assist a client for such a choice, you can discuss these issues with him to help him in his selection.
After seeing the simplest metrics that are easier to create, we will discuss in this article much more complex metrics and even some standards or best practices that I already found with some customers and which are impossible to implement.
Most metrics that are really useful are complex because they are based on syntax or programming practices complex to identify. For example:
- All conditional statements (as ‘IF … THEN … ELSE … ENDIF’) or loop statements (as ‘While … EndWhile’ ou ‘Loop … EndLoop’) are structured with multiple blocks : ‘IF’ or ‘While’ followed by several lines of code (sometimes on more than one page) before the final ‘ENDIF’ or ‘EndWhile’. Trying to identify these statements with a unique RegExp (see the previous post about regular expressions) can become rapidly a real headache. This is a fairly common source of falses-positive for a lot of code analysis tools.
- Some instructions can be memory intensive or impact response time, without presenting a risk for performance … unless they are called in a loop. Here again, try to identify the presence of such a syntax within a loop statement is very complex and requires a good knowledge of regular expressions.
- Most best practices of readability and understanding of the code, thus maintainability best practices, are based on avoiding nested instructions: nested IF, loops within loops, all this usually on several pages of code. The consequence is an higher risk of error and higher maintenance costs because any modification of this code requires a higher programming effort. Again, these rules are complex or very complex to implement with regular expressions.
- Other good programming practices depend on the type of objects. The best example of such a rule, present in all languages and technologies, is the one that prohibits to access the database from the presentation layer. Probably the first and most important rule about security. To implement this rule in a code analysis tool, you must identify all the different instructions potentially accessing the database and check that they are used in a screen (presentation layer).
It becomes very difficult and even impossible to design and implement such metrics using regular expressions. It usually means that the software is about to analyze the syntactic tree of all the instructions in a program (or a class), which is not the case with all tools. And it will take quite an advanced understanding as well as specific tools and languages to browse this ‘Abstract Syntax Tree’ and set your metric. Not for newbies.
The more complex is a metric, the more it is difficult to implement it. But some custom good practices are simply impossible to automate into a metric.
One of the first questions I ask to a customer who wants to implement its own rules is if he has a list of rules or any document describing its ‘best practices’. This is not always or even often the case, but I’m glad when he answers yes because it means that his question about the ‘ease to implement custom metric’ is justified.
Reading such a document will however reveal that a significant proportion of these rules is simply impossible to implement. This proportion is higher or lower depending on the technology (we will discuss this in the next post on this topic). Some examples:
- “A class, a method, a function, a procedure, a paragraph (COBOL), etc.. should be documented” with an header respecting an in-house template. Simply try to define a RegExp that checks if a comment is present at the top of class, method, function, etc. Now construct a RegExp which verifies that this comment respects a specific format made of several lines with at least the author, the date, a description of the object, etc. Good luck.
- “The header of each program must include a description of the implemented functionality.” This rule is a more advanced version of the previous one: not only do you need to verify that the comment header meets some form but furthermore, that it describes the functional purpose of this program. Impossible.
- “The error messages should be translated into Spanish, Catalan and Basque.”. Again, impossible to verify this rule, very frequent in Spain.
- Identification of dead code. I like when a software vendor claims that his tool is able to identify dead code. Something that a programmer (somewhat advanced) can do to comment a whole block of code:
… dead code
Of course, it is quite easy to define a RegExp that will find all occurrences of ‘IF 1 = 0’. How much RegExp to identify occurrences of ‘IF A = B’ or ‘IF 1 = 2’? Forget it.
The next time a salesman says that his soft can identify dead code, give him this example (and try not to smile).
And I do not even talk about general and imprecise rules as:
- “Syntax control: correct critical errors and justify exceptions to non-critical rules.” Handle that.
- “It is forbidden to develop a function if there is already a function performing the same treatment.” Oh yeah of course, it makes sense but thanks to remember.
- “Check that the called program is existing.” That would be preferable. Note that in fact, this rule can be implemented easily by checking the return code of the call … when it exists.
I just want to show how certain rules are completely impossible to implement and they are more numerous than you think.
Seriously, if your client has a document with such rules, don’t laugh at him because at least, the existence of such a document proves that he cares about the quality of its applications. Simply, most of these books are rules for manual checks of code, and you can not automate all manual rules.
When asked if a code analysis tool can create custom rules, the first reaction is to expose what this tool can do, and generally to program a metric based on a simple syntax, such as those we saw in the he previous post. Your customer will be satisfied with that, but this is not the correct answer. What I dos is:
- I ask him if he has defined some custom rules and if these rules are formalized through a document.
- If this is not the case: I remind him that 100% of people who wish to acquire a code analysis tool asks this same question, but 90% do not use this feature. And 90% of the remaining 10% will abandon the maintenance of custom metrics after 2 years because it is too profitable.
- If the answer is some indistinct mumbling like “we are thinking to it” or “possibly”, etc. – I consider it as a negative response (previous point) or I turn to the following question:
Custom rules ?For which technologies ?
This will be the subject of the next post.