New to the series? Start from the beginning!
Product quality metrics measure the excellence of a product and its features. They measure the “goodness” inherent in the product, apart from how the product was developed. High-quality processes and tests contribute to, but do not alone guarantee, high-quality products. That’s why quality must be built into the product from the start and checked throughout all phases of development. Below are metrics for assuring quality in the delivered products.
|Quality Aspect||Does the product work correctly?|
|Desired State||True – Features either work, or they don’t.|
|Metrics||Test Failure Rate – The whole purpose of functional testing is to determine which features work and which don’t. Assuming test quality is high, the test failure rate is the single best indicator of product functionality. Higher failure rates mean more broken features. Teams should target low-to-zero test failures. It may be useful to keep a failure history for each test. For large products, it may also be useful to break down failure rates by feature area.
It is imperative to recognize, however, that the test failure rate is meaningful only if test quality is high – meaning that tests have good coverage and reliability. Poor-quality tests will give untrustworthy results. For example, weak coverage could mean that failure rate is low because functionality is not truly exercised, and poor reliability could mean that failure rate is high because tests always crash. Be sure to back up any reporting on test failure rate with assurance that test quality is high (using test quality metrics).
|Quality Aspect||Does the product work reliably?|
|Desired State||High – Product functionality should be consistently good and available.|
|Metrics||Build Failure Rate – The build failure rate is the proportion of builds that have failed for whatever reason over a given period of time. While process metrics focus on response times to fix broken builds, the build failure rate itself indicates the health of the product while it is being developed. It does not track how badly a build failed like test failure rate does, but instead it impartially tracks ultimate success or failure. Make sure to limit the history of builds included in the calculation to keep it relevant (such as the last 30 days or so). Occasional build failures are acceptable as long as they are fixed quickly. High build failure rates indicate product instability, which could be due to design flaws, weak pre-check-in testing, tricky bugs, or even pipeline faults.
Uptime – Uptime refers to the total time a system is usable. For example, consider a website that must go down for a one-hour service window every week – its uptime would be 167/168 = 99.4%. Not all downtime is planned, however. A bad deployment during maintenance could knock that website offline for an additional 3 hours – dragging uptime down to 97.6% for the week. This may not seem bad at first, but it’s quite terrible when considering that (a) lost time is lost money and (b) the goal of Six Sigma is 99.99966%. A product should have near-perfect availability. System monitoring tools can easily measure uptime. Low uptime indicates either poor design or lack of failover redundancy.
|Quality Aspect||Does the product work optimally?|
|Desired State||Optimal – Performance should be at its best in all areas.|
|Metrics||There are four classic software performance metrics. They may be applied in various ways to aspects of product behavior. Ultimately, software products should have a minimal impact on the system while providing a maximal capacity for work.
Processor Usage – Processor cycles should not be needlessly wasted. Make sure algorithms are efficient in terms of computational complexity (big O) and implementation details.
Memory Usage – Watch out for both memory bloat (when features take up a lot of memory unnecessarily) and memory leaks (when memory is not freed up after it is no longer needed.)
Response Time – Response time, or latency, measures the turnaround time from when an action is taken to when the actor receives feedback that the action is completed. Common examples of response time are web page loading, REST API call responses, and database queries. Response time should be as short as possible.
Throughput – Throughput measures how much load a system can handle. It could refer to data I/O bandwidth, transactions per time unit, number of concurrent users, etc. Typically, higher stress on a system will cause other performance metrics to degrade. The “sweet spot” to find is the maximum throughput value that does not unacceptably impact other performance aspects.
|Quality Aspect||Is the software code unnecessarily complicated?|
|Desired State||Minimal – Simple is better than complex. Complex is better than complicated. (See The Zen of Python.)|
|Metrics||There are a number of code metrics that indicate complexity in various ways.
Lines of Code – One of the most rudimentary metrics is to count the lines of code. All things equal, line count indicates the magnitude of the software product, with the assumption that fewer lines will be easier to maintain. Any modern IDE (or, worst case, shell scripting) can yield line counts. However, all things are not equal, and line count alone does not indicate quality or efficiency.
Cyclomatic Complexity – Cyclomatic complexity measures the number of different execution paths the code can take. It is more meaningful than counting sheer lines of code because it indicates the magnitude of testing needed for full coverage. Lower values are better. Cyclomatic complexity is a popular code metric, and many modern analysis tools can measure it.
Depth of Inheritance – For object-oriented languages, the depth of inheritance measures the maximum length of a class inheritance tree from child class to its ultimate root. For example, in the class inheritance tree of Tiger > Cat > Animal > Object, Tiger would have an inheritance depth of 3. Lower values are desirable because they make classes easier to understand.
There are countless other code metrics available. For example, Microsoft Visual Studio calculates the metrics above plus a maintainability index and class coupling. Halstead metrics are another way to measure complexity.
|Quality Aspect||Does the product satisfy the end user?|
|Desired State||High – The product should meet the end user’s needs, and the end user should like using it.|
|Metrics||Customer satisfaction is inherently subjective, so trying to measure it is difficult. Ultimately, the end users must find compelling value in the product over other alternatives, or else they won’t use it or buy it. There are many ways to attempt to gauge customer satisfaction: surveys, interviews, A/B testing, etc. Statistics and psychology also play a part. Check out articles here, here, and here to get some ideas.|
Interesting approach! I’ve been thinking a lot lately about how one might define and achieve software quality. What I’ve come up with is the acronym ACRUMEN (try saying “acronym ACRUMEN” ten times fast!), for the six key aspects. What you call Functionality, I call Correctness; your Stability is my Robustness; Performance is an aspect of Efficiency; Complexity (or rather, the lack thereof) is an aspect of Maintainability; and Satisfaction is an aspect of Appropriateness, with perhaps some Usability thrown in. More details (including what the N stands for) at https://bit.ly/jop-acrumen-talk . 🙂