Author: Andy Knight

I'm a software engineer who specializes in test automation.

Python Testing 101: behave

Warning: If you are new to BDD, then I strongly recommend reading the BDD 101 series before trying to use the behave framework.

Overview

behave is a behavior-driven (BDD) test framework that is very similar to Cucumber, Cucumber-JVM, and SpecFlow. BDD frameworks are unique in that test cases are not written in raw programming code but rather in plain specification language that is then “glued” to code. The “behavior specs” help to define what the behavior is, and steps can be reused by multiple test cases (or “scenarios”). This is very different from more traditional frameworks like unittest and pytest. Although behave is not an official Cucumber variant, it still uses the Gherkin language (“Given-When-Then”) for behavior specification.

Test scenarios are written in Gherkin “.feature” files. Each Given, When, and Then step is “glued” to a step definition – a Python function decorated by a matching string in a step definition module. The behave framework essentially runs feature files like test scripts. Hooks (in “environment.py”) and fixtures can also insert helper logic for test execution.

behave is officially supported for Python 2, but it seems to run just fine using Python 3.

Installation

Use pip to install the behave module.

pip install behave

Project Structure

Since behave is an opinionated framework, it has a very opinionated project structure. All code must be located under a directory named “features”. Gherkin feature files and the “environment.py” file for hooks must appear under “features”, and step definition modules must appear under “features/steps”. Configuration files can store common execution settings and even override the path to the “features” directory.

Note: Step definition module names do not need to be the same as feature file names. Any step definition can be used by any feature file within the same project.

[project root directory]
|‐‐ [product code packages]
|-- features
|   |-- environment.py
|   |-- *.feature
|   `-- steps
|       `-- *_steps.py
`-- [behave.ini|.behaverc|tox.ini|setup.cfg]

Example Code

An example project named behavior-driven-python located in GitHub shows how to write tests using behave. This section will explain how the Web tests are designed.

The top layer in a behave project is the set of Gherkin feature files. Notice how the scenario below is concise, focused, meaningful, and declarative:

@web @duckduckgo
Feature: DuckDuckGo Web Browsing
  As a web surfer,
  I want to find information online,
  So I can learn new things and get tasks done.

  # The "@" annotations are tags
  # One feature can have multiple scenarios
  # The lines immediately after the feature title are just comments

  Scenario: Basic DuckDuckGo Search
    Given the DuckDuckGo home page is displayed
    When the user searches for "panda"
    Then results are shown for "panda"

Each scenario step is “glued” to a decorated Python function called a step definition. Step defs can use different types of step matchers and can also take parametrized inputs:

from behave import *
from selenium.webdriver.common.keys import Keys

DUCKDUCKGO_HOME = 'https://duckduckgo.com/'

@given('the DuckDuckGo home page is displayed')
def step_impl(context):
  context.browser.get(DUCKDUCKGO_HOME)

@when('the user searches for "{phrase}"')
def step_impl(context, phrase):
  search_input = context.browser.find_element_by_name('q')
  search_input.send_keys(phrase + Keys.RETURN)

@then('results are shown for "{phrase}"')
def step_impl(context, phrase):
  links_div = context.browser.find_element_by_id('links')
  assert len(links_div.find_elements_by_xpath('//div')) > 0
  search_input = context.browser.find_element_by_name('q')
  assert search_input.get_attribute('value') == phrase

The “environment.py” file can specify hooks to execute additional logic before and after steps, scenarios, features, and even the whole test suite. Hooks should handle automation concerns that should not be exposed through Gherkin. For example, Selenium WebDriver setup and cleanup should be handled by hooks instead of step definitions because after hooks always get run despite failures, while steps after an abortive failure will not get run.

from selenium import webdriver

def before_scenario(context, scenario):
  if 'web' in context.tags:
    context.browser = webdriver.Firefox()
    context.browser.implicitly_wait(10)

def after_scenario(context, scenario):
  if 'web' in context.tags:
    context.browser.quit()

Test Launch

behave boasts a powerful command line with many options. Below are common use case examples when running tests from the project root directory:

# Run all scenarios in the project
behave

# Run all scenarios in a specific feature file
behave features/web.feature

# Filter tests by tag
behave --tags-help
behave --tags @duckduckgo
behave --tags ~@unit
behave --tags @basket --tags @add,@remove

# Write a JUnit report (useful for Jenkins and other CI tools)
behave --junit

# Don't print skipped scenarios
behave -k

Pros and Cons

Like all BDD test frameworks, behave is opinionated. It works best for black box testing due to its behavior focus. Web testing would be a great use case because user interactions can easily be described using plain language. Reusable steps also foster a snowball effect for automation development. However, behave would not be good for unit testing or low-level integration testing – the verbosity would become more of a hindrance than a helper.

My recommendation is to use behave for black box testing if the team has bought into BDD. I would also strongly consider pytest-bdd as an alternative BDD framework because it leverages all the goodness of pytest.

5 Things I Love About SpecFlow

SpecFlow, a.k.a. “Cucumber for .NET,” is a leading BDD test automation framework for .NET. Created by Gáspár Nagy and maintained as a free, open source project on GitHub by TechTalk, SpecFlow presently has almost 3 million total NuGet downloads. I’ve used it myself at a few companies, and, I must say as an automationeer, it’s awesome! SpecFlow shares a lot in common with other Cucumber frameworks like Cucumber-JVM, but it is not a knockoff – it excels in many ways. Below are five features I love about SpecFlow.

#1: Declarative Specification by Example

SpecFlow is a behavior-driven test framework. Test cases are written as Given-When-Then scenarios in Gherkin “.feature” files. For example, imagine testing a cucumber basket:

Feature: Cucumber Basket
  As a gardener,
  I want to carry many cucumbers in a basket,
  So that I don’t drop them all.
  
  @cucumber-basket
  Scenario: Fill an empty basket with cucumbers
    Given the basket is empty
    When "10" cucumbers are added to the basket
    Then the basket is full

Notice a few things:

  • It is declarative in that steps indicate what should be done at a high level.
  • It is concise in that a full test case is only a few lines long.
  • It is meaningful in that the coverage and purpose of the test are intuitively obvious.
  • It is focused in that the scenario covers only one main behavior.

Gherkin makes it easy to specify behaviors by example. That way, everybody can understand what is happening. C# code will implement each step in lower layers. Even if your team doesn’t do the full-blown BDD process, using a BDD framework like SpecFlow is still great for test automation. Test code naturally abstracts into separate layers, and steps are reusable, too!

#2: Context is King

Safely sharing data (e.g., “context”) between steps is a big challenge in BDD test frameworks. Using static variables is a simple yet terrible solution – any class can access them, but they create collisions for parallel test runs. SpecFlow provides much better patterns for sharing context.

Context injection is SpecFlow’s simple yet powerful mechanism for inversion of control (using BoDi). Any POCOs can be injected into any step definition class, either using default values or using a specific initialization, by declaring the POCO as a step def constructor argument. Those instances will also be shared instances, meaning steps across different classes can share the same objects! For example, steps for Web tests will all need a reference to the scenario’s one WebDriver instance. The context-injected objects are also created fresh for each scenario to protect test case independence.

Another powerful context mechanism is ScenarioContext. Every scenario has a unique context: title, tags, feature, and errors. Arbitrary objects can also be stored in the context object like a Dictionary, which is a simple way to pass data between steps without constructor-level context injection. Step definition classes can access the current scenario context using the static ScenarioContext.Current variable, but a better, thread-safe pattern is to make all step def classes extend the Steps class and simply reference the ScenarioContext instance variable.

#3: Hooks for Any Occasion

Hooks are special methods that insert extra logic at critical points of execution. For example, WebDriver cleanup should happen after a Web test scenario completes, no matter the result. If the cleanup routine were put into a Then step, then it would not be executed if the scenario had a failure in a When step. Hooks are reminiscent of Aspect-Oriented Programming.

Most BDD frameworks have some sort of hooks, but SpecFlow stands out for its hook richness. Hooks can be applied before and after steps, scenario blocks, scenarios, features, and even around the whole test run. (Cucumber-JVM, by contrast, does not support global hooks.) Hooks can be selectively applied using tags, and they can be assigned an order if a project has multiple hooks of the same type. Hook methods will also be picked up from any step definition class. SpecFlow hooks are just awesome!

#4: Thorough Outline Templating

Scenario Outlines are a standard part of Gherkin syntax. They’re very useful for templating scenarios with multiple input combinations. Consider the cucumber basket again:

Feature: Cucumber Basket
  
  Scenario Outline: Add cucumbers to the basket
    Given the basket has "<initial>" cucumbers
    When "<some>" cucumbers are added to the basket
    Then the basket has "<total>" cucumbers

    Examples: Counts
      | initial | some | total |
      | 1       | 2    | 3     |
      | 5       | 3    | 8     |

All BDD frameworks can parametrize step inputs (shown in double quotes). However, SpecFlow can also parametrize the non-input parts of a step!

Feature: Cucumber Basket
  
  Scenario Outline: Use the cucumber basket
    Given the basket has "<initial>" cucumbers
    When "<some>" cucumbers are <handled-with> the basket
    Then the basket has "<total>" cucumbers

    Examples: Counts
      | initial | some | handled-with | total |
      | 1       | 2    | added to     | 3     |
      | 5       | 3    | removed from | 2     |

The step definitions for the add and remove steps are separate. The step text for the action is parametrized, even though it is not a step input:

[When(@"""(\d+)"" cucumbers are added to the basket")]
public void WhenCucumbersAreAddedToTheBasket(int count) { /* */ }

[When(@"""(\d+)"" cucumbers are removed from the basket")]
public void WhenCucumbersAreRemovedFromTheBasket(int count) { /* */ }

That’s cool!

#5: Test Thread Affinity

SpecFlow can use any unit test runner (like MsTest, NUnit, and xUnit.net), but TechTalk provides the official SpecFlow+ Runner for a licensed fee. I’m not associated with TechTalk in any way, but the SpecFlow+ Runner is worth the cost for enterprise-level projects. It has a friendly command line, a profile file to customize execution, parallel execution, and nice integrations.

The major differentiator, in my opinion, is its test thread affinity feature. When running tests in parallel, the major challenge is avoiding collisions. Test thread affinity is a simple yet powerful way to control which tests run on which threads. For example, consider testing a website with user accounts. No two tests should use the same user at the same time, for fear of collision. Scenarios can be tagged for different users, and each thread can have the affinity to run scenarios for a unique user. Some sort of parallel isolation management like test thread affinity is absolutely necessary for test automation at scale. Given that the SpecFlow+ Runner can handle up to 64 threads (according to TechTalk), massive scale-up is possible.

But Wait, There’s More!

SpecFlow is an all-around great test automation framework, whether or not your team is doing full BDD. Feel free to add comments below about other features you love (or *gasp* hate) about SpecFlow!

 

Django Admin Translations

Django is a fantastic Python Web framework, and one of its great out-of-the-box features is internationalization (or “i18n” for short). It’s pretty easy to add translations to nearly any string in a Django app, but what about translating admin site pages? Titles, names, and actions all need translations. Those admin pages are automatically generated, so how can their words be translated? This guide shows you how to do it easily.

chinese_django_home

Want an internationalized admin site like this? Follow this guide to find out how!

i18n Review

If you are new to translations in Django, definitely read the official Translation page first. In a nutshell, all strings that need translation should be passed into a translation function for Python code or a translation block for Django template code. Django management commands then generate language-specific message files, in which translators provide translations for the marked strings, and finally compile them for app use. Note that translations require the gettext tools to be installed on your machine. Django also provides some advanced logic for handling special cases like date formats and pluralization, too. It’s really that simple!

Initial Setup

A Django project needs some basic config before doing translations, which is needed for both the main site and the admin.

Enabling Internationalization

Make sure the following settings are given in settings.py:

# settings.py

LANGUAGE_CODE = 'en-us'  # or other appropriate code
USE_I18N = True
USE_L10N = True

They were probably added by default. The Booleans could be set to False to give apps with no internationalization a small performance boost, but we need them to be True so that translations happen.

Changing Locale Paths

By default, message files will be generated into locale directories for each app with strings marked for translation. You may optionally want to set LOCALE_PATHS to change the paths. For example, it may be easiest to put all message files into one directory like this, rather than splitting them out by app:

# settings.py

LOCALE_PATHS = [os.path.join(BASE_DIR, 'locale')]

This will avoid translation duplication between apps. It’s a good strategy for small projects, but be warned that it won’t scale well for larger projects.

Middleware for Automatic Translation

Django provides LocaleMiddleware to automatically translate pages using “context clues” like URL language prefixes, session values, and cookies. (The full pecking order is documented under How Django discovers language preference on the official doc page.) So, if a user accesses the site from China, then they should automatically receive Chinese translations! To use the middleware, add django.middleware.locale.LocaleMiddleware to the MIDDLEWARE setting in settings.py. Make sure it comes after SessionMiddleware and CacheMiddleware and before CommonMiddleware, if those other middlewares are used.

# settings.py

MIDDLEWARE = [
    # ...
    'django.middleware.locale.LocaleMiddleware',
    # ...
]

URL Pattern Language Prefixes

Getting automatic translations from context clues is great, but it’s nevertheless useful to have direct URLs to different page translations. The i18n_patterns function can easily add the language code as a prefix to URL patterns. It can be applied to all URLs for the site or only a subset of URLs (such as the admin site). Optionally, patterns can be set so that URLs without a language prefix will use the default language. The main caveat for using i18n_patterns is that it must be used from the root URLconf and not from included ones. The project’s root urls.py file should look like this:

# urls.py

from django.conf.urls.i18n import i18n_patterns
from django.contrib import admin
from django.urls import path

urlpatterns = i18n_patterns(
    # ...
    path('admin/', admin.site.urls),
    # ...

    # If no prefix is given, use the default language
    prefix_default_language=False
)

Limiting Language Choices

When adding language prefixes to URLs, I strongly recommend limiting the available languages. Django includes ready-made message files for several languages. A site would look bad if, for example, the “/fr/” prefix were available without any French translations. Set the available languages using LANGUAGES in settings.py:

# settings.py

from django.utils.translation import gettext_lazy as _

LANGUAGES = [
    ('en', _('English')),
    ('zh-hans', _('Simplified Chinese')),
]

Note that language codes follow the ISO 639-1 standard.

Doing the Translations

With the configurations above, translations can now be added for the main site! The steps below show how to add translations specifically for the admin. Unless there is a specific need, use lazy translation for all cases.

Out-of-the-Box Phrases

Admin site pages are automatically generated using out-of-the-box templates with lots of canned phrases for things like “login,” “save,” and “delete.” How do those get translated? Thankfully, Django already has translations for many major languages. Check out the list under django/contrib/admin/locale for available languages. Django will automatically use translations for these languages in the admin site – there’s nothing else you need to do! If you need a language that’s not available, I strongly encourage you to contribute new translations to the Django project so that everyone can share them. (I suspect that you could also try to manually create messages files in your locale directory, but I have not tested that myself.)

Custom Admin Titles

There are a few ways to set custom admin site titles. My preferred method is to set them in the root urls.py file. Wherever they are set, mark them for lazy translation. It’s easy to overlook them!

from django.contrib import admin
from django.utils.translation import gettext_lazy as _

admin.site.index_title = _('My Index Title')
admin.site.site_header = _('My Site Administration')
admin.site.site_title = _('My Site Management')

App Names

App names are another set of phrases that can be easily missed. Add a verbose_name field with a translatable string to every AppConfig class in the project. Do not simply try to translate the string given for the name field: Django will yield a runtime exception!

from django.apps import AppConfig
from django.utils.translation import gettext_lazy as _

class CustomersConfig(AppConfig):
    name = 'customers'
    verbose_name = _('Customers')

Model Names

Models are full of strings that need translations. Here are the things to look for:

  • Give each field a verbose_name value, since the identifiers cannot be translated.
  • Mark help texts, choice descriptions, and validator messages as translatable.
  • Add a Meta class with verbose_name and verbose_name_plural values.
  • Look out for any other strings that might need translations.

Here is an example model:

from django.db import models
from django.core.validators import RegexValidator
from django.utils.translation import gettext_lazy as _

class Customer(models.Model):
    name = models.CharField(
        max_length=100,
        help_text=_('First and last name.'),
        verbose_name=_('name'))
    address = models.CharField(
        max_length=100,
        verbose_name=_('address'))
    phone = models.CharField(
        max_length=10,
        validators=[RegexValidator(
            '^\d{10}$',
            _('Phone must be exactly 10 digits.'))],
        verbose_name=_('phone number'))

    class Meta:
        verbose_name = _('customer')
        verbose_name_plural = _('customers')

Run the Commands

Once all strings are marked for translation, generate the message files:

# Generate message files for a desired language
python manage.py makemessages -l zh_Hans

# After adding translations to the .po files, compile the messages
python manage.py compilemessages

Warning: The language code and the locale name may be different! For example, take Simplified Chinese: the language code is “zh-hans”, but the locale name is “zh_Hans”. Notice the underscore and the caps. Locale names often include a country code to differentiate language nuances, like American English vs. British English. Refer to django/contrib/admin/local for a list of examples.

Bonus: Admin Language Buttons

With LocaleMiddleware and i18n_patterns, pages should be automatically translated based on context or URL prefix. However, it would still be great to let the user manually switch the language from the admin interface. Clicking a button is more intuitive than fumbling with URL prefixes.

There are many ways to add language switchers to the admin site. To me, the most sensible way is to add flag icons to the title bar. Behind the scenes, each flag icon would be linked to a language-prefixed URL for the page. That way, whenever a user clicks the flag, then the same page is loaded in the desired language.

i18n_userlinks

It’s pretty easy to make something like this, but it needs a few steps.

Language Code Prefix Switcher

Since URL paths use i18n_patterns, their language codes can be trusted to be uniform. A utility function can easily add or substitute the desired language code as a URL path prefix. For example, it would convert “/admin/” and “/en/admin/” into “/zh-hans/admin/” for Simplified Chinese. This function should also validate that the path and language are correct. It can be put anywhere in the project. Below is the code:

from django.conf import settings

def switch_lang_code(path, language):

    # Get the supported language codes
    lang_codes = [c for (c, name) in settings.LANGUAGES]

    # Validate the inputs
    if path == '':
        raise Exception('URL path for language switch is empty')
    elif path[0] != '/':
        raise Exception('URL path for language switch does not start with "/"')
    elif language not in lang_codes:
        raise Exception('%s is not a supported language code' % language)

    # Split the parts of the path
    parts = path.split('/')

    # Add or substitute the new language prefix
    if parts[1] in lang_codes:
        parts[1] = language
    else:
        parts[0] = "/" + language

    # Return the full new path
    return '/'.join(parts)

Prefix Switch Template Filter

Ultimately, this function must be called by Django templates in order to provide links to language-specific pages. Thus, we need a custom template filter. The filter implementation module can be put into any app, but it must be in a sub-package named templatetags – that’s how Django knows to look for custom template tags and filters. The new filters will be easy to write because we already have the switch_lang_code function. (Separating the logic to handle the prefix from the filter itself makes both more testable and reusable.) The code is below:

# [app]/templatetags/i18n_switcher.py

from django import template
from django.template.defaultfilters import stringfilter

register = template.Library()

@register.filter
@stringfilter
def switch_i18n_prefix(path, language):
    """takes in a string path"""
    return switch_lang_code(path, language)

@register.filter
def switch_i18n(request, language):
    """takes in a request object and gets the path from it"""
    return switch_lang_code(request.get_full_path(), language)

Admin Template Override

Finally, admin templates must be overridden so that we can add new elements to the admin pages. Any admin template can be overridden by creating new templates of the same name under [project-root]/templates/admin. Parent content will be used unless explicitly overridden within the child template file. Since we want to change the title bar, create a new template file for base_site.html with the following contents:

The static CSS file named css/custom_admin.css should have the following contents:

Notice that the whole userlinks block had to be rewritten to fit the flag into place. The static image files for the flags are simply free flag emojis. They are hyperlinked to the appropriate language URL for the page: the switch_i18n filter is applied to the active request object to get the desired language-prefixed path. (Note: In my example code, I removed the “View Site” link because my site didn’t need it.)

Completed View

The admin site should now look like this:

This slideshow requires JavaScript.

The files in my project needed for the admin language buttons are organized like this (without showing other files in the project):

[root]
|- i18n_switcher
|  |- templatetags
|  |  |- __init__.py
|  |  `- i18n_switcher.py
|  |- __init__.py
|  `- apps.py
|- locale
|  `- zh_Hans
|     `- LC_MESSAGES
|        |- django.mo
|        `- django.po
|- static
|  |- css
|  |  `- custom_admin.css
|  `- images
|     |- flag-china-16.png
|     `- flag-usa-16.png
`- templates
   `- admin
      `- base_site.html

As mentioned before, flag icons in the title bar are simply one way to provide easy links to translated pages. It works well when there are only a few language choices available. A different view would be better for more languages, like a dropdown, a second line in the title bar, or even a page footer.

With a bit more polishing, this would also make a nifty little Django app package that others could use for their projects. Maybe I’ll get to that someday.

Pipenv: Python Packagement for Champions!

While recently deploying a new Python Django app to Heroku, I noticed the documentation mentioned a tool I hadn’t known before: pipenv. I thought to myself, “Great, now I need to learn a new tool. What was so bad about pip and virtualenv?” So, I did my research, and BOOM! Yes. Mind blown. Life changed. This.

What It Is

Pipenv is the Python packaging and environments tool for champions.

  • It unites pip, Pipfile, and virtualenv into a sophisticated workflow with simple commands.
  • It automatically creates virtual environments for projects.
  • It automatically updates package dependencies (and their dependencies).
  • It locks versions for deterministic builds.

I strongly recommend using pipenv for all new Python projects. Python.org officially recommends it, too.

What It’s About

Packages and environments (“packagement”) are essential to Python development. Typically, Pythoneers create a virtual environment for each project and install dependent packages into it locally using pip. They then “freeze” the dependencies into a requirements.txt file so that others can easily recreate the environment. Virtual environments thus enable different projects to use different package versions without global conflict.

Unfortunately, this traditional workflow has some problems:

  • It uses multiple tools instead of one and requires many commands.
  • Different projects can do the workflow differently, which can be confusing.
  • The requirements.txt file must be manually generated and can easily fall out of date.
  • Dev-only dependencies are a hassle to separate.
  • Uninstalling packages will not remove sub-packages.
  • Dependencies with version ranges instead of fixed versions cause nondeterministic builds.

Pipenv solves these problems by combining pipPipfile, and virtualenv into a standard workflow that automatically handles and locks package updates.

How to Use It

See how simple it is to use pipenv with a Python project:

# Install pipenv
pip install pipenv

# Create a new project directory
mkdir panda_project
cd panda_project
echo "print('hello')" > main.py

# Init pipenv:
# Creates a virtual environment
# Then creates Pipfile and Pipfile.lock files
pipenv install

# Install a package:
# Updates the Pipfiles
pipenv install requests

# Install a dev-only package:
# Updates the Pipfiles
pipenv install --dev pytest

# Run commands in the environment
pipenv run python --version
pipenv run python main.py

More Info

There’s no need for me to repeat what other people have already said:

 

 

giphy

Me, after using pipenv for the first time.

Quality Metrics 101: Product Quality

New to the series? Start from the beginning!

Product quality metrics measure the excellence of a product and its features. They measure the “goodness” inherent in the product, apart from how the product was developed. High-quality processes and tests contribute to, but do not alone guarantee, high-quality products. That’s why quality must be built into the product from the start and checked throughout all phases of development. Below are metrics for assuring quality in the delivered products.

0024977_knex-education-energy-motion-aeronautics-set

Functionality

Quality Aspect Does the product work correctly?
Desired State True – Features either work, or they don’t.
Metrics Test Failure Rate – The whole purpose of functional testing is to determine which features work and which don’t. Assuming test quality is high, the test failure rate is the single best indicator of product functionality. Higher failure rates mean more broken features. Teams should target low-to-zero test failures. It may be useful to keep a failure history for each test. For large products, it may also be useful to break down failure rates by feature area.

It is imperative to recognize, however, that the test failure rate is meaningful only if test quality is high – meaning that tests have good coverage and reliability. Poor-quality tests will give untrustworthy results. For example, weak coverage could mean that failure rate is low because functionality is not truly exercised, and poor reliability could mean that failure rate is high because tests always crash. Be sure to back up any reporting on test failure rate with assurance that test quality is high (using test quality metrics).

shutterstock_362478776

Stability

Quality Aspect Does the product work reliably?
Desired State High – Product functionality should be consistently good and available.
Metrics Build Failure Rate – The build failure rate is the proportion of builds that have failed for whatever reason over a given period of time. While process metrics focus on response times to fix broken builds, the build failure rate itself indicates the health of the product while it is being developed. It does not track how badly a build failed like test failure rate does, but instead it impartially tracks ultimate success or failure. Make sure to limit the history of builds included in the calculation to keep it relevant (such as the last 30 days or so). Occasional build failures are acceptable as long as they are fixed quickly. High build failure rates indicate product instability, which could be due to design flaws, weak pre-check-in testing, tricky bugs, or even pipeline faults.

Uptime – Uptime refers to the total time a system is usable. For example, consider a website that must go down for a one-hour service window every week – its uptime would be 167/168 = 99.4%. Not all downtime is planned, however. A bad deployment during maintenance could knock that website offline for an additional 3 hours – dragging uptime down to 97.6% for the week. This may not seem bad at first, but it’s quite terrible when considering that (a) lost time is lost money and (b) the goal of Six Sigma is 99.99966%. A product should have near-perfect availability. System monitoring tools can easily measure uptime. Low uptime indicates either poor design or lack of failover redundancy.

ekonomi-depar

Performance

Quality Aspect Does the product work optimally?
Desired State Optimal – Performance should be at its best in all areas.
Metrics There are four classic software performance metrics. They may be applied in various ways to aspects of product behavior. Ultimately, software products should have a minimal impact on the system while providing a maximal capacity for work.

Processor Usage – Processor cycles should not be needlessly wasted. Make sure algorithms are efficient in terms of computational complexity (big O) and implementation details.

Memory Usage – Watch out for both memory bloat (when features take up a lot of memory unnecessarily) and memory leaks (when memory is not freed up after it is no longer needed.)

Response Time – Response time, or latency, measures the turnaround time from when an action is taken to when the actor receives feedback that the action is completed. Common examples of response time are web page loading, REST API call responses, and database queries. Response time should be as short as possible.

Throughput – Throughput measures how much load a system can handle. It could refer to data I/O bandwidth, transactions per time unit, number of concurrent users, etc. Typically, higher stress on a system will cause other performance metrics to degrade. The “sweet spot” to find is the maximum throughput value that does not unacceptably impact other performance aspects.

01_ma27836-edit

 

Complexity

Quality Aspect Is the software code unnecessarily complicated?
Desired State Minimal – Simple is better than complex. Complex is better than complicated. (See The Zen of Python.)
Metrics There are a number of code metrics that indicate complexity in various ways.

Lines of Code – One of the most rudimentary metrics is to count the lines of code. All things equal, line count indicates the magnitude of the software product, with the assumption that fewer lines will be easier to maintain. Any modern IDE (or, worst case, shell scripting) can yield line counts. However, all things are not equal, and line count alone does not indicate quality or efficiency.

Cyclomatic ComplexityCyclomatic complexity measures the number of different execution paths the code can take. It is more meaningful than counting sheer lines of code because it indicates the magnitude of testing needed for full coverage. Lower values are better. Cyclomatic complexity is a popular code metric, and many modern analysis tools can measure it.

Depth of Inheritance – For object-oriented languages, the depth of inheritance measures the maximum length of a class inheritance tree from child class to its ultimate root. For example, in the class inheritance tree of Tiger > Cat > Animal > Object, Tiger would have an inheritance depth of 3. Lower values are desirable because they make classes easier to understand.

There are countless other code metrics available. For example, Microsoft Visual Studio calculates the metrics above plus a maintainability index and class coupling. Halstead metrics are another way to measure complexity.

customer_exp

Satisfaction

Quality Aspect Does the product satisfy the end user?
Desired State High – The product should meet the end user’s needs, and the end user should like using it.
Metrics Customer satisfaction is inherently subjective, so trying to measure it is difficult. Ultimately, the end users must find compelling value in the product over other alternatives, or else they won’t use it or buy it. There are many ways to attempt to gauge customer satisfaction: surveys, interviews, A/B testing, etc. Statistics and psychology also play a part. Check out articles here, here, and here to get some ideas.

Quality Metrics 101: Process Quality

New to the series? Start from the beginning!

Process quality metrics make sure that software development practices build good, high-quality features. Healthy software processes identify and resolve issues as early as possible because later bug discovery means higher cost to fix. Quality starts at inception, when features are first brainstormed, and it carries through design, implementation, and testing. Every step in the development process should have quality checkpoints: acceptance criteria for planning, reviews for design and implementation, and reports for testing. Process quality metrics primarily focus on delivery speed or the effectiveness of feedback loops to make sure a team is responding appropriately to change.

Note: Standard software development methodologies often come with canned metrics. For example, Agile Scrum focuses heavily on velocity for determining a team’s capacity for work, while Agile Kanban focuses heavily on lead time and cycle time for measuring how fast work gets done. This article will not cover methodology-specific metrics – please refer to external resources to learn more about them. Instead, this article will cover generic aspects of process quality.

delivery_truck_accidents

Delivery Speed

Quality Aspect How fast are new features with high quality delivered to the end user?
Desired State ASAP – Deliver them fast without compromising quality.
Metrics People are impatient – they always want things as soon as possible. Fast delivery speed is thus crucial for businesses to meet client expectations and respond quickly to change. However, delivery speed is not the sole metric for success: it must be counterbalanced with safety measures. Delivery speed could be absolutely minimized by committing changes directly to production, but that’s a terrible practice because the damage risk is too high. The best strategy is to pursue the fastest speed without sacrificing too much coverage.

Time to Production – Time to production focuses on the time it takes for a developer’s checked-in code to become useful to end users. It’s a decent way to judge from a business perspective how quickly new stuff gets out the door. Measure the total time for each code check-in from when it is first committed to when it is deployed to production. Source control logs and deployment histories can be pieced together to measure the total time. It may be beneficial to split check-ins by feature area and to review distributions rather than averages. Short, consistent times are desirable. Long times reveal delays in testing, fixing, and deploying changes.

Pipeline Speed – Pipeline speed is a DevOps-y metric. Measure the total start-to-end time from triggering the build pipeline to the final deployment, and measure the time taken by each stage. This will give insights into bottlenecks, such as: system resource exhaustion, network delays, being stuck in job queues, tests that are too long, etc. Knowing each stage will indicate where the greatest optimizations can occur. For example, parallel test execution can significantly reduce total pipeline time. Use pipeline speed metrics to find efficiencies, not to justify cutting vital stages. Most modern continuous integration systems should provide time metrics.

Test Coverage per Time Period – There is always a tradeoff between test coverage and delivery speed. Assuming tests have optimally efficient execution times, higher coverage means slower delivery. Whenever time periods are fixed (such as CI pipeline limits or release deadlines), the best strategy is to maximize test coverage during the available time. For this purpose, coverage should be heuristically scored in terms of feature coverage priority (or the importance of the behaviors under test), not so much in terms of numerical code coverage. Then, for each test, divide the coverage score by the execution time. Sort tests by this ratio, and select the tests with the greatest scores until the total test execution time reaches the time limit. This approach guarantees that maximal test coverage will be achieved in the given period. It may also be advantageous to determine a threshold score for minimal coverage – if the maximum score for a given time period is below the minimal coverage threshold, then the time period should be increased. This metric is compelling if, for example, a CI pipeline needs more time for tests but managers are hesitant to slow down delivery.

Note: The metrics here cover speed after code is checked in, focusing on operational excellence. Metrics covering speed before code is checked in are important but are typically already covered by standard processes (like Scrum’s velocity). There are several ways to measure speed before code check-in: development time, backlog age, story completion rate, etc. Slow times before check-in indicate that a team is overloaded with work, lacks focus on priorities, or is being disrupted too frequently. However, one major caution for these metrics is that they are difficult to accurately measure, and they presume artifacts are logged precisely at event times. For example, if a story ticket is not created until a week after a new feature was first inspired, then the actual times measured will be inaccurate.

health-screening-2

Feedback Notification

Quality Aspect How quickly does a team identify problems?
Desired State Fast – Fast feedback helps teams resolve issues quickly before they become more costly.
Metrics Software development is the poster child for Murphy’s Law: anything that can go wrong will. Problems will happen. Metrics targeting perfection (such as 100% pass rates or 0-bug counts) are foolishly impossible and hopelessly destructive. Instead, metrics should gauge feedback loops – how well a team handles problems as they arise. Feedback has two parts: (1) notification time to discover and report problems, and (2) response time to fix problems. Ultimately, the sum should be minimal, but separating the parts identifies bottlenecks. This section covers notification.

Code Review Effectiveness – Code reviews are often the second line of defense against bugs (the first line being the author themselves). They grant an opportunity for other experts to inspect code for problems before fully committing changes. However, measuring the effectiveness of code reviews can be tricky. A few metrics to consider are:

  • Percentage of code check-ins that undergo review, if the team notoriously skips reviews
  • Average review turnaround time, if reviews are ignored
  • Code change size in terms of line number or another similar unit, if reviews are too large for teams to handle effectively
  • Issues caught, whenever a review successfully identifies and resolves an issue

Issue Discovery Time – The sooner issues are discovered, the less costly they are to resolve. “Issues” typically mean defects in the product (e.g., “bugs”), but they could include problems with the environment, deployment, or tests. The simplest form of issue discovery time is the measurement from when a pipeline starts to the time the issue is discovered. More advanced measurements can track time back to the root cause, such as when code containing a bug was committed, but these may be difficult to gather or may be less accurate. Issue types should be analyzed as separate distributions. Look specifically for blocking issues that appear late in the pipeline, such as critical services being down, and add checks early in the pipeline to discover them ASAP.

Bugs per Phase – Raw bug counts, like test counts, are not helpful beyond soundbites, but the proportions of bug counts per phase are useful for determining test effectiveness. A well-engineered pipeline should have meaningful phases (or “stages” or “steps”) with feedback after each one. A typical pipeline could have phases for build, unit tests, integration tests, end-to-end tests, and production deployment. Ideally, bugs should be caught in the shortest time, at the lowest level, and in the earliest phase. For example, if the majority of bugs are caught by end-to-end tests or (gasp!) in production, then the lower-level tests might need stronger coverage.

1200px-metallic_shield_bug444

Feedback Response

Quality Aspect How quickly does a team resolve problems once they are found?
Desired State Fast – Again, resolve issues quickly before they become more costly.
Metrics Time to Fix a Broken Build – Build health is vital for successful software development, especially in continuous integration. After a build is broken, it must be fixed ASAP so that it does not block progress. “Fixing” a build means that the pipeline can run to completion with an acceptable test passing rate. Fixing a build may mean:

  • Fixing a bug in the product
  • Fixing a problem in the environment, deployment, or tests
  • Reverting a code check-in that caused a bug
  • Updating tests to somehow flag the failure

Subverting safety checks (like removing tests or skipping phases) is not acceptable because it doesn’t truly fix the build’s underlying problems.

Measure the time it takes from when a pipeline reports a broken build to when the pipeline produces the first subsequent working build. The distribution of these times will reveal the team’s dedication to build stability. Clearly, shorter times are better. When broken builds are caused by code changes, the author should favor reverting check-ins over attempting fixes for faster recovery speed.

Time to Resolve Bugs – While the time to fix a broken build focuses on immediate product stability, the time to resolve bugs focuses instead on ultimate correctness. Just because a build is fixed does not mean a bug is necessarily fixed – tests may mark it as an acceptable failure, or the code containing the bug may simply be reverted. The time to resolve a bug is the total time from when the bug was first discovered to when it is fixed or otherwise closed (such as being marked as invalid or won’t fix). Bug tracker tools should easily provide this data. Bugs should be separated by severity when analyzing resolution times. Bugs should be resolved quickly, with priority given to higher-severity bugs. Resolution time metrics indicate if bugs are addressed adequately and in the proper order. Long resolution times may indicate overloaded teams, tolerance of low quality, or the need for redesign/refactoring.