testing

JavaScript Testing with Jasmine

Table of Contents

  1. Introduction
  2. Setup and Installation
  3. Project Structure
  4. Unit Tests for Functions
  5. Unit Tests for Classes
  6. Unit Tests with Mocks
  7. Integration Tests for REST APIs
  8. End-to-End Tests for Web UIs
  9. Basic Test Execution
  10. Advanced Test Execution with Karma
  11. Angular Testing

Introduction

Jasmine is one of the most popular JavaScript test frameworks available. Its tests are intuitively recognizable by their describe/it format. Jasmine is inspired by Behavior-Driven Development and comes with many basic features out-of-the-box. While Jasmine is renowned for its Node.js support, it also supports Python and Ruby. Jasmine also works with JavaScript-based languages like TypeScript and CoffeeScript.

This guide shows how to write tests in JavaScript on Node.js using Jasmine. It uses the jasmine-node-js-example project (hosted on GitHub). Content includes:

  • Basic white-box unit tests
  • REST API integration tests with frisby
  • Web UI end-to-end tests with Protractor
  • Spying with sinon
  • Monkeypatching with rewire
  • Handling config data with JSON files
  • Advanced execution features with Karma
  • Special considerations for Angular projects

The Jasmine API Reference is also indispensable when writing tests.

Setup and Installation

The official Jasmine Node.js Setup Guide explains how to set up and install Jasmine. Jasmine tests may be added to an existing project or to an entirely new project. As a prerequisite, Node.js must already be installed. Use the following commands to set things up.

# Initialize a new project (if necessary)
# This will create the package.json file
$ mkdir [project-name]
$ cd [project-name]
$ npm init

# Install Jasmine locally for the project and globally for the CLI
$ npm install jasmine
$ npm install -g jasmine

# Create a spec directory with configuration file for Jasmine
$ jasmine init

# Optional: Install official Jasmine examples
# Do this only for self-education in a separate project
$ jasmine examples

The code used by this guide is available in GitHub at jasmine-node-js-example. Feel free to clone this repository to try things out yourself!

Recommended editors and IDEs include Visual Studio Code with the Jasmine Snippets extensions, Atom, and JetBrains WebStorm.

Project Structure

Jasmine does not require the project to have a specific directory layout, but it does use a configuration file to specify where to find tests. The default, conventional project structure created by “jasmine init” puts all Jasmine code into a “spec” directory, which contains “*spec.js” files for tests, helpers that run before specs, and a support directory for config. The JASMINE_CONFIG_PATH environment variable can be set to change the config file used. (The default config file is spec/support/jasmine.json.)

[project-name]
|-- [product source code]
|-- spec
|   |-- [spec sub-directory]
|   |   `-- *spec.js
|   |-- helpers
|   |   `-- [helper sub-directory]
|   `-- support
|       `-- jasmine.json
`-- package.json

This structure may be changed using the “spec_dir”, “spec_files”, and “helpers” properties in the config file. For example, it may be useful to change the structure to include more than one level of directories to the hierarchy. However, it is typically best to leave the conventional directory layout in place. The default config values as of Jasmine 2.8 are below.

{
  "spec_dir": "spec",
  "spec_files": [
    "**/*[sS]pec.js"
  ],
  "helpers": [
    "helpers/**/*.js"
  ],
  "stopSpecOnExpectationFailure": false,
  "random": false
}

It is also a best practice to separate tests between different levels of the Testing Pyramid. The example project has spec subdirectories for unit, integration, and end-to-end tests. Directory-level organization makes it easy to filter tests by level when executed.

Unit Tests for Functions

The most basic unit of code to be tested in JavaScript is a function. The “lib/calculator.functions.js” module contains some basic math functions for easy testing.

// --------------------------------------------------
// lib/calculator.functions.js
// --------------------------------------------------

// Calculator Functions

function add(a, b) {
    return a + b;
}

function subtract(a, b) {
    return a - b;
}

function multiply(a, b) {
    return a * b;
}

function divide(a, b) {
    let value = a * 1.0 / b;
    if (!isFinite(value))
        throw new RangeError('Divide-by-zero');
    else
        return value;
}

function maximum(a, b) {
    return (a >= b) ? a : b;
}

function minimum(a, b) {
    return (a <= b) ? a : b;
}

// Module Exports

module.exports = {
    add: add,
    subtract: subtract,
    multiply: multiply,
    divide: divide,
    maximum: maximum,
    minimum: minimum,
}

Its tests are in “spec/unit/calculator.function.spec.js”. Below is a snippet showing simple tests for the “add” function. A describe block groups a “suite” of specs together. Each it block is an individual spec (or test). Titles for specs are often written as what the spec should do. Describe blocks may be nested for hierarchical grouping, but it blocks (being bottom-level) may not. Assertions are made using Jasmine’s fluent-like expect and matcher methods. Since the functions are stateless, no setup or cleanup is needed. Tests for other math functions are similar.

// --------------------------------------------------
// spec/unit/calculator.function.spec.js
// --------------------------------------------------

const calc = require('../../lib/calculator.functions');

describe("Calculator Functions", function() {

  describe("add", function() {

    it("should add two positive numbers", function() {
      let value = calc.add(3, 2);
      expect(value).toBe(5);
    });

    it("should add a positive and a negative number", function() {
      let value = calc.add(3, -2);
      expect(value).toBe(1);
    });

    it("should give the same value when adding zero", function() {
      let value = calc.add(3, 0);
      expect(value).toBe(3);
    });

  });

});

The divide-by-zero test for the “divide” function is special because it must verify that an exception is thrown. The divide call is wrapped in a function so that it may be passed into the expect call.

  describe("divide", function() {

    // ...

    it("should throw an exception when dividing by zero", function() {
      let divideByZero = function() { calc.divide(3, 0); };
      expect(divideByZero).toThrowError(RangeError, 'Divide-by-zero');
    });

    // ...

  });

The “maximum” and “minimum” functions have parametrized tests using the Array class’s forEach method. This is a nifty trick for hitting multiple input sets without duplicating code or combining specs. Note that the spec titles are also parametrized. Tests for “maximum” are shown below.

  describe("maximum", function() {

    [
      [1, 2, 2],
      [2, 1, 2],
      [2, 2, 2],
    ].forEach(([a, b, expected]) => {
      it(`should return ${expected} when given ${a} and ${b}`, () => {
        let value = calc.maximum(a, b);
        expect(value).toBe(expected);
      });
    });

  });

Unit Tests for Classes

Jasmine can also test classes. When testing classes, setup and cleanup routines become more helpful. The Calculator class in the “lib/calculator.class.js” module calls the math functions and caches the last answer.

// --------------------------------------------------
// lib/calculator.class.js
// --------------------------------------------------

// Imports

const calcFunc = require('./calculator.functions');

// Calculator Class

class Calculator {

  constructor() {
      this.last_answer = 0;
  }

  do_math(a, b, func) {
      return (this.last_answer = func(a, b));
  }

  add(a, b) {
      return this.do_math(a, b, calcFunc.add);
  }

  subtract(a, b) {
      return this.do_math(a, b, calcFunc.subtract);
  }

  multiply(a, b) {
      return this.do_math(a, b, calcFunc.multiply);
  }

  divide(a, b) {
      return this.do_math(a, b, calcFunc.divide);
  }

  maximum(a, b) {
      return this.do_math(a, b, calcFunc.maximum);
  }

  minimum(a, b) {
      return this.do_math(a, b, calcFunc.minimum);
  }

}

// Module Exports

module.exports = {
  Calculator: Calculator,
}

The Jasmine specs in “spec/unit/calculator.class.spec.js” are very similar but now call the beforeEach method to construct the Calculator object before each scenario. (Jasmine also has methods for afterEach, beforeAll, and afterAll.) The verifyAnswer helper function also makes assertions easier. The addition tests are shown below.

// --------------------------------------------------
// spec/unit/calculator.class.spec.js
// --------------------------------------------------

const calc = require('../../lib/calculator.class');

describe("Calculator Class", function() {

  let calculator;

  beforeEach(function() {
    calculator = new calc.Calculator();
  });

  function verifyAnswer(actual, expected) {
    expect(actual).toBe(expected);
    expect(calculator.last_answer).toBe(expected);
  }

  describe("add", function() {

    it("should add two positive numbers", function() {
      verifyAnswer(calculator.add(3, 2), 5);
    });

    it("should add a positive and a negative number", function() {
      verifyAnswer(calculator.add(3, -2), 1);
    });

    it("should give the same value when adding zero", function() {
      verifyAnswer(calculator.add(3, 0), 3);
    });

  });

  // ...

});

Unit Tests with Mocks

Mocks help to keep unit tests focused narrowly upon the unit under test. They are essential when units of code depend upon other callable entities. For example, mocks can be used to provide dummy test values for REST APIs instead of calling the real endpoints so that receiving code can be tested independently.

Jasmine’s out-of-the-box spies can do some mocking and spying, but it is not very powerful. For example, it doesn’t work when members of one module call members of another, or even when members of the same module call each other (unless they are within the same class). It is better to use rewire for monkey-patching (mocking via member substitution) and sinon for stubbing and spying.

The “lib/weather.js” module shows how mocking can be done with member dependencies. The WeatherCaller class’s “getForecast” method calls the “callForecast” function, which is meant to represent a service call to get live weather forecasts. The “callForecast” function returns an empty object, but the specs will “rewire” it to return dummy test values that can be used by the WeatherCaller class. Rewiring will work even though “callForecast” is not exported!

// --------------------------------------------------
// lib/weather.js
// --------------------------------------------------

function callForecast(month, day, year, zipcode) {
  return {};
}

class WeatherCaller {

  constructor() {
    this.forecasts = {};
  }

  getForecast(month, day, year, zipcode) {
    let key = `${month}/${day}/${year} for ${zipcode}`;
    if (!(key in this.forecasts)) {
      this.forecasts[key] = callForecast(month, day, year, zipcode);
    }
    return this.forecasts[key];
  }

}

module.exports = {
  WeatherCaller: WeatherCaller,
}

The tests in “spec/unit/weather.mock.spec.js” monkey-patch the “callForecast” function with a sinon stub in the beforeEach call so that each test has a fresh spy count. Note that the weather method is imported using “rewire” instead of “require” so that it can be monkey-patched. Even though the original function returns an empty object, the tests pass because the mock returns the dummy test value.

// --------------------------------------------------
// spec/unit/weather.mock.spec.js
// --------------------------------------------------

// Imports

const rewire = require('rewire');
const sinon = require('sinon');

// Rewirings

const weather = rewire('../../lib/weather');

// WeatherCaller Specs
describe("WeatherCaller Class", function() {

  // Test constants
  const dummyForecast = {"high": 42, "low": 26};

  // Test variables
  let callForecastMock;
  let weatherModuleRestore;
  let weatherCaller;

  beforeEach(function() {
    // Mock the inner function's return value using sinon
    // Do this for each test to avoid side effects of call count
    callForecastMock = sinon.stub().returns(dummyForecast);
    weatherModuleRestore = weather.__set__("callForecast", callForecastMock);

    // Construct the main caller object
    weatherCaller = new weather.WeatherCaller();
  });

  it("should be empty upon construction", function() {
    // No mocks required here
    expect(Object.keys(weatherCaller.forecasts).length).toBe(0);
  });

  it("should get a forecast for a date and a zipcode", function() {
    // This simply verifies that the return value is correct
    let forecast = weatherCaller.getForecast(12, 25, 2017, 21047);
    expect(forecast).toEqual(dummyForecast);
  });

  it("should get a fresh forecast the first time", function() {
    // The inner function should be called and the value should be cached
    // Note the sequence of assertions, which guarantee safety
    let forecast = weatherCaller.getForecast(12, 25, 2017, 21047);
    const forecastKey = "12/25/2017 for 21047";
    expect(callForecastMock.called).toBeTruthy();
    expect(Object.keys(weatherCaller.forecasts).length).toBe(1);
    expect(forecastKey in weatherCaller.forecasts).toBeTruthy();
    expect(weatherCaller.forecasts[forecastKey]).toEqual(dummyForecast);
  });

  it("should get a cached forecast the second time", function() {
    // The inner function should be called only once
    // The same object should be returned by both method calls
    let forecast1 = weatherCaller.getForecast(12, 25, 2017, 21047);
    let forecast2 = weatherCaller.getForecast(12, 25, 2017, 21047);
    expect(callForecastMock.calledOnce).toBeTruthy();
    expect(forecast1).toBe(forecast2);
  });

  it("should get and cache multiple forecasts", function() {
    // The other tests verify the mechanics of individual calls
    // This test verifies that the caller can handle multiple forecasts

    // Initial forecasts
    let forecast1 = weatherCaller.getForecast(12, 25, 2017, 27518);
    let forecast2 = weatherCaller.getForecast(12, 25, 2017, 27518);
    let forecast3 = weatherCaller.getForecast(12, 25, 2017, 21047);

    // Change forecast value
    weatherModuleRestore();
    const newForecast = {"high": 39, "low": 18}
    callForecastMock = sinon.stub().returns(newForecast);
    weatherModuleRestore = weather.__set__("callForecast", callForecastMock);

    // More forecasts
    let forecast4 = weatherCaller.getForecast(12, 26, 2017, 21047);
    let forecast5 = weatherCaller.getForecast(12, 27, 2017, 21047);

    // Assertions
    expect(Object.keys(weatherCaller.forecasts).length).toBe(4);
    expect("12/25/2017 for 27518" in weatherCaller.forecasts).toBeTruthy();
    expect("12/25/2017 for 21047" in weatherCaller.forecasts).toBeTruthy();
    expect("12/26/2017 for 21047" in weatherCaller.forecasts).toBeTruthy();
    expect("12/27/2017 for 21047" in weatherCaller.forecasts).toBeTruthy();
    expect(forecast1).toEqual(dummyForecast);
    expect(forecast2).toEqual(dummyForecast);
    expect(forecast3).toEqual(dummyForecast);
    expect(forecast4).toEqual(newForecast);
    expect(forecast5).toEqual(newForecast);
  });

  afterEach(function() {
    // Undo the monkeypatching
    weatherModuleRestore();
  });

});

Integration Tests for REST APIs

Jasmine can do black-box tests just as well as it can do white-box tests. Testing REST API service calls are some of the most common integration-level tests. There are many REST request packages for Node.js, but frisby is particularly designed for testing. Frisby even has its own expect methods (though the standard Jasmine expect and matchers may still be used).

A best practice for black-box tests is to put config data for external dependencies into a config file. Config data for REST API calls could be URLs, usernames, and passwords. Never hard-code config data into test automation. JavaScript config files are super simple: just write a JSON file and read it during test setup using the “require” function, just like any module. The config data will be automatically parsed as a JavaScript object!

Below is an example test for calling Wikipedia’s REST API. It reads the base URL from a config file and uses it in the frisby call. The config file:

// --------------------------------------------------
// spec/support/env.json
// --------------------------------------------------
{
  "integration" : {
    "wikipediaServiceBaseUrl": "https://en.wikipedia.org/api/rest_v1"
  }
}

And the spec:

// --------------------------------------------------
// spec/integration/wikipedia.service.spec.js
// --------------------------------------------------

const frisby = require('frisby');

describe("English Wikipedia REST API", function() {

  const ENV = require("../support/env.json");
  const BASE_URL = ENV.integration.wikipediaServiceBaseUrl;

  describe("GET /page/summary/{title}", function() {

    it("should return the summary for the given page title", function(done) {
      frisby
        .get(BASE_URL + "/page/summary/Pikachu")
        .then(function(response) {
          expect(response.status).toBe(200);
          expect(response.json.title).toBe("Pikachu");
          expect(response.json.pageid).toBe(269816);
          expect(response.json.extract).toContain("Pokémon");
        })
        .done(done);
    })

  });

  // ...
});

End-to-End Tests for Web UIs

Jasmine can also be used for end-to-end Web UI tests. One of the most popular packages for web browser automation is Selenium WebDriver, which uses programming calls to interact with a browser like a real user. Selenium releases a WebDriver package for JavaScript for Node.js, but it is typically a better practice to use Protractor.

Protractor integrates WebDriver with JavaScript test frameworks to make it easier to use. By default, Jasmine is the default framework for Protractor, but Mocha, Cucumber, and any other JavaScript framework could be used. One of the best advantages Protractor has over WebDriver by itself is that Protractor does automatic waiting: explicit calls to wait for page elements are not necessary. This is a wonderful feature that eliminates a lot of repetitive automation code. Protractor also provides tools to easily set up the Selenium Server and browsers (including mobile browsers). Even though Protractor is designed for Angular apps, it can nevertheless be used for non-Angular front-ends.

Web UI tests can be quite complicated because they cover many layers and require extra configuration. Web page interactions frequently need to be reused, too. It is a best practice to use a pattern like the Page Object Model to handle web interactions in one reusable layer. Page objects pull WebDriver locators and actions out of test fixtures (like describe/it functions) so that they may be updated more easily when changes are developed for the actual web pages. (In fact, some teams choose to co-locate page object classes with product source code for the web app so that both are updated simultaneously.) The Page Object Model is a great way to manage the inherently complicated Web automation design.

This guide does not provide a custom example for Protractor with Jasmine because the Protractor documentation is pretty good. It contains a decent tutorial, setup and config instructions, framework integrations, and a full reference. Furthermore, proper Protractor setup requires careful local setup with a live site to test. Please refer to the official doc for more information. Most of the examples in the doc use Jasmine.

Basic Test Execution

The simplest way to run Jasmine tests is to use the “jasmine” command. Make sure you are in the project’s root directory when running tests. Below are example invocations.

# Run all specs in the project (according to the Jasmine config)
$ jasmine

# Run a specific spec by file path
$ jasmine spec/integration/wikipedia.service.spec.js

# Run all specs that match a path pattern
# Warning: this call is NOT recursive and will not search sub-directories!
$ jasmine spec/unit/*

# Run all specs whose titles match a regex filter
# This searches both "describe" and "it" titles
$ jasmine --filter="Calculator"

# Stop testing after the first failure happens
$ jasmine --stop-on-failure=true

# Run tests in a random order
# Optionally include a seed value
$ jasmine --random=true --seed=4321

Test execution options may also be set in the Jasmine config file.

Advanced Test Execution with Karma

Karma is a self-described “spectacular test runner for JavaScript.” Its main value is that it runs JavaScript tests in live web browsers (rather than merely on Node.js), testing actual browser compatibility. In fact, developers can keep Karma running while they develop code so they can see test results in real time as they make changes. Karma integrates with many test tools (including Istanbul for code coverage) and frameworks (including Jasmine). Karma itself runs on Node.js and is distributed as a number of packages for different browsers and frameworks. Check out this Google Testing Blog article to learn the original impetus behind developing Karma, originally called “Testacular.”

Karma and Protractor are similar in that they run tests against real web browsers, but they serve different purposes. Karma is meant for running unit tests against JavaScript code, whereas Protractor is meant for running end-to-end tests against a full, live site like a user. Karma tests go through a “back door” to exercise pieces of a site. Karma and Protractor are not meant to be used together for the same tests (see Protractor Issue #9 on GitHub). However, one project can use both tools at their appropriate test layers, as done for standard Angular testing.

This guide does not provide a custom example for Karma with Jasmine because it requires local setup with the right packages and browser versions. Karma packages are distributed through npm. Karma with Jasmine requires the main karma package, the karma-jasmine package, and a launcher package for each desired browser (like karma-chrome-launcher). There are also plenty of decent examples online here, here, and here. Please refer to the official Karma documentation for more info.

Running Jasmine tests with Karma is not without its difficulties, however. One challenge is handling modules and imports. ECMAScript 6 (ES6) has a totally new syntax for modules and imports that is incompatible with the CommonJS module system with require used by Node.js. Node.js is working on ES6-style module support, but at the time this article was written, full support was not yet available. Module imports are troublesome for Karma because Karma is launched from Node.js (requiring require) but runs in a browser (which doesn’t support require). There are a few workarounds:

  • Use RequireJS to load modules.
  • Use Browserify to make require work in browsers.
  • Use rollup.js to bundle all modules into one to sidestep imports.
  • Use Angular with TypeScript, which builds and links everything automatically.

Angular Testing

Angular is a very popular front-end Web framework. It is a complete rewrite of AngularJS and is seen as an alternative to React. One of Angular’s perks is its excellent support for testing. Out of the box, new Angular projects come with config for unit testing with Jasmine/Karma and end-to-end testing with Jasmine/Protractor. It’s easy to integrate other automation tools like Istanbul code coverage or HTML reporting. Standard Angular projects using TypeScript also don’t suffer from the module import problem: imports are linked properly when TypeScript is compiled into JavaScript.

Angular unit tests are written just like any other Jasmine unit tests except for one main difference: the Angular testing utilities. These extra packages create a test environment (a “TestBed”) for testing each part of the Angular app internally and independently. Dependencies can be easily stubbed and mocked using Jasmine’s spies, with no need for sinon since everything binds. NGRX also provides extended test utilities. The Angular testing utilities can seem overwhelming at first, but together with Jasmine, they make it easy to write laser-precise unit tests.

Another interesting best practice for Angular unit tests is to co-locate them with the modules they cover. For every *.js/*.ts file, there should be a *.spec.js/*.spec.ts file with the covering describe/it tests. This is not common practice for unit tests, but the Angular doc notes many advantages: tests are easy to find, coverage is roughly visual, and updates are less likely forgotten. The automatically-generated test config has settings to search the whole project for spec files.

Angular end-to-end tests are treated differently from unit tests, however. Since they test the app as a whole, they don’t use the Angular testing utilities, and they should be located in their own directory (usually named “e2e”). Thus, Angular end-to-end tests are really no different than any other Web UI tests that use Protractor. Jasmine is the default test framework, but it may be advantageous to switch to Cucumber.js for all the advantages of BDD.

This guide does not provide Angular testing examples because the official Angular documentation is stellar. It contains a tutorial, a whole page on testing, and live examples of tests (linked from the testing page).

To Infinity and Beyond: A Guide to Parallel Testing

Are your automated tests running in parallel? If not, then they probably should be. Together with continuous integration, parallel testing the best way to fail fast during software development and ultimately enforce higher software quality. Switching tests from serial to parallel execution, however, is not a simple task. Tests themselves must be designed to run concurrently without colliding, and extra tools and systems are needed to handle the extra stress. This article is a high-level guide to good parallel testing practices.

What is Parallel Testing?

Parallel testing means running multiple automated tests simultaneously to shorten the overall start-to-end runtime of a test suite. For example, if 10 tests take a total of 10 minutes to run, then 2 parallel processes could execute 5 tests each and cut the total runtime down to 5 minutes. Even better, 10 processes could execute 1 test each to shrink runtime to 1 minute. Parallel testing is usually managed by either a test framework or a continuous integration tool. It also requires more compute resources than serial testing.

Why Go Parallel?

Running automated tests in parallel does require more effort (and potentially cost) than running tests serially. So, why go through the trouble?

The answer is simple: time. It is well documented that software bugs cost more when they are discovered later. That’s why current development practices like Agile and BDD strive to avoid problems from the start through small iterations and healthy collaboration (“shift left“), while CI/CD defensively catches regressions as soon as they happen (“fail fast“). Reducing the time to discover a problem after it has been introduced means higher quality and higher productivity.

Ideally, a developer should be told if a code change is good or bad immediately after committing it. The change should automatically trigger a new build that runs all tests. Unfortunately, tests are not instantaneous – they could take minutes, hours, or even days to complete. A test automation strategy based on the Testing Pyramid will certainly shorten start-to-end execution time but likely still require parallelization. Consider the layers of the Testing Pyramid and their tests’ average runtimes, the Testing Pyramid Rule of 1’s:

The Testing Pyramid with Times
Each layer is listed above with the rough runtime of a typical test. Though actual runtimes will vary, the Rule of 1’s focuses on orders of magnitude. Unit tests typically run in milliseconds because they often exercise product code in memory. Integration tests exercise live products but are limited in scope and often cover low-level areas (like REST service calls). End-to-end tests, however, cover full paths through a live system, which requires extra setup and waiting (like Selenium WebDriver interaction).

Now, consider how many tests from each layer could be run within given time limits, if the tests are run serially:

Test Layer1 Minute
Near-Instant
10 Minutes
Coffee Break
1 Hour
There Goes Today
Unit60,000600,0003,600,000
Integration606003,600
End-to-End11060

Unit test numbers look pretty good, though keep in mind 1 millisecond is often the best-case runtime for a unit test. Integration and end-to-end runtimes, however, pose a more pressing problem. It is not uncommon for a project to have thousands of above-unit tests, yet not even a hundred end-to-end tests could complete within an hour, nor could a thousand integration tests complete within 10 minutes. Now, consider two more facts: (1) tests often run as different phases in a CI pipeline, to total runtimes are stacked, and (2) multiple commits would trigger multiple builds, which could cause a serious backup. Serial test execution would starve engineering feedback in any continuous integration system of scale. A team would need to drastically shrink test coverage or give up on being truly “continuous” in favor of running tests daily or weekly. Neither alternative is acceptable these days. CI needs parallel testing to be truly continuous.

The Danger of Collisions

The biggest danger for parallel testing is collision – when tests interfere with each other, causing invalid test failures. Collisions may happen in the product under test if product state is manipulated by more than one test at a time, or they may happen in the automation code itself if the code is not thread-safe. Collisions are also inherently intermittent, which makes them all the more difficult to diagnose. As a design principle, automated tests must avoid collisions for correct parallel execution.

Making tests run in parallel is not as simple as flipping a switch or adding a new config file. Automated tests must be specifically designed to run in parallel. A team may need to significantly redevelop their automation code to make parallel execution work right.

A train collision in Mannheim, Germany in 2014
A train collision in Mannheim, Germany in 2014. Don’t let this happen to your tests!

Handling Product-Level Collisions

Product-level collisions essentially reduce to how environments are set up and handled.

Separate Environments

The most basic way to avoid product-level collisions would be to run each test thread or process against its own instance of the product in an exclusive environment. (In the most extreme case, every single test could have its own product instance.) No collisions would happen in the product because each product instance would be touched by only one test instance at a time. Separate environments are possible to implement using various configuration and deployment tools. Docker containers are quick and easy to spin up. VMs with Vagrant, Puppet, Chef, and/or Ansible can also get it done.

However, it may not always be sensible to make separate environments for each test thread/process:

  • Creating a new environment is inefficient – it takes extra time to set up that may cancel out any time saved from parallel execution.
  • Many projects simply don’t have the money or the compute resources to handle a massive scale-out.
  • Some tests may not cause collisions and therefore may not need total isolation.
  • Some product environments are extremely large and complicated and would not be practical to replicate for each test individually.

Shared Environments

Environments with a shared product instance are quite common. One could be a common environment that everyone on a team shares, or one could be freshly created during a CI run and accessed by multiple test threads/processes. Either way, product-level collisions are possible, and tests must be designed to avoid clashing product states. Any test covering a persistent state is vulnerable; usually, this is the vast majority of tests. Consider web app testing as an example. Tests to load a page and do some basic interactions can probably run in parallel without extra protection, but tests that use a login to enter data or change settings could certainly collide. In this case, collisions could be avoided by using different logins for each simultaneous test instance – by using either a pool of logins, a unique login per test case, or a unique login per thread/process. Each product is different and will require different strategies for avoiding collisions.

the_earth_seen_from_apollo_17
We all share certain environments. Take care of them when you do. (Photo: The Blue Marble, taken by the Apollo 17 crew on Dec 7, 1972)

Handling Automation-Level Collisions

Automation-level collisions can happen when automation code is not thread-safe, which could mean more than simply locks and semaphores.

#1: Test Independence

Test cases must be completely independent of each other. One test must not require another test to run before it for the sake of setup. A test case should be able to run by itself without any others. A test suite should be able to run successfully in random order.

#2: Proper Variable Scope

If parallel tests will be run in the same memory address space, then it is imperative to properly scope all variables. Global or static mutable variables (e.g., “non-constants”) must not be allowed because they could be changed unexpectedly. The best pattern for handling scope is dependency injection. Thread-safe singletons would be a second choice. (Typically, global or static variables are used to subvert design patterns, so they may reveal further necessary automation rework when discovered.)

#3: External Resources

Automation may sometimes interact with external resources, such as test config files or test result databases/services. Make sure no external interactions collide. For example, make sure test run updates don’t overwrite each other.

#4: Logging

Logs are very difficult to trace when multiple tests are simultaneously printed to the same file. The best practice is to generate separate log files for each test case, thread, or process to make them readable.

#5: Result Aggregation

A test suite is a unified collection of tests, no matter how many threads/processes are used to run its tests in parallel. Make sure test results are aggregated together into one report. Some frameworks will do this automatically, while others will require custom post-processing.

#6: Test Filtering

One strategy to avoid collisions may be to run non-colliding partitions (subsets) of tests in parallel. Test tagging and filtering would make this possible. For example, tests that require a special login could be tagged as such and run together on one thread.

Test Scalability

The previous section on collisions discussed how to handle product environments. It is also important to consider how to handle the test automation environment. These are two different things: the product environment contains the live product under test, while the test environment contains the automation software and resources that run tests against the product. The test environment is where the parallel tests will be executed, and, as such, it must be scalable to handle the parallelization. A common example of a test environment could be a Jenkins master with a few agents for running build pipelines. There are two primary ways to scale the test environment: scale-up and scale-out.

Parallel Scale-Up

Scale-up is when one machine is configured to handle more tests in parallel. For example, scale-up would be when a machine switches from one (serial) thread to two, three, or even more in parallel. Many popular test runners support this type of scale-up by spawning and joining threads in a common memory address space or by forking processes. (For example, the SpecFlow+ Runner lets you choose.)

Scale-up is a simple way to squeeze as much utility out of an existing machine as possible. If tests are designed to handle collisions, and the test runner has out-of-the-box support, then it’s usually pretty easy to add more test threads/processes. However, parallel test scale-up is inherently limited by the machine’s capacity. Each additional test process succumbs to the law of diminishing returns as more memory and processor cycles are used. Eventually, adding more threads will actually slow down test execution because the processor(s) will waste time constantly switching between tests. (Anecdotally, I found the optimal test-thread-to-processor ratio to be 2-to-1 for running C#/SpecFlow/Selenium-WebDriver tests on Amazon EC2 M4 instances.) A machine itself could be upgraded with more threads and processors, but nevertheless, there are limits to a single machine’s maximum capacity. Weird problems like TCP/IP port exhaustion may also arise.

Scale Up
Scale-up adds more threads to one machine.

Parallel Scale-Out

Scale-out is when multiple machines are configured to run tests in parallel. Whereas scale-up had one machine running multiple tests, scale-out has multiple machines each running tests. Scale-out can be achieved in a number of ways. A few examples are:

  • One master test execution machine launches multiple Web UI tests that each use a remote Selenium WebDriver with a service like Selenium Grid, Sauce Labs, LambdaTest, or BrowserStack.
  • A Jenkins pipeline launches tests across ten agents in parallel, in which each agent executes a tenth of the tests independently.

Scale-out is a better long-term solution than scale-up because scale-out can handle an unlimited number of machines for parallel testing. The limiting factor with scale-out is not the maximum capacity of the hardware but rather the cost of running more machines. However, scale-out is much harder to implement than scale-up. It requires tests to be evenly divided with some sort of balancer and filter. It also requires some sort of test result aggregation for joint reporting – people won’t want to piece together a bunch of separate reports to get an overall snapshot of quality. Plus, the test environment is more complicated to build and maintain (though tools like CloudBees Jenkins Enterprise or Amazon EC2 can make it easier.)

Scale Out
Scale-out distributes tests across multiple machines.

Upwards and Outwards

Of course, scale-up and scale-out are not mutually exclusive. Scaled-out nodes could individually be scaled-up. Consider a test environment with 10 powerful VMs that could each handle 10 tests in parallel – that means 100 tests could run simultaneously. Using the Rule of 1’s, it would take only about a minute to run 100 Web UI tests, which serially would have taken over an hour and a half! Use both strategies to shorten start-to-end runtime as much as possible.

Conclusion

Parallel testing is a worthwhile endeavor. When done properly, it will not only reduce development time but also improve the development experience. For readers who want to start doing parallel testing, I recommend researching the tools and frameworks you want to use. Many popular test frameworks support parallel execution, and even if the one you choose doesn’t, you can always invoke tests in parallel from the command line. Do well!

Software Testing Lessons from Luigi’s Mansion

How can lessons from Luigi’s Mansion apply to software testing and automation?

Luigi’s Mansion is a popular Nintendo video game series. It’s basically Ghostbusters in the Super Mario universe: Luigi must use a special vacuum cleaner to rid haunted mansions of the ghosts within. Along the way, Luigi also solves puzzles, collects money, and even rescues a few friends. I played the original Luigi’s Mansion game for the Nintendo GameCube when I was a teenager, and I recently beat the sequel, Luigi’s Mansion: Dark Moon, for the Nintendo 3DS. They were both quite fun! And there are some lessons we can apply from Luigi’s Mansion to software testing and automation.

#1: Exploratory Testing is Good

The mansions are huge – Luigi must explore every nook and cranny (often in the dark) to spook ghosts out of their hiding places. There are also secrets and treasure hiding in plain sight everywhere. Players can easily miss ghosts and gold alike if they don’t take their time to explore the mansions thoroughly. The same is true with testing: engineers can easily miss bugs if they overlook details. Exploratory testing lets engineers freely explore the product under test to uncover quality issues that wouldn’t turn up through rote test procedures.

#2: Expect the Unexpected

Ghosts can pop out from anywhere to scare Luigi. They also can create quite a mess of the mansion – blocking rooms, stealing items, and even locking people into paintings! Software testing is full of unexpected problems, too. Bugs happen. Environments go down. Network connections break. Even test automation code can have bugs. Engineers must be prepared for any emergency regardless of origin. Software development and testing is about solving problems, not about blame-games.

#3: Don’t Give Up!

Getting stuck somewhere in the mansion can be frustrating. Some puzzles are small, while others may span multiple rooms. Sometimes, a player may need to backtrack through every room and vacuum every square inch to uncover a new hint. Determination nevertheless pays off when puzzles get solved. Software engineers must likewise never give up. Failures can be incredibly complex to identify, reproduce, and resolve. Test automation can become its own nightmare, too. However, there is always a solution for those tenacious (or even hardheaded) enough to find it.

 

Want to see what software testing lessons can be learned from other games? Check out Gotta Catch ’em All! for Pokémon!

Gherkin Syntax Highlighting in Chrome

Google Chrome is one of the most popular web browsers around. Recently, I discovered that Chrome can edit and display Gherkin feature files. The Chrome Web Store has two useful extensions for Gherkin: Tidy Gherkin and Pretty Gherkin, both developed by Martin Roddam. Together, these two extensions provide a convenient, lightweight way to handle feature files.

Tidy Gherkin

Tidy Gherkin is a Chrome app for editing and formatting feature files. Once it is installed, it can be reached from the Chrome Apps page (chrome://apps/). The editor appears in a separate window. Gherkin text is automatically colored as it is typed. The bottom preview pane automatically formats each line, and clicking the “TIDY!” button in the upper-left corner will format the user-entered text area as well. Feature files can be saved and opened like a regular text editor. Templates for Feature, Scenario, and Scenario Outline sections may be inserted, as well as tables, rows, and columns.

Another really nice feature of Tidy Gherkin is that the preview pane automatically generates step definition stubs for Java, Ruby, and JavaScript! The step def code is compatible with the Cucumber test frameworks. (The Java code uses the traditional step def format, not the Java 8 lambdas.) This feature is useful if you aren’t already using an IDE for automation development.

Tidy Gherkin has pros and cons when compared to other editors like Notepad++ and Atom. The main advantages are automatic formatting and step definition generation – features typically seen only in IDEs. It’s also convenient for users who already use Chrome, and it’s cross-platform. However, it lacks richer text editing features offered by other editors, it’s not extendable, and the step def gen feature may not be useful to all users. It also requires a bit of navigation to open files, whereas other editors may be a simple right-click away. Overall, Tidy Gherkin is nevertheless a nifty, niche editor.

This slideshow requires JavaScript.

Pretty Gherkin

Pretty Gherkin is a Chrome extension for viewing Gherkin feature files through the browser with syntax highlighting. After installing it, make sure to enable the “Allow access to the file URLs” option on the Chrome Extensions page (chrome://extensions/). Then, whenever Chrome opens a feature file, it should display pretty text. For example, try the GoogleSearch.feature file from my Cucumber-JVM example project, cucumber-jvm-java-example. Unfortunately, though, I could not get Chrome to display local feature files – every time I would try to open one, Chrome would simply download it. Nevertheless, Pretty Gherkin seems to work for online SCM sites like GitHub and BitBucket.

Since Pretty Gherkin is simply a display tool, it can’t really be compared to other editors. I’d recommend Pretty Gherkin to Chrome users who often read feature files from online code repositories.

This slideshow requires JavaScript.

 

Be sure to check out other Gherkin editors, too!

Unpredictable Test Data

Test data is a necessary evil for testing and automation. It is necessary because tests simply can’t run without test case values, configuration data, and ready state (as detailed in BDD 101: Test Data). It is evil because it is challenging to handle properly. Test data may be even more dastardly when it is unpredictable, but thankfully there are decent strategies for handling unpredictability.

What is Unpredictable Test Data?

Test data is unpredictable when its values are not explicitly the same every time a test runs. For example, let’s suppose we are writing tests for a financial system that must obtain stock quotes. Mocking a stock quote service with dummy predictable data would not be appropriate for true integration or end-to-end tests. However, stock quotes act like random walks: they change values in real time, often perpetually. The name “unpredictable” could also be “non-deterministic” or “uncertain.”

Below are a few types of test data unpredictability:

  • Values may be missing, mistyped, or outside of expected bounds.
  • Time-sensitive data may change rapidly.
  • Algorithms may yield non-deterministic results (like for machine learning).
  • Data formats may change with software version updates.
  • Data may have inherent randomness.

Strategies for Handling Unpredictability

Any test data is prone to be unpredictable when it comes from sources external to the automation codebase. Test must be robust enough to handle the inherent unpredictability. Below are 5 strategies for safety and recovery. The main goal is test completion – pass or fail, tests should not crash and burn due to bad test data. When in doubt, skip the test and log warnings. When really in doubt, fail it as a last resort.

Automation’s main goal is to complete tests despite unpredictability in test data.

#1: Make it Predictable

Ask, is it absolutely necessary to fetch data from unpredictable sources? Or can they be avoided by using predictable, fake data? Fake data can be provided in a number of ways, like mocks or database copies. It’s a tradeoff between test reliability and test coverage. In a risk-based test strategy, the additional test coverage may not be worthwhile if all input cases can be covered with fake data. Nevertheless, unpredictable data sometimes cannot or should not be avoided.

#2: Write Defensive Assertions

When reading data, make assertions to guarantee correctness. Assertions are an easy way to abort a test immediately if any problems are found. Assertions could make sure that values are not null, contain all required pieces, and fit the expected format.

#3: Handle Healthy Bounds

Tests using unpredictable data should be able to handle acceptable ranges of values instead of specific pinpointed values. This could mean including error margins in calculations or using regular expressions to match strings. Assertions may need to do some extra preliminary processing to handle ranges instead of singular values. Any anomalies should be reported as warnings.

For the stock quote example, the following would be ways to handle healthy bounds:

  • Abort if the quote value is non-numeric or negative.
  • Warn if the value is $0 or greater than $1M.
  • Continue for values between $0 and $1M.

#4: Scrub the Data

Sometimes, data problems can be “scrubbed” away. Formats can be fixed, missing values can be populated, and given values can be adjusted or filtered. Scrubbing data may not always be appropriate, but if possible, it can mean a test will be completed instead of aborted.

#5: Do Retries

Data may need to be fetched again if it isn’t right the first time. Retries are applicable for data that changes frequently or is random. The automation framework should have a mechanism to retry data access after a waiting period. Set retry limits and wait times appropriately – don’t waste too much time. Retries should also be done as close to the point of failure as possible. Retrying the whole test is possible but not as efficient as retrying a single service call.

Final Advice

Unpredictable test data shouldn’t be a show-stopper – it just need special attention. Nevertheless, try to limit test automation’s dependence on external data sources.

Cucumber-JVM for Java

This post is a concise-yet-comprehensive overview of Cucumber-JVM for Java. It is an introduction, a primer, a guide, and a reference. If you are new to BDD, please learn about it before using Cucumber-JVM.

Introduction

cucumber-logo-d727c551ce-seeklogo-com

Cucumber is an open-source software test automation framework for behavior-driven development. It uses a business-readable, domain-specific language called Gherkin for specifying feature behaviors that become tests. The Cucumber project started in 2008 when Aslak Hellesøy released the first version of the Cucumber framework for Ruby.

Cucumber-JVM is the official port for Java. Every Gherkin step is “glued” to a step definition method that executes the step. The English text of a step is glued using annotations and regular expressions. Cucumber-JVM integrates nicely with other testing packages. Anything that can be done with Java can be handled by Cucumber-JVM. Cucumber-JVM is ideal for black-box, above-unit, functional tests.

[Update on 7/29/2018: As of version 3.0.0, Cucumber-JVM no longer supports JVM languages other than Java – namely Groovy, Scala, Clojure, and Gosu. Please read Cucumber-JVM is dropping support of JVM Languages on the official Cucumber blog. The Gherkin parser is also now written in Go instead of in Java.]

Example Projects

Github contains two Cucumber-JVM example projects for this guide:

The projects use Java, Apache Maven, Selenium WebDriver, and AssertJ. The README files include practice exercises as well.

Prerequisite Skills

To be successful with Cucumber-JVM for Java, the following skills are required:

Prerequisite Tools

Test machines must have the Java Development Kit (JDK) installed to build and run Cucumber-JVM tests. They should also have the desired build tool installed (such as Apache Maven). The build tool should automatically install Cucumber-JVM packages through dependency management.

An IDE such as JetBrains IntelliJ IDEA (with the Cucumber for Java plugin) or Eclipse (with the Cucumber JVM Eclipse Plugin) is recommended for Cucumber-JVM test automation development. Software configuration management (SCM) with a tool like Git is also strongly recommended.

Versions

Cucumber-JVM 2.0 was released in August 2017 and should be used for new Cucumber-JVM projects. Releases may be found under Maven Group ID io.cucumber. Older Cucumber-JVM 1.x versions may be found under Maven Group ID info.cukes.

Build Management

Apache Maven is the preferred build management tool for Cucumber-JVM projects. All Cucumber-JVM packages are available from the Maven Central Repository. Maven can automatically run Cucumber-JVM tests as part of the build process. Projects using Cucumber-JVM should follow Maven’s Standard Directory Layout. The examples use Maven. Gradle may also be used, but it requires extra setup.

Every Maven project has a POM file for configuration. The POM should contain appropriate Cucumber-JVM dependencies. There is a separate package for each JVM language, dependency injection framework, and underlying unit test runner. Since Cucumber-JVM is a test framework, its dependencies should use test scope. Check io.cucumber on the Maven site for the latest packages and versions.

Project Structure

Cucumber-JVM test automation has the same layered approach as other BDD frameworks:

BDD Automation Layers.png

The higher layers focus more on specification, while the lower layers focus more on implementation. Gherkin feature files and step definition classes are BDD-specific.

Cucumber-JVM tests may be included in the same project as product code or in a separate project. Either way, projects using Cucumber-JVM should follow Maven’s Standard Directory Layout: test code should be located under src/test.

Cucumber-JVM Example Project

Screenshot of the example project from IntelliJ IDEA’s Project view.

Gherkin Feature Files

Gherkin feature files are text files that contain Gherkin behavior scenarios. They use the “.feature” extension. In a Maven project, they belong under src/test/resources, since they are not Java source files. They should also be organized into a sensible package hierarchy. Refer to other BDD pages for writing good Gherkin.

Gherkin Feature File

A feature file from the example projects, opened in IntelliJ IDEA.

Step Definition Classes

Step definition classes are Java classes containing methods that implement Gherkin steps. Step def classes are like regular Java classes: they have variables, constructors, and methods. Steps are “glued” to methods using regular expressions. Feature file scenarios can use steps from any step definition class in the project. In a Maven project, step defs belong in packages under src/test/java, and their class names should end in “Steps”.

The Basics

Below is a step definition class from the cucumber-jvm-java-example project, which uses the traditional method annotation style for step defs as part of the cucumber-java package. Each method should throw Throwable so that exceptions are raised up to the Cucumber-JVM framework.

package com.automationpanda.example.stepdefs;

import com.automationpanda.example.pages.GooglePage;
import cucumber.api.java.After;
import cucumber.api.java.Before;
import cucumber.api.java.en.Given;
import cucumber.api.java.en.Then;
import cucumber.api.java.en.When;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

import static org.assertj.core.api.Assertions.assertThat;

public class GoogleSearchSteps {

  private WebDriver driver;
  private GooglePage googlePage;

  @Before(value = "@web", order = 1)
  public void initWebDriver() throws Throwable {
    driver = new ChromeDriver();
  }

  @Before(value = "@google", order = 10)
  public void initGooglePage() throws Throwable {
    googlePage = new GooglePage(driver);
  }

  @Given("^a web browser is on the Google page$")
  public void aWebBrowserIsOnTheGooglePage() throws Throwable {
    googlePage.navigateToHomePage();
  }

  @When("^the search phrase \"([^\"]*)\" is entered$")
  public void theSearchPhraseIsEntered(String phrase) throws Throwable {
    googlePage.enterSearchPhrase(phrase);
  }

  @Then("^results for \"([^\"]*)\" are shown$")
  public void resultsForAreShown(String phrase) throws Throwable {
    assertThat(googlePage.pageTitleContains(phrase)).isTrue();
  }

  @After(value = "@web")
  public void disposeWebDriver() throws Throwable {
    driver.quit();
  }
}

Alternatively, in Java 8, step definitions may be written using lambda expressions. As shown in the cucumber-jvm-java8-example project, lambda-style step defs are more concise and may be defined dynamically. The cucumber-java8 package is required:

package com.automationpanda.example.stepdefs;

import com.automationpanda.example.pages.GooglePage;
import cucumber.api.Scenario;
import cucumber.api.java8.En;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

import static org.assertj.core.api.Assertions.assertThat;

public class GoogleSearchSteps implements En {

  private WebDriver driver;
  private GooglePage googlePage;

  // Warning: Make sure the timeouts for hooks using a web driver are zero

  public GoogleSearchSteps() {
    Before(new String[]{"@web"}, 0, 1, (Scenario scenario) -> {
      driver = new ChromeDriver();
    });
    Before(new String[]{"@google"}, 0, 10, (Scenario scenario) -> {
      googlePage = new GooglePage(driver);
    });
    Given("^a web browser is on the Google page$", () -> {
      googlePage.navigateToHomePage();
    });
    When("^the search phrase \"([^\"]*)\" is entered$", (String phrase) -> {
      googlePage.enterSearchPhrase(phrase);
    });
    Then("^results for \"([^\"]*)\" are shown$", (String phrase) -> {
      assertThat(googlePage.pageTitleContains(phrase)).isTrue();
    });
    After(new String[]{"@web"}, (Scenario scenario) -> {
      driver.quit();
    });
  }
}

Either way, steps from any feature file are glued to step definition methods/lambdas from any class at runtime:

Step Def Glue

Gluing a Gherkin step to its Java definition using regular expressions. IDEs have features to automatically generate definition stubs for steps.

For best practice, class inheritance should also be avoided – step bindings in superclasses will trigger DuplicateStepDefinitionException exceptions at runtime, and any step definition concern handled by inheritance can be handled better with other design patterns. Class constructors should be used primarily for dependency injection, while setup operations should instead be handled in Before hooks.

Hooks

Scenarios sometimes need automation-centric setup and cleanup routines that should not be specified in Gherkin. For example, web tests must first initialize a Selenium WebDriver instance. Step definition classes can have Before and After hooks that run before and after a scenario. They are analogous to setup and teardown methods from other test frameworks like JUnit. Hooks may optionally specify tags for the scenarios to which they apply, as well as an order number. They are similar to Aspect-Oriented Programming. After hooks will run even if a scenario has an exception or abortive assertion – use them for cleanup routines instead of Gherkin steps to guarantee cleanup runs.

The code snippet below shows Before and After hooks from the traditional-style example project. The order given to the Before hooks guarantees the web driver is initialized before the page object is created.

  @Before(value = "@web", order = 1)
  public void initWebDriver() throws Throwable {
    driver = new ChromeDriver();
  }

  @Before(value = "@google", order = 10)
  public void initGooglePage() throws Throwable {
    googlePage = new GooglePage(driver);
  }

  @After(value = "@web")
  public void disposeWebDriver() throws Throwable {
    driver.quit();
  }

Before and After hooks surround scenarios only. Cucumber-JVM does not provide hooks to surround the whole test suite. This protects test case independence but makes global setup and cleanup challenging. The best workaround is to use the singleton pattern with lazy initialization. The solution is documented in Cucumber-JVM Global Hook Workarounds.

Dependency Injection

Cucumber-JVM supports dependency injection (DI) as a way to share objects between step definition classes. For example, steps in different classes may need to share the same web driver instance. Cucumber-JVM supports many DI modules, and each has its own dependency package. As a warning, do not use static variables for sharing objects between step definition classes – static variables can break test independence and parallelization.

PicoContainer is the simplest DI framework and is recommended for most needs. Dependency injection hinges upon step definition class constructors. Without DI, step def constructors must not have parameters. With DI, PicoContainer will automatically construct each object in a step def constructor signature and pass them in when the step def object is constructed. Furthermore, the same object is injected into all step def classes that have its type as a constructor parameter. Objects that require constructor parameters should use a holder or caching class to provide the necessary arguments. Note that dependency-injected objects are created fresh for each scenario.

Below is a trivial example for how to apply dependency injection using PicoContainer to initialize the web driver in the example projects. (A more advanced example would read browser type from a config file and set the web driver accordingly.)

public class WebDriverHolder {
  private WebDriver driver;
  public WebDriver getDriver() {
    return driver;
  }
  public void initWebDriver() {
    driver = new ChromeDriver();
  }
}

public class GoogleSearchSteps {
  private WebDriverHolder holder;
  public GoogleSearchSteps(WebDriverHolder holder) {
    this.holder = holder;
  }
  @Before
  public void initWebDriver() throws Throwable {
    if (holder.getDriver() == null)
      holder.initWebDriver();
  }
}

Automation Support Classes

Automation support classes are extra classes outside of the Cucumber-JVM framework itself that are needed for test automation. They could come from the same test project, a separate but proprietary package, or an open-source package. Regardless of the source, they should fold into build management. They can integrate seamlessly with Cucumber-JVM. Step definitions should be very short because the bulk of automation work should be handled by support classes for maximum code reusability.

Popular open-source Java packages for test automation support are:

Page objects, file readers, and data processors also count as support classes.

Configuration Files

Configuration files are extra files outside of the Cucumber-JVM framework that provide environment-specific data to the tests, such as URLs, usernames, passwords, logging/reporting settings, and database connections. They should be saved in standard formats like CSV, XML, JSON, or Java Properties, and they should be read into memory once at the start of the test suite using global hook workarounds. The automation code should look for files at predetermined locations or using paths passed in as environment variables or properties.

Not all test automation projects need config files, but many do. Never hard-code config data into the automation code. Avoid non-text-based formats like Microsoft Excel so that version control can easily do diffs, and avoid non-standard formats that require custom parsers because they require extra development and maintenance time.

Running Tests

Cucumber-JVM tests may be run in a number of ways.

Using JUnit or TestNG

The cucumber-junit and cucumber-testng packages enable JUnit and TestNG respectively to run Cucumber-JVM tests. They require test runner classes that provide CucumberOptions for how to run the tests. A project may have more than one runner class. The example projects use the JUnit runner like this:

package com.automationpanda.example.runners;

import cucumber.api.CucumberOptions;
import cucumber.api.junit.Cucumber;
import org.junit.runner.RunWith;

@RunWith(Cucumber.class)
@CucumberOptions(
  plugin = {"pretty", "html:target/cucumber", "junit:target/cucumber.xml"},
  features = "src/test/resources/com/automationpanda/example/features",
  glue = {"com.automationpanda.example.stepdefs"})
public class PandaCucumberTest {
}

JUnit and TestNG runners can also be picked up by build management tools. For example, Maven will automatically run any runner classes named *Test.java during the test phase and *IT.java during the verify phase. Be sure to include the clean option to delete old test results. Avoid duplicate test runs by making sure runner classes do not cover the same tests – use tags to avoid duplicate coverage.

Using the Command Line Runner

Cucumber-JVM provides a CLI runner that can run feature files directly from the command line. To use it, invoke:

java cucumber.api.cli.Main

Run with “–help” to see all available options.

Using IDEs

Both JetBrains IntelliJ IDEA (with the Cucumber for Java plugin) and Eclipse (with the Cucumber JVM Eclipse Plugin) are great IDEs for Cucumber-JVM test development. They provide features for linking steps to definitions, generating definition stubs, and running tests with various options.

Cucumber Options

Cucumber options may be specified either in a runner class or from the command line as a Java system property. Set options from the command line using “-Dcucumber.options” – it will work for any java or mvn command. To see all available options, set the options to “–help”, or check the official Cucumber-JVM doc page.

The most useful option is probably the tags option. Selecting tags to run dynamically at runtime, rather than statically in runner classes, is very useful. In Cucumber-JVM 2.0, tag expressions use a basic English Boolean language:

@automated and @web
@web or @service
not @manual
(@web or @service) and (not @wip)

Older version of Cucumber-JVM used a more complicated syntax with tildes and commas.

Parallel Execution

Parallel test execution greatly reduces total start-to-end testing time. However, it requires additional machines to run tests, tools and config to handle parallel runs, and tests to be written to avoid collisions. As of version 4.0.0, Cucumber-JVM supports parallel execution out of the box. Previously, the most common way to do parallel test runs in Cucumber-JVM was to use a Maven plugin like the Cucumber-JVM Parallel Plugin or the Cucable plugin from Trivago.

[Update on 9/24/2018: Mentioned that Cucumber-JVM supports parallel execution starting with version 4.0.0.]

References

BDD 101: Unit, Integration, and End-to-End Tests

There are many types of software tests. BDD practices can be incorporated into all aspects of testing, but BDD frameworks are not meant to handle all test types. Behavior scenarios are inherently functional tests – they verify that the product under test works correctly. While instrumentation for performance metrics could be added, BDD frameworks are not intended for performance testing. This post focuses on how BDD automation works into the Testing Pyramid. Please read BDD 101: Manual Testing for manual test considerations. (Check the Automation Panda BDD page for the full table of contents.)

The Testing Pyramid

The Testing Pyramid is a functional test development approach that divides tests into three layers: unit, integration, and end-to-end.

  • Unit tests are white-box tests that verify individual “units” of code, such as functions, methods, and classes. They should be written in the same language as the product under test, and they should be stored in the same repository. They often run as part of the build to indicate immediate success or failure.
  • Integration tests are black-box tests that verify integration points between system components work correctly. The product under test should be active and deployed to a test environment. Service tests are often integration-level tests.
  • End-to-end tests are black-box tests that test execution paths through a system. They could be seen as multi-step integration tests. Web UI tests are often end-to-end-level tests.

Below is a visual representation of the Testing Pyramid:

The Testing Pyramid

The Testing Pyramid

From bottom to top, the tests increase in complexity: unit tests are the simplest and run very fast, while end-to-end require lots of setup, logic, and execution time. Ideally, there should be more tests at the bottom and fewer tests at the top. Test coverage is easier to implement and isolate at lower levels, so fewer high-investment, more-fragile tests need to be written at the top. Pushing tests down the pyramid can also mean wider coverage with less execution time. Different layers of testing mitigate risk at their optimal returns-on-investment.

Behavior-Driven Unit Testing

BDD test frameworks are not meant for writing unit tests. Unit tests are meant to be low-level, program-y tests for individual functions and methods. Writing Gherkin for unit tests is doable, but it is overkill. It is much better to use established unit test frameworks like JUnit, NUnit, and pytest.

Nevertheless, behavior-driven practices still apply to unit tests. Each unit test should focus on one main thing: a single call, an individual variation, a specific input combo; a behavior. Furthermore, in the software process, feature-level behavior specs draw a clear dividing line between unit and above-unit tests. The developer of a feature is often responsible for its unit tests, while a separate engineer is responsible for integration and end-to-end tests for accountability. Behavior specs carry a gentleman’s agreement that unit tests will be completed separately.

Integration and End-to-End Testing

BDD test frameworks shine at the integration and end-to-end testing levels. Behavior specs expressively and concisely capture test case intent. Steps can be written at either integration or end-to-end levels. Service tests can be written as behavior specs like in Karate. End-to-end tests are essentially multi-step integrations tests. Note how a seemingly basic web interaction is truly a large end-to-end test:

Given a user is logged into the social media site
When the user writes a new post
Then the user's home feed displays the new post
And the all friends' home feeds display the new post

Making a simple social media post involves web UI interaction, backend service calls, and database updates all in real time. That’s a full pathway through the system. The automated step definitions may choose to cover these layers implicitly or explicitly, but they are nevertheless covered.

Lengthy End-to-End Tests

Terms often mean different things to different people. When many people say “end-to-end tests,” what they really mean are lengthy procedure-driven tests: tests that cover multiple behaviors in sequence. That makes BDD purists shudder because it goes against the cardinal rule of BDD: one scenario, one behavior. BDD frameworks can certainly handle lengthy end-to-end tests, but careful considerations should be taken for if and how it should be done.

There are five main ways to handle lengthy end-to-end scenarios in BDD:

  1. Don’t bother. If BDD is done right, then every individual behavior would already be comprehensively covered by scenarios. Each scenario should cover all equivalence classes of inputs and outputs. Thus, lengthy end-to-end scenarios would primarily be duplicate test coverage. Rather than waste the development effort, skip lengthy end-to-end scenario automation as a small test risk, and compensate with manual and exploratory testing.
  2. Combine existing scenarios into new ones. Each When-Then pair represents an individual behavior. Steps from existing scenarios could be smashed together with very little refactoring. This violates good Gherkin rules and could result in very lengthy scenarios, but it would be the most pragmatic way to reuse steps for large end-to-end scenarios. Most BDD frameworks don’t enforce step type order, and if they do, steps could be re-typed to work. (This approach is the most pragmatic but least pure.)
  3. Embed assertions in Given and When steps. This strategy avoids duplicate When-Then pairs and ensures validations are still performed. Each step along the way is validated for correctness with explicit Gherkin text. However, it may require a number of new steps.
  4. Treat the sequence of behaviors as a unique, separate behavior. This is the best way to think about lengthy end-to-end scenarios because it reinforces behavior-driven thinking. A lengthy scenario adds value only if it can be justified as a uniquely separate behavior. The scenario should then be written to highlight this uniqueness. Otherwise, it’s not a scenario worth having. These scenarios will often be very declarative and high-level.
  5. Ditch the BDD framework and write them purely in the automation programming. Gherkin is meant for collaboration about behaviors, while lengthy end-to-end tests are meant exclusively for intense QA work. Biz roles will write behavior specs but will never write end-to-end tests. Forcing behavior specification on lengthy end-to-end scenarios can inhibit their development. A better practice could be coexistence: acceptance tests could be written with Gherkin, while lengthy end-to-end tests could be written in raw programming. Automation for both test sets could still nevertheless share the same automation code base – they could share the same support modules and even step definition methods.

Pick the approach that best meets the team’s needs.

BDD 101: Manual Testing

Behavior-driven development takes an automation-first philosophy: behavior specs should become automated tests. However, BDD can also accommodate manual testing. Manual testing has a place and a purpose, even in BDD. Remember, behavior scenarios are first and foremost behavior specifications, and they provide value beyond testing and automation. Any behavior scenario could be run as a manual test. The main questions, then, are (1) when is manual testing appropriate and (2) how should it be handled.

(Check the Automation Panda BDD page for the full BDD 101 table of contents.)

When is Manual Testing Appropriate?

Automation is not a silver bullet – it doesn’t satisfy all testing needs. Scenarios should be written for all behaviors, but they likely shouldn’t be automated under the following circumstances:

  • The return-on-investment to automate the scenarios is too low.
  • The scenarios won’t be included in regression or continuous integration.
  • The behaviors are temporary (ex: hotfixes).
  • The automation itself would be too complex or too fragile.
  • The nature of the feature is non-functional (ex: performance, UX, etc.).
  • The team is still learning BDD and is not yet ready to automate all scenarios.

Manual testing is also appropriate for exploratory testing, in which engineers rely upon experience rather than explicit test procedures to “explore” the product under test for bugs and quality concerns. It complements automation because both testing styles serve different purposes. However, behavior scenarios themselves are incompatible with exploratory testing. The point of exploring is for engineers to go “unscripted” – without formal test plans – to find problems only a user would catch. Rather than writing scenarios, the appropriate way to approach behavior-driven exploratory testing is more holistic: testers should assume the role of a user and exercise the product under test as a collection of interacting behaviors. If exploring uncovers any glaring behavior gaps, then new behavior scenarios should be added to the catalog.

How Should Manual Testing Be Handled?

Manual testing fits into BDD in much the same way as automated testing because both formats share the same process for behavior specification. Where the two ways diverge is in how the tests are run. There are a few special considerations to make when writing scenarios that won’t be automated.

Repository

Both manual and automated behavior scenarios should be stored in the same repository. The natural way to organize behaviors is by feature, regardless of how the tests will be run. All scenarios should also be managed by some form of version control.

Furthermore, all scenarios should be co-located for document-generation tools like Pickles. Doc tools make it easy to expose behavior specs and steps to everyone. They make it easier for the Three Amigos to collaborate. Non-technical people are not likely to dig into programming projects.

Tags

Scenarios must be classified as manual or automated. When BDD frameworks run tests, they need a way to exclude tests that are not automated. Otherwise, test reports would be full of errors! In Gherkin, scenarios should be classified using tags. For example, scenarios could be tagged as either “@manual” or “@automated”. A third tag, “@automatable”, could be used to distinguish scenarios that are not yet automated but are targeted for automation.

Some BDD frameworks have nifty features for tags. In Cucumber-JVM, tags can be set as runner class options for convenience. This means that tag options could be set to “~@manual” to avoid manual tests. In SpecFlow, any scenario with the special “@ignore” tag will automatically be skipped. Nevertheless, I strongly recommend using custom tags to denote manual tests, since there are many reasons why a test may be ignored (such as known bugs).

Extra Comments

The conciseness of behavior scenarios is problematic for manual testing because steps don’t provide all the information a tester may need. For example, test data may not be written explicitly in the spec. The best way to add extra information to a scenario is to add comments. Gherkin allows any number of lines for comments and description. Comments provide extra information to the reader but are ignored by the automation.

It may be tempting to simply write new Gherkin steps to handle the extra information for manual testing. However, this is not a good approach. Principles of good Gherkin should be used for all scenarios, regardless of whether or not the scenarios will be automated. High-quality specification should be maintained for consistency, for documentation tools, and for potential future automation.

An Example

Below is a feature that shows how to write behavior scenarios for manual tests:

Feature: Google Searching

  @automated
  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  @manual
  Scenario: Image search
    # The Google home page URL is: http://www.google.com/
    # Make sure the images shown include pandas eating bamboo
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

It’s not really different from any other behavior scenarios.

 

As stated in the beginning, BDD should be automation-first. Don’t use the content of this article to justify avoiding automation. Rather, use the techniques outlined here for manual testing only as needed.

 

Test Automation Myth-Busting

Test automation is a vital part of software quality assurance, now more than ever. However, it is a discipline that is often poorly understood. I’ve heard all sorts of crazy claims about automation throughout my career. This post debunks a number of commonly held but erroneous beliefs about automation.

Myth #1: Every test should be automated.

“100% automation” seems to be a new buzz-phrase. If automation is so great, why not automate every test? Not every test is worth automating in terms of return-on-investment. Automation requires significant expertise to design, implement, and maintain. There are limits to how many tests a team can reasonably produce and manage. Furthermore, not all tests are equal. Some require more effort to handle, or may not be run as frequently, or cover less important features. Just because a test could be automated does not mean that it should be automated. Using a risk-based test strategy, tests to automate should be prioritized by highest ROI.

Automated testing does not completely replace manual testing, either. Automated testing is defensive: it protects a code line by consistently running scripted tests for core functionality. However, manual testing is offensive: it uses human expertise to explore features off-script, test-to-break, and evaluate wholistic quality. Returns-on-investment for the same tests are often opposites between automated and manual approaches. Automated and manual testing together fulfill vital, complementary roles.

Myth #2: Automation means we can downsize QA.

Executives often see test automation as a way to automate QA out of a job. This is simply not true: Automation makes QA jobs more efficient and all the more necessary. Automation is software and thus requires strong software development skills. It also requires extra tools, processes, and labor to maintain. The benefit is that more tests can be run more quickly. QA jobs won’t vanish due to automation – they simply assume new responsibilities.

Myth #3: Automation will catch all bugs.

By their very nature, automated tests are “scripted” – each test always follows the same pre-programmed steps. This is a very good thing for catching regression bugs, but it inherently cannot handle new, unforeseen situations. That’s why manual, exploratory testing is needed. Automation, being software, may also have its own bugs. Automation is not a silver bullet.

Myth #4: Automation must be written in the same language as the product code.

Automation must be written in the same programming language as the product code for white-box unit tests. However, any programming language may be used for black-box functional tests. Black-box functional tests (like integration and end-to-end tests) test a live product. There’s no direct connection between the automation code and the product code. For example, a web app could have a REST service layer written in Java, a Web UI frontend written in .NET and JavaScript, and test automation written in Python using requests and Selenium WebDriver. It may be helpful to write automation in the same language as the product so that developers can more easily contribute, but it is not required. Choose the best language for test automation to meet the present needs.

Myth #5: All tests should be such-and-such-level tests.

This argument varies by product type and team. For web apps, it could be phrased as, “All tests should be web UI tests, since that’s how the user interacts with the product.” This is nonsense – different layers of testing mitigate risk at their optimal returns-on-investment. The Testing Pyramid applies this principle. Consider a web app with a service layer in terms of automation – service calls have faster response times and are more reliable than web UI interactions. It would likely be wise to test input combinations at the service layer while focusing on UI-specific functionality only at the web layer.

Myth #6: Unit tests aren’t necessary because the QA team does the testing.

The existence of a QA team or of a black-box automated test suite does not negate the need for unit tests. Unit tests are an insurance policy – they make sure the software programming is fundamentally working. In continuous integration, they make sure builds are good. They are essential for good software development. Many times, I caught bugs in my own code through writing unit tests – bugs that nobody else ever saw because I fixed them before committing code. Personally, I would never want to work on a product without strong unit tests.

Myth #7: We can complete a user story this sprint and automate its tests next sprint.

In Agile Scrum, teams face immense pressure to finish user stories within a sprint. Test automation is often the last part of a story to be done. If the automation isn’t completed by the end of the sprint, teams are tempted to mark the story as complete and finish the test automation in the future. This is a terrible mistake and a violation of Agile principles. Test automation should be included in the definition of done. A story isn’t complete without its prescribed tests! Punting tests into the next sprint merely builds technical debt and forces QA into constant catch-up. To mitigate the risk of incomplete stories, teams should size stories to include automation work, shift left to start QA sooner, or reduce the total sprint commitment size. Incomplete test automation often happens when product code is delivered late or a team’s capacity is overestimated.

Myth #8: Automation is just a bunch of “test scripts.”

It’s quite common to hear developers or managers refer to automated tests as “test scripts.” While this term itself is not inherently derogatory, it oversimplifies the complexity of test automation. “Scripts” sound like short, hacky sequences of commands to do system dirty-work. Test automation, however, is a full stack: in addition to the product under test, automation involves design patterns, dependency packages, development processes, version control, builds, deployments, reporting, and failure triage. Referring to test automation as “scripting” leads to chronic planning underestimations. Automation is a discipline, and the investment it requires should be honored.

 

Do you have any other automation myths to debunk? Share them in the comment section below!

BDD 101: Test Data

How should test data be handled in a behavior-driven test framework? This is a common question I hear from teams working on BDD test automation. A better question to ask first is, What is test data? This article will explain different types of test data and provide best practices for handling each. The strategies covered here can be applied to any BDD test framework. (Check the Automation Panda BDD page for the full table of contents.)

Types of Test Data

Personally, I hate the phrase “test data” because its meaning is so ambiguous. For functional test automation, there are three primary types of test data:

  1. Test Case Values. These are the input and expected output values for test cases. For example, when testing calculator addition “1 + 2 = 3”, “1” and “2” would be input values, and “3” would be the expected output value. Input values are often parameterized for reusability, and output values are used in assertions.
  2. Configuration Data. Config data represents the system or environment in which the tests run. Changes in config data should allow the same test procedure to run in different environments without making any other changes to the automation code. For example, a calculator service with an addition endpoint may be available in three different environments: development, test, and production. Three sets of config data would be needed to specify URLs and authentication in each environment (the config data), but 1 + 2 should always equal 3 in any environment (the test case values).
  3. Ready State. Some tests require initial state to be ready within a system. “Ready” state could be user accounts, database tables, app settings, or even cluster data. If testing makes any changes, then the data must be reverted to the ready state.

Each type of test data has different techniques for handling it.

Test Case Values

There are 4 main ways to specify test case values in BDD frameworks, ranging from basic to complex.

In The Specs

The most basic way to specify test case values is directly within the behavior scenarios themselves! The Gherkin language makes it easy – test case values can be written into the plain language of a step, as step parameters, or in Examples tables. Consider the following example:

Scenario Outline: Simple Google searches
  Given a web browser is on the Google page
  When the search phrase "<phrase>" is entered
  Then results for "<phrase>" are shown
  
  Examples: Animals
    | phrase   |
    | panda    |
    | elephant |
    | rhino    |

The test case value used is the search phrase. The When and Then steps both have a parameter for this phrase, which will use three different values provided by the Examples table. It is perfectly suitable to put these test case values directly into the scenario because the values are small and descriptive.

Furthermore, notice how specific result values are not specified for the Then step. Values like “Panda Express” or “Elephant man” are not hard-coded. The step wording presumes that the step definition will have some sort of programmed mechanism for checking that result links relate to the search phrase (likely through regular expression matching).

Key-Value Lookup

Direct specification is great for small sets of simple values, but one size does not fit all needs. Key-value lookups are appropriate when test data is lengthier. For example, I’ve often seen steps like this:

Given the user navigates to "http://www.somewebsite.com/long/path/to/the/profile/page"

URLs, hexadecimal numbers, XML blocks, and comma-separated lists are all the usual suspects. While it is not incorrect to put these values directly into a step parameter, something like this would be more readable:

Given the user navigates to the "profile" page

Or even:

Given the user navigates to their profile page

The automation would store URLs in a lookup table so that these new steps could easily fetch the URL for the profile page by name. These steps are also more declarative than imperative and better resist changes in the underlying environment.

Another way to use key-value lookup is to refer to a set of values by one name. Consider the following scenario for entering an address:

Scenario Outline: Address entry
  Given the profile edit page is displayed
  When the user sets the street address to "<street>"
  And the user sets the second address line to "<second>"  
  And the user sets the city to "<city>"
  And the user sets the state to "<state>"
  And the user sets the zipcode to "<zipcode>"
  And the user sets the country to "<country>"
  And the user clicks the save button
  Then ...

  Examples: Addresses
    | street | second | city | state | zipcode | country |
    ...

An address has a lot of fields. Specifying each in the scenario makes it very imperative and long. Furthermore, if the scenario is an outline, the Examples table can easily extend far to the right, off the page. This, again, is not readable. This scenario would be better written like this:

Scenario Outline: Address entry
  Given the profile edit page is displayed
  When the user enters the "<address-type>" address
  And the user clicks the save button
  Then ...

  Examples: Addresses
    | address-type |
    | basic        |
    | two-line     |
    | foreign      |

Rather than specifying all the values for different addresses, this scenario names the classifications of addresses. The step definition can be written to link the name of the address class to the desired values.

Data Files

Sometimes, test case values should be stored in data files apart from the specs or the automation code. Reasons could be:

  • The data is simply too large to reasonably write into Gherkin or into code.
  • The data files may be generated by another tool or process.
  • The values are different between environments or other circumstances.
  • The values must be selected or switched at runtime (without re-compiling code).
  • The files themselves are used as payloads (ex: REST request bodies or file upload).

Scenario steps can refer to data files using the key-value lookup mechanisms described above. Lightweight, text-based, tabular file formats like CSV, XML, or JSON work the best. They can parsed easily and efficiently, and changes to them can easily be diff’ed. Microsoft Excel files are not recommended because they have extra bloat and cannot be easily diff’ed line-by-line. Custom text file formats are also not recommended because custom parsing is an extra automation asset requiring unnecessary development and maintenance. Personally, I like using JSON because its syntax is concise and its parsing tools seem to be the simplest in most programming languages.

External Sources

An external dependency exists when the data for test case values exists outside of the automation code base. For example, test case values could reside in a database instead of a CSV file, or they could be fetched from a REST service instead of a JSON file. This would be appropriate if the data is too large to manage as a set of files or if the data is constantly changing.

As a word of caution, external sources should be used only if absolutely necessary:

  1. External sources introduce an additional point-of-failure. If that database or service goes down, then the test automation cannot run.
  2. External sources degrade performance. It is slower to get data from a network connection than from a local machine.
  3. Test case values are harder to audit. When they are in the specs, the code, or data files, history is tracked by version control, and any changes are easy to identify in code reviews.
  4. Test case values may be unpredictable. The automation code base does not control the values. Bad values can fail tests.

External sources can be very useful, if not necessary, for performance / stress / load / limits testing, but it is not necessary for the vast majority of functional testing. It may be convenient to mock external sources with either a mocking framework like Mockito or with a dummy service.

Configuration Data

Config data pertain to the test environments, not the test cases. Test automation should never contain hard-coded values for config data like URLs, usernames, or passwords. Rather, test automation should read config data when it launches tests and make references to the required values. This should be done in Before hooks and not in Gherkin steps. In this way, automated tests can run on any configuration, such as different test environments before being released to production.

Config data can be stored in data files or accessed through some other dependency. (Read the previous section for pros and cons of those approaches.) The config to use should be somehow dynamically selectable when tests run. For example, the path to the config file to use could be provided as a command line argument to the test launch command.

Config data can be used to select test values to use at runtime. For example, different environments may need different test value data files. Conversely, scenario tagging can control what parts of config data should be used. For example, a tag could specify a username to use for the scenario, and a Before hook could use that username to fetch the right password from the config data.

For efficiency, only the necessary config data should be accessed or read into memory. In many cases, fetching the config data should also be done once globally, rather than before each test case.

Ready State

All scenarios have a starting point, and often, that starting point involves data. Setup operations must bring the system into the ready state, and cleanup operations must return the system to the ready state. Test data should leave no trace – temporary files should be deleted and records should be reverted. Otherwise, disk space may run out or duplicate records may fail tests. Maintaining the ready state between tests is necessary for true test independence.

During the Test Run

Simple setup and cleanup operations may be done directly within the automation. For example, when testing CRUD operations, records must be created before they can be retrieved, updated, or deleted. Setup would create a record, and cleanup would guarantee the record’s deletion. If the setup is appropriate to mention as part of the behavior, then it should be written as Given steps. This is true of CRUD operations: “Given a record has been created, When it is deleted, …”. If multiple scenarios share this same setup, then those Given steps should be put into a Background section.

However, sometimes setup details are not pertinent to the behavior at hand. For example, perhaps fresh authentication tokens must be generated for those CRUD calls. Those operations should be handled in Before hooks. The automation will take care of it, while the Gherkin steps can focus exclusively on the behavior.

No matter what, After hooks must do cleanup. It is incorrect to write final Then steps to do cleanup. Then steps should verify outcomes, not take more actions. Plus, the final Then steps will not be run if the test has a failure and aborts!

External Preparation

Some data simply takes too long to set up fresh for each test launch. Consider complicated user accounts or machine learning data: these are things that can be created outside of the test automation. The automation can simply presume that they exist as a precondition. These types of data require tool automation to prepare. Tool automation could involve a set of scripts to load a database, make a bunch of service calls, or navigate through a web portal to update settings. Automating this type of setup outside of the test automation enables engineers to more easily replicate it across different environments. Then, tests can run in much less time because the data is already there.

However, this external preparation must be carefully maintained. If any damage is done to the data, then test case independence is lost. For example, deleting a user account without replacing it means that subsequent test runs cannot log in! Along with setup tools, it is important to create maintenance tools to audit the data and make repairs or updates.

Advice for Any Approach

Use the minimal amount of test data necessary to test the functionality of the product under test. More test data requires more time to develop and manage. As a corollary, use the simplest approach that can pragmatically handle the test data. Avoid external dependencies as much as possible.

To minimize test data, remember that BDD is specification by example: scenarios should use descriptive values. Furthermore, variations should be reduced to input equivalence classes. For example, in the first scenario example on this page, it would probably be sufficient to test only one of those three animals, because the other two animals would not exhibit any different searching behavior.

Finally, be cautioned against randomization in test data. Functional tests are meant to be deterministic – they must always pass or fail consistently, or else test results will not be reliable. (Not only could this drive a tester crazy, but it would also break a continuous integration system.) Using equivalence classes is the better way to cover different types of inputs. Use a unique number counting mechanism whenever values must be unique.

For handling unpredictable test data, check out Unpredictable Test Data.