To Infinity and Beyond: A Guide to Parallel Testing

Are your automated tests running in parallel? If not, then they probably should be. Together with continuous integration, parallel testing the best way to fail fast during software development and ultimately enforce higher software quality. Switching tests from serial to parallel execution, however, is not a simple task. Tests themselves must be designed to run concurrently without colliding, and extra tools and systems are needed to handle the extra stress. This article is a high-level guide to good parallel testing practices.

What is Parallel Testing?

Parallel testing means running multiple automated tests simultaneously to shorten the overall start-to-end runtime of a test suite. For example, if 10 tests take a total of 10 minutes to run, then 2 parallel processes could execute 5 tests each and cut the total runtime down to 5 minutes. Even better, 10 processes could execute 1 test each to shrink runtime to 1 minute. Parallel testing is usually managed by either a test framework or a continuous integration tool. It also requires more compute resources than serial testing.

Why Go Parallel?

Running automated tests in parallel does require more effort (and potentially cost) than running tests serially. So, why go through the trouble?

The answer is simple: time. It is well documented that software bugs cost more when they are discovered later. That’s why current development practices like Agile and BDD strive to avoid problems from the start through small iterations and healthy collaboration (“shift left“), while CI/CD defensively catches regressions as soon as they happen (“fail fast“). Reducing the time to discover a problem after it has been introduced means higher quality and higher productivity.

Ideally, a developer should be told if a code change is good or bad immediately after committing it. The change should automatically trigger a new build that runs all tests. Unfortunately, tests are not instantaneous – they could take minutes, hours, or even days to complete. A test automation strategy based on the Testing Pyramid will certainly shorten start-to-end execution time but likely still require parallelization. Consider the layers of the Testing Pyramid and their tests’ average runtimes, the Testing Pyramid Rule of 1’s:

The Testing Pyramid with Times

Each layer is listed above with the rough runtime of a typical test. Though actual runtimes will vary, the Rule of 1’s focuses on orders of magnitude. Unit tests typically run in milliseconds because they often exercise product code in memory. Integration tests exercise live products but are limited in scope and often cover low-level areas (like REST service calls). End-to-end tests, however, cover full paths through a live system, which requires extra setup and waiting (like Selenium WebDriver interaction).

Now, consider how many tests from each layer could be run within given time limits, if the tests are run serially:

Test Layer 1 Minute
10 Minutes
Coffee Break
1 Hour
There Goes Today
Unit 60,000 600,000 3,600,000
Integration 60 600 3,600
End-to-End 1 10 60

Unit test numbers look pretty good, though keep in mind 1 millisecond is often the best-case runtime for a unit test. Integration and end-to-end runtimes, however, pose a more pressing problem. It is not uncommon for a project to have thousands of above-unit tests, yet not even a hundred end-to-end tests could complete within an hour, nor could a thousand integration tests complete within 10 minutes. Now, consider two more facts: (1) tests often run as different phases in a CI pipeline, to total runtimes are stacked, and (2) multiple commits would trigger multiple builds, which could cause a serious backup. Serial test execution would starve engineering feedback in any continuous integration system of scale. A team would need to drastically shrink test coverage or give up on being truly “continuous” in favor of running tests daily or weekly. Neither alternative is acceptable these days. CI needs parallel testing to be truly continuous.

The Danger of Collisions

The biggest danger for parallel testing is collision – when tests interfere with each other, causing invalid test failures. Collisions may happen in the product under test if product state is manipulated by more than one test at a time, or they may happen in the automation code itself if the code is not thread-safe. Collisions are also inherently intermittent, which makes them all the more difficult to diagnose. As a design principle, automated tests must avoid collisions for correct parallel execution.

Making tests run in parallel is not as simple as flipping a switch or adding a new config file. Automated tests must be specifically designed to run in parallel. A team may need to significantly redevelop their automation code to make parallel execution work right.


A train collision in Iran in November 2016. Don’t let this happen to your tests!

Handling Product-Level Collisions

Product-level collisions essentially reduce to how environments are set up and handled.

Separate Environments

The most basic way to avoid product-level collisions would be to run each test thread or process against its own instance of the product in an exclusive environment. (In the most extreme case, every single test could have its own product instance.) No collisions would happen in the product because each product instance would be touched by only one test instance at a time. Separate environments are possible to implement using various configuration and deployment tools. Docker containers are quick and easy to spin up. VMs with Vagrant, Puppet, Chef, and/or Ansible can also get it done.

However, it may not always be sensible to make separate environments for each test thread/process:

  • Creating a new environment is inefficient – it takes extra time to set up that may cancel out any time saved from parallel execution.
  • Many projects simply don’t have the money or the compute resources to handle a massive scale-out.
  • Some tests may not cause collisions and therefore may not need total isolation.
  • Some product environments are extremely large and complicated and would not be practical to replicate for each test individually.

Shared Environments

Environments with a shared product instance are quite common. One could be a common environment that everyone on a team shares, or one could be freshly created during a CI run and accessed by multiple test threads/processes. Either way, product-level collisions are possible, and tests must be designed to avoid clashing product states. Any test covering a persistent state is vulnerable; usually, this is the vast majority of tests. Consider web app testing as an example. Tests to load a page and do some basic interactions can probably run in parallel without extra protection, but tests that use a login to enter data or change settings could certainly collide. In this case, collisions could be avoided by using different logins for each simultaneous test instance – by using either a pool of logins, a unique login per test case, or a unique login per thread/process. Each product is different and will require different strategies for avoiding collisions.


We all share certain environments. Take care of them when you do. (Photo: The Blue Marble, taken by the Apollo 17 crew on Dec 7, 1972)

Handling Automation-Level Collisions

Automation-level collisions can happen when automation code is not thread-safe, which could mean more than simply locks and semaphores.

#1: Test Independence

Test cases must be completely independent of each other. One test must not require another test to run before it for the sake of setup. A test case should be able to run by itself without any others. A test suite should be able to run successfully in random order.

#2: Proper Variable Scope

If parallel tests will be run in the same memory address space, then it is imperative to properly scope all variables. Global or static mutable variables (e.g., “non-constants”) must not be allowed because they could be changed unexpectedly. The best pattern for handling scope is dependency injection. Thread-safe singletons would be a second choice. (Typically, global or static variables are used to subvert design patterns, so they may reveal further necessary automation rework when discovered.)

#3: External Resources

Automation may sometimes interact with external resources, such as test config files or test result databases/services. Make sure no external interactions collide. For example, make sure test run updates don’t overwrite each other.

#4: Logging

Logs are very difficult to trace when multiple tests are simultaneously printed to the same file. The best practice is to generate separate log files for each test case, thread, or process to make them readable.

#5: Result Aggregation

A test suite is a unified collection of tests, no matter how many threads/processes are used to run its tests in parallel. Make sure test results are aggregated together into one report. Some frameworks will do this automatically, while others will require custom post-processing.

#6: Test Filtering

One strategy to avoid collisions may be to run non-colliding partitions (subsets) of tests in parallel. Test tagging and filtering would make this possible. For example, tests that require a special login could be tagged as such and run together on one thread.

Test Scalability

The previous section on collisions discussed how to handle product environments. It is also important to consider how to handle the test automation environment. These are two different things: the product environment contains the live product under test, while the test environment contains the automation software and resources that run tests against the product. The test environment is where the parallel tests will be executed, and, as such, it must be scalable to handle the parallelization. A common example of a test environment could be a Jenkins master with a few agents for running build pipelines. There are two primary ways to scale the test environment: scale-up and scale-out.

Parallel Scale-Up

Scale-up is when one machine is configured to handle more tests in parallel. For example, scale-up would be when a machine switches from one (serial) thread to two, three, or even more in parallel. Many popular test runners support this type of scale-up by spawning and joining threads in a common memory address space or by forking processes. (For example, the SpecFlow+ Runner lets you choose.)

Scale-up is a simple way to squeeze as much utility out of an existing machine as possible. If tests are designed to handle collisions, and the test runner has out-of-the-box support, then it’s usually pretty easy to add more test threads/processes. However, parallel test scale-up is inherently limited by the machine’s capacity. Each additional test process succumbs to the law of diminishing returns as more memory and processor cycles are used. Eventually, adding more threads will actually slow down test execution because the processor(s) will waste time constantly switching between tests. (Anecdotally, I found the optimal test-thread-to-processor ratio to be 2-to-1 for running C#/SpecFlow/Selenium-WebDriver tests on Amazon EC2 M4 instances.) A machine itself could be upgraded with more threads and processors, but nevertheless, there are limits to a single machine’s maximum capacity. Weird problems like TCP/IP port exhaustion may also arise.

Scale Up

Scale-up adds more threads to one machine.

Parallel Scale-Out

Scale-out is when multiple machines are configured to run tests in parallel. Whereas scale-up had one machine running multiple tests, scale-out has multiple machines each running tests. Scale-out can be achieved in a number of ways. A few examples are:

  • One master test execution machine launches multiple Web UI tests that each use a remote Selenium WebDriver with a service like Selenium Grid, Sauce Labs, or BrowserStack.
  • A Jenkins pipeline launches tests across ten agents in parallel, in which each agent executes a tenth of the tests independently.

Scale-out is a better long-term solution than scale-up because scale-out can handle an unlimited number of machines for parallel testing. The limiting factor with scale-out is not the maximum capacity of the hardware but rather the cost of running more machines. However, scale-out is much harder to implement than scale-up. It requires tests to be evenly divided with some sort of balancer and filter. It also requires some sort of test result aggregation for joint reporting – people won’t want to piece together a bunch of separate reports to get an overall snapshot of quality. Plus, the test environment is more complicated to build and maintain (though tools like CloudBees Jenkins Enterprise or Amazon EC2 can make it easier.)

Scale Out

Scale-out distributes tests across multiple machines.

Upwards and Outwards

Of course, scale-up and scale-out are not mutually exclusive. Scaled-out nodes could individually be scaled-up. Consider a test environment with 10 powerful VMs that could each handle 10 tests in parallel – that means 100 tests could run simultaneously. Using the Rule of 1’s, it would take only about a minute to run 100 Web UI tests, which serially would have taken over an hour and a half! Use both strategies to shorten start-to-end runtime as much as possible.


Parallel testing is a worthwhile endeavor. When done properly, it will not only reduce development time but also improve the development experience. For readers who want to start doing parallel testing, I recommend researching the tools and frameworks you want to use. Many popular test frameworks support parallel execution, and even if the one you choose doesn’t, you can always invoke tests in parallel from the command line. Do well!

The Spark: What Makes Coders Great

I first started programming back in 2002. In fact, I stumbled into it unintentionally. My high school, Parkville High School, required all students in their Magnet Program for Mathematics, Science, and Computer Science to have a TI-83 Plus graphing calculator. As an incoming freshman, a graphing calculator was a big luxury for me – I spent my entire 8th grade Algebra I class without one, completing all problems by hand. Embarrassingly, it took me ten minutes to figure out how to turn it off the first time! I presumed that my new calculator would be used exclusively for math classes, but in the first few weeks of my computer class, we started programming the calculator. I’m sure we wrote some sort of basic “Hello World” print program, but we quickly moved onto programming math formulas.

It blew my mind.

At the time, I didn’t know what “computer programming” was. In fact, I didn’t even like computers very much. And yet there I was, thirteen years old, telling a calculator how to automatically solve my math problems for me. It was one of the greatest thrills I had ever experienced. I could command a machine to do cool things and make my life easier. I quickly started writing programs outside of class for every math formula I could find: areas, volumes, circumferences, the Quadratic Formula, the Pythagorean Theorem; and I shared my formulas with my classmates. Then, I moved onto calculator games. At the end of the year, our class started programming simple graphics and animations in C++, for which I made a fireworks show.

Something just clicked for me and coding. It all made sense. I could solve any problem. I could make any feature. I could teach myself how to do anything. And doing it brought on this wonderful euphoria – the “coder’s high” – even stronger than the feelings of Christmas morning or playing video games. For me, coding was practically addictive.

When my mom told me that there were well-paying careers in software, I never looked back. I took my first Java programming course as a sophomore (which I still consider my “mother language”) and then AP Computer Science AB as a junior (which was the first year it was offered in Java; I scored a 5/5). I went to RIT for college, where I graduated with a combined BS/MS in Computer Science in 2010. The rest is my professional history.

There were many times along the way that I doubted my path. There were times in high school that my code simply wouldn’t compile or run and I had no idea why (in the dark days before Stack Overflow). There were times at RIT when I felt like the most computer-illiterate student in my sink-or-swim program, and I even considered switching to a math major. There were times when the corporate grind was so tough I considered dropping back into academia. But, every time I doubted myself, I remembered what inspired me to pursue software at the beginning – the spark. The click. The it factor. The undeniable tenacity in my soul to solve real-world problems with elegance and efficiency through the sheer power of logical processes. I’m convinced that one of God’s greatest gifts to me has been the software spark. Though tempted, I have never wavered in my vocational clarity.

I’m not the only one who’s experienced the “spark,” either. In fact, it has consistently been my litmus test for identifying truly great coders. Many people have recounted nearly identical stories to me of how they first got into software – they were hooked at “Hello World.” I’ve heard people say things like, “I didn’t want to do it at first, but I discovered it was the coolest thing ever!” or, “What I love is that you can do anything!” or, “It seemed so basic and almost stupid, but it was so awesome!” Inversely, I’ve seen those who lack the spark struggle tremendously with computing and software. And it’s not a matter of grit or intelligence – these often are smart, hard-working people who just lack that X-factor.

Now, please do not misunderstand me by thinking that it’s impossible for those without the spark to be successful in software. I’m not trying to be elitist or condescending. Rather, based on my experience, I’ve seen the spark to be the single greatest determining factor in what makes someone naturally talented at programming. Furthermore, having the spark doesn’t make the journey easy. A career in software still requires grit and elbow grease. The challenges are tough. Having the spark simply makes it worthwhile.

If you have the spark, you’ll be able to overcome any software obstacle. I encourage you to go do awesome things. And never, ever give up!


This post is dedicated to my parents, who always supported my software aspirations from the very beginning.


Missing Error Messages with Angular Testing

Logs are an essential part of test automation – they leave a trace of execution that is indispensable when backtracking through failures. Missing logs can make it much, much harder to figure out problems in the code. Recently, I hit this problem while writing unit tests for an Angular project: neither the console nor Google Chrome’s debugger showed any helpful error messages! Thankfully, there was a pretty easy solution. This article will explain the problem and the solution.

Update (January 18, 2018):

After further research, it appears that this problem was fixed in the @angular/cli 1.3.x release. I updated to 1.3.2, removed the “–sourcemaps=false” option, and verified that the error messages are printed. Furthermore, the source mapping is correct – the errors map to the correct line and column in the sources files!

If you are stuck using a version prior to 1.3.x, then use the workaround detailed below. Otherwise, upgrade the package and avoid the problem altogether!


Disable source maps when running Angular tests:

$ ng test --sourcemaps=false

Angular Project Setup

This article presumes the standard Angular 4 project setup, as automatically generated by the “ng new” command. Jasmine unit tests are written in “*.spec.ts” files and run with Karma using Google Chrome as the browser.

The Problem

The Angular testing utilities provide great support for isolating and exercising parts of Angular code for unit testing. However, programmers need to use them properly, or else they won’t work. When I tried writing some unit tests for ngrx, I quickly hit dependency problems. However, it took me hours to figure it out because the console output was not helpful – all it would print was “ERROR”:

Angular Test Errors 1

As a newbie, I had no idea what went wrong. I tried debugging with Chrome, but the error message I got there was cryptic and not much more helpful:

Angular Test Errors 2

The Solution

After googling for a while, I discovered that there is a bug with source maps in the Angular CLI (Issue #7296). The workaround is to add the “–sourcemaps=false” option to the “ng test” command. If the package.json file contains a “test” script that calls “ng test”, the option may be added there. Now, the console prints error messages:

Angular Test Errors 3

Errors also appear on the Karma page in the browser:

Angular Test Errors 4

One side effect of this workaround, however, is that the line and column numbers don’t correctly line up to the TypeScript files. I presume that they map to the compiled JavaScript files instead. Nevertheless, error messages with wrong line numbers are better than no error messages at all. There may be a way to fix the source mapping, but that’s a problem for another day. Hopefully, the Angular team will fix this “feature” for us.

Now, time to go fix those test errors!

Debugging Angular Apps through Visual Studio Code

Angular is a great front-end framework for web apps. Visual Studio Code is a great source code editor. Their powers combined let you not only develop Angular app code but also debug it through the editor! VS Code debugging even works for TypeScript.

The Basic Guide

To set up debugging, simply follow the steps in the Debugging Angular section of the official Using Angular in VS Code guide. (This guide is really helpful for other VS Code Angular topics, too.) The basic steps are:

  1. Make sure VS Code, Google Chrome, and all the Angular parts are already installed.
  2. Install the Debugger for Chrome extension in VS Code.
  3. Create a launch.json config file (by clicking the gear icon in the Debug view).
  4. Set an appropriate config spec in the .vscode/launch.json file (example below).
  5. Set breakpoints in the editor.
  6. Launch the Angular app separate from the debugger (such as by running “ng serve” from the command line).
  7. Run the VS Code debugger “launch” job against the app (by clicking the green arrow in the Debug view).

The launch.json file should look like this, with values changed to reflect your environment:

    "version": "0.2.0",
    "configurations": [
            "type": "chrome",
            "request": "launch",
            "name": "Launch Chrome against localhost",
            "url": "http://localhost:4200",
            "webRoot": "${workspaceFolder}"
            "type": "chrome",
            "request": "attach",
            "name": "Attach to Chrome",
            "port": 9222,
            "webRoot": "${workspaceFolder}"

Note that the app must already be running before the debugger is launched! (This point is not entirely clear in the official guide.) The debugger will launch the Google Chrome browser and load the URL provided in the launch.json config. Any time execution hits a breakpoint, execution will stop and let VS Code step through it.

The original guide provides screen shots to better illustrate these steps. Please follow it for more precise steps.

Browser Options

Microsoft publishes the Debugger for Chrome and Debugger for Edge extensions for this sort of debugging. It looks like other non-Microsoft VS Code extensions are available for Firefox, PhantomJS, and Safari on iOS, but the launch.json config looks different.

Debugger Config and Source Control

Typically, it’s a best practice to avoid committing user-specific config files to source control. One user’s settings could conflict with another’s, potentially breaking workspaces. Personally, I would caution against submitting anything in the .vscode directory to source control unless (a) everyone on the team uses VS Code exclusively for the project and (b) the config file entries are usable by everyone on the team.

Please Hang Up and Dial Again: Handling Test Interruptions in CI/CD

This post was originally published by Sealights on December 19, 2017 as part of their article Test Quality in CI/CD – Expert Roundup. I was honored to contribute my thoughts on automatic recovery in test automation, and I reblogged the text of my contribution here for Automation Panda readers. Please check out contributions from other experts in the full article!

Test automation is an essential part of CI/CD, but it must be extremely robust.
Unfortunately, tests running in live environments (integration and end-to-end)
often suffer rare but pesky “interruptions” that, if unhandled, will cause tests to fail.
These interruptions could be network blips, web pages not fully loaded, or
temporarily downed services – any environment issues unrelated to product bugs.
Interruptive failures are problematic because they (a) are intermittent and thus
difficult to pinpoint, (b) waste engineering time, (c) potentially hide real failures,
and (d) cast doubt over process/product quality. Furthermore, CI/CD magnifies
even rare issues. If an interruption has only a 1% chance of happening during a test,
then considering binomial probabilities, there is a 63% chance it will happen after
100 tests, and a 99% chance it will happen after 500 tests. Keep in mind that it is not
uncommon for thousands of tests to run daily in CI – Google Guava had over 286K
tests back in July 2012!

It is impossible to completely avoid interruptions – they will happen. Therefore, it is
imperative to handle interruptions at multiple layers:

  1. First, secure the platform upon which the tests run. Make sure system
    performance is healthy and that network connections are stable.
  2. Second, add failover logic to the automated tests. Any time an interruption
    happens, catch it as close to its source as possible, pause briefly, and retry the
    operation(s). Do not catch any type of error: pinpoint specific interruption
    signatures to avoid false positives. Build failover logic into the framework
    rather than implementing it for one-off cases. For example, wrappers around web element or service calls could automatically perform retries. Aspect-
    oriented programming can help here tremendously. Repeating failed tests in their entirety also works and may be easier to implement but takes much
    more time to run.
  3. Third, log any interruptions and recovery attempts as warnings. Do not
    neglect to report them because they could indicate legitimate problems,
    especially if patterns appear.

It may be difficult to differentiate interruptions from legitimate bugs. Or, certain
retry attempts might take too long to be practical. When in doubt, just fail the test –
that’s the safer approach.

Jenkins Declarative Pipeline Resources

This post is intended to be a quick personal reference for Jenkins Pipelines so I don’t forget things I learned or lose links to valuable info. Feel free to recommend additional resources!

Today, a few of my LexisNexis coworkers and I went to the CloudBees office down the street (since we are both located at NCSU Centennial Campus) for a Jenkins Pipeline workshop. I’ve used Jenkins for a few years now, and I handle my team’s freestyle projects for running .NET/SpecFlow/Selenium automated tests, but the declarative pipeline style for Jenkins jobs is new to me. (I feel so behind the times.) I’m glad I attended the workshop because I learned a few cool things.

Below are links to helpful resources for learning about Jenkins Pipelines:

Pipelines are definitely a major improvement over freestyle projects:

  • They make it much easier to chain tasks together.
  • They are written in code (a Groovy-like DSL) and can support advanced logic.
  • They can be managed by source control (like Git).
  • They keep running even when the Jenkins master goes down.
  • Stages can be paused to wait for user input.
  • The DSL can be extended for custom steps.

I can’t wait to rewrite my team’s jobs!

Pipe Character Escape for Gherkin Tables

For the first time today, I had to write a Gherkin behavior scenario in which table text needed to use the pipe character “|”. I wanted a generic step that would find and click web page links by name, and one of the link names had the pipe in it! The first version of the step I wrote looked like this:

When the user follows the links:
  | link              |
  | Category          |
  | Sub-Category      |
  | Index|Description |

Naturally, this step didn’t parse – the “|” was parsed as a table delimiter instead of the intended link text. I could have rewritten the step to search for partial link text, or I could have done a key-value lookup, but I wanted to keep the step simple and direct.

The solution was simple: escape the pipe character “|” with a backslash character “\”. Easy! Thanks, StackOverflow! The updated table looks like this:

When the user follows the links:
 | link               |
 | Category           |
 | Sub-Category       |
 | Index\|Description |

“\|” works for both step tables and scenario outline example tables. It looks like it is fairly standard for test frameworks that use Gherkin. I verified that Cucumber-JVM and SpecFlow support it, and it looks like Cucumber for Ruby does as well. It looks like behave will support it in 1.2.6.

After learning this trick, I updated the BDD 101: The Gherkin Language page.

Note that backslash escape sequences won’t work for quotes in Gherkin steps. Quotes in steps are merely conventions and not part of the Gherkin language standard.