test data

Managing the Test Data Nightmare

On April 22, 2021, I delivered a talk entitled “Managing the Test Data Nightmare” at SauceCon 2021. SauceCon is Sauce Labs’ annual conference for the testing community. Due to the COVID-19 pandemic, the conference was virtual, but I still felt a bit of that exciting conference buzz.

My talk covers the topic of test data, which can be a nightmare to handle. Data must be prepped in advance, loaded before testing, and cleaned up afterwards. Sometimes, teams don’t have much control over the data in their systems under test—it’s just dropped in, and it can change arbitrarily. Hard-coding values into tests that reference system tests can make the tests brittle, especially when running tests in different environments.

In this talk, I covered strategies for managing each type of test data: test case variations, test control inputs, config metadata, and product state. I also covered how to “discover” test data instead of hard-coding it, how to pass inputs into automation (including secrets like passwords), and how to manage data in the system. After watching this talk, you can wake up from the nightmare and handle test data cleanly and efficiently like a pro!

Here are some other articles I wrote about test data:

As usual, I hit up Twitter throughout the conference. Here are some action shots:

Many thanks to Sauce Labs and all the organizers who made SauceCon 2021 happen. If SauceCon was this awesome as a virtual event, then I can’t wait to attend in person (hopefully) in 2022!

Unpredictable Test Data

Test data is a necessary evil for testing and automation. It is necessary because tests simply can’t run without test case values, configuration data, and ready state (as detailed in BDD 101: Test Data). It is evil because it is challenging to handle properly. Test data may be even more dastardly when it is unpredictable, but thankfully there are decent strategies for handling unpredictability.

What is Unpredictable Test Data?

Test data is unpredictable when its values are not explicitly the same every time a test runs. For example, let’s suppose we are writing tests for a financial system that must obtain stock quotes. Mocking a stock quote service with dummy predictable data would not be appropriate for true integration or end-to-end tests. However, stock quotes act like random walks: they change values in real time, often perpetually. The name “unpredictable” could also be “non-deterministic” or “uncertain.”

Below are a few types of test data unpredictability:

  • Values may be missing, mistyped, or outside of expected bounds.
  • Time-sensitive data may change rapidly.
  • Algorithms may yield non-deterministic results (like for machine learning).
  • Data formats may change with software version updates.
  • Data may have inherent randomness.

Strategies for Handling Unpredictability

Any test data is prone to be unpredictable when it comes from sources external to the automation codebase. Test must be robust enough to handle the inherent unpredictability. Below are 5 strategies for safety and recovery. The main goal is test completion – pass or fail, tests should not crash and burn due to bad test data. When in doubt, skip the test and log warnings. When really in doubt, fail it as a last resort.

Automation’s main goal is to complete tests despite unpredictability in test data.

#1: Make it Predictable

Ask, is it absolutely necessary to fetch data from unpredictable sources? Or can they be avoided by using predictable, fake data? Fake data can be provided in a number of ways, like mocks or database copies. It’s a tradeoff between test reliability and test coverage. In a risk-based test strategy, the additional test coverage may not be worthwhile if all input cases can be covered with fake data. Nevertheless, unpredictable data sometimes cannot or should not be avoided.

#2: Write Defensive Assertions

When reading data, make assertions to guarantee correctness. Assertions are an easy way to abort a test immediately if any problems are found. Assertions could make sure that values are not null, contain all required pieces, and fit the expected format.

#3: Handle Healthy Bounds

Tests using unpredictable data should be able to handle acceptable ranges of values instead of specific pinpointed values. This could mean including error margins in calculations or using regular expressions to match strings. Assertions may need to do some extra preliminary processing to handle ranges instead of singular values. Any anomalies should be reported as warnings.

For the stock quote example, the following would be ways to handle healthy bounds:

  • Abort if the quote value is non-numeric or negative.
  • Warn if the value is $0 or greater than $1M.
  • Continue for values between $0 and $1M.

#4: Scrub the Data

Sometimes, data problems can be “scrubbed” away. Formats can be fixed, missing values can be populated, and given values can be adjusted or filtered. Scrubbing data may not always be appropriate, but if possible, it can mean a test will be completed instead of aborted.

#5: Do Retries

Data may need to be fetched again if it isn’t right the first time. Retries are applicable for data that changes frequently or is random. The automation framework should have a mechanism to retry data access after a waiting period. Set retry limits and wait times appropriately – don’t waste too much time. Retries should also be done as close to the point of failure as possible. Retrying the whole test is possible but not as efficient as retrying a single service call.

Final Advice

Unpredictable test data shouldn’t be a show-stopper – it just need special attention. Nevertheless, try to limit test automation’s dependence on external data sources.

BDD 101: Test Data

How should test data be handled in a behavior-driven test framework? This is a common question I hear from teams working on BDD test automation. A better question to ask first is, What is test data? This article will explain different types of test data and provide best practices for handling each. The strategies covered here can be applied to any BDD test framework. (Check the Automation Panda BDD page for the full table of contents.)

Types of Test Data

Personally, I hate the phrase “test data” because its meaning is so ambiguous. For functional test automation, there are three primary types of test data:

  1. Test Case Values. These are the input and expected output values for test cases. For example, when testing calculator addition “1 + 2 = 3”, “1” and “2” would be input values, and “3” would be the expected output value. Input values are often parameterized for reusability, and output values are used in assertions.
  2. Configuration Data. Config data represents the system or environment in which the tests run. Changes in config data should allow the same test procedure to run in different environments without making any other changes to the automation code. For example, a calculator service with an addition endpoint may be available in three different environments: development, test, and production. Three sets of config data would be needed to specify URLs and authentication in each environment (the config data), but 1 + 2 should always equal 3 in any environment (the test case values).
  3. Ready State. Some tests require initial state to be ready within a system. “Ready” state could be user accounts, database tables, app settings, or even cluster data. If testing makes any changes, then the data must be reverted to the ready state.

Each type of test data has different techniques for handling it.

Test Case Values

There are 4 main ways to specify test case values in BDD frameworks, ranging from basic to complex.

In The Specs

The most basic way to specify test case values is directly within the behavior scenarios themselves! The Gherkin language makes it easy – test case values can be written into the plain language of a step, as step parameters, or in Examples tables. Consider the following example:

Scenario Outline: Simple Google searches
  Given a web browser is on the Google page
  When the search phrase "<phrase>" is entered
  Then results for "<phrase>" are shown
  
  Examples: Animals
    | phrase   |
    | panda    |
    | elephant |
    | rhino    |

The test case value used is the search phrase. The When and Then steps both have a parameter for this phrase, which will use three different values provided by the Examples table. It is perfectly suitable to put these test case values directly into the scenario because the values are small and descriptive.

Furthermore, notice how specific result values are not specified for the Then step. Values like “Panda Express” or “Elephant man” are not hard-coded. The step wording presumes that the step definition will have some sort of programmed mechanism for checking that result links relate to the search phrase (likely through regular expression matching).

Key-Value Lookup

Direct specification is great for small sets of simple values, but one size does not fit all needs. Key-value lookups are appropriate when test data is lengthier. For example, I’ve often seen steps like this:

Given the user navigates to "http://www.somewebsite.com/long/path/to/the/profile/page"

URLs, hexadecimal numbers, XML blocks, and comma-separated lists are all the usual suspects. While it is not incorrect to put these values directly into a step parameter, something like this would be more readable:

Given the user navigates to the "profile" page

Or even:

Given the user navigates to their profile page

The automation would store URLs in a lookup table so that these new steps could easily fetch the URL for the profile page by name. These steps are also more declarative than imperative and better resist changes in the underlying environment.

Another way to use key-value lookup is to refer to a set of values by one name. Consider the following scenario for entering an address:

Scenario Outline: Address entry
  Given the profile edit page is displayed
  When the user sets the street address to "<street>"
  And the user sets the second address line to "<second>"  
  And the user sets the city to "<city>"
  And the user sets the state to "<state>"
  And the user sets the zipcode to "<zipcode>"
  And the user sets the country to "<country>"
  And the user clicks the save button
  Then ...

  Examples: Addresses
    | street | second | city | state | zipcode | country |
    ...

An address has a lot of fields. Specifying each in the scenario makes it very imperative and long. Furthermore, if the scenario is an outline, the Examples table can easily extend far to the right, off the page. This, again, is not readable. This scenario would be better written like this:

Scenario Outline: Address entry
  Given the profile edit page is displayed
  When the user enters the "<address-type>" address
  And the user clicks the save button
  Then ...

  Examples: Addresses
    | address-type |
    | basic        |
    | two-line     |
    | foreign      |

Rather than specifying all the values for different addresses, this scenario names the classifications of addresses. The step definition can be written to link the name of the address class to the desired values.

Data Files

Sometimes, test case values should be stored in data files apart from the specs or the automation code. Reasons could be:

  • The data is simply too large to reasonably write into Gherkin or into code.
  • The data files may be generated by another tool or process.
  • The values are different between environments or other circumstances.
  • The values must be selected or switched at runtime (without re-compiling code).
  • The files themselves are used as payloads (ex: REST request bodies or file upload).

Scenario steps can refer to data files using the key-value lookup mechanisms described above. Lightweight, text-based, tabular file formats like CSV, XML, or JSON work the best. They can parsed easily and efficiently, and changes to them can easily be diff’ed. Microsoft Excel files are not recommended because they have extra bloat and cannot be easily diff’ed line-by-line. Custom text file formats are also not recommended because custom parsing is an extra automation asset requiring unnecessary development and maintenance. Personally, I like using JSON because its syntax is concise and its parsing tools seem to be the simplest in most programming languages.

External Sources

An external dependency exists when the data for test case values exists outside of the automation code base. For example, test case values could reside in a database instead of a CSV file, or they could be fetched from a REST service instead of a JSON file. This would be appropriate if the data is too large to manage as a set of files or if the data is constantly changing.

As a word of caution, external sources should be used only if absolutely necessary:

  1. External sources introduce an additional point-of-failure. If that database or service goes down, then the test automation cannot run.
  2. External sources degrade performance. It is slower to get data from a network connection than from a local machine.
  3. Test case values are harder to audit. When they are in the specs, the code, or data files, history is tracked by version control, and any changes are easy to identify in code reviews.
  4. Test case values may be unpredictable. The automation code base does not control the values. Bad values can fail tests.

External sources can be very useful, if not necessary, for performance / stress / load / limits testing, but it is not necessary for the vast majority of functional testing. It may be convenient to mock external sources with either a mocking framework like Mockito or with a dummy service.

Configuration Data

Config data pertain to the test environments, not the test cases. Test automation should never contain hard-coded values for config data like URLs, usernames, or passwords. Rather, test automation should read config data when it launches tests and make references to the required values. This should be done in Before hooks and not in Gherkin steps. In this way, automated tests can run on any configuration, such as different test environments before being released to production.

Config data can be stored in data files or accessed through some other dependency. (Read the previous section for pros and cons of those approaches.) The config to use should be somehow dynamically selectable when tests run. For example, the path to the config file to use could be provided as a command line argument to the test launch command.

Config data can be used to select test values to use at runtime. For example, different environments may need different test value data files. Conversely, scenario tagging can control what parts of config data should be used. For example, a tag could specify a username to use for the scenario, and a Before hook could use that username to fetch the right password from the config data.

For efficiency, only the necessary config data should be accessed or read into memory. In many cases, fetching the config data should also be done once globally, rather than before each test case.

Ready State

All scenarios have a starting point, and often, that starting point involves data. Setup operations must bring the system into the ready state, and cleanup operations must return the system to the ready state. Test data should leave no trace – temporary files should be deleted and records should be reverted. Otherwise, disk space may run out or duplicate records may fail tests. Maintaining the ready state between tests is necessary for true test independence.

During the Test Run

Simple setup and cleanup operations may be done directly within the automation. For example, when testing CRUD operations, records must be created before they can be retrieved, updated, or deleted. Setup would create a record, and cleanup would guarantee the record’s deletion. If the setup is appropriate to mention as part of the behavior, then it should be written as Given steps. This is true of CRUD operations: “Given a record has been created, When it is deleted, …”. If multiple scenarios share this same setup, then those Given steps should be put into a Background section.

However, sometimes setup details are not pertinent to the behavior at hand. For example, perhaps fresh authentication tokens must be generated for those CRUD calls. Those operations should be handled in Before hooks. The automation will take care of it, while the Gherkin steps can focus exclusively on the behavior.

No matter what, After hooks must do cleanup. It is incorrect to write final Then steps to do cleanup. Then steps should verify outcomes, not take more actions. Plus, the final Then steps will not be run if the test has a failure and aborts!

External Preparation

Some data simply takes too long to set up fresh for each test launch. Consider complicated user accounts or machine learning data: these are things that can be created outside of the test automation. The automation can simply presume that they exist as a precondition. These types of data require tool automation to prepare. Tool automation could involve a set of scripts to load a database, make a bunch of service calls, or navigate through a web portal to update settings. Automating this type of setup outside of the test automation enables engineers to more easily replicate it across different environments. Then, tests can run in much less time because the data is already there.

However, this external preparation must be carefully maintained. If any damage is done to the data, then test case independence is lost. For example, deleting a user account without replacing it means that subsequent test runs cannot log in! Along with setup tools, it is important to create maintenance tools to audit the data and make repairs or updates.

Advice for Any Approach

Use the minimal amount of test data necessary to test the functionality of the product under test. More test data requires more time to develop and manage. As a corollary, use the simplest approach that can pragmatically handle the test data. Avoid external dependencies as much as possible.

To minimize test data, remember that BDD is specification by example: scenarios should use descriptive values. Furthermore, variations should be reduced to input equivalence classes. For example, in the first scenario example on this page, it would probably be sufficient to test only one of those three animals, because the other two animals would not exhibit any different searching behavior.

Finally, be cautioned against randomization in test data. Functional tests are meant to be deterministic – they must always pass or fail consistently, or else test results will not be reliable. (Not only could this drive a tester crazy, but it would also break a continuous integration system.) Using equivalence classes is the better way to cover different types of inputs. Use a unique number counting mechanism whenever values must be unique.

For handling unpredictable test data, check out Unpredictable Test Data.