Test data is a necessary evil for testing and automation. It is necessary because tests simply can’t run without test case values, configuration data, and ready state (as detailed in BDD 101: Test Data). It is evil because it is challenging to handle properly. Test data may be even more dastardly when it is unpredictable, but thankfully there are decent strategies for handling unpredictability.
What is Unpredictable Test Data?
Test data is unpredictable when its values are not explicitly the same every time a test runs. For example, let’s suppose we are writing tests for a financial system that must obtain stock quotes. Mocking a stock quote service with dummy predictable data would not be appropriate for true integration or end-to-end tests. However, stock quotes act like random walks: they change values in real time, often perpetually. The name “unpredictable” could also be “non-deterministic” or “uncertain.”
Below are a few types of test data unpredictability:
- Values may be missing, mistyped, or outside of expected bounds.
- Time-sensitive data may change rapidly.
- Algorithms may yield non-deterministic results (like for machine learning).
- Data formats may change with software version updates.
- Data may have inherent randomness.
Strategies for Handling Unpredictability
Any test data is prone to be unpredictable when it comes from sources external to the automation codebase. Test must be robust enough to handle the inherent unpredictability. Below are 5 strategies for safety and recovery. The main goal is test completion – pass or fail, tests should not crash and burn due to bad test data. When in doubt, skip the test and log warnings. When really in doubt, fail it as a last resort.
Automation’s main goal is to complete tests despite unpredictability in test data.
#1: Make it Predictable
Ask, is it absolutely necessary to fetch data from unpredictable sources? Or can they be avoided by using predictable, fake data? Fake data can be provided in a number of ways, like mocks or database copies. It’s a tradeoff between test reliability and test coverage. In a risk-based test strategy, the additional test coverage may not be worthwhile if all input cases can be covered with fake data. Nevertheless, unpredictable data sometimes cannot or should not be avoided.
#2: Write Defensive Assertions
When reading data, make assertions to guarantee correctness. Assertions are an easy way to abort a test immediately if any problems are found. Assertions could make sure that values are not null, contain all required pieces, and fit the expected format.
#3: Handle Healthy Bounds
Tests using unpredictable data should be able to handle acceptable ranges of values instead of specific pinpointed values. This could mean including error margins in calculations or using regular expressions to match strings. Assertions may need to do some extra preliminary processing to handle ranges instead of singular values. Any anomalies should be reported as warnings.
For the stock quote example, the following would be ways to handle healthy bounds:
- Abort if the quote value is non-numeric or negative.
- Warn if the value is $0 or greater than $1M.
- Continue for values between $0 and $1M.
#4: Scrub the Data
Sometimes, data problems can be “scrubbed” away. Formats can be fixed, missing values can be populated, and given values can be adjusted or filtered. Scrubbing data may not always be appropriate, but if possible, it can mean a test will be completed instead of aborted.
#5: Do Retries
Data may need to be fetched again if it isn’t right the first time. Retries are applicable for data that changes frequently or is random. The automation framework should have a mechanism to retry data access after a waiting period. Set retry limits and wait times appropriately – don’t waste too much time. Retries should also be done as close to the point of failure as possible. Retrying the whole test is possible but not as efficient as retrying a single service call.
Unpredictable test data shouldn’t be a show-stopper – it just need special attention. Nevertheless, try to limit test automation’s dependence on external data sources.