Writing effective tests is hard. Tests that are flaky, confusing, or slow are effectively useless because they do more harm than good. The Arrange-Act-Assert pattern gives good structure, but what other characteristics should test cases have? Here are 12 traits for highly effective tests.
At its core, a test is just a step-by-step procedure. It exercises a behavior and verifies the outcome. In a sense, tests are living specifications – they detail exactly how a feature should function. Everyone should be able to intuitively understand how a test works. Follow conventions like Arrange-Act-Assert or Given-When-Then. Seek conciseness without vagueness. Avoid walls of text.
If you find yourself struggling to write a test in plain language, then you should review the design for the feature under test. If you can’t explain it, then how will others know how to use it?
Each test case in a suite should cover a unique behavior. Don’t Repeat Yourself – repetitive tests with few differences bear a heavy cost to maintain and execute without delivering much additional value. If a test can cover multiple inputs, then focus on one variation per equivalence class.
For example, equivalence classes for the absolute value function could be a positive number, a negative number, and zero. There’s little need to cover multiple negative numbers because the absolute value function performs the same operation on all negatives.
Test one thing at a time. Tests that each focus on one main behavior are easier to formulate and automate. They naturally become understandable and maintainable. When a test covering only one behavior fails, then its failure reason is straightforward to deduce.
Any time you want to combine multiple behaviors into one test, consider separating them into different tests. Make a clear distinction between “arrange” and “act” steps. Write atomic tests as much as possible. Avoid writing “world tours,” too. I’ve seen repositories where tests are a hundred steps long and meander through an application like Mr. Toad’s Wild Ride.
Each test should be independent of all other tests. That means testers should be able to run each test as a standalone unit. Each test should have appropriate setup and cleanup routines to do no harm and leave no trace. Set up new resources for each test. Automated tests should use patterns like dependency injection instead of global variables. If one test fails, others should still run successfully. Test case independence is the cornerstone for scalable, parallelizable tests.
Modern test automation frameworks strongly support test independence. However, folks who are new to automation frequently presume interdependence – they think the end of one test is the starting point for the next one in the source code file. Don’t write tests like that! Write your tests as if each one could run on its own, or as if the suite’s test order could be randomized.
Testing tends to be a repetitive activity. Test suites need to run continuously to provide fast feedback as development progresses. Every time they run, they must yield deterministic results because teams expect consistency.
Unfortunately, manual tests are not very repeatable. They require lots of time to run, and human testers may not run them exactly the same way each iteration. Test automation enables tests to be truly repeatable. Tests can be automated once and run repeatedly and continuously. Automated scripts always run the same way, too.
Tests must run successfully to completion, whether they return PASS or FAIL results. “Flaky” tests – tests that occasionally fail for arbitrary reasons – waste time and create doubt. If a test cannot run reliably, then how can its results be trusted? And why would a team invest so much time developing tests if they don’t run well?
You shouldn’t need to rerun tests to get good results. If tests fail intermittently, find out why. Correct any automation errors. Tune automation timeouts. Scale infrastructure to the appropriate sizes. Prioritize test stability over speed. And don’t overlook any wonky bugs that could be lurking in the product under test!
Providing fast feedback is testing’s main purpose. Fast feedback helps teams catch issues early and keep developing safely. Fast tests enable fast feedback. Slow tests cause slow feedback. They force teams to limit coverage. They waste time and money, and they increase the risk that bugs do more damage.
Optimize tests to be as efficient as possible without jeopardizing stability. Don’t include unnecessary steps. Use smart waits instead of hard sleeps. Write atomic tests that cover individual behaviors. For example, use APIs instead of UIs to prep data. Set up tests to run in parallel. Run tests as part of Continuous Integration pipelines so that they deliver results immediately.
An effective test has a clear identity:
- Purpose: Why run this test?
- Coverage: What behavior or feature does this test cover?
- Level: Should this test be a unit, integration, or end-to-end test?
Identity informs placement and type. Make sure tests belong to appropriate suites. For example, tests that interact with Web UIs via Selenium WebDriver do not belong in unit test suites. Group related tests together using subdirectories and/or tags.
Functional tests yield PASS or FAIL results with logs, screenshots, and other artifacts. Large suites yield lots of results. Reports should present results in a readable, searchable format. They should make failures stand out with colors and error messages. They should also include other helpful information like duration times and pass rates. Unit test reports should include code coverage, too.
Publish test reports to public dashboards so everyone can see them. Most Continuous Integration servers like Jenkins include some sort of test reporting mechanism. Furthermore, capture metrics like test result histories and duration times in data formats instead of textual reports so they can be analyzed for trends.
Tests are inherently fragile because they depend upon the features they cover. If features change, then tests probably break. Furthermore, automated tests are susceptible to code duplication because they frequently repeat similar steps. Code duplication is code cancer – it copies problems throughout a code base.
Fragility and duplication cause a nightmare for maintainability. To mitigate the maintenance burden, develop tests using the same practices as developing products. Don’t Repeat Yourself. Simple is better than complex. Do test reviews. For automation, follow good design principles like separating concerns and building solution layers. Make tests easy to update in the future!
A test is “successful” if it runs to completion and yields a correct PASS or FAIL result. The veracity of the outcome matters. Tests that report false failures make teams waste time doing unnecessary triage. Tests that report false passing results give a false sense of security and let bugs go undetected. Both ways ultimately cause teams to mistrust the tests.
Unfortunately, I’ve seen quite a few untrustworthy tests before. Sometimes, test assertions don’t check the right things, or they might be missing entirely! I’ve also seen tests for which the title does not match the behavior under test. These problems tend to go unnoticed in large test suites, too. Make sure every single test is trustworthy. Review new tests carefully, and take time to improve existing tests whenever problems are discovered.
Testing takes a lot of work. It takes time away from developing new things. Therefore, testing must be worth the effort. Since covering every single behavior is impossible, teams should apply a risk-based strategy to determine which behaviors pose the most risk if they fail and then prioritize testing for those behaviors.
If you are unsure if a test is genuinely valuable, ask this question: If the test fails, will the team take action to fix the defect? If the answer is yes, then the test is very valuable. If the answer is no, then look for other, more important behaviors to cover with tests.
Any more traits?
These dozen traits certainly make tests highly effective. However, this list is not necessarily complete. Do you have any more traits to add to the list? Do you agree or disagree with the traits I’ve given? Let me know by retweeting and commenting my tweet below!
(Note: I changed #8 from “Leveled Appropriately” to “Organized” to be more concise. The tweet is older than the article.)
Very informative! Thanks for the compilation!
Somewhat off-topic, but since you mentioned it… 😉
«There’s little need to cover multiple negative numbers because the absolute value function performs the same operation on all negatives.»
The input data format may add the requirement for another equivalence class. For example, in two’s-complement format, there’s no positive value that corresponds to the most negative value.
The example speaks in broad generalities. You, as the tester, must decide what the appropriate equivalence classes should be.