Testing Pyramid

The Airing of Grievances: Test Automation Process

Test automation is a big deal for me. It is my chosen specialty within the broad field of software. When I see things done wrong, or when people just don’t get what it’s about, it really grinds my gears. I got a lot of problems with bad test automation processes, and now you’re gonna hear about it!

Saying “They’re Just Test Scripts”

Test automation is not just a bunch of test scripts: it is a full technology stack that requires design, integration, and expertise. Test automation development is a discipline. Saying it is just a bunch of test scripts is derogatory and demeaning. It devalues the effort test automation requires, which can lead to poor work item sizings and an “us vs. them” attitude between developers and QA.

Not Applying the Same Software Development Best Practices

Test automation is software development, and all the same best practices should thus apply. Write clean, well-designed code. Use version control with code reviews. Add comments and doc. Don’t get lazy because “they’re just test scripts” – wrong attitude!

Lip Service

Don’t say automation is important but then never dedicate time or resources to work on it. Don’t leave automation as a task to complete only if there’s time after manual testing is done. Make automation a priority, or else it will never get done! I once worked on an Agile team where automation framework stories were never included into the sprint because there weren’t “enough points to go around.” So, even though this company hired me explicitly to do test automation, I always got shunted into a manual testing scramble every sprint.

Confusing Test Automation with Deployment Automation

Test automation is the automation of test scenarios (for either functional or performance tests). Deployment automation is the automation of product build distribution and installation in a software environment. They are two different concerns. Cucumber is not Ansible.

Forcing 100% Automation

Some people think that automation will totally eliminate the need for any manual testing. That’s simply not true. Automation and manual testing are complementary. Automation should handle deterministic scenarios with a worthwhile return-on-investment to automate, while manual testing should focus on exploratory testing, user experience (UX), and tests that are too complicated to automate properly. Forcing 100% automation will make teams focus on metrics instead of quality and effectiveness.

Downsizing or Eliminating QA

Test automation doesn’t reduce or eliminate the need for testers. On the contrary, test automation requires even more advanced skills than old-school manual testing. There is still a need for testing roles, whether as a dedicated position or as shared collectively by a bunch of developers. The work done by that testing role just becomes more technical.

Saying Product Code and Test Code Must Use the Same Language

For unit tests, this is true, but for above-unit tests, it is simply false. Any general purpose programming language could be used to automate black-box tests. For example, Python tests could run against an Angular web app. A team may choose to use the same language for product and test code for simplicity, but it is not mandatory.

Not Classifying Test Types

Not all tests are the same. You can play buzzword bingo with all the different test type names: unit, integration, end-to-end, functional, performance, system, contract, exploratory, stress, limits, longevity, test-to-break, etc. Different tests need different tools or frameworks. Tests should also be written at the appropriate Testing Pyramid level.

Assuming All Tests Are Equal

Again, not all tests are the same, even within the same test type. Tests vary in development time, runtime, and maintenance time. It’s not accurate to compare individuals or teams merely on test numbers.

Not Prioritizing Tests to Automate

There’s never enough time to automate everything. Pick the most important ones to automate first – typically the core, highest-priority features. Don’t dilly-dally on unimportant tests.

Not Running Tests Regularly

Automated tests need to run at least once daily, if not continuously in Continuous Integration. Otherwise, the return-on-investment is just too low to justify the automation work. I once worked on a QA team that would run an automated test only once or twice during a 2-year release! How wasteful.

HP Quality Center / ALM

This tool f*&@$#! sucks. Don’t use it for automated tests. Pocket the money and just develop a good codebase with decent doc in the code, and rely upon other dashboards (like Jenkins or Kibana) for test reporting.

BDD 101: Unit, Integration, and End-to-End Tests

There are many types of software tests. BDD practices can be incorporated into all aspects of testing, but BDD frameworks are not meant to handle all test types. Behavior scenarios are inherently functional tests – they verify that the product under test works correctly. While instrumentation for performance metrics could be added, BDD frameworks are not intended for performance testing. This post focuses on how BDD automation works into the Testing Pyramid. Please read BDD 101: Manual Testing for manual test considerations. (Check the Automation Panda BDD page for the full table of contents.)

The Testing Pyramid

The Testing Pyramid is a functional test development approach that divides tests into three layers: unit, integration, and end-to-end.

  • Unit tests are white-box tests that verify individual “units” of code, such as functions, methods, and classes. They should be written in the same language as the product under test, and they should be stored in the same repository. They often run as part of the build to indicate immediate success or failure.
  • Integration tests are black-box tests that verify integration points between system components work correctly. The product under test should be active and deployed to a test environment. Service tests are often integration-level tests.
  • End-to-end tests are black-box tests that test execution paths through a system. They could be seen as multi-step integration tests. Web UI tests are often end-to-end-level tests.

Below is a visual representation of the Testing Pyramid:

The Testing Pyramid

The Testing Pyramid

From bottom to top, the tests increase in complexity: unit tests are the simplest and run very fast, while end-to-end require lots of setup, logic, and execution time. Ideally, there should be more tests at the bottom and fewer tests at the top. Test coverage is easier to implement and isolate at lower levels, so fewer high-investment, more-fragile tests need to be written at the top. Pushing tests down the pyramid can also mean wider coverage with less execution time. Different layers of testing mitigate risk at their optimal returns-on-investment.

Behavior-Driven Unit Testing

BDD test frameworks are not meant for writing unit tests. Unit tests are meant to be low-level, program-y tests for individual functions and methods. Writing Gherkin for unit tests is doable, but it is overkill. It is much better to use established unit test frameworks like JUnit, NUnit, and pytest.

Nevertheless, behavior-driven practices still apply to unit tests. Each unit test should focus on one main thing: a single call, an individual variation, a specific input combo; a behavior. Furthermore, in the software process, feature-level behavior specs draw a clear dividing line between unit and above-unit tests. The developer of a feature is often responsible for its unit tests, while a separate engineer is responsible for integration and end-to-end tests for accountability. Behavior specs carry a gentleman’s agreement that unit tests will be completed separately.

Integration and End-to-End Testing

BDD test frameworks shine at the integration and end-to-end testing levels. Behavior specs expressively and concisely capture test case intent. Steps can be written at either integration or end-to-end levels. Service tests can be written as behavior specs like in Karate. End-to-end tests are essentially multi-step integrations tests. Note how a seemingly basic web interaction is truly a large end-to-end test:

Given a user is logged into the social media site
When the user writes a new post
Then the user's home feed displays the new post
And the all friends' home feeds display the new post

Making a simple social media post involves web UI interaction, backend service calls, and database updates all in real time. That’s a full pathway through the system. The automated step definitions may choose to cover these layers implicitly or explicitly, but they are nevertheless covered.

Lengthy End-to-End Tests

Terms often mean different things to different people. When many people say “end-to-end tests,” what they really mean are lengthy procedure-driven tests: tests that cover multiple behaviors in sequence. That makes BDD purists shudder because it goes against the cardinal rule of BDD: one scenario, one behavior. BDD frameworks can certainly handle lengthy end-to-end tests, but careful considerations should be taken for if and how it should be done.

There are five main ways to handle lengthy end-to-end scenarios in BDD:

  1. Don’t bother. If BDD is done right, then every individual behavior would already be comprehensively covered by scenarios. Each scenario should cover all equivalence classes of inputs and outputs. Thus, lengthy end-to-end scenarios would primarily be duplicate test coverage. Rather than waste the development effort, skip lengthy end-to-end scenario automation as a small test risk, and compensate with manual and exploratory testing.
  2. Combine existing scenarios into new ones. Each When-Then pair represents an individual behavior. Steps from existing scenarios could be smashed together with very little refactoring. This violates good Gherkin rules and could result in very lengthy scenarios, but it would be the most pragmatic way to reuse steps for large end-to-end scenarios. Most BDD frameworks don’t enforce step type order, and if they do, steps could be re-typed to work. (This approach is the most pragmatic but least pure.)
  3. Embed assertions in Given and When steps. This strategy avoids duplicate When-Then pairs and ensures validations are still performed. Each step along the way is validated for correctness with explicit Gherkin text. However, it may require a number of new steps.
  4. Treat the sequence of behaviors as a unique, separate behavior. This is the best way to think about lengthy end-to-end scenarios because it reinforces behavior-driven thinking. A lengthy scenario adds value only if it can be justified as a uniquely separate behavior. The scenario should then be written to highlight this uniqueness. Otherwise, it’s not a scenario worth having. These scenarios will often be very declarative and high-level.
  5. Ditch the BDD framework and write them purely in the automation programming. Gherkin is meant for collaboration about behaviors, while lengthy end-to-end tests are meant exclusively for intense QA work. Biz roles will write behavior specs but will never write end-to-end tests. Forcing behavior specification on lengthy end-to-end scenarios can inhibit their development. A better practice could be coexistence: acceptance tests could be written with Gherkin, while lengthy end-to-end tests could be written in raw programming. Automation for both test sets could still nevertheless share the same automation code base – they could share the same support modules and even step definition methods.

Pick the approach that best meets the team’s needs.