Our Test Automation Has Problems. Should We Start Over?

Test automation is the cornerstone for continuous software delivery pipelines. Automation repeatedly hits new features with a barrage of tests that could never be completed manually in time. From my experiences, though, test automation code can be some of the worst code in the software industry. Teams frequently overlook its importance, its workload size, and its unique technical challenges. The resulting code can become a heapin’ mess! You might even call it “bankrupt.”

In this situation, should a team give up on their current test automation solution and just start over from scratch? Maybe, but maybe not. Don’t be quick to nuke everything and start over! No project is perfect, and some can be recovered. Starting a whole new test automation solution is not a light decision.

Red Herrings

Here are a few problems that should be addressed within the existing test automation solution instead of starting over:

  • Tests don’t add value? That’s easy to fix: just remove the low-value tests. The framework is separate from the test cases.
  • Tests are flaky? Find the root cause. Typically, flaky tests can be fixed with small updated to the test case or to an aspect of the framework. If the feature under test itself is flaky, then fix the feature or consider testing it manually instead of with automation.
  • The original authors are gone? Figure out their code before writing your own. Their code might be good once you understand it.
  • The team uses poor practices? Fix the practices before fixing the code. Otherwise, the new code won’t be any better than the old code.

Signs to Consider

Nevertheless, there are times when a team should start over. Look for these signs, and proceed thoughtfully.

  • The test frameworks and packages are deprecated. Although the tests may run today, they may not run tomorrow. Finding others to work on it will be difficult, too. For example, nose was once the Python framework of choice, but now it’s dead. Pick a more modern framework. Hopefully, parts of the old tests can be salvaged.
  • Test case independence is systemically violated. Each test case must be independent. It should not depend upon the outputs of any other tests. It should not interrupt any other tests. The litmus tests for independence is that tests should be able to run successfully in any random order. Interdependent tests are not scalable, difficult for reruns, and potentially dangerous. The only way to fix them is to completely rewrite them.
  • There is no separation between unit tests and feature tests. White-box unit tests cover code, whereas black-box feature tests cover live features. They occupy different Testing Pyramid layers and should happen at different stages in a CI/CD pipeline. An automation solution with no separation between these types of tests reveals a lack of planning and strategy, and continuous testing will be much tougher to achieve.
  • The framework lacks cohesive architecture, designs, and patterns. Good solutions are designed, not hacked. Designs scale. Hacks don’t. Building new tests on a shaky platform will yield shaky tests.
  • Critical fixes would require a majority of tests to be rewritten. Some issues are pervasive, especially when code is repeatedly duplicated. If framework problems are so widespread that tests would need to be rewritten anyway, then it might be easier to start fresh with a new solution.

The Nuclear Option

If you feel like you really do need to start a new test automation solution from scratch, make sure to do it right. You won’t want to redo everything again in a few years! Here’s some advice:

  1. Define your goals. What problems do you want to solve? How can testing and automation help? What can you reasonably achieve? Are you willing to make necessary changes to achieve these goals?
  2. Treat test automation as a project. Test automation is software and requires the same rules and practices. It takes time and expertise. Make sure to allocate time, money, and resources to its development and execution.
  3. Learn how to develop test automation well. It’s not “just writing scripts” – it’s a special domain. Take courses from Test Automation University. Read more Automation Panda articles. Attend webinars and conferences. Seek consulting help if necessary.
  4. Draw a line in the sand between the old and new solutions. Commit to writing all new tests in the new solution. Meanwhile, continue to run tests from the old solution as appropriate – there’s no need to axe its test coverage. Decide if migrating old tests to the new solution is worthwhile for your team. This could also be an opportunity to clean up old tests and cut out low-value areas.

Starting a new test solution is a lot of work, but it can also be rewarding. Good luck!

Python BDD Framework Comparison

Almost every major programming language has BDD test frameworks, and Python is no exception. In fact, Python has several! So, how do they compare, and which one is best? Let’s find out.

Head-to-Head Comparison


behave is one of the most popular Python BDD frameworks. Although it is not officially part of the Cucumber project, it functions very similarly to Cucumber frameworks.


  • It fully supports the Gherkin language.
  • Environmental functions and fixtures make setup and cleanup easy.
  • It has Django and Flask integrations.
  • It is popular with Python BDD practitioners.
  • Online docs and tutorials are great.
  • It has PyCharm Professional Edition support.


  • There’s no support for parallel execution.
  • It’s a standalone framework.
  • Sharing steps between feature files can be a bit of a hassle.


pytest-bdd is a plugin for pytest that lets users write tests as Gherkin feature files rather than test functions. Because it integrates with pytest, it can work with any other pytest plugins, such as pytest-html for pretty reports and pytest-xdist for parallel testing. It also uses pytest fixtures for dependency injection.


  • It is fully compatible with pytest and major pytest plugins.
  • It benefits from pytest‘s community, growth, and goodness.
  • Fixtures are a great way to manage context between steps.
  • Tests can be filtered and executed together with other pytest tests.
  • Step definitions and hooks are easily shared using
  • Tabular data can be handled better for data-driven testing.
  • Online docs and tutorials are great.
  • It has PyCharm Professional Edition support.


  • Step definition modules must have explicit declarations for feature files (via “@scenario” or the “scenarios” function).
  • Scenario outline steps must be parsed differently.


radish is a BDD framework with a twist: it adds new syntax to the Gherkin language. Language features like scenario loops, scenario preconditions, and constants make radish‘s Gherkin variant more programmatic for test cases.




  • Gherkin language extensions empower testers to write better tests.
  • The website, docs, and logo are on point.
  • Feature files and step definitions come out very clean.


  • It’s a standalone framework with limited extensions.
  • BDD purists may not like the additions to the Gherkin syntax.


lettuce is another vegetable-themed Python BDD framework that’s been around for years. However, the website and the code haven’t been updated for a while.





  • Its code is simpler.
  • It’s tried and true.


  • It lacks the feature richness of the other frameworks.
  • It doesn’t appear to have much active, ongoing support.


freshen was one of the first BDD test frameworks for Python. It was a plugin for nose. However, both freshen and nose are no longer maintained, and their doc pages explicitly tell readers to use other frameworks.

My Recommendations

None of these frameworks are perfect, but some have clear advantages. Overall, my top recommendation is pytest-bdd because it benefits from the strengths of pytest. I believe pytest is one of the best test frameworks in any language because of its conciseness, fixtures, assertions, and plugins. The 2018 Python Developers Survey showed that pytest is, by far, the most popular Python test framework, too. Even though pytest-bdd doesn’t feel as polished as behave, I think some TLC from the open source community could fix that.

Here are other recommendations:

  • Use behave if you want a robust, clean experience with the largest community.
  • Use pytest-bdd if you need to integrate with other plugins, already have a bunch of pytest tests, or want to run tests in parallel.
  • Use radish if you want more programmatic control of testing at the Gherkin layer.
  • Don’t use lettuce or freshen.

Web Element Locators for Test Automation

Do you want a full course? Check out Web Element Locator Strategies on Test Automation University!

If you do any Web UI test automation (like with Selenium WebDriver), then you probably spend a large chunk of your test development time finding elements on a page, like buttons, inputs, and divs. Finding the right elements, however, can be challenging, especially when they lack unique IDs or class names. This guide will show you how to locate any Web element like a pro.

What are Web elements?

A Web element is an individual entity rendered on a Web page. Everything a user sees on a Web page (and even some things they don’t see) are elements: title headers, okay buttons, input fields, text areas, and more. Elements are specified in HTML by tag name, attributes, and contents. They may also have child elements, such as a table containing rows. CSS may be applied to elements to style them with colors, sizes, position, etc. Programming languages typically access Web elements as nodes in the Document Object Model (DOM).

What are Web element locators?

Web elements and locators are two different things. A Web element locator is an object that finds and returns Web elements on a page using a given query. In short, locators find elements.

Why are locators needed? As human users, we interact with Web pages visually: We look, scroll, click, and type through a browser. However, test automation interacts with Web pages programmatically: it needs a coded way to find and manipulate those same elements. Traditional automation won’t “look” at the page like a human* – it will search the DOM instead.

(*Newer automation technologies enable visual testing, which will be discussed later in this article.)

Selenium WebDriver separates the concerns of element location and interaction. WebDriver calls for these two concerns are frequently written back-to-back:

// WebDriver example: typing a search phrase at
// This code is written in C#, but the calls are the same in any language

// First, element location
IWebElement searchField = driver.FindElement(By.Name("q"));

// Second, element interaction

WebDriver provides the following locator query types using “By”:

Which one is best? We’ll discuss that below.

Locators may also return multiple elements, or none at all! For example:

// Get the list of results from a Google search
// Using "FindElements" will return a list of all elements found in order
// Using "FindElement" would return the first element found (or throw an exception if no elements were found)
IList<IWebElement> results = driver.FindElements(By.CssSelector("div.r"));

Large test frameworks often use design patterns for structuring locators and interactions. The Page Object Model organizes locators and action methods together in classes by page or component. However, I strongly recommend the Screenplay Pattern over page objects because its pieces are more reusable and scalable. Whatever the pattern, locators are needed.

How do I find elements?

Elements can be a hassle to find when writing locators for test automation. To simplify my work flow, I use Google Chrome’s Developer Tools side-by-side with my IDE. Why choose Chrome?

To inspect any Web page in Chrome, simply right-click anywhere on the page:

Voila! DevTools will open. For finding Web elements, we want to use the Elements tab.

Visually pinpointing an element is easy. Click the “select” tool in the upper-left corner of the DevTools pane. (It looks like a square with a cursor on it.) The icon should turn blue.

Then, move the cursor to the desired element on the page. You will see each element highlighted in different colors as the mouse moves over. The corresponding HTML source code in the Elements tab will simultaneously be highlighted, too. Nice! Click on the desired element to set the highlighting so that it won’t disappear when you move the cursor elsewhere.

From here, you can check out the element’s tag, classes, attributes, contents, parents, and children.

How do I write good locators?

Finding the element is half the battle. Forming a unique locator query is the other half. If a locator is too broad, then it could return false positives. However, if a locator is too specific, then it could be susceptible to break whenever the DOM changes, and it could also be difficult for others to read. The best philosophy is this: Write the simplest locator query that uniquely identifies the target element(s).

My locator query type order-of-preference is:

  1. ID (if unique)
  2. Name (if unique)
  3. Class name
  4. CSS Selector
  5. XPath without text or indexing
  6. Link text / partial link text
  7. XPath with text and/or indexing

Unique IDs, names, and class names make locators super easy to write: queries are short and don’t need extra anchors. Always encourage developers on the team to use unique identifiers like class names for all elements. However, many elements do not have them, which means locators must fall back on more complicated CSS selectors and XPaths (*shiver*). Whenever this happens, here’s some advice:

  • Use parents as anchors if they have unique identifiers.
    • CSS selector example: “#some-list > li”
    • XPath example: “//ul[@id=’some-list’]/li”
  • Avoid XPaths that use text or indexing if possible.
    • Bad example: “//div[3]//span[text()=’hello’]”
    • Those tend to be the most brittle checks.
  • Use the “contains” function when checking for classes in XPath.
    • Example: “//div[contains(@class, ‘some-class’)]”
    • Elements frequently have more than one class.
    • “contains” will check a substring instead of the full class string.
    • However, be careful because “some-class2” would be matched!

Always test locators, too. Syntax errors and false positives happen frequently. Chrome DevTools makes testing locators easy. Simply hit Ctrl-F on the Elements tab and then paste the locator query into the finder field. DevTools will highlight all the matching elements in order. Spiffy!

Sometimes, when I can’t figure out why a locator isn’t working for a test case, I’ll do the following:

  1. Run the test case with debugging from my IDE.
  2. Set a break point on the locator.
  3. Wait for the test case to stop at the break point.
  4. Enter DevTools on the active Chrome window.
  5. Check the DOM and test the locators on the live page.

What if my tests are flaky?

Web UI testing is roundly criticized for being “flaky” because tests often crash for unexpected reasons. However, much of the unreliability people hit with Web UI testing (and often with Selenium WebDriver itself) is that all Web interactions inherently pose race conditions. The automation and the browser execute independently, so interactions must be synchronized with page state. Otherwise, WebDriver will throw exceptions for timeouts, stale elements, and elements not found. Many times, these issues happen intermittently, so they can be difficult to trace and resolve.

The best way to avoid race conditions is this: Always wait for an element to exist before interacting with it. This may seem basic, but it’s easy to overlook. Selenium WebDriver packages all offer some sort of WebDriverWait object that will force the driver to wait for a given condition to be true before proceeding. The easiest way to check if an element exists is to check if the list of elements returned by a FindElements (plural) call is non-empty. Adding another call for each interaction may feel burdensome, but design patterns within well-designed frameworks (like the Screenplay Pattern) can make these checks happen automatically.

Another good practice is this: Always fetch fresh elements. Sometimes, automation will first get some elements and then use a second query to get more elements. Or, in the case of the Page Object Factory (which should never be used because, bluntly, its design is terrible), elements are fetched once when the page object is constructed and referenced thereafter. No matter which way, the longer a Web element object exists, the more prone it is to become stale and cause exceptions. I’ve seen elements turn stale inexplicably even when they still seem to be on the page, too. Always get an element in the moment when it is needed. That way, it can’t go stale!

Want some helpful tips for clicking tricky elements? Check out this article: Clicking Web Elements with Selenium WebDriver.

How can AI help Web UI testing?

Several new AI-based projects/products aim to improve automated Web UI testing over traditional methods:

  • Applitools extends Selenium WebDriver automation with checks for nontrivial visual differences.
  • Testim can automatically heal locators whenever they break, avoiding test flakiness due to front-end changes.
  • Mabl is an assistant that will learn and rerun tests that developers teach it without writing any code.
  • runs common user tests like login, searching, and shopping on mobile apps based on what its AI has learned from several other apps.
  • Rainforest QA uses crowdsourcing plus AI to run manual tests specified by a team almost like they are automated.

Test Automation University also offers a free course on using AI for element selection: AI for Element Selection: Erasing the Pain of Fragile Test Scripts.

Many AI testing tools definitely add value, but keep in mind, under the hood, locators are still used somewhere.

Testing Web Services with Karate

Karate is a relatively new open source framework for testing Web services. Even though Karate is written in Java, its main value proposition is that testers don’t need to do any Java programming in order to write fully automated tests. Instead, testers use a Gherkin-like language with steps for making requests and validating responses. It’s like Cucumber with out-of-the-box Web API steps! There are a bunch of other nifty features, too.

This article is my quick-start guide for Karate. As a prerequisite, make sure you understand how Web services work (like REST APIs). Knowing BDD will also help.

My System

Since Karate is an open-source Java project, it can run almost anywhere. Here’s my system config:

  • macOS 10.13.6 (High Sierra)
  • Java 1.8.0_191
  • Apache Maven 3.6.0
  • Karate 0.9.0

Warning: I initially attempted to run my Karate project using Java 11.0.1, but I repeatedly hit SSL handshake exceptions despite trying many fixes. Downgrading to Java 8 fixed the errors. See Issue #617.

Project Setup

I created my Karate project using the Maven archetype since I am quite familiar with Maven. I named my project “firstchop”:

mvn archetype:generate \ \
  -DarchetypeArtifactId=karate-archetype \
  -DarchetypeVersion=0.9.0 \
  -DgroupId=com.automationpanda \

The directory layout was standard for Java Maven / Cucumber-JVM projects. (In fact, Karate was based on Cucumber-JVM until version 0.8.0.) Interestingly, the Karate docs recommend placing feature files under src/test/java instead of src/test/resources.


The Karate docs recommend using Eclipse or IntelliJ IDEA for developing Karate tests. Both IDEs offer support for JUnit and Cucumber, which Karate can leverage not only for editing but also for running tests. I’d probably use IntelliJ IDEA for serious testing.

However, for my initial exploration, I chose to use Visual Studio Code. Why?

  • It’s fast and easy.
  • It has good support for Java and Gherkin.
  • It has a file explorer and an integrated terminal.


Here’s what my Karate project looked like inside Visual Studio Code. Nice!

The Examples

The archetype project included example tests in the users.feature file. The first scenario from the file, copied below, tests getting users from the JSONPlaceholder REST API:

Feature: sample karate test script

* url ''

Scenario: get all users and then get the first user by id

Given path 'users'
When method get
Then status 200

* def first = response[0]

Given path 'users',
When method get
Then status 200

Anyone familiar with Cucumber will immediately recognize the Given/When/Then format for scenarios. However, Karate’s standard steps make the language more powerful than raw Gherkin:

  • Given steps build requests
  • When steps make request calls
  • Then steps validate responses
  • Catch-all steps (*) provide additional directives, like setting variables

All steps are quite concise. The Java implementation is essentially hidden from the tester (unless, of course, they want to plunge into the framework’s lower levels). Furthermore, this scenario shows how to use data from one response as the input for a second request.

Running Tests

Since I chose to use Maven and Visual Studio Code, the easiest way to run tests without any additional configuration was through the command line using “mvn test”. The example tests came with an file that will run all feature files in the package when it is discovered during Maven’s “test” phase. The project defaulted to using JUnit 4, but JUnit 5 is also supported.

Running the tests will print many lines to the console. Every request and response will be printed. Below is the tail end of a successful run for users.feature:

$ mvn test
feature: classpath:examples/users/users.feature
scenarios:  2 | passed:  2 | failed:  0 | time: 1.5232
HTML report: (paste into browser to view) | Karate version: 0.9.0

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.09 sec

Results :

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  5.635 s
[INFO] Finished at: 2018-12-09T23:32:29-05:00
[INFO] ------------------------------------------------------------------------

Karate can generate helpful test reports, too. The project generates JUnit reports by default, but other report formats like Cucumber reports are also possible.


The JUnit HTML report shows a step-by-step log for each scenario.


Full requests and responses are automatically logged, which is great for debugging.


Failures are visually easy to identify. Above, I hacked the example to deliberately fail.

My Scenarios

Seeing examples is great, but I wanted to write my own scenarios with a real Web service for some hands-on experience. So, I wrote a test for Recipe Puppy, a search engine that provides links to recipes for input ingredients:

Feature: Recipe Puppy

* url ''

Scenario Outline: Get a recipe for <ingredient>

Given path 'api'
And params {i: '<ingredient>'}
When method get
Then status 200
And match response contains {results: '#array'}
And match response.results[*] contains
    title: '#string',
    href: '#string',
    ingredients: '#regex <ingredient>',
    thumbnail: '#string'

    | ingredient |
    | tomato     |
    | pepperoni  |
    | cheese     |


Here are a few things I learned while writing this new feature file:

  • Scenario Outlines are supported, but data-driven features are preferred.
  • JSON objects can be used as step arguments or as variables.
  • Matching syntax is simple but sophisticated.
  • Markers like “#array” support fuzzy matching when specific values are unknown.
  • Multi-line JSON expressions need block quotes.

This new scenario ran just as successfully as the examples!

Standalone Execution

Even though writing tests using Karate’s domain-specific language does not require Java development skills, setting up the full Karate project does. Thankfully, Karate provides a standalone JAR that can run feature files without any other dependencies or configuration. It simply takes in paths to feature files, runs them, and generates Cucumber reports. The standalone JAR would be a good option for testers who don’t have strong programming skills.


This was the Cucumber report generated by running my Recipe Puppy test with the standalone JAR.

Other Features

Despite being a fairly young project (with a GitHub creation date of February 7, 2017), Karate is full of nifty features that I have yet to try:

  • Parallel test execution
  • A mock servlet
  • A UI for visually debugging scripts
  • Reading files of many types into variables
  • Calling a feature file from another feature file
  • Calling JavaScript code
  • Calling Java code
  • Using scenarios as Gatling performance tests

Project contributors are also experimenting to extend Karate to test browser, mobile, and desktop UIs using Selenium WebDriver.


Overall, Karate is a great tool for testing Web services. It handles all of the programming implementation details so that testers can focus more on testing. Its syntax is concise, clear, and versatile. Its native support for JSON object makes request and response handling feel natural. The GitHub docs are on-point. Anyone testing REST APIs should give it serious consideration.

Even though Karate uses Gherkin-like syntax, it is not truly a behavior-driven test framework. Instead, it simply leverages the strengths of Cucumber – readability, reusability, and tooling – to make automated Web service testing easier. I have long stated that one of the best solutions to test automation challenges is to create a domain-specific testing language that automatically handles low-level details. (See Behavior-Driven Blasphemy.) The Karate project validates my claim. Karate’s DSL acts much more like a testing language than a business-oriented specification language. Its Gherkin keywords simply provide structure and familiarity to its steps. And it’s perfectly okay that Karate isn’t “pure BDD” – just ask the project’s creator.

With that said, I would argue that a tester probably needs basic programming skills to be successful with Karate. Execution needs Java setup no matter what, and feature files are really just fancy test scripts. And Web service APIs are inherently code-y.

I look forward to seeing how the Karate project grows, especially if/when WebDriver-based steps are added to the language.



Python Testing 101: pytest-bdd

Warning: If you are new to BDD, then I strongly recommend reading the BDD 101 series before trying to use pytest-bdd. Also, make sure that you are already familiar with the pytest framework.


pytest-bdd is a behavior-driven (BDD) test framework that is very similar to behaveCucumber and SpecFlow. BDD frameworks are very different from more traditional frameworks like unittest and pytest. Test scenarios are written in Gherkin “.feature” files using plain language. Each Given, When, and Then step is “glued” to a step definition – a Python function decorated by a matching string in a step definition module. This means that there is a separation of concerns between test cases and test code. Gherkin steps may also be reused by multiple scenarios.

pytest-bdd is very similar to other Python BDD frameworks like behave, radish, and lettuce. However, unlike the others, pytest-bdd is not a standalone framework: it is a plugin for pytest. Thus, all of pytest‘s features and plugins can be used with pytest-bdd. This is a huge advantage!


Use pip to install both pytest and pytest-bdd.

pip install pytest
pip install pytest-bdd

Project Structure

Project structure for pytest-bdd is actually pretty flexible (since it is based on pytest), but the following conventions are recommended:

  • All test code should appear under a test directory named “tests”.
  • Feature files should be placed in a test subdirectory named “features”.
  • Step definition modules should be placed in a test subdirectory named “step_defs”.
  • files should be located together with step definition modules.

Other names and hierarchies may be used. For example, large test suites can have feature-specific directories of features and step defs. pytest should be able to discover tests anywhere under the test directory.

[project root directory]
|‐‐ [product code packages]
|-- [test directories]
|   |-- features
|   |   `-- *.feature
|   `-- step_defs
|       |--
|       `-- test_*.py
`-- [pytest.ini|tox.ini|setup.cfg]

Note: Step definition module names do not need to be the same as feature file names. Any step definition can be used by any feature file within the same project.

Example Code

An example project named behavior-driven-python located in GitHub shows how to write tests using pytest-bdd. This section will explain how the Web tests are designed.

The top layer for pytest-bdd tests is the set of Gherkin feature files. Notice how the scenario below is concise, focused, meaningful, and declarative:

@web @duckduckgo
Feature: DuckDuckGo Web Browsing
  As a web surfer,
  I want to find information online,
  So I can learn new things and get tasks done.

  # The "@" annotations are tags
  # One feature can have multiple scenarios
  # The lines immediately after the feature title are just comments

  Scenario: Basic DuckDuckGo Search
    Given the DuckDuckGo home page is displayed
    When the user searches for "panda"
    Then results are shown for "panda"

Each scenario step is “glued” to a decorated Python function called a step definition. Step definitions are written in Python test modules, as shown below:

import pytest
from pytest_bdd import scenarios, given, when, then, parsers
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

# Constants


# Scenarios


# Fixtures

def browser():
    b = webdriver.Firefox()
    yield b

# Given Steps

@given('the DuckDuckGo home page is displayed')
def ddg_home(browser):

# When Steps

@when(parsers.parse('the user searches for "{phrase}"'))
def search_phrase(browser, phrase):
    search_input = browser.find_element_by_name('q')
    search_input.send_keys(phrase + Keys.RETURN)

# Then Steps

@then(parsers.parse('results are shown for "{phrase}"'))
def search_results(browser, phrase):
    links_div = browser.find_element_by_id('links')
    assert len(links_div.find_elements_by_xpath('//div')) > 0
    search_input = browser.find_element_by_name('q')
    assert search_input.get_attribute('value') == phrase

Notice how each Given/When/Then step has a function with an appropriate decorator. Arguments, such as the search “phrase,” may also be passed from step to function. pytest-bdd provides a few argument parsers out of the box and also lets programmers implement their own. (By default, strings are compared using equality.) One function can be decorated for many steps, too.

pytest fixtures may also be used by step functions. The code above uses a fixture to initialize the Firefox WebDriver before each scenario and then quit it after each scenario. Fixtures follow all the same rules, including scope. Any step function can use a fixture by declaring it as an argument. Furthermore, any “@given” step function that returns a value can also be used as a fixture. Please read the official docs for more info about fixtures with pytest-bdd.

One important, easily-overlooked detail is that scenarios must be explicitly declared in test modules. Unlike other BDD frameworks that treat feature files as the main scripts, pytest-bdd treats the “test_*.py” module as the main scripts (because that’s what pytest does). Scenarios may be specified explicitly using scenario decorators, or all scenarios in a list of feature files may be included implicitly using the “scenarios” shortcut function shown above.

To share steps across multiple feature files, add them to the “” file instead of the test modules. Since scenarios must be declared within a test module, they can only use step functions available within the same module or in “”. As a best practice, put commonly shared steps in “” and feature-specific steps in the test module. The same recommendation also applies for hooks.

Scenario outlines require special implementation on the Python side to run successfully. Unfortunately, steps used by scenario outlines need unique step decorators and extra converting. Please read the official docs or the example project to see examples.

Test Launch

pytest-bdd can leverage the full power of pytest. Tests can be run in full or filtered by tag. Below are example commands using the example project:

# run all tests

# filter tests by test module
# note: feature files cannot be run directly
pytest tests/step_defs/
pytest tests/step_defs/
pytest tests/step_defs/
pytest tests/step_defs/

# filter tests by tags
# running by tag is typically better than running by path
pytest -k "unit"
pytest -k "service"
pytest -k "web"
pytest -k "add or remove"
pytest -k "unit and not outline"

# print JUnit report
pytest -junitxml=/path/for/output

pytest-bdd tests can be executed and filtered together with regular pytest tests. Tests can all be located within the same directory. Tags work just like pytest.mark.

All other pytest plugins should work, too. For example:

Pros and Cons

Just like for other BDD frameworks, pytest-bdd is best suited for black-box testing because it forces the developer to write test cases in plain, descriptive language. In my opinion, it is arguably the best BDD framework currently available for Python because it rests on the strength and extendability of pytest. It also has PyCharm support (in the Professional Edition). However, it can be more cumbersome to use than behave due to the extra code needed for declaring scenarios, implementing scenario outlines, and sharing steps. Nevertheless, I would still recommend pytest-bdd over behave for most users because it is more powerful – pytest is just awesome!

The Software Engineer in Test

I am a Software Engineer in Test (SET). Many people don’t know quite what that means, though. Developers frequently refer to me as a “tester” or “QA,” and a former director once thought I did DevOps. While my work covers these areas, they aren’t the main focus. Let’s clarify what it means to be a SET.

What is a Software Engineer in Test?

A “Software Engineer in Test” (a.k.a. “Software Development Engineer in Test”) is a software developer who develops software for testing: tools, frameworks, and automated tests. SETs focus primarily on automation for running tests quickly and repeatedly. Test automation is a software product: just as front-end developers write web pages and back-end developers write microservices, SETs write automated tests. The same practices and coding skills apply. I frequently say that a Software Engineer in Test must have the heart of a developer.

So they just write test scripts?

No. Never say this to a SET. Test automation involves much more than just “writing test scripts.” A serious testing solution requires serious design and effort. The top-level automation of a test case is usually just a small piece. SETs are responsible for:

  • Collaborating with developers and product owners
    • Contributing to planning and design
    • Reviewing product code
    • Formulating test scenarios
  • Developing test automation frameworks
  • Automating test scenarios using the frameworks
  • Knowing and using design patterns where appropriate
  • Setting up the infrastructure to run tests
    • Running tests in CI/CD
    • Running tests in parallel
    • Running tests with the appropriate test data
  • Setting up dashboards for reporting test results in real time
  • Teaching others good quality and testing practices
  • Developing tools to assist manual and exploratory testing

Saying that testing is “just scripting” belittles the role of the SET and underestimates the workload it requires.

How did this role start?

Software “tester” and “QA” roles have existed for decades, but the SET role first became distinct in the 2000s when large-scale test automation became both feasible and necessary. According to Wikipedia, Microsoft coined the title “Software Developer Engineer in Test” (SDET) in 2005, and others like Amazon and Apple quickly adopted it. Google coined the name “Software Engineer in Test” for the same type of role. I personally prefer the SET title over the SDET title simply because it is more concise.

How is it different from “QA” or “testing” roles?

To me, the SET role is distinct from the traditional “QA” or “testing” role. Software testers historically focused on manual testing and thus didn’t need strong programming skills. Their fortes were product domain knowledge, intuition, out-of-the-box thinking, test planning, and test system setup. And these certainly are important, indispensable skills! SETs, however, live in both the development and testing worlds. They use developer skills to provide software solutions for testing problems. Automation is much more central to the SET role. These days, almost all “testing” job openings are SET roles; manual-only test roles have been largely deprecated. Nevertheless, there will always be a need for manual testing because there are some problems a human can catch much more easily than a script (or an AI agent).

Personally, I avoid using the titles “QA” and “software tester” for myself because they don’t accurately describe all that I do. I also avoid the title “automation engineer” because, again, it is reductionist. I tackle software testing with the heart of a developer, and I set up test automation solutions from the ground up. I’m proud to be a software engineer who specializes in testing.


As a bonus, check out Test and Code episode 47, in which Brian Okken and I discuss what it means to be a Software Engineer in Test (among other topics).


Book Review: Python Testing with pytest


Title Python Testing with pytest
Author Brian Okken (@brianokken)
Publication 2017 (The Pragmatic Programmers)
Summary How to use all the features of pytest for Python test automation – “simple, rapid, effective, and scalable.”
Prerequisites Intermediate-level Python programming.


Python Testing with pytest is the book on pytest*. Brian Okken covers all the ins and outs of the framework. The book is useful both as tutorial for learning pytest as well as a reference for specific framework features. It covers:

  • Getting started with pytest
  • Writing simple tests as functions
  • Writing more interesting tests with assertions, exceptions, and parameters
  • Using all the different execution options
  • Writing fixtures to flexibly separate concerns and reuse code
  • Using built-in fixtures like tmpdir, pytestconfig, and monkeypatch
  • Using configuration files to control execution
  • Integrating pytest with other tools like pdb, tox, and Jenkins

Appendices also cover:

  • Using Python virtual environments
  • Installing packages with pip
  • An overview of popular plugins like pytest-xdist and pytest-cov
  • Packaging and distributing Python packages

[* Update on 9/27/2018: Check out my review on another excellent pytest book, pytest Quick Start Guide by Bruno Oliveira.]


This book is a comprehensive guide to pytest. It thoroughly covers the framework’s features and gives pointers to more info elsewhere. Even though pytest has excellent online documentation, I still recommend this book to anyone who wants to become a pytest master. Online docs tend to be fragmented with each piece limited in scope, whereas books like this one are designed to be read progressively and orderly for maximal understanding of the material.

I love how this book is example-driven. Each section follows a simple yet powerful outline: idea → code → output → explanation. Having real code with real output truly cements the point of each mini-lesson. New topics are carefully unfolded so that they build upon previous topics, making the book read like a collection of tutorials. Examples at the end of every chapter challenge the readers to practice what they learn. The formatting of each section also looks great.

The extra info on related topics like pip and virtualenv is also a nice touch. Python pros probably don’t need it, but beginners might get stuck without it.

The rocket ship logo on the cover is also really cool!


pytest is one of the best functional test frameworks currently available in any language. It breaks the clunky xUnit mold, in which class structures were awkwardly superimposed over test cases as if one size fits all. Instead, pytest is lightweight and minimalist because it relies on functions and fixtures. Scope is much easier to manage, code is more reusable, and side effects can more easily be avoided. pytest has taken over Python testing because it is so Pythonic.

Brian’s concise writing style has also inspired me to be more direct in my own writing. I tend to be rather verbose out of my desire to be descriptive. However, fewer words often leave a more powerful impression. They also make the message easier to comprehend. Python is beloved for its concise expressiveness, and as a Pythonista, it would be fitting for me to adopt that trait into my English.

If I had a wish list for a second edition, I’d love to see more info about assertions and other plugins (namely pytest-bdd). I think an appendix with comparisons to other Python test frameworks would also be valuable.

A Warning

I ordered a physical copy of this book directly from Amazon (not a third-party seller). Unfortunately, that copy was missing all the introductory content: the table of contents, the acknowledgements, and the preface. The first page after the front cover was Chapter 1. Befuddled, I reached out to Brian Okken (who I personally met at PyCon 2018). We suspected that it was either a misprint or a bootleg copy. Either way, we sent the evidence to the publisher, and Amazon graciously exchanged my defective copy for the real deal. Please look out for this sort of problem if you purchase a printed copy of this book for yourself!

If you want to learn more about pytest, please read my article Python Testing 101: pytest.