BDD

Introducing Boa Constrictor: The .NET Screenplay Pattern

Today, I’m excited to announce the release of a new open source project for test automation: Boa Constrictor, the .NET Screenplay Pattern!

The Screenplay Pattern helps you make better interactions for better automation. The pattern can be summarized in one line: Actors use Abilities to perform Interactions.

Actors initiate Interactions. Every test has an Actor.
Abilities enable Actors to perform Interactions. They hold objects that Interactions need, like WebDrivers or REST API clients.
Interactions exercise behaviors under test. They could be clicks, requests, commands, and anything else.

This separation of concerns makes Screenplay code very reusable and scalable, much more so than traditional page objects. Check it out, here’s a C# script to test a search engine:

// Create the Actor
IActor actor = new Actor(logger: new ConsoleLogger());

// Add an Ability to use a WebDriver
actor.Can(BrowseTheWeb.With(new ChromeDriver()));

// Load the search engine
actor.AttemptsTo(Navigate.ToUrl(SearchPage.Url));

// Get the page's title
string title = actor.AsksFor(Title.OfPage());

// Search for something
actor.AttemptsTo(Search.For("panda"));

// Wait for results
actor.AttemptsTo(Wait.Until(
    Appearance.Of(ResultPage.ResultLinks),
    IsEqualTo.True()));

Boa Constrictor provides many interactions for Selenium WebDriver and RestSharp out of the box, like Navigate, Title, and Appearance shown above. It also lets you compose interactions together, like how Search is a composition of typing and clicking.

Over the past two years, my team and I at PrecisionLender, a Q2 Company, developed Boa Constrictor internally as the cornerstone of Boa, our comprehensive end-to-end test automation solution. We were inspired by Serenity BDD‘s Screenplay implementation. After battle-hardening Boa Constrictor with thousands of automated tests, we are releasing it publicly as an open source project. Our goal is to help everyone make better interactions for better test automation.

If you’d like to give Boa Constrictor a try, then start with the tutorial. You’ll implement that search engine test from above in full. Then, once you’re ready to use it for some serious test automation, add the Boa.Constrictor NuGet package to your .NET project and go!

You can view the full source code on GitHub at q2ebanking/boa-constrictor. Check out the repository for full information. In the coming weeks, we’ll be developing more content and code. Since Boa Constrictor is open source, we’d love for you to contribute to the project, too!

Test-Driving TestProject’s New Python SDK

TestProject recently released its new OpenSDK, and one of its major features is the inclusion of Python testing support! Since I love using Python for test automation, I couldn’t wait to give it a try. This article is my crash-course tutorial on writing Web UI tests in Python with TestProject.

What is TestProject?

TestProject is a free end-to-end test automation platform for Web, mobile, and API tests. It provides a cloud-based way to teams to build, run, share, and analyze tests. Manual testers can visually build tests for desktop or mobile sites using TestProject’s in-browser recorder and test builder. Automation engineers can use TestProject’s SDKs in Java, C#, and now Python for developing coded test automation solutions, and they can use packages already developed by others in the community through TestProject’s add-ons. Whether manual or automated, TestProject displays all test results in a sleek reporting dashboard with helpful analytics. And all of these features are legitimately free – there’s no tiered model or service plan.

Recently, TestProject announced the official release of its new OpenSDK. This new SDK (“software development kit”) provides a simple, unified interface for running tests with TestProject across multiple platforms and languages (now including Python). Things look exciting for the future of TestProject!

What’s My Interest?

It’s no secret that I love testing with Python. When I heard that TestProject added Python support, I knew I had to give it a try. I never used TestProject before, but I was interested to learn what it could do. Specifically, I wanted to see the value it could bring to reporting automated tests. In the Python space, test automation is slick, but reporting can be rough since frameworks like pytest and unittest are command-line-focused. I also wanted to see if TestProject’s SDK would genuinely help me automate tests or if it would get it my way. Furthermore, I know some great people in the TestProject community, so I figured it was time to jump in myself!

The Python SDK

TestProject’s Python SDK is an open-source project. It was originally developed by Bas Dijkstra, with the support of the TestProject team, and its code is hosted on GitHub. The Python SDK supports Selenium for Web UI automation (which will be the focus for this tutorial) and Appium for Android and iOS UI automation as well!

Since I’d never used TestProject before, let alone this new Python SDK, I wanted to review the code to see how to use it. Thankfully, the README included lots of helpful information and example code. When I looked at the code for TestProject’s BaseDriver, I discovered that it simply extend’s Selenium WebDriver’s RemoteDriver class. That means all the TestProject WebDrivers use exactly the same API as Python’s Selenium WebDriver implementation. To me, that was a big relief. I know WebDriver’s API very well, so I wouldn’t need to learn anything different in order to use TestProject. It also means that any test automation project can be easily retrofitted to use TestProject’s SDKs – they just need to swap in a new WebDriver object!

Setup Steps

TestProject has a straightforward architecture. Users sign up for free TestProject accounts online. Then, they set up their own machines for running tests. Each testing machine must have the TestProject agent installed and linked to a user’s account. When tests run, agents automatically push results to the TestProject cloud. Users can then log into the TestProject portal to view and analyze results. They can invite team mates to share results, and they can also set up multiple test machines with agents. Users can even integrate TestProject with other tools like Jenkins, qTest, and Sauce Labs. The TestProject docs, especially the ecosystem diagram, explain everything in more detail.

When I did my test drive, I created a TestProject account, installed the agent on my Mac, and ran Python Web UI tests from my Mac. I already had the latest version of Python installed (Python 3.8 at the time of writing this article). I also already had my target browsers installed: Google Chrome and Mozilla Firefox.

Below are the precise steps I followed to set up TestProject:

1. Sign up for an account

TestProject accounts are “free forever.” Use this signup link.

2. Download the TestProject Agent

The signup wizard should direct you to download the TestProject agent. If not, you can always download it from the TestProject dashboard. Be warned, the download package is pretty large – the macOS package was 345 MB. Alternatively, you can fetch the agent as a container image from Docker Hub.

The TestProject agent contains all the stuff needed to run tests and upload results to the TestProject app in the cloud. You don’t need to install WebDriver executables like ChromeDriver or geckodriver. Once the agent is downloaded, install it on the machine and register the agent with your account. For me, registration happened automatically.

This is what the TestProject agent looks like when running on macOS. You can also close this window to let it run in the background.

3. Find your developer token

You’ll need to use your developer token to connect your automated tests to your account in the TestProject app. The signup wizard should reveal it to you, but you can always find it (and also reset it) on the Integrations page.

The Integrations page. Check here for your developer token. No, you can’t use mine.

4. Install the Python SDK

TestProject’s Python SDK is distributed as a package through PyPI. To install it, simply run pip install testproject-python-sdk at the command line. This package will also install dependencies like selenium and requests.

A Classic Web UI Test

After setting up my Mac to use TestProject, it was time to write some Web UI tests in Python! Since I discovered that TestProject’s WebDriver objects could easily retrofit any existing test automation project, I thought, “What if I try to run my PyCon 2020 tutorial project with TestProject?” For PyCon 2020, I gave an online tutorial about building a Web UI test automation project in Python from the ground up using pytest and Selenium WebDriver. The tutorial includes one test case: a DuckDuckGo web search and verification. I thought it would be easy to integrate with TestProject since I already had the code. Thankfully, it was!

Below, I’ll walk though my code. You can check out my example project repository from GitHub at AndyLPK247/testproject-python-sdk-example. My code will be a bit more advanced than the examples shown in the Python SDK’s README or in Bas Dijkstra’s tutorial article because it uses the Page Object Model and pytest fixtures. Make sure to pip install pytest, too.

1. Write the test steps

The test case covers a simple DuckDuckGo web search. Whenever I automate tests, I always write out the steps in plain language. Good tests follow the Arrange-Act-Assert pattern, and I like to use Gherkin’s Given-When-Then phrasing. Here’s the test case:

Scenario: Basic DuckDuckGo Web Search
    Given the DuckDuckGo home page is displayed
    When the user searches for "panda"
    Then the search result query is "panda"
    And the search result links pertain to "panda"
    And the search result title contains "panda"

2. Specify automation inputs

Inputs configure how automated tests run. They can be passed into a test automation solution using configuration files. Testers can then easily change input values in the config file without changing code. Automation should read config files once at the start of testing and inject necessary inputs into every test case.

In Python, I like to use JSON for config files. JSON data is simple and hierarchical, and Python includes a module in its standard library named json that can parse a JSON file into a Python dictionary in one line. I also like to put config files either in the project root directory or in the tests directory.

Here’s the contents of config.json for this test:

{
  "browser": "Chrome",
  "implicit_wait": 10,
  "testproject_projectname": "TestProject Python SDK Example",
  "testproject_token": ""
}

browser is the name of the browser to test
implicit_wait is the implicit waiting timeout for the WebDriver instance
testproject_projectname is the project name to use for this test suite in the TestProject app
testproject_token is the developer token

3. Read automation inputs

Automation code should read inputs one time before any tests run and then inject inputs into appropriate tests. pytest fixtures make this easy to do.

I created a fixture named config in the tests/conftest.py module to read config.json:

import json
import pytest


@pytest.fixture
def config(scope='session'):

  # Read the file
  with open('config.json') as config_file:
    config = json.load(config_file)
  
  # Assert values are acceptable
  assert config['browser'] in ['Firefox', 'Chrome', 'Headless Chrome']
  assert isinstance(config['implicit_wait'], int)
  assert config['implicit_wait'] > 0
  assert config['testproject_projectname'] != ""
  assert config['testproject_token'] != ""

  # Return config so it can be used
  return config

Setting the fixture’s scope to “session” means that it will run only one time for the whole test suite. The fixture reads the JSON config file, parses its text into a Python dictionary, and performs basic input validation. Note that Firefox, Chrome, and Headless Chrome will be supported browsers.

4. Set up WebDriver

Each Web UI test should have its own WebDriver instance so that it remains independent from other tests. Once again, pytest fixtures make setup easy.

The browser fixture in tests/conftest.py initialize the appropriate TestProject WebDriver type based on inputs returned by the config fixture:

from selenium.webdriver import ChromeOptions
from src.testproject.sdk.drivers import webdriver


@pytest.fixture
def browser(config):

  # Initialize shared arguments
  kwargs = {
    'projectname': config['testproject_projectname'],
    'token': config['testproject_token']
  }

  # Initialize the TestProject WebDriver instance
  if config['browser'] == 'Firefox':
    b = webdriver.Firefox(**kwargs)
  elif config['browser'] == 'Chrome':
    b = webdriver.Chrome(**kwargs)
  elif config['browser'] == 'Headless Chrome':
    opts = ChromeOptions()
    opts.add_argument('headless')
    b = webdriver.Chrome(chrome_options=opts, **kwargs)
  else:
    raise Exception(f'Browser "{config["browser"]}" is not supported')

  # Make its calls wait for elements to appear
  b.implicitly_wait(config['implicit_wait'])

  # Return the WebDriver instance for the setup
  yield b

  # Quit the WebDriver instance for the cleanup
  b.quit()

This was the only section of code I needed to change to make my PyCon 2020 tutorial project work with TestProject. I had to change the WebDriver invocations to use the TestProject classes. I also had to add arguments for the project name and developer token, which come from the config file. (Note: you may alternatively set the developer token as an environment variable.)

5. Create page objects

Automated tests could make direct calls to the WebDriver interface to interact with the browser, but WebDriver calls are typically low-level and wordy. The Page Object Model is a much better design pattern. Page object classes encapsulate WebDriver gorp so that tests can call simpler, more readable methods.

The DuckDuckGo search test interacts with two pages: the search page and the result page. The pages package contains a module for each page. Here’s pages/search.py:

from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys


class DuckDuckGoSearchPage:

  URL = 'https://www.duckduckgo.com'

  SEARCH_INPUT = (By.ID, 'search_form_input_homepage')

  def __init__(self, browser):
    self.browser = browser

  def load(self):
    self.browser.get(self.URL)

  def search(self, phrase):
    search_input = self.browser.find_element(*self.SEARCH_INPUT)
    search_input.send_keys(phrase + Keys.RETURN)

And here’s pages/result.py:

from selenium.webdriver.common.by import By

class DuckDuckGoResultPage:
  
  RESULT_LINKS = (By.CSS_SELECTOR, 'a.result__a')
  SEARCH_INPUT = (By.ID, 'search_form_input')

  def __init__(self, browser):
    self.browser = browser

  def result_link_titles(self):
    links = self.browser.find_elements(*self.RESULT_LINKS)
    titles = [link.text for link in links]
    return titles
  
  def search_input_value(self):
    search_input = self.browser.find_element(*self.SEARCH_INPUT)
    value = search_input.get_attribute('value')
    return value

  def title(self):
    return self.browser.title

Notice that this code uses the “regular” WebDriver interface because TestProject’s WebDriver classes extend the Selenium WebDriver classes.

To make setup easier, I added fixtures to tests/conftest.py to construct each page object, too. They call the browser fixture and inject the WebDriver instance into each page object:

from pages.result import DuckDuckGoResultPage
from pages.search import DuckDuckGoSearchPage


@pytest.fixture
def search_page(browser):
  return DuckDuckGoSearchPage(browser)


@pytest.fixture
def result_page(browser):
  return DuckDuckGoResultPage(browser)

6. Automate the test case

All the automation plumbing is finally in place. Here’s the test case in tests/traditional/test_duckduckgo.py:

import pytest


@pytest.mark.parametrize('phrase', ['panda', 'python', 'polar bear'])
def test_basic_duckduckgo_search(search_page, result_page, phrase):
  
  # Given the DuckDuckGo home page is displayed
  search_page.load()

  # When the user searches for the phrase
  search_page.search(phrase)

  # Then the search result query is the phrase
  assert phrase == result_page.search_input_value()
  
  # And the search result links pertain to the phrase
  titles = result_page.result_link_titles()
  matches = [t for t in titles if phrase.lower() in t.lower()]
  assert len(matches) > 0

  # And the search result title contains the phrase
  assert phrase in result_page.title()

I parametrized the test to run it for three different phrases. The test function does not interact with the WebDriver instance directly. Instead, it interacts exclusively with the page objects.

7. Run the tests

The tests run like any other pytest tests: python -m pytest at the command line. If everything is set up correctly, then the tests will run successfully and upload results to the TestProject app.

In the TestProject dashboard, the Reports tab shows all the test you have run. It also shows the different test projects you have.

You can also drill into results for individual test case runs. TestProject automatically records the browser type, timestamps, pass-or-fail results, and every WebDriver call. You can also download PDF reports!

What if … BDD?

I was delighted to see how easily I could run a traditional pytest suite using TestProject. Then, I thought to myself, “What if I could use a BDD test framework?” I personally love Behavior-Driven Development, and Python has multiple BDD test frameworks. There is no reason why a BDD test framework wouldn’t work with TestProject!

So, I rewrote the DuckDuckGo search test as a feature file with step definitions using pytest-bdd. The BDD-style test uses the same fixtures and page objects as the traditional test.

Here’s the Gherkin scenario in tests/bdd/features/duckduckgo.feature:

Feature: DuckDuckGo
  As a Web surfer,
  I want to search for websites using plain-language phrases,
  So that I can learn more about the world around me.


  Scenario Outline: Basic DuckDuckGo Web Search
    Given the DuckDuckGo home page is displayed
    When the user searches for "<phrase>"
    Then the search result query is "<phrase>"
    And the search result links pertain to "<phrase>"
    And the search result title contains "<phrase>"

    Examples:
      | phrase     |
      | panda      |
      | python     |
      | polar bear |

And here’s the step definition module in tests/bdd/step_defs/test_duckduckgo_bdd.py:

from pytest_bdd import scenarios, given, when, then, parsers
from selenium.webdriver.common.keys import Keys


scenarios('../features/duckduckgo.feature')


@given('the DuckDuckGo home page is displayed')
def load_duckduckgo(search_page):
  search_page.load()


@when(parsers.parse('the user searches for "{phrase}"'))
@when('the user searches for "<phrase>"')
def search_phrase(search_page, phrase):
  search_page.search(phrase)


@then(parsers.parse('the search result query is "{phrase}"'))
@then('the search result query is "<phrase>"')
def check_search_result_query(result_page, phrase):
  assert phrase == result_page.search_input_value()


@then(parsers.parse('the search result links pertain to "{phrase}"'))
@then('the search result links pertain to "<phrase>"')
def check_search_result_links(result_page, phrase):
  titles = result_page.result_link_titles()
  matches = [t for t in titles if phrase.lower() in t.lower()]
  assert len(matches) > 0


@then(parsers.parse('the search result title contains "{phrase}"'))
@then('the search result title contains "<phrase>"')
def check_search_result_title(result_page, phrase):
  assert phrase in result_page.title()

There’s one more nifty trick I added with pytest-bdd. I added a hook to report each Gherkin step to TestProject with a screenshot! That way, testers can trace each test case step more easily in the TestProject reports. Capturing screenshots also greatly assists test triage when failures arise. This hook is located in tests/conftest.py:

def pytest_bdd_after_step(request, feature, scenario, step, step_func):
  browser = request.getfixturevalue('browser')
  browser.report().step(description=str(step), message=str(step), passed=True, screenshot=True)

Since pytest-bdd is just a pytest plugin, its tests run using the same python -m pytest command. TestProject will group these test results into the same project as before, but it will separate the traditional tests from the BDD tests by name. Here’s what the Gherkin steps with screenshots look like:

Custom Gherkin step with screenshot reported in the TestProject app

This is Awesome!

As its name denotes, TestProject is a great platform for handling project-level concerns for testing work: reporting, integrations, and fast feedback. Adding TestProject to an existing automation solution feels seamless, and its sleek user experience gives me what I need as a tester without getting in my way. The one word that keeps coming to mind is “simple” – TestProject simplifies setup and sharing. Its design takes to heart the renowned Python adage, “Simple is better than complex.” As such, TestProject’s new Python SDK is a welcome addition to the Python testing ecosystem.

I look forward to exploring Python support for mobile testing with Appium soon. I also look forward to seeing all the new Python add-ons the community will develop.

Using Multiple Test Frameworks Simultaneously

Someone recently asked me the following question, which I’ve paraphrased for better context:

Is it good practice to use multiple test frameworks simultaneously? For example, I’m working on a Python project. I want to do BDD with behave for feature testing, but pytest would be better for unit testing. Can I use both? If so, how should I structure my project(s)?

The short answer: Yes, you should use the right frameworks for the right needs. Using more than one test framework is typically not difficult to set up. Let’s dig into this.

The F-word

I despise the F-word – “framework.” Within the test automation space, people use the word “framework” to refer to different things. Is the “framework” only the test package like pytest or JUnit? Does it include the tests themselves? Does it refer to automation tools like Selenium WebDriver?

For clarity, I prefer to use two different terms: “framework” and “solution.” A test framework is software package that lets programmers write tests as methods or functions, run the tests, and report the results. A test solution is a software implementation for a testing problem. It typically includes frameworks, tools, and test cases. To me, a framework is narrow, but a solution is comprehensive.

The original question used the word “framework,” but I think it should be answered in terms of solutions. There are two potential solutions at hand: one for unit tests written in pytest, while another for feature tests written in behave.

One Size Does Not Fit All

Always use the right tools or frameworks for the right needs. Unit tests and feature tests are fundamentally different. Unit tests directly access internal functions and methods in product code, whereas feature tests interact with live versions of the product as an external user or caller. Thus, they need different kinds of testing solutions, which most likely will require different tools and frameworks.

For example, behave is a BDD framework for Python. Programmers write test cases in plain-language Gherkin with step definitions as Python functions. Gherkin test cases are intuitively readable and understandable, which makes them great for testing high-level behaviors like interacting with a Web page. However, BDD frameworks add complexity that hampers unit test development. Unit tests are inherently “code-y” and low-level because they directly call product code. The pytest framework would be a better choice for unit testing. Conversely, feature tests could be written using raw pytest, but behave provides a more natural structure for describing features. Hence, separate solutions for different test types would be ideal.

Same or Separate Repositories?

If more than one test solution is appropriate for a given software project, the next question is where to put the test code. Should all test code go into the same repository as the product code, or should they go into separate repositories? Unfortunately, there is no universally correct answer. Here are some factors to consider.

Unit tests should always be located in the same repository as the product code they test. Unit tests directly depend upon the product code. They mus be written in the same language. Any time the product code is refactored, unit tests must be updated.

Feature tests can be placed in the same repository or a separate repository. I recommend putting feature tests in the same repository as product code if feature tests are written in the same language as the product code and if all the product code under test is located in the same repository. That way, tests are version-controlled together with the product under test. Otherwise, I recommend putting feature tests in their own separate repository. Mixed language repositories can be confusing to maintain, and version control must be handled differently with multi-repository products.

Same Repository Structure

One test solution in one repository is easy to set up, but multiple test solutions in one repository can be tricky. Thankfully, it’s not impossible. Project structure ultimately depends upon the language. Regardless of language, I recommend separating concerns. A repository should have clearly separate spaces (e.g., subdirectories) for product code and test code. Test code should be further divided by test types and then coverage areas. Testers should be able to run specific tests using convenient filters.

Here are ways to handle multiple test solutions in a few different languages:

In Python, project structure is fairly flexible. Conventionally, all tests belong under a top-level directory named “tests.” Subdirectories may be added thereunder, such as “unit” and “feature”. Frameworks like pytest and behave can take search paths so they run the proper tests. Furthermore, if using pytest-bdd instead of behave, pytest can use the markings/tags instead of search paths for filtering tests.
In .NET (like C#), the terms “project” and “solution” have special meanings. A .NET project is a collection of code that is built into one artifact. A .NET solution is a collection of projects that interrelate. Typically, the best practice in .NET would be to create a separate project for each test type/suite within the same .NET solution. I have personally set up a .NET solution that included separate projects for NUnit unit tests and SpecFlow feature tests.
In Java, project structure depends upon the project’s build automation tool. Most Java projects seem to use Maven or Gradle. In Maven’s Standard Directory Layout, tests belong under “src/test”. Different test types can be placed under separate packages there. The POM might need some extra configuration to run tests at different build phases.
In JavaScript, test placement depends heavily upon the project type. For example, Angular creates separate directories for unit tests using Jasmine and end-to-end tests using Protractor when initializing a new project.

Do What’s Best

Different test tools and frameworks meet different needs. No single one can solve all problems. Make sure to use the right tools for the problems at hand. Don’t force yourself to use the wrong thing simply because it is already used elsewhere.

4 Rules for Writing Good Gherkin

In Behavior-Driven Development, Gherkin is a specification language for defining behaviors. Gherkin scenarios can easily be automated into test cases. There are many points to writing good Gherkin, but to keep things simple, I like to focus on four major rules:

Gherkin’s Golden Rule
The Cardinal Rule of BDD
The Unique Example Rule
The Good Grammar Rule

Check out my TechBeacon article to learn about these rules in depth!

Get out of a #BDD pickle … Writing good #Gherkin may be challenging, but it certainly isn’t impossible. @AutomationPanda shares four rules that will help you to write readable, automatable, scalable Gherkin #TSQA2020 https://t.co/5inyn4WZsB
— TechBeacon (@TechBeaconCom) February 21, 2020

How Do We Write Good Gherkin as Part of BDD? (Webinar + Q&A)

On July 23, 2019, I gave a webinar entitled, “How Do We Write Good Gherkin as Part of BDD?” in collaboration with Paul Merrill and his company, Beaufort Fairmont. This webinar was the follow-up to a previous webinar, What Is BDD, and How Do We Practice It? It was an honor to partner with Paul again to go further into BDD practices. (If you want to learn more about BDD, check out Beaufort Fairmont’s two-day BDD training offering, as well as their blog and other webinars.)

To see my webinar recording, register here. Definitely watch the previous webinar first.

Just like last time, attendees asks several great questions that we simply could not answer live. I categorized all questions we received and answered them below. Please note that some questions might be rephrased or combined with others.

Questions about BDD

What is BDD?

Behavior-Driven Development! Read more here.

In a typical Agile development process, who should write feature files?

The Three Amigos! Product owners, developers, and testers should all come together to figure out behaviors. I recommend doing Example Mapping to formulate before writing Gherkin scenarios. The green example cards should be turned into feature files. The specific person who writes the feature files is up to team preference. It could be a collaborative effort, or it could be divided-and-conquered. Any one of the Three Amigos can do it.

How can we apply BDD to SAFe (Scaled Agile Framework) teams?

BDD practices like Three Amigos meetings, Example Mapping, Behavior Specification with Gherkin, and Behavior Implementation can become part of any process. All of these practices happen at the level of the development teams. Teams could even share Gherkin steps and test frameworks wherever sharing makes sense. Check out BDD 101: Behavior-Driven Agile.

What advice can you give to teams that use BDD tests frameworks solely as an automation tool and not part of a greater BDD process?

Do the best with what you’ve got. Try to show how other BDD practices can pragmatically improve your team’s development and delivery work. See also:

Questions about Gherkin Syntax

What is the difference between a scenario and a scenario outline?

A scenario is a procedure of Given-When-Then steps that covers one example for one behavior. If there are any parameters for steps, then a scenario has exactly one combination of possible inputs. A scenario outline is a Given-When-Then procedure that can have multiple examples of one behavior provided as a table of input combos. Each input row will run the same steps once, just with different parameter inputs. See BDD 101: Gherkin by Example to see examples.

What do you think about long tables in scenarios?

Long tables in Gherkin usually look terrible. They’re hard to read, and they create a wall of text. They may also include unnecessary variations. Stick to the Unique Example rule.

Are Given steps mandatory, or can scenarios start directly with When steps?

None of the step types are mandatory. It is valid to write a scenario that skips the Given and has only When-Then steps. It is also valid to write scenarios that are Given-Then or Given-When. In fact, it is syntactically valid to put steps in any order. However, I strongly recommend keeping Given-When-Then step order to properly frame behaviors.

Are quotation marks required for parameters?

No, quotation marks are not required for parameters, but they are a popular convention, and one that I recommend. Quotes make parameters easy to identify.

Questions about Gherkin Scenarios

How do we make sure each scenario focuses on an individual, independent behavior?

Do Example Mapping first as a team. Write scenarios together, or review them with others. Ask, “What makes this behavior unique?” Make sure to use strict Given-When-Then step order when defining the behavior. Rethink the scenario if it is more than 10 lines long. Look out for unnecessary complication.

What does it mean for a scenario to be “chronological”?

Scenario steps should be written as if they were on a timeline. Each step will be executed after the previous one, so its description must start where the previous one ended. Remember, steps will be automated as if they were scripts.

How do we write a very low-level scenario without having a wall of text?

Don’t write low-level scenarios! Gherkin is best for feature testing, not unit testing. Steps should focus on intention and business value. Instead of writing “type, type, click, wait,” write “log into the app.” If you absolutely must write a low-level scenario, remember that the same principles apply. Be intuitively descriptive. Focus on individual behaviors. Keep scenarios concise.

If all scenarios in a feature file have only one user, is it okay to use first-person perspective instead of third-person?

In my opinion, no. I favor third-person perspective universally. Trying to limit usage to one feature file won’t work because any step can be used by any feature file within a test project. The entire solution must be either first-person or third-person. There’s no middle ground.

Can we write Gherkin scenarios with personas?

Yes! Personas can make scenarios more meaningful and understandable. Make sure to define the personas well – they could be described under the Feature section or in a separate text file.

How do we write Gherkin scenarios that need to validate lots of information on a page?

Pick the most important pieces of information to check. You could write separate Then steps for each assertion, or you could push small-but-similar validations down to the automation level to avoid Gherkin clutter.

How do we write Gherkin scenarios for validating Web UI fields?

Typically, I treat each field validation as an independent behavior, and thus I write separate scenarios to check each field. If the scenario steps simply enter a textual value and verify a specific message, then I might make a Scenario Outline with example rows for each equivalence class of inputs.

How do we write Gherkin scenarios that have multiple inputs and setup steps? (Example: an API with ten parameters)

Gherkin allows multiple steps of the same type to be written using “And” and “But” keywords. It’s not a problem to have “Given-And-And” or “When-And-And”. If you discover that different scenarios repeat the same setup steps, then I recommend either moving those common steps to a Background section or writing a new step that covers multiple calls (for conciseness).

One example from the webinar showed searching for shoes and adding them to a shopping cart as part of one scenario. Aren’t those two different behaviors?

Here’s the scenario in question:

Scenario: Add shoes to the shopping cart
  Given the ShoeStore home page is displayed
  When the shopper searches for “red pumps”
  And the shopper adds the first result to the cart
  Then the cart has one pair of “red pumps”

We could have split this scenario into two. I just chose to define the behavior this way. This scenario is a bit more end-to-end because it covers a basic but typical workflow. It may also have leveraged existing steps, which expedites automation development. Overall, the scenario is still concise, chronological, and intuitively understandable. Remember, there is an art as well as a science to writing good Gherkin.

Questions about Automation

Do scenarios need to be independent of each other?

Yes, unequivocally. Tests that are not independent could interfere with each other and cause unexpected failures. Independence also reinforces singular behavioral focus.

How do we start a scenario “in media res” without it depending on other tests?

At the Gherkin level, write Given steps that define a new starting point for the behavior. For example, many teams develop Web apps. It’s common to think that the starting point for all tests is login. However, the starting point can be a few pages after login.

At the automation level, it may be useful to implement the Given steps by calling other steps. For example, if a Given step should start at a user’s profile page, then perhaps it could internally call the login step and the click-the-profile-link step. Test steps may repetitively do the same operations for different tests, but test case independence will be preserved, and unique failures will be reported.

What is the best way to handle preconditions like logging into a Web app?

The simplest way to handle preconditions is to write Given steps. If those Given steps are shared by all scenarios in a feature file, then move them to a Background section. Automation hooks can also perform common setup and cleanup actions, depending upon the test framework. Personally, I prefer to use hooks to do automatic login rather than repeat Given steps for many scenarios.

Is it better to set up and tear down new test objects for each test case, or is it better to use shared, pre-created objects?

That depends upon the object. Most objects like WebDrivers and page objects should have scenario scope, meaning they are created fresh for each scenario and then torn down when the scenario ends. The only time an object should be shared across scenarios is if it is immutable or very expensive to create. For example, configuration data could be read in once before all tests and then injected immutably into each scenario. The safe position is always to use fresh objects; justify why sharing is needed before trying it.

I want to use Serenity for BDD and testing. Should I use Cucumber-like Gherkin feature files, or should I use Serenity’s native methods?

That’s up to you and your team. Personally, I would still use Gherkin feature files with Serenity. I like to separate my test case from my test code. Everyone can read Gherkin feature files, but not everyone can read Java or JavaScript test methods.

If a company already has a large BDD test solution that is poorly implemented, would it be better to keep it going or try to change it?

This question can be applied to all software projects, not just BDD test solutions. The answer is situational. Personally, I favor doing things right, even if it means refactoring. Please read Our Test Automation Has Problems. Should We Start Over? for a thorough answer.

Final Questions

Why do you call yourself “Pandy” and the “Automation Panda”?

Pandas are awesome. Everybody loves them. And nobody forgets my moniker. The nickname “Pandy” came about in the Python community to distinguish me from other folks named “Andy.”

Where can I get team training in BDD?

Beaufort Fairmont provides a one- or two-day course in BDD and writing Gherkin. Sign up for more information here.

Python BDD Framework Comparison

Almost every major programming language has BDD test frameworks, and Python is no exception. In fact, Python has several! So, how do they compare, and which one is best? Let’s find out.

Head-to-Head Comparison

behave

behave is one of the most popular Python BDD frameworks. Although it is not officially part of the Cucumber project, it functions very similarly to Cucumber frameworks.

Resources

Logo

Pros

It fully supports the Gherkin language.
Environmental functions and fixtures make setup and cleanup easy.
It has Django and Flask integrations.
It is popular with Python BDD practitioners.
Online docs and tutorials are great.
It has PyCharm Professional Edition support.

Cons

There’s no support for parallel execution.
- behave-parallel, a spinoff framework, is needed.
It’s a standalone framework.
Sharing steps between feature files can be a bit of a hassle.

pytest-bdd

pytest-bdd is a plugin for pytest that lets users write tests as Gherkin feature files rather than test functions. Because it integrates with pytest, it can work with any other pytest plugins, such as pytest-html for pretty reports and pytest-xdist for parallel testing. It also uses pytest fixtures for dependency injection.

Resources

Logo

Pros

It is fully compatible with pytest and major pytest plugins.
It benefits from pytest‘s community, growth, and goodness.
Fixtures are a great way to manage context between steps.
Tests can be filtered and executed together with other pytest tests.
Step definitions and hooks are easily shared using conftest.py.
Tabular data can be handled better for data-driven testing.
Online docs and tutorials are great.
It has PyCharm Professional Edition support.

Cons

Step definition modules must have explicit declarations for feature files (via “@scenario” or the “scenarios” function).
Scenario outline steps must be parsed differently.

radish

radish is a BDD framework with a twist: it adds new syntax to the Gherkin language. Language features like scenario loops, scenario preconditions, and constants make radish‘s Gherkin variant more programmatic for test cases.

Resources

Logo

Pros

Gherkin language extensions empower testers to write better tests.
The website, docs, and logo are on point.
Feature files and step definitions come out very clean.

Cons

It’s a standalone framework with limited extensions.
BDD purists may not like the additions to the Gherkin syntax.

lettuce

lettuce is another vegetable-themed Python BDD framework that’s been around for years. However, the website and the code haven’t been updated for a while.

Resources

The lettuce project on GitHub
lettuce.it

Logo

Pros

Its code is simpler.
It’s tried and true.

Cons

It lacks the feature richness of the other frameworks.
It doesn’t appear to have much active, ongoing support.

freshen

freshen was one of the first BDD test frameworks for Python. It was a plugin for nose. However, both freshen and nose are no longer maintained, and their doc pages explicitly tell readers to use other frameworks.

My Recommendations

None of these frameworks are perfect, but some have clear advantages. Overall, my top recommendation is pytest-bdd because it benefits from the strengths of pytest. I believe pytest is one of the best test frameworks in any language because of its conciseness, fixtures, assertions, and plugins. The 2018 Python Developers Survey showed that pytest is, by far, the most popular Python test framework, too. Even though pytest-bdd doesn’t feel as polished as behave, I think some TLC from the open source community could fix that.

Here are other recommendations:

Use behave if you want a robust, clean experience with the largest community.
Use pytest-bdd if you need to integrate with other plugins, already have a bunch of pytest tests, or want to run tests in parallel.
Use radish if you want more programmatic control of testing at the Gherkin layer.
Don’t use lettuce or freshen.

What is BDD, and How Do We Practice It? (Webinar + Q&A)

On March 18, 2019, I gave a webinar entitled, “What is Behavior-Driven Development, and How Do We Practice It?” in collaboration with Paul Merrill and his company, Beaufort Fairmont. It was both a pleasure and an honor to do this webinar with them. Paul is a top-notch test automation expert, and Beaufort Fairmont is doing really exciting things. Check out their two-day BDD training offering, as well as their blog and other webinars.

To see my webinar recording, register here.

During the webinar, attendees asked more questions than we could answer. I’m excited that so many people asked questions. My answers are below.

Questions about Process

How is BDD different from TDD (Test-Driven Development)?

BDD is an evolution of TDD. In TDD, developers (1) write unit tests and watch them fail, (2) develop the feature to make the tests pass, (3) refactor the code to make it stronger, and (4) repeat the cycle. In BDD, teams do this same loop with feature tests (a.k.a “acceptance” or “black-box” tests) as well as unit tests. Furthermore, BDD adds shift left practices like Example Mapping and Specification by Example so that teams know what they are doing and focus on developing the right things.

Check out Dan North’s article, Introducing BDD, for a more thorough answer.

Can BDD be used with manual testing?

Yes! BDD is not merely an automation tool – it is a set of pragmatic practices to help teams develop better software. Gherkin scenarios are first and foremost behavior specs that help a team’s collaboration and accountability. They function secondarily as test cases that can be executed either manually or with automation.

Can we use BDD with technical stories or backend features?

Yes! If you can describe it, then you can do it.

How many Gherkin scenarios should one story have?

There’s no hard rule, but I recommend no more than a handful of rules per story, and no more than a handful of examples per rule. If you do Example Mapping and feel overwhelmed by the number of cards for a story, then the story should probably be broken into smaller stories.

Should we do Example Mapping for every story? Spending 20-30 minutes for each story would take a long time.

Try doing Example Mapping on one or two stories to start. The first time is always rough, but as you iterate on it, you’ll get better as a team. Even though Example Mapping has an upfront time cost, it will save a lot of time later in the sprint because (a) acceptance criteria is clear, (b) tests are already written, and (c) everyone has a mutual understanding of the story. The team won’t suffer through the inefficiencies of miscommunication and poor planning. You may even want to replace planning meeting with Example Mapping meetings.

What metrics should we use with BDD?

All metrics are flawed, but some metrics are useful. All the standard testing and Agile metrics still apply: code coverage, story velocity, etc. Here are some additional metrics you may consider for BDD:

the percentage of stories that undergo Example Mapping before the sprint
the number of rules and examples that get “missed” during Example Mapping and need to be added later
the percentage of Gherkin scenarios that get automated in the sprint

If you choose to track metrics, make sure their feedback is used to improve team practices. For more info on metrics, please read my Quality Metrics 101 series.

What were the resources you recommended at the end of the webinar?

Questions about Tools

What test management tools should we use with BDD?

I’m sure there are BDD plugins for test management tools, but I don’t have any that I can personally recommend. To be honest, I try to stay away from large test management tools like HP ALM, qTest, VersionOne. When doing BDD, the Gherkin feature files themselves should be the single source of truth for feature-level tests, and they should be version-controlled in a repository. Don’t fall into the trap of slapping “Given-When-Then” keywords onto existing functional tests – that’s not BDD.

Does Jira support Example Mapping?

I have not personally used any Jira plugin for Example Mapping. It looks like there is an Easy Agile User Story Maps plugin that is similar to but slightly different from Example Mapping.

Are there other good tools for BDD and Example Mapping?

Cycle Automation provides a nice app with Gherkin steps out of the box, so you can automate tests without needing a programming language.
TeamUp Labs provides an online Example Mapping tool.
IDEs from JetBrains and Eclipse provide BDD plugins
Gherkin Syntax Highlighting in Notepad++
Gherkin Syntax Highlighting in Visual Studio Code
Gherkin Syntax Highlighting in Atom
Gherkin Syntax Highlighting in Chrome

What’s the difference between Gherkin, Cucumber, and SpecFlow?

Gherkin is the Given-When-Then spec language.
Cucumber is a company and its eponymous test framework that uses Gherkin.
SpecFlow is Cucumber for .NET.

Questions about Testing

Can BDD test frameworks be used for unit testing?

Yes, but I don’t recommend it. BDD frameworks shine for black-box feature testing. They’re a bit too verbose for code-level unit tests. Read BDD 101: Unit, Integration, and End-to-End Tests for more info.

Can BDD test frameworks be used for integration testing?

Yes! See BDD 101: Unit, Integration, and End-to-End Tests.

How long should Gherkin scenarios be?

Scenarios should be bite-sized. Each scenario should focus on one individual behavior. There’s no hard rule, but I recommend single-digit step counts. Read BDD 101: Writing Good Gherkin for more info.

What are “step definitions” in Cucumber?

Step definitions are the methods in the automation code that execute the steps. When a BDD framework runs a Gherkin scenario as a test, it “glues” each step to a step definition based on some sort of string matching.

How can we minimize duplicate code within a BDD test framework?

Know your steps. Always search for existing steps before writing new steps. Refactor existing steps whenever appropriate. Reuse steps when writing new scenarios. Do pair programming or mob programming when writing scenarios. Put scenarios through code reviews. Apply good coding practices – remember, test automation is software.

I write Gherkin scenarios, but I don’t write test automation code. What’s the best way to write Gherkin scenarios so that they can be automated?

Do pair programming with the automation engineers to write Gherkin scenarios together. Become familiar with existing steps by reading and searching feature files. Otherwise, the Gherkin steps you write in isolation might not be usable. Remember, BDD is a team effort!

The examples in the webinar were all fairly basic. Do you have any examples with more complex systems?

I have some example projects on GitHub in Python and Java with some basic unit, integration, and end-to-end tests, but I don’t have any large-scale examples that I can share publicly.

We wrote hundreds of SpecFlow tests without the other Amigos. Now, there are large test gaps, and many steps aren’t reusable. What should we do?

I’m sorry to hear that. It’s not an uncommon story. There are two paths: (1) refactoring or (2) starting over. Without really knowing the situation, I don’t think it’s my place to say which way is better. Here are some questions to help guide your decision:

What are your goals for testing and automation?
What’s your overall quality and testing strategy?
What parts of the code base are salvageable?
What parts of the code base should be removed?
If you started again from scratch, what would you do differently to make sure the same problems don’t reoccur?

I strongly recommend taking the Setting a Foundation for Successful Test Automation course from Test Automation University. (It’s free.) I also gave a talk about this very problem, Egad! How Do We Start Writing (Better) Tests?, at a few Python conferences.

We have a large BDD test suite with heavy coupling and slow execution times. The business amigos have also left the company. Should we try to fix what we have or just start over?

Sorry to hear that; same answer as before.

Final Questions

Why do you call yourself the “Automation Panda”?

Pandas are awesome. Everybody loves them. And nobody forgets my moniker.

Where can I get team training in BDD?

Beaufort Fairmont provides a one- or two-day course in BDD and writing Gherkin. Sign up for more information here.

Sprint Planning Sucks. Can It Be Fixed?

Warning: This article contains strong opinions that might not be suitable for all audiences. Reader discretion is advised.

It’s Monday morning. After an all-too-short weekend and rush hour traffic, you finally arrive at the office. You throw your bag down at your desk, run to the break room, and queue up for coffee. As the next pot is brewing, you check your phone. It’s 8:44am… now 8:45am, and DING! A meeting reminder appears:

Sprint Planning – 9am to 3pm.

What’s your visceral reaction?

I can’t tell you mine, because I won’t put profanity on my blog.

Real Talk

In the capital-A Agile Scrum process, sprint planning is the kick-off meeting for the next iteration. The whole team comes together to talk about features, size work items with points, and commit to deliverables for the next “sprint” (typically 2 weeks long). Idealistically, team members collaborate freely as they learn about product needs and give valued input.

Let’s have some real talk, though: sprint planning sucks. Maybe that’s a harsh word, but, if you’re reading this article, then it caught your attention. Personally, my sprint planning experiences have been lousy. Why? Am I just bellyaching, or are there some serious underlying problems?

Sprint planning is a huge time commitment. 9am to 3pm is not an exaggeration. Sprint planning meetings are typically half-day to full-day affairs. Most people can’t stay focused on one thing for that long. Plus, when a sprint is only two weeks long, one hour is a big chunk of time, let alone 3, or 6, or a whole day. The longer the meeting, the higher the opportunity cost, and the deeper the boredom.

Collaboration is a farce. Planning meetings typically devolve into one “leader” (like a scrum master, product owner, or manager) pulling teeth to get info for a pre-determined list of stories. Only two people, the leader and the story-owner, end up talking, while everyone else just stares at their laptops until it’s their turn. Discussions typically don’t follow any routine beyond, “What’s the acceptance criteria?” and, “Does this look right?” with an interloper occasionally chiming in. Each team member typically gets only a few minutes of value out of an hours-long ordeal. That’s an inefficient use of everyone’s time.

No real planning actually happens. These meetings ought to be called “guessing” meetings, instead. Story point sizes are literally made up. Do they measure time or complexity? No, they really just measure groupthink. Teams even play a game called planning poker that subliminally encourages bluffing. Then, point totals are used to guess how much work can be done during the sprint. When the guess turns out to be wrong at the end of the sprint (and it always does), the team berates itself in retro for letting points slip. Every. Time.

Does It Spark Joy?

I’ve long wondered to myself if sprint planning is a good concept just implemented poorly, or if it’s conceptually flawed at its root. I’m pretty sure it’s just flawed. The meetings don’t facilitate efficient collaboration relative to their time commitments, and estimates are based on poor models. Retros can’t fix that. And gut reactions don’t lie.

So, what should we do? Should we Konmari our planning meetings to see if they spark joy? Should we get rid of our ceremonies and start over? Is this an indictment of the whole Agile Scrum process? But then, how will we know what to do, and when things can get done?

I think we can evolve our Agile process with more effective practices than sprint planning. And I don’t think that evolution would be terribly drastic.

Behavior-Driven Planning

What we really want out of a planning meeting is planning, not pulling and not predicting. Planning is the time to figure out what will be done and how it will be done. The size of the work should be based on the size of the blueprint. Enter Example Mapping.

Example Mapping is a Behavior-Driven Development practice for clarifying and confirming stories. The process is straightforward:

Write the story on a yellow card.
Write each rule that the story must satisfy on a blue card.
Illustrate each rule with examples written on green cards.
Got stuck on a question? Write it on a red card and move on.

One story should take about 20-30 minutes to map. The whole team can participate, or the team can split up into small groups to divide-and-conquer. Rules become acceptance criteria, examples become test cases, and questions become spikes.

Here’s a good walkthrough of Example Mapping.

What about story size? That’s easy – count the cards. How many cards does a story have? That’s a rough size for the work to be done based on the blueprint, not bluffing. More cards = more complexity. It’s objective. No games. Frankly, it can’t be any worse that made-up point values.

This is real planning: a blueprint with a course of action.

So, rather than doing traditional sprint planning meetings, try doing Example Mapping sessions. Actually plan the stories, and use card counts for point sizes. Decisions about priority and commitments can happen between rounds of story mapping, too. The Scrum process can otherwise remain the same.

If you want to evolve further, you could eliminate the time boxes of sprints in favor of Kanban. Two-week work item boundaries can arbitrarily fall in the middle of progress, which is not only disruptive to workflow but can also encourage bad responses (like cramming to get things done or shaming for not being complete.) Kanban treats work items as a continuous flow of prioritized work fed to a team in bite-sized pieces. When a new story comes up, it can have its own Example Mapping “planning” meeting. Now, Kanban is not for everyone, but it is popular among post-Agile practitioners. What’s important is to find what works for your team.

Rant Over

I know I expressed strong, controversial opinions in this article. And I also recognize that I’m arguing against bad examples of Agile Scrum. Nevertheless, I believe my points are fair: planning itself is not a waste of time, but the way many teams plan their sprints uses time inefficiently and sets poor expectations. There are better ways to do planning – let’s give them a try!

Python Testing 101: pytest-bdd

Warning: If you are new to BDD, then I strongly recommend reading the BDD 101 series before trying to use pytest-bdd. Also, make sure that you are already familiar with the pytest framework.

Overview

pytest-bdd is a behavior-driven (BDD) test framework that is very similar to behave, Cucumber and SpecFlow. BDD frameworks are very different from more traditional frameworks like unittest and pytest. Test scenarios are written in Gherkin “.feature” files using plain language. Each Given, When, and Then step is “glued” to a step definition – a Python function decorated by a matching string in a step definition module. This means that there is a separation of concerns between test cases and test code. Gherkin steps may also be reused by multiple scenarios.

pytest-bdd is very similar to other Python BDD frameworks like behave, radish, and lettuce. However, unlike the others, pytest-bdd is not a standalone framework: it is a plugin for pytest. Thus, all of pytest‘s features and plugins can be used with pytest-bdd. This is a huge advantage!

Installation

Use pip to install both pytest and pytest-bdd.

pip install pytest
pip install pytest-bdd

Project Structure

Project structure for pytest-bdd is actually pretty flexible (since it is based on pytest), but the following conventions are recommended:

All test code should appear under a test directory named “tests”.
Feature files should be placed in a test subdirectory named “features”.
Step definition modules should be placed in a test subdirectory named “step_defs”.
conftest.py files should be located together with step definition modules.

Other names and hierarchies may be used. For example, large test suites can have feature-specific directories of features and step defs. pytest should be able to discover tests anywhere under the test directory.

[project root directory]
|‐‐ [product code packages]
|-- [test directories]
|   |-- features
|   |   `-- *.feature
|   `-- step_defs
|       |-- __init__.py
|       |-- conftest.py
|       `-- test_*.py
`-- [pytest.ini|tox.ini|setup.cfg]

Note: Step definition module names do not need to be the same as feature file names. Any step definition can be used by any feature file within the same project.

Example Code

An example project named behavior-driven-python located in GitHub shows how to write tests using pytest-bdd. This section will explain how the Web tests are designed.

The top layer for pytest-bdd tests is the set of Gherkin feature files. Notice how the scenario below is concise, focused, meaningful, and declarative:

@web @duckduckgo
Feature: DuckDuckGo Web Browsing
  As a web surfer,
  I want to find information online,
  So I can learn new things and get tasks done.

  # The "@" annotations are tags
  # One feature can have multiple scenarios
  # The lines immediately after the feature title are just comments

  Scenario: Basic DuckDuckGo Search
    Given the DuckDuckGo home page is displayed
    When the user searches for "panda"
    Then results are shown for "panda"

Each scenario step is “glued” to a decorated Python function called a step definition. Step definitions are written in Python test modules, as shown below:

import pytest

from pytest_bdd import scenarios, given, when, then, parsers
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

# Constants

DUCKDUCKGO_HOME = 'https://duckduckgo.com/'

# Scenarios

scenarios('../features/web.feature')

# Fixtures

@pytest.fixture
def browser():
    b = webdriver.Firefox()
    b.implicitly_wait(10)
    yield b
    b.quit()

# Given Steps

@given('the DuckDuckGo home page is displayed')
def ddg_home(browser):
    browser.get(DUCKDUCKGO_HOME)

# When Steps

@when(parsers.parse('the user searches for "{phrase}"'))
def search_phrase(browser, phrase):
    search_input = browser.find_element_by_id('search_form_input_homepage')
    search_input.send_keys(phrase + Keys.RETURN)

# Then Steps

@then(parsers.parse('results are shown for "{phrase}"'))
def search_results(browser, phrase):
    # Check search result list
    # (A more comprehensive test would check results for matching phrases)
    # (Check the list before the search phrase for correct implicit waiting)
    links_div = browser.find_element_by_id('links')
    assert len(links_div.find_elements_by_xpath('//div')) > 0
    # Check search phrase
    search_input = browser.find_element_by_id('search_form_input')
    assert search_input.get_attribute('value') == phrase

Notice how each Given/When/Then step has a function with an appropriate decorator. Arguments, such as the search “phrase,” may also be passed from step to function. pytest-bdd provides a few argument parsers out of the box and also lets programmers implement their own. (By default, strings are compared using equality.) One function can be decorated for many steps, too.

pytest fixtures may also be used by step functions. The code above uses a fixture to initialize the Firefox WebDriver before each scenario and then quit it after each scenario. Fixtures follow all the same rules, including scope. Any step function can use a fixture by declaring it as an argument. Furthermore, any “@given” step function that returns a value can also be used as a fixture. Please read the official docs for more info about fixtures with pytest-bdd.

One important, easily-overlooked detail is that scenarios must be explicitly declared in test modules. Unlike other BDD frameworks that treat feature files as the main scripts, pytest-bdd treats the “test_*.py” module as the main scripts (because that’s what pytest does). Scenarios may be specified explicitly using scenario decorators, or all scenarios in a list of feature files may be included implicitly using the “scenarios” shortcut function shown above.

To share steps across multiple feature files, add them to the “conftest.py” file instead of the test modules. Since scenarios must be declared within a test module, they can only use step functions available within the same module or in “conftest.py”. As a best practice, put commonly shared steps in “conftest.py” and feature-specific steps in the test module. The same recommendation also applies for hooks.

Scenario outlines require special implementation on the Python side to run successfully. Unfortunately, steps used by scenario outlines need unique step decorators and extra converting. Please read the official docs or the example project to see examples.

Test Launch

pytest-bdd can leverage the full power of pytest. Tests can be run in full or filtered by tag. Below are example commands using the example project:

# run all tests
pytest

# filter tests by test module
# note: feature files cannot be run directly
pytest tests/step_defs/test_unit_basic.py
pytest tests/step_defs/test_unit_outlines.py
pytest tests/step_defs/test_unit_service.py
pytest tests/step_defs/test_unit_web.py

# filter tests by tags
# running by tag is typically better than running by path
pytest -k "unit"
pytest -k "service"
pytest -k "web"
pytest -k "add or remove"
pytest -k "unit and not outline"

# print JUnit report
pytest -junitxml=/path/for/output

pytest-bdd tests can be executed and filtered together with regular pytest tests. Tests can all be located within the same directory. Tags work just like pytest.mark. As a warning, marks must be explicitly added to “pytest.ini” starting with pytest 5.0.

All other pytest plugins should work, too. For example:

Run tests in parallel with pytest-xdist
Generate code coverage reports with pytest-cov
Integrate with popular frameworks using pytest-django, pytest-flask, or other similar plugins

Pros and Cons

Just like for other BDD frameworks, pytest-bdd is best suited for black-box testing because it forces the developer to write test cases in plain, descriptive language. In my opinion, it is arguably the best BDD framework currently available for Python because it rests on the strength and extendability of pytest. It also has PyCharm support (in the Professional Edition). However, it can be more cumbersome to use than behave due to the extra code needed for declaring scenarios, implementing scenario outlines, and sharing steps. Nevertheless, I would still recommend pytest-bdd over behave for most users because it is more powerful – pytest is just awesome!

Behavior-Driven Blasphemy

This is my 100th post on Automation Panda! I’m thrilled to see how much this blog has grown and how many people it has helped. For such a monumental occasion, I have chosen to voice a rather controversial opinion about test automation.

Behavior-driven development seems to be the software testing buzzword of the decade. What started as a refinement of test-driven development by developers in Europe and the UK quickly became the big process fad of the 2010’s. The Cucumber project (now 10 years old) developed or inspired Gherkin-based test automation frameworks in all the major programming languages. Companies started requiring Given-When-Then format for acceptance criteria and test scenarios. Three Amigos meetings became standard calendar fixtures during sprints. Organizations that once undertook “Agile transformations” now have similar initiatives for BDD. For better or worse, BDD exists and cannot be ignored.

The dogmatic benefits of BDD are better collaboration and automation. However, leaders frequently insist that Gherkin-style test frameworks add value only when paired with practices like Example Mapping. “BDD is a process, not a tool,” is a common mantra. “Otherwise, the Gherkin just gets in the way.” Although I wholeheartedly agree that behavior-driven practices add significant value to the development process, I nevertheless espouse a rather blasphemous opinion:

BDD test automation frameworks are better than traditional frameworks for black box functional testing even when BDD processes are not followed.

What Exactly Are You Saying?

My claim is that behavior-driven test frameworks like Cucumber, SpecFlow, and behave are significantly better than traditional xUnit-style frameworks for testing live features. For example, I would rather use SpecFlow than NUnit for testing a Web app with Selenium WebDriver, whether or not the other two Amigos are with me. The resulting automation code has better structure, readability, and reusability.

I’m not saying that teams shouldn’t do BDD practices, and I’m not saying that the Three Amigos should be separated. Collaboration is key to success, and BDD really helps. Example Mapping is one of the most useful practices a development team can do. I’m also not saying that BDD frameworks should be used for all testing purposes – they are poorly suited for unit testing and for performance testing.

Objection!

I find myself very lonely in this opinion. BDD leaders repeatedly insist that BDD is not about testing and automation:

BDD is not about Testing (slides by Dan North)
The world’s most misunderstood collaboration tool by Aslak Hellesøy
BDD is – BDD is not by Augusto Evangelisti
BDD Tool Cucumber is Not a Testing Tool by Jan Stenberg
3 misconceptions about BDD from ThoughtWorks

The most outspoken BDDers (mostly coalescing around the Cucumber community) have largely moved their focus to the collaboration benefits, almost forsaking the automation benefits. (This may not necessarily be true, but it appears that way based on the literature and materials floating on the Web.) That outlook is somewhat disingenuous because the main tools supporting BDD are, in fact, test frameworks.

BDD also has outspoken opponents – it’s love or hate. I’ve personally spoken with several engineers who despise Gherkin-based frameworks. “I can see how it would be valuable when a whole team embraces behavior-driven practices,” many have told me, “but otherwise, the Gherkin layer just gets in the way of automation.” I’ve heard it called “plaster” and “garbage.” Engineers just want to code their tests. And code should always be readable, right?

Testing is an inherently opinionated space. People can never seem to agree on things.

The Bigger Picture

Test automation must be developed regardless of any specific development practices, and its architecture must stand firmly in its own right. Unfortunately, both sides miss the bigger picture:

The best solution for test automation is a domain-specific language.

A domain-specific language (DSL) is a programming language with a purpose. It is designed to handle very specific needs, rather than general-purpose programming. For example:

SQL is a DSL for database queries.
XPath is a DSL for finding elements in an XML document.
YAML is a DSL for object serialization.

Gherkin is also a DSL – for behavior specification.

Domain-specific languages naturally suit test automation due to the clear difference between test cases and test code. Test cases are procedures that exercise product behavior. Anyone can write a test case. They are dictated or explained in plain language. Test code, however, is the software implementation of test cases. Test code handles function calls, logging, exceptions, and all those other little programming details that help run tests. A test automation DSL separates those concerns: test cases are written in a special language, and the interpreter handles repetitive, low-level details. Some type of extensions can handle product-specific interactions. The purpose of a language is to effectively express intention – and the intention is to test the product.

To truly achieve an optimal solution, however, the DSL and its interpreter must be treated as part of the automation software, just like the test cases and extensions. Remember, a language’s interpreter is just another piece of software. The interpreter is part of the separation of concerns and the single responsibility principle. Concerns that would typically be handled by classes and functions in traditional test code should be moved to the interpreter. For example, the interpreter should automatically log every test case step, rather that forcing the author to write explicit logging statements.

When I worked at NetApp years ago, I implemented a DSL to test platform-level features of our operating system. I called it DS – short for “Design Steps” (from HP ALM) (but also not without an affinity for the Nintendo DS). NetApp’s entire test automation code was developed in Perl at the time, so I implemented the DS interpreter in Perl to reuse existing libraries. DS test cases were typically only a dozen lines long each, and DS expressions could call specially-written Perl modules directly for complete extendability. During the first big release using DS, my team saved countless hours of automation development time as compared to the previous release while delivering a higher number of tests. I also did this before I had ever heard of BDD.

Unfortunately, most teams have neither the time to develop their own testing DSL nor the understanding of compiler theory to build it right. And if they were given such a language, they typically limit themselves to the provided implementation instead of taking ownership to extend the language for their needs.

The original Nintendo DS. Fun times!

Who Truly Misunderstands Gherkin?

Enter Gherkin: the world’s first major general-purpose, off-the-shelf language for test automation. It is general enough to cover any case through its plain language steps, yet specific enough to standardize tests. Users don’t need to be compiler theory experts – they just make up their own step names and provide the definition code to execute them. Early BDD projects like JBehave and Cucumber packaged an interpreter as a test framework and delivered it to a testing world still stuck on JUnit. The need for a testing DSL was there, whether or not the BDD folks meant to serve it.

Cucumber-ites frequently bemoan that their framework is misunderstood by the masses. They shudder to see teams using their framework purely for test automation. However, Cucumber effectively lowered the entry barrier for teams to make their own testing DSLs. Kodak did the same thing for film: they made it cheap and standard so anyone could be a photographer. Not everyone who uses a BDD framework misunderstands its purpose: some (like me) just see an alternative value proposition than what is preached by orthodox BDD practitioners. Gherkin fills a need that nobody knew. Its popularity validates that claim.

Benefits Apart from Process

Using a BDD framework adds much value to testing and development even without BDD processes. Below are just a handful of benefits:

Focus first on good scenarios. Gherkin forces authors to think before they code.
Faster automation development. Gherkin steps are reusable and parametrizable.
Stronger structure. Engineers know where to put things in the framework.
Test understandability. Anyone can read scenarios because they are written in plain language. Business people can help. New people can pick it up fast.
Test sharing. Feature files can be shared apart from test code, which can be helpful for business partners.
Test similarity. Tests all look the same. Team members can more easily help each other.
Clearer failures. When a scenario fails, reports show exactly what step failed.
Simpler bug reports. Use scenario steps as instructions to reproduce the failure.
2-phase test reviews. Review Gherkin first and then test code second to make sure the test cases are good before implementing the wrong things.
BDD enablement. Using a BDD framework opens the door for a team to embrace better behavioral practices in the future.

I wrote about these advantages before:

Case Studies

I’m also not the only one who finds value in BDD test frameworks outside of the full BDD process. Below are five case studies.

radish

radish is a Python test framework inspired by Cucumber. Its DSL syntax is a superset of Gherkin that adds preconditions, loops, variables, and expressions. These language additions indicate a bias towards automation because they enable engineers to write tests more programmatically, albeit in a Gherkin-ese way.

Karate

Karate is a test framework with a full DSL based on Gherkin with steps specifically tailored to Web service calls. Although it is implemented in Java, testers do not need to do any Java programming to write complete tests cases from day one. Peter Thomas, the creator of Karate, unabashedly declares that Karate does not truly adhere to BDD but nevertheless uses Cucumber for its automation benefits. (Note: Karate is working to move completely off of Cucumber. See GitHub issue #444.)

REST Assured

REST Assured is a Java package for testing REST APIs. Unlike Karate, REST Assured provides a fluent syntax (and not a DSL) for writing service calls directly in Java code. The fluent syntax is based on Gherkin: given() a request spec is created, when() the call is made, then() verify the response. Although REST Assured is not a full testing framework, it nevertheless pulls inspiration from BDD frameworks for order and structure.

Cycle

Cycle is a BDD-focused test automation platform from Cycle Labs for testing Web, terminal, and desktop apps. Cycle is unique because it provides out-of-the-box steps for all types of supported testing so that no programming experience is required. Testers write feature files using Cycle 2.0’s slick new Electron app. Scenarios are written in CycleScript, a Gherkin-ese language with additions like variables and sub-scenario calls. Steps tend to be imperative, but that’s the tradeoff for not requiring lower-level programming.

Hexawise

Hexawise is a combinatorial testing tool designed to maximize coverage with minimal test counts by smartly joining feature variations. It helps testers write better tests with less redundancy and fewer gaps. Although Hexawise has historically assisted manual testers, it also can generate Gherkin feature files for test variations.

Not all cucumbers are the same. Above is a sea cucumber.

Good Enough?

Gherkin-based test frameworks are not perfect, but they do provide good structure. They gained popularity outside of the pure BDD movement because they genuinely added value to testing and automation. Like any other tool, teams will use them in both good and bad ways. (Trust me, I’ve seen scary Gherkin.)

It’s interesting to see how groups outside the Cucumber diaspora are attempting to solve the limitations of pure Gherkin. Each case study above showed a unique path. Clearly, the test automation problem has not yet been completely solved, but current BDD frameworks are the best off-the-shelf solutions we have until a new software testing movement comes along.