automation

Python Testing 101: Introduction

Python is an amazing programming language. Loved by beginners and experts alike, it is consistently ranked as one of the most in-demand languages today. At PyData Carolinas 2016, Josh Howes, a senior data science manager at MaxPoint at the time, described Python like this (in rough paraphrase):

Python is a magical tool that easily lets you solve the world’s toughest problems.

I first touched Python back in high school more than a decade ago, but I really started using it and loving it in recent years for test automation. This 101 series will teach how to do testing in Python. This introductory post will give basic orientation, and each subsequent post will focus on a different Python test framework in depth.

Why Use Python for Testing?

As mentioned in another post, The Best Programming Language for Test Automation, Python is concise, elegant, and readable – the precise attributes needed to effectively turn test cases into test scripts. It has richly-supported test packages to deftly handle both white-box and black-box testing. It is also command-line-friendly. Engineers who have never used Python tend to learn it quickly.

The following examples illustrate ways to use Python for test automation:

A developer embedding quick checks into function docstrings.
A developer writing unit tests for a module or package.
A tester writing integration tests for REST APIs.
A tester writing end-to-end web tests using Selenium.
A data scientist verifying functions in a Jupyter notebook.
The Three Amigos writing Given-When-Then scenarios for BDD testing.

Remember, Python can be used for any black-box testing, even if the software product under test isn’t written in Python!

Python Version

Choosing the right Python installation itself is no small decision. For an in-depth analysis, please refer to Which Version of Python Should I Use? Tl;dr:

For white-box testing, use the matching Python version.
For black-box testing, use CPython version 3 if not otherwise constrained.

Unless otherwise stated, this 101 series uses CPython 3.

Picking a Framework

There are so many Python test frameworks that choosing one may seem daunting – just look at the Python wiki, The Hitchhiker’s Guide to Python, and pythontesting.net. Despite choice overload, there are a few important things to consider:

Consider the type of testing. Basic unit tests could be handled by unittest or even doctest, but higher-level testing would do better with other frameworks like pytest. BDD testing would require behave, lettuce, or radish.
Consider the supported Python version. Python 2 and 3 are two different languages, with Python 2’s end-of-life slated for 2020. Different frameworks have different levels of version support, which could become especially problematic for white-box testing. Furthermore, some may have different features between Python versions.
Consider support and development. Typically, it is best to choose mature, actively-developed frameworks for future sustainability. For example, the once-popular nose is now deprecated.

Future posts in this series will document many frameworks in detail to empower you, as the reader, to pick the best one for your needs.

Virtual Environments

A virtual environment (VE) is like a local Python installation with a specific package set. Tools like venv (Python 3.3+), virtualenv (Python 2 and 3), and Conda (Python 2 and 3; for data scientists) make it easy to create virtual environments from the command line. Pipenv goes a step further by combining VE management with simple-yet-sophisticated package management. Creating at least one separate VE for each Python project is typically a good practice. VEs are extremely useful for test automation because:

VEs allow engineers to maintain multiple Python environments simultaneously.
- Engineers can develop and test packages for both versions of Python.
- Engineers can separate projects that rely on different package versions.
VEs allow users to install Python packages locally without changing global installations.
- Users may not have permissions to install packages globally.
- Global changes may disrupt other dependent Python software.
VEs can import and export package lists for easy reconstruction.

VEs become especially valuable in continuous integration and deployment because they can easily provide Python consistency. For example, a Jenkins build job can create a VE, install dependencies from PyPI in the VE, run Python tests, and safely tear down. Once the product under test is ready to be deployed, the same VE configuration can be used.

Recommended IDEs

Any serious test automation work needs an equally serious IDE. My favorite is JetBrains PyCharm. I really like its slick interface and intuitive nature, and it provides out-of-the-box support for a number of Python test frameworks. PyCharm may be downloaded as a standalone IDE or a plugin for JetBrains IntelliJ IDEA. The Community Edition is free and meets most automation needs, while the Professional Edition requires a license. PyDev is a nice alternative for those who prefer Eclipse. Eric satisfies the purists for being a Python IDE written in Python. While all three have a plugin framework, PyCharm and PyDev seem to take the advantage in popularity and support. There’s also the classic IDLE, but its use is strongly discouraged nowadays, due to bugs and better options.

Lightweight text editors can make small edits easy and fast. Visual Studio Code is a recent favorite. Notepad++ is always a winner on Windows. Atom is a newer, cross-platform editor developed by GitHub that’s gaining popularity. Of course, UNIX platforms typically provide vim or emacs.

Framework Survey

If this series is for you, then install an IDE, set up a virtual environment, and let’s roll! The next posts will each introduce a popular Python test framework.Each post should be used as an introduction for getting started or as a quick reference. Please refer to official framework documentation for full details – it would be imprudent for this blog to unnecessarily duplicate information.

The outline for each post will be:

Overview
Installation
Project Structure
Example Code
Test Launch
Pros and Cons

Cucumber-JVM Global Hook Workarounds

Almost all BDD automation frameworks have some sort of hooks that run before and after scenarios. However, not all frameworks have global hooks that run once at the beginning or end of a suite of scenarios – and Cucumber-JVM is one of these unlucky few. Cucumber-JVM GitHub Issue #515, which seeks to add @BeforeAll and @AfterAll hooks, has been open and active since 2013, but it looks unclear if the issue will ever be resolved. Thankfully, there are some workarounds to effect the same behavior as global hooks.

Workaround #1: Don’t Do It

From a purist’s perspective, each scenario (or test) should be completely independent, meaning it should not share parts with any other tests. Independence provides the following benefits:

Safety between tests
Consistency across tests
The ability to run any tests individually, in any order, or in parallel
More sensible, understandable tests

If not handled properly, global hooks can be dangerous because they make tests interdependent. Changes or failures in one test may cascade into others. Global test data would waste memory for tests that don’t use it. Furthermore, the fact that Issue #515 has been open for years indicates the difficulty of properly implementing global hooks.

However, the main cost of independence is runtime. Independent tests often repeat similar setup and cleanup routines. Even a few extra seconds per test can add up tremendously. Google Guava, for example, has over 286,000 tests – adding one second to each test would amount to nearly 80 hours! Performance becomes especially critical for continuous integration, in which wasted time means either delivery delays or coverage gaps. Certain operations like preparing a database or fetching authentication tokens may be pragmatic candidates for global hooks.

The best strategy is to use global hooks only when necessary for time-intensive setup that can be shared safely. Any shared test data should be immutable. Always question the need for global hooks. Most tests probably won’t need them.

Workaround #2: Static Variables

A basic hack for global hooks is actually provided in Issue #515. A static Boolean flag can indicate when the @Before hook has run more than once because it isn’t “reset” when a new scenario re-instantiates the step definition classes. The runtime shutdown hook will be called once all tests are done and the program exits. (Note that a static flag cannot be used in an @After hook due to the halting problem.) The example from the issue is shamelessly copied below:

public class GlobalHooks {
    private static boolean dunit = false;

    @Before
    public void beforeAll() {
        if(!dunit) {
            Runtime.getRuntime().addShutdownHook(afterAllThread);
            // do the beforeAll stuff...
            dunit = true;
        }
    }
}

Workaround #3: Singleton Caching

The basic hack is useful for simple setup and cleanup routines, but it becomes inelegant when objects must be shared by scenarios. Rather than polluting the class with static members, a singleton can cache test data between scenarios, and global setup logic may be put into the singleton’s constructor. Furthermore, if the singleton uses lazy initialization, then @Before hooks may not be needed at all. A “lazy” singleton will not be instantiated until the first time its getInstance method is called, meaning it will be skipped if the scenarios do not need them. This is a huge advantage when selectively running scenarios by name, tag, or feature. (Please refer to the previous post, Static or Singleton, for a deeper explanation of the singleton pattern.)

Consider scenarios that must generate authentication tokens (like OAuth) for API testing. A singleton “token holder” could cache tokens for usernames, rather than doing the authorization dance for every scenario. The snippet below shows how such a singleton could be called within a @When step definition with no @Before method.

public class ExampleSteps {
    ...
    @When("^some API is called$")
    public void whenSomeApiIsCalled() {
        // Get the token from the singleton cache lazily
        String token = TokenHolder.getInstance().getToken("user", "pass");
        // Use the token to call some API (method not shown)
        callSomeApi(token);
    }
    ...
}

And the singleton class could be defined like this:

public class TokenHolder {
    private static volatile TokenHolder instance = null;
    private HashMap<String, String> tokens;

    private TokenHolder() {
        tokens = new HashMap<String, String>();
    }

    public static TokenHolder getInstance() {
        // Lazy and thread-safe
        if (instance == null) {
            synchronized(TokenHolder.class) {
                if (instance == null) {
                    instance = new TokenHolder();
                }
            }
        }

        return instance;
    }
    
    public String getToken(String username, String password) {
        // This check could be extended to handle token expiration
        if (!tokens.containsKey(username)) {
            // Request a fresh authentication token (method not shown)
            String token = requestToken(username, password);
            // Cache the token for later
            tokens.put(username, token);
        }
        
        return tokens.get(username);
    }
    
    ...
}

Workaround #4: JUnit Class Annotations

Another workaround mentioned in Issue #515 and elsewhere is to use JUnit‘s @BeforeClass and @AfterClass annotations in the runner class, like this:

@RunWith(Cucumber.class)
@Cucumber.Options(format = {
    "html:target/cucumber-html-report",
    "json-pretty:target/cucumber-json-report.json"})
public class RunCukesTest {

    @BeforeClass
    public static void setup() {
        System.out.println("Ran the before");
    }

    @AfterClass
    public static void teardown() {
        System.out.println("Ran the after");
    }
}

While @BeforeClass and @AfterClass may look like the cleanest solution at first, they are not very practical to use. They work only when Cucumber-JVM is set to use the JUnit runner. Other runners, like TestNG, the command line runner, and special IDE runners, won’t pick up these hooks. Their methods must also be are static and would need static variables or singletons to share data anyway. Therefore, I personally discourage using these annotations in Cucumber-JVM.

What About Dependency Injection?

Dependency injection is a marvelous technique. As defined by Wikipedia:

In software engineering, dependency injection is a technique whereby one object supplies the dependencies of another object. A dependency is an object that can be used (a service). An injection is the passing of a dependency to a dependent object (a client) that would use it. The service is made part of the client’s state. Passing the service to the client, rather than allowing a client to build or find the service, is the fundamental requirement of the pattern.

Dependency injection can be a powerful alternative to singletons because DI provides finer control over the scope of objects. However, Cucumber-JVM’s dependency injection cannot be applied with global hooks because dependency objects, like step definition objects, are constructed and destroyed for each scenario.

Comparison Table

Ultimately, the best approach for global hooks in Cucumber-JVM is the one that best fits the tests’ needs. Below is a table to make workaround comparisons easier.

Workaround	Pros	Cons
Don’t Do It	Scenarios are completely independent. No complicated or risky workarounds.	Repeated setup and cleanup procedures may add significant execution time.
Static Variables	Simple yet effective implementation.	May need many static variables to share test data.
Singleton Caching	Abstracts test data and setup procedures. Easily handles lazy initialization and evaluation. May not need a @Before hook.	More complicated design.
JUnit Class Annotations	Clean look for basic setup and cleanup routines.	May be used only with the JUnit runner. Requires static variables or singletons to share test data anyway.

12 Awesome Benefits of BDD

What can BDD do for you? Why adopt a new process with a new framework? Because it’s worth it! The main benefits of BDD are better collaboration and automation. This article expands those two into a dozen awesome benefits. (If you read the BDD 101 series, then these points should look familiar.)

#1: Inclusion

BDD is meant to be collaborative. Everyone from the customer to the tester should be able to easily engage in product development. And anyone can write behavior scenarios because they are written in plain language. Scenarios are:

Requirements for product owners
Acceptance criteria for developers
Test cases for testers
Scripts for automators
Description for other stakeholders

Essentially, BDD is an enhancement of The Three Amigos.

#2: Clarity

Scenarios focus on the expected behaviors of the product. Each scenario focuses on one specific thing. Behaviors are described in plain language, and any ambiguity can be clarified with a simple conversation or Example Mapping. There’s no unreadable code or obscure technical jargon, and there’s no game of telephone. Clarity ensures the customer gets what the customer wants.

#3: Streamlining

BDD is designed to speed up the development process. Everyone involved in development relies upon the same scenarios. Scenarios are requirements, acceptance criteria, test cases, and test scripts all in one – there is no need to write any other artifact. The modular nature of Gherkin syntax expedites test automation development. Furthermore, scenarios can be used as steps to reproduce failures for defect reports.

#4: Shift Left

“Shift left” is a buzzword for testing early in the development process. Testing earlier means fewer bugs later. In BDD, test case definition inherently becomes part of the requirements phase (for waterfall) or grooming (for Agile). As soon as behavior scenarios are written, testing and automation can theoretically begin.

#5: Artifacts

Scenarios form a collection of self-documenting test cases as a result of the BDD process. This ever-growing collection forms a perfect regression test suite. Scenarios can be run manually or with automation. Any tests not automated can be added to a backlog to automate in the future.

#6: Automation

BDD frameworks make it easy to turn scenarios into automated tests. The steps are already given by the scenarios – the automation engineer simply needs to write a method/function to perform each step’s operations.

#7: Test-Driven

BDD is an evolution of TDD. Writing scenarios from the beginning enforces quality-first and test-first mindsets. BDD automation can run scenarios to fail until the feature is implemented and causes tests to pass.

#8: Code Reuse

Given-When-Then steps can be reused between scenarios. The underlying implementation for each step does not change. Automation code becomes very modular.

#9: Parameterization

Scenario steps can be parameterized to be even more reusable. For example, a step to click a button can take in its ID. Parameterization can help a team adopt a common, reusable set of steps, and it inspires healthier discussion when writing scenarios.

#10: Variation

Scenario outlines make it easy to run the same scenario with different combinations of inputs. This is a simple but powerful way to expand test coverage without code duplication, which is the bane of test automation.

#11: Momentum

BDD has a snowball effect: scenarios become easier and faster to write and automate as more step definitions are added. Scenarios typically share common steps. Sometimes, new scenarios need nothing more than different step parameters or just one new line.

#12: Adaptability

BDD scenarios are easy to update as the product changes. Plain language is easy to edit. Modular design makes changes to automation code safer. Scenarios can also be filtered by tag name to decide what runs and what doesn’t.

BDD 101: Frameworks

Every major programming language has a BDD automation framework. Some even have multiple choices. Building upon the structural basics from the previous post, this post provides a survey of the major frameworks available today. Since I cannot possibly cover every BDD framework in depth in this 101 series, my goal is to empower you, the reader, to pick the best framework for your needs. Each framework has support documentation online justifying its unique goodness and detailing how to use it, and I would prefer not to duplicate documentation. Use this post primarily as a reference. (Check the Automation Panda BDD page for the full table of contents.)

Major Frameworks

Most BDD frameworks are Cucumber versions, JBehave derivatives inspired by Dan North, or non-Gherkin spec runners. Some put behavior scenarios into separate files, while others put them directly into the source code.

C# and Microsoft .NET

SpecFlow, created by Gáspár Nagy, is arguably the most popular BDD framework for Microsoft .NET languages. Its tagline is “Cucumber for .NET” – thus fully compliant with Gherkin. SpecFlow also has polished, well-designed hooks, context injection, and parallel execution (especially with test thread affinity). The basic package is free and open source, but SpecFlow also sells licenses for SpecFlow+ extensions. The free version requires a unit test runner like MsTest, NUnit, or xUnit.net in order to run scenarios. This makes SpecFlow flexible but also feels jury-rigged and inelegant. The licensed version provides a slick runner named SpecFlow+ Runner (which is BDD-friendly) and a Microsoft Excel integration tool named SpecFlow+ Excel. Microsoft Visual Studio has extensions for SpecFlow to make development easier.

There are plenty of other BDD frameworks for C# and .NET, too. xBehave.net is an alternative that pairs nicely with xUnit.net. A major difference of xBehave.net is that scenario steps are written directly in the code, instead of in separate text (feature) files. LightBDD bills itself as being more lightweight than other frameworks and basically does some tricks with partial classes to make the code more readable. NSpec is similar to RSpec and Mocha and uses lambda expressions heavily. Concordion offers some interesting ways to write specs, too. NBehave is a JBehave descendant, but the project appears to be dead without any updates since 2014.

Java and JVM Languages

The main Java rivalry is between Cucumber-JVM and JBehave. Cucumber-JVM is the official Cucumber version for Java and other JVM languages (Groovy, Scala, Clojure, etc.). It is fully compliant with Gherkin and generates beautiful reports. The Cucumber-JVM driver can be customized, as well. JBehave is one of the first and foremost BDD frameworks available. It was originally developed by Dan North, the “father of BDD.” However, JBehave is missing key Gherkin features like backgrounds, doc strings, and tags. It was also a pure-Java implementation before Cucumber-JVM existed. Both frameworks are widely used, have plugins for major IDEs, and distribute Maven packages. This popular but older article compares the two in slight favor of JBehave, but I think Cucumber-JVM is better, given its features and support.

The Automation panda article Cucumber-JVM for Java is a thorough guide for the Cucumber-JVM framework.

Java also has a number of other BDD frameworks. JGiven uses a fluent API to spell out scenarios, and pretty HTML reports print the scenarios with the results. It is fairly clean and concise. Spock and JDave are spec frameworks, but JDave has been inactive for years. Scalatest for Scala also has spec-oriented features. Concordion also provides a Java implementation.

JavaScript

Almost all JavaScript BDD frameworks run on Node.js. Jasmine and Mocha are two of the most popular general-purpose JS test frameworks. They differ in that Jasmine has many features included (like assertions and spies) that Mocha does not. This makes Jasmine easier to get started (good for beginners) but makes Mocha more customizable (good for power users). Both claim to be behavior-driven because they structure tests using “describe” and “it-should” phrases in the code, but they do not have the advantage of separate, reusable steps like Gherkin. Personally, I consider Jasmine and Mocha to be behavior-inspired but not fully behavior-driven.

Other BDD frameworks are more true to form. Cucumber provides Cucumber.js for Gherkin-compliant happiness. Yadda is Gherkin-like but with a more flexible syntax. Vows provides a different way to approach behavior using more formalized phrase partitions for a unique form of reusability. The Cucumber blog argues that Cucumber.js is best due to its focus on good communication through plain language steps, whereas other JavaScript BDD frameworks are more code-y. (Keep in mind, though, that Cucumber would naturally boast of its own framework.) Other comparisons are posted here, here, here, and here.

PHP

The two major BDD frameworks for PHP are Behat and Codeception. Behat is the official Cucumber version for PHP, and as such is seen as the more “pure” BDD framework. Codeception is more programmer-focused and can handle other styles of testing. There are plenty of articles comparing the two – here, here, and here (although the last one seems out of date). Both seem like good choices, but Codeception seems more flexible.

Python

Python has a plethora of test frameworks, and many are BDD. behave and lettuce are probably the two most popular players. Feature comparison is analogous to Cucumber-JVM versus JBehave, respectively: behave is practically Gherkin compliant, while lettuce lacks a few language elements. Both have plugins for major IDEs. pytest-bdd is on the rise because it integrates with all the wonderful features of pytest. radish is another framework that extends the Gherkin language to include scenario loops, scenario preconditions, and variables. All these frameworks put scenarios into separate feature files. They all also implement step definitions as functions instead of classes, which not only makes steps feel simpler and more independent, but also avoids unnecessary object construction.

Other Python frameworks exist as well. pyspecs is a spec-oriented framework. Freshen was a BDD plugin for Nose, but both Freshen and Nose are discontinued projects.

Ruby

Cucumber, the gold standard for BDD frameworks, was first implemented in Ruby. Cucumber maintains the official Gherkin language standard, and all Cucumber versions are inspired by the original Ruby version. Spinach bills itself as an enhancement to Cucumber by encapsulating steps better. RSpec is a spec-oriented framework that does not use Gherkin.

Which One is Best?

There is no right answer – the best BDD framework is the one that best fits your needs. However, there are a few points to consider when weighing your options:

What programming language should I use for test automation?
Is it a popular framework that many others use?
Is the framework actively supported?
Is the spec language compliant with Gherkin?
What type of testing will you do with the framework?
What are the limitations as compared to other frameworks?

Frameworks that separate scenario text from implementation code are best for shift-left testing. Frameworks that put scenario text directly into the source code are better for white box testing, but they may look confusing to less experienced programmers.

Personally, my favorites are SpecFlow and pytest-bdd. At LexisNexis, I used SpecFlow and Cucumber-JVM. For Python, I used behave at MaxPoint, but I have since fallen in love with pytest-bdd since it piggybacks on the wonderfulness of pytest. (I can’t wait for this open ticket to add pytest-bdd support in PyCharm.) For skill transferability, I recommend Gherkin compliance, as well.

Reference Table

The table below categorizes BDD frameworks by language and type for quick reference. It also includes frameworks in languages not described above. Recommended frameworks are denoted with an asterisk (*). Inactive projects are denoted with an X (x).

Language	Framework	Type
C	Catch	In-line Spec
C++	Igloo	In-line Spec
C# and .NET	Concordion LightBDD NBehave x NSpec SpecFlow * xBehave.net	In-line Spec In-line Gherkin Separated semi-Gherkin In-line Spec Separated Gherkin In-line Gherkin
Golang	Ginkgo	In-line Spec
Java and JVM	Cucumber-JVM * JBehave JDave x JGiven * Scalatest Spock	Separated Gherkin Separated semi-Gherkin In-line Spec In-line Gherkin In-line Spec In-line Spec
JavaScript	Cucumber.js * Yadda Jasmine Mocha Vows	Separated Gherkin Separated semi-Gherkin In-line Spec In-line Spec In-line Spec
Perl	Test::BDD::Cucumber	Separated Gherkin
PHP	Behat Codeception *	Separated Gherkin Separated or In-line
Python	behave * freshen x lettuce pyspecs pytest-bdd * radish	Separated Gherkin Separated Gherkin Separated semi-Gherkin In-line Spec Separated semi-Gherkin Separated Gherkin-plus
Ruby	Cucumber * RSpec Spinach	Separated Gherkin In-line Spec Separated Gherkin
Swift / Objective C	Quick	In-line Spec

[4/22/2018] Update: I updated info for C# and Python frameworks.

BDD 101: Automation

Better automation is one of BDD’s hallmark benefits. In fact, the main goal of BDD could be summarized as rapidly turning conceptualized behavior into automatically tested behavior. While the process and the Gherkin are universal, the underlying automation could be built using one of many frameworks.

This post explains how BDD automation frameworks work. It focuses on the general structure of the typical framework – it is not a tutorial on how to use any specific framework. However, I wrote short examples for each piece using Python’s behave framework, since learning is easier with examples. I chose to use Python here simply for its conciseness. (Check the Automation Panda BDD page for the full table of contents.)

Framework Parts

Every BDD automation framework has five major pieces:

#1: Feature Files

Gherkin feature files are very much part of the automation. They act like test scripts – each scenario is essentially a test case. Previous posts covered Gherkin in depth.

Here is an example feature file named google_search.feature:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  # This scenario should look familiar
  @automated @google-search @panda
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

#2: Step Definitions

A step definition is a code block that implements the logic to execute a step. It is typically a method or function with the English-y step phrase as an annotation. Step definitions can take in arguments, doc strings, and step tables. They may also make assertions to pass or fail a scenario. In most frameworks, data can be passed between steps using some sort of context object. When a scenario is executed, the driver matches each scenario step phrase to its step definition. (Most frameworks use regular expressions for phrase matching.) Thus, every step in a feature file needs a step definition.

The step definitions would be written in a Python source file like this:

from behave import *

@given('a web browser is on the Google page')
def step_impl(context):
  context.google_page.load();

@when('the search phrase "{phrase}" is entered')
def step_impl(context, phrase):
  context.google_page.search(phrase)

@then('the results for "{phrase}" are shown')
def step_impl(context, phrase):
  assert context.google_page.has_results(phrase)

#3: Hooks

Certain automation logic cannot be handled by step definitions. For example, scenarios may need special setup and cleanup operations. Most BDD frameworks provide hooks that can insert calls before or after Gherkin sections, typically filterable using tags. Hooks are similar in concept to aspect-oriented programming.

In behave, hooks are written in a Python source file named environment.py:

import page_objects
from selenium import webdriver

def before_all(context):
  context.browser = webdriver.Chrome()

def before_scenario(context):
  context.google_page = page_objects.GooglePage(context.browser)

def after_all(context):
  context.browser.quit()

#4: Support Code

Support code (a.k.a libraries or packages) refers to any code called by step definitions and hooks. Support code could be dependency packages downloaded using managers like Maven (Java), NuGet (.NET), or PyPI (Python). For example, Selenium WebDriver is a well-known package for web browser automation. Support code could also be components to assist automation, such as page objects or other design patterns. As the cliché goes, “Don’t reinvent the wheel.” Step definitions and hooks should not contain all of the logic for running the actions – they should reuse common code as much as possible.

A Python page object class from the page_objects.py module could look like this:

class GooglePage(object):
  """A page object for the Google home page"""
  
  def __init__(self, browser):
    self.browser = browser
  
  def load():
    # put code here
    pass
  
  def search(phrase):
    # put code here
    pass
  
  def has_results(phrase):
    # put code here
    return False

#5: Driver

Every automation framework has a driver that runs tests, and BDD frameworks are no different. The driver executes each scenario in a feature file independently. Whenever a failure happens, the driver reports the failure and aborts the scenario. Drivers typically have discovery mechanisms for selecting scenarios to run based on tag names or file paths.

The behave driver can be launched from the command line like this:

> behave google_search.py --tags @panda

Automation Advantages

Even if a team does not apply behavior-driven practices to its full development process, BDD test frameworks still have some significant advantages over non-BDD test frameworks. First of all, steps make BDD automation very modular and thus reusable. Each step is an independent action, much like how each scenario is an independent behavior. Once a step definition is written, it may be reused by any number of scenarios. This is crucial, since most behaviors for a feature share common actions. And all steps are inherently self-documenting, since they are written in plain language. There is a natural connection between high-level behavior and low-level implementation.

Test execution also has advantages. Tags make it very easy to select tests to run, especially from the command line. Failures are very informative as well. The driver pinpoints precisely which step failed for which scenario. And since behaviors are isolated, a failure for one scenario is less likely to affect other test scenarios than would be the case for procedure-driven tests.

All of this is explained more thoroughly in the Automation Panda article, ‑‑BDD; Automation without Collaboration.

What About Test Data?

Test data is a huge concern for any automation framework. Simple test data values may be supplied directly in Gherkin as step arguments or table values, but larger test data sets require other strategies. Support code can be used to handle test data. Read BDD 101: Test Data for more information.

Available Frameworks

There are many BDD frameworks out there. The next post will introduce a few major frameworks for popular languages.

BDD 101: Writing Good Gherkin

So, you and your team have decided to make test automation a priority. You plan to use behavior-driven development to shift left with testing. You read the BDD 101 Series up through the previous post. You picked a good language for test automation. You even peeked at Cucumber-JVM or another BDD framework on your own. That’s great! Big steps! And now, you are ready to write your first Gherkin feature file. You fire open Atom with a Gherkin plugin or Notepad++ with a Gherkin UDL, you type “Given” on the first line, and…

Writer’s block. How am I supposed to write my Gherkin steps?

Good Gherkin feature files are not easy to write at first. Writing is definitely an art. With some basic pointers, and a bit of practice, Gherkin becomes easier. This post will cover how to write top-notch feature files. (Check the Automation Panda BDD page for the full table of contents.)

The Golden Gherkin Rule: Treat other readers as you would want to be treated. Write Gherkin so that people who don’t know the feature will understand it.

Proper Behavior

The biggest mistake BDD beginners make is writing Gherkin without a behavior-driven mindset. They often write feature files as if they are writing “traditional” procedure-driven functional tests: step-by-step instructions with actions and expected results. HP ALM, qTest, AccelaTest, and many other test repository tools store tests in this format. These procedure-driven tests are often imperative and trace a path through the system that covers multiple behaviors. As a result, they may be unnecessarily long, which can delay failure investigation, increase maintenance costs, and create confusion.

For example, let’s consider a test that searches for images of pandas on Google. Below would be a reasonable test procedure:

Open a web browser.
1. Web browser opens successfully.
Navigate to https://www.google.com/.
1. The web page loads successfully and the Google image is visible.
Enter “panda” in the search bar.
1. Links related to “panda” are shown on the results page.
Click on the “Images” link at the top of the results page.
1. Images related to “panda” are shown on the results page.

I’ve seen many newbies translate a test like this into Gherkin like the following:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Google Image search shows pictures
    Given the user opens a web browser
    And the user navigates to "https://www.google.com/"
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

This scenario is terribly wrong. All that happened was that the author put BDD buzzwords in front of each step of the traditional test. This is not behavior-driven, it is still procedure-driven.

The first two steps are purely setup: they just go to Google, and they are strongly imperative. Since they don’t focus on the desired behavior, they can be reduced to one declarative step: “Given a web browser is at the Google home page.” This new step is friendlier to read.

After the Given step, there are two When-Then pairs. This is syntactically incorrect: Given-When-Then steps must appear in order and cannot repeat. A Given may not follow a When or Then, and a When may not follow a Then. The reason is simple: any single When-Then pair denotes an individual behavior. This makes it easy to see how, in the test above, there are actually two behaviors covered: (1) searching from the search bar, and (2) performing an image search. In Gherkin, one scenario covers one behavior. Thus, there should be two scenarios instead of one. Any time you want to write more than one When-Then pair, write separate scenarios instead. (Note: Some BDD frameworks may allow disordered steps, but it would nevertheless be anti-behavioral.)

This splitting technique also reveals unnecessary behavior coverage. For instance, the first behavior to search from the search bar may be covered in another feature file. I once saw a scenario with about 30 When-Then pairs, and many were duplicate behaviors.

Do not be tempted to arbitrarily reassign step types to make scenarios follow strict Given-When-Then ordering. Respect the integrity of the step types: Givens set up initial state, Whens perform an action, and Thens verify outcomes. In the example above, the first Then step could have been turned into a When step, but that would be incorrect because it makes an assertion. Step types are meant to be guide rails for writing good behavior scenarios.

The correct feature file would look something like this:

Feature: Google Searching

  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  Scenario: Image search
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

The second behavior arguably needs the first behavior to run first because the second needs to start at the search result page. However, since that is merely setup for the behavior of image searching and is not part of it, the Given step in the second scenario can basically declare (declaratively) that the “panda” search must already be done. Of course, this means that the “panda” search would be run redundantly at test time, but the separation of scenarios guarantees behavior-level independence.

The Cardinal Rule of BDD: One Scenario, One Behavior!

Remember, behavior scenarios are more than tests – they also represent requirements and acceptance criteria. Good Gherkin comes from good behavior.

(For deeper information about the Cardinal Rule of BDD and multiple When-Then pairs per scenario, please refer to my article, Are Gherkin Scenarios with Multiple When-Then Pairs Okay?)

Phrasing Steps

How you write a step matters. If you write a step poorly, it cannot easily be reused. Thankfully, some basic rules maintain consistent phrasing and maximum reusability.

Write all steps in third-person point of view. If first-person and third-person steps mix, scenarios become confusing. I even dedicated a whole blog post entirely to this point: Should Gherkin Steps Use First-Person or Third-Person? TL;DR: just use third-person at all times.

Write steps as a subject-predicate action phrase. It may tempting to leave parts of speech out of a step line for brevity, especially when using Ands and Buts, but partial phrases make steps ambiguous and more likely to be reused improperly. For example, consider the following example:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Google search result page elements
    Given the user navigates to the Google home page
    When the user entered "panda" at the search bar
    Then the results page shows links related to "panda"
    And image links for "panda"
    And video links for "panda"

The final two And steps lack the subject-predicate phrase format. Are the links meant to be subjects, meaning that they perform some action? Or, are they meant to be direct objects, meaning that they receive some action? Are they meant to be on the results page or not? What if someone else wrote a scenario for a different page that also had image and video links – could they reuse these steps? Writing steps without a clear subject and predicate is not only poor English but poor communication.

Also, use appropriate tense and phrasing for each type of step. For simplicity, use present tense for all step types. Rather than take a time warp back to middle school English class, let’s illustrate tense with a bad example:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Simple Google search
    Given the user navigates to the Google home page
    When the user entered "panda" at the search bar
    Then links related to "panda" will be shown on the results page

The Given step above uses present tense, but its subject is misleading. It indicates an action when it says, “Given the user navigates.” Actions imply the exercise of behavior. However, Given steps are meant to establish an initial state, not exercise a behavior. This may seem like a trivial nuance, but it can confuse feature file authors who may not be able to tell if a step is a Given or When. A better phrasing would be, “Given the Google home page is displayed.” It establishes a starting point for the scenario. Use present tense with an appropriate subject to indicate a state rather than an action.

The When step above uses past tense when it says, “The user entered.” This indicates that an action has already happened. However, When steps should indicate that an action is presently happening. Plus, past tense here conflicts with the tenses used in the other steps.

The Then step above uses future tense when it says, “The results will be shown.” Future tense seems practical for Then steps because it indicates what the result should be after the current action is taken. However, future tense reinforces a procedure-driven approach because it treats the scenario as a time sequence. A behavior, on the other hand, is a present-tense aspect of the product or feature. Thus, it is better to write Then steps in the present tense.

The corrected example looks like this:

Feature: Google Searching

  Scenario: Simple Google search
    Given the Google home page is displayed
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

And note, all steps are written in third-person. Read Should Gherkin Steps use Past, Present, or Future Tense? to learn more.

Good Titles

Good titles are just as important as good steps. The title is like the face of a scenario – it’s the first thing people read. It must communicate in one concise line what the behavior is. Titles are often logged by the automation framework as well. Specific pointers for writing good scenario titles are given in my article, Good Gherkin Scenario Titles.

Choices, Choices

Another common misconception for beginners is thinking that Gherkin has an “Or” step for conditional or combinatorial logic. People may presume that Gherkin has “Or” because it has “And”, or perhaps programmers want to treat Gherkin like a structured language. However, Gherkin does not have an “Or” step. When automated, every step is executed sequentially.

Below is a bad example based on a classic Super Mario video game, showing how people might want to use “Or”:

# BAD EXAMPLE! Do not copy.
Feature: SNES Mario Controls

  Scenario: Mario jumps
    Given a level is started
    When the player pushes the "A" button
    Or the player pushes the "B" button
    Then Mario jumps straight up

Clearly, the author’s intent is to say that Mario should jump when the player pushes either of two buttons. The author wants to cover multiple variations of the same behavior. In order to do this the right way, use Scenario Outline sections to cover multiple variations of the same behavior, as shown below:

Feature: SNES Mario Controls

  Scenario Outline: Mario jumps
    Given a level is started
    When the player pushes the "<letter>" button
    Then Mario jumps straight up
    
    Examples: Buttons
      | letter |
      | A      |
      | B      |

The Known Unknowns

Test data can be difficult to handle. Sometimes, it may be possible to seed data in the system and write tests to reference it, but other times, it may not. Google search is the prime example: the result list will change over time as both Google and the Internet change. To handle the known unknowns, write scenarios defensively so that changes in the underlying data do not cause test runs to fail. Furthermore, to be truly behavior-driven, think about data not as test data but as examples of behavior.

Consider the following example from the previous post:

Feature: Google Searching
  
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown
    And the following related results are shown
      | related       |
      | Panda Express |
      | giant panda   |
      | panda videos  |

This scenario uses a step table to explicitly name results that should appear for a search. The step with the table would be implemented to iterate over the table entries and verify each appeared in the result list. However, what if Panda Express were to go out of business and thus no longer be ranked as high in the results? (Let’s hope not.) The test run would then fail, not because the search feature is broken, but because a hard-coded variation became invalid. It would be better to write a step that more intelligently verified that each returned result somehow related to the search phrase, like this: “And links related to ‘panda’ are shown on the results page.” The step definition implementation could use regular expression parsing to verify the presence of “panda” in each result link.

Another nice feature of Gherkin is that step definitions can hide data in the automation when it doesn’t need to be exposed. Step definitions may also pass data to future steps in the automation. For example, consider another Google search scenario:

Feature: Google Searching

  Scenario: Search result linking
    Given Google search results for "panda" are shown
    When the user clicks the first result link
    Then the page for the chosen result link is displayed

Notice how the When step does not explicitly name the value of the result link – it simply says to click the first one. The value of the first link may change over time, but there will always be a first link. The Then step must know something about the chosen link in order to successfully verify the outcome, but it can simply reference it as “the chosen result link”. Behind the scenes, in the step definitions, the When step can store the value of the chosen link in a variable and pass the variable forward to the Then step.

Handling Test Data

Some types of test data should be handled directly within the Gherkin, but other types should not. Remember that BDD is specification by example – scenarios should be descriptive of the behaviors they cover, and any data written into the Gherkin should support that descriptive nature. Read Handling Test Data in BDD for comprehensive information on handling test data.

Less is More

Scenarios should be short and sweet. I typically recommend that scenarios should have a single-digit step count (<10). Long scenarios are hard to understand, and they are often indicative of poor practices. One such problem is writing imperative steps instead of declarative steps. I have touched on this topic before, but I want to thoroughly explain it here.

Imperative steps state the mechanics of how an action should happen. They are very procedure-driven. For example, consider the following When steps for entering a Google search:

When the user scrolls the mouse to the search bar
And the user clicks the search bar
And the user types the letter “p”
And the user types the letter “a”
And the user types the letter “n”
And the user types the letter “d”
And the user types the letter “a”
And the user types the ENTER key

Now, the granularity of actions may seem like overkill, but it illustrates the point that imperative steps focus very much on how actions are taken. Thus, they often need many steps to fully accomplish the intended behavior. Furthermore, the intended behavior is not always as self-documented as with declarative steps.

Declarative steps state what action should happen without providing all of the information for how it will happen. They are behavior-driven because they express action at a higher level. All of the imperative steps in the example above could be written in one line: “When the user enters ‘panda’ at the search bar.” The scrolling and keystroking is implied, and it will ultimately be handled by the automation in the step definition. When trying to reduce step count, ask yourself if your steps can be written more declaratively.

Another reason for lengthy scenarios is scenario outline abuse. Scenario outlines make it all too easy to add unnecessary rows and columns to their Examples tables. Unnecessary rows waste test execution time. Extra columns indicate complexity. Both should be avoided. Below are questions to ask yourself when facing an oversized scenario outline:

Does each row represent an equivalence class of variations?
- For example, searching for “elephant” in addition to “panda” does not add much test value.
Does every combination of inputs need to be covered?
- N columns with M inputs each generates M^N possible combinations.
- Consider making each input appear only once, regardless of combination.
Do any columns represent separate behaviors?
- This may be true if columns are never referenced together in the same step.
- If so, consider splitting apart the scenario outline by column.
Does the feature file reader need to explicitly know all of the data?
- Consider hiding some of the data in step definitions.
- Some data may be derivable from other data.

These questions are meant to be sanity checks, not hard-and-fast rules. The main point is that scenario outlines should focus on one behavior and use only the necessary variations.

Style and Structure

While style often takes a backseat during code review, it is a factor that differentiates good feature files from great feature files. In a truly behavior-driven team, non-technical stakeholders will rely upon feature files just as much as the engineers. Good writing style improves communication, and good communication skills are more than just resume fluff.

Below are a number of tidbits for good style and structure:

Focus a feature on customer needs.
Limit one feature per feature file. This makes it easy to find features.
Limit the number of scenarios per feature. Nobody wants a thousand-line feature file. A good measure is a dozen scenarios per feature.
Limit the number of steps per scenario to less than ten.
Limit the character length of each step. Common limits are 80-120 characters.
Use proper spelling.
Use proper grammar.
Capitalize Gherkin keywords.
Capitalize the first word in titles.
Do not capitalize words in the step phrases unless they are proper nouns.
Do not use punctuation (specifically periods and commas) at the end of step phrases.
Use single spaces between words.
Indent the content beneath every section header.
Separate features and scenarios by two blank lines.
Separate examples tables by 1 blank line.
Do not separate steps within a scenario by blank lines.
Space table delimiter pipes (“|”) evenly.
Adopt a standard set of tag names. Avoid duplicates.
Write all tag names in lowercase, and use hyphens (“-“) to separate words.
Limit the length of tag names.

Without these rules, you might end up with something like this:

# BAD EXAMPLE! Do not copy.

 Feature: Google Searching
     @AUTOMATE @Automated @automation @Sprint32GoogleSearchFeature
 Scenario outline: GOOGLE STUFF
Given a Web Browser is on the Google page,
 when The seach phrase "<phrase>" Enter,

 Then  "<phrase>" shown.
and The relatedd   results include "<related>".
Examples: animals
 | phrase | related |
| panda | Panda Express        |
| elephant    | elephant Man  |

Don’t do this. It looks horrible. Please, take pride in your profession. While the automation code may look hairy in parts, Gherkin files should look elegant.

Gherkinize Those Behaviors!

With these best practices, you can write Gherkin feature files like a pro. Don’t be afraid to try: nobody does things perfectly the first time. As a beginner, I broke many of the guidelines I put in this post, but I learned as I went. Don’t give up if you get stuck. Always remember the Golden Gherkin Rule and the Cardinal Rule of BDD!

This is the last of three posts in the series focused exclusively on Gherkin. The next post will address how to adopt behavior-driven practices into the Agile software development process.

BDD 101: Gherkin By Example

Gherkin is learned best by example. Whereas the previous post in this series focused on Gherkin syntax and semantics, this post will walk through a set of examples that show how to use all of the language parts. The examples cover basic Google searching, which is easy to explain and accessible to all. You can find other good example references from Cucumber and Behat. (Check the Automation Panda BDD page for the full table of contents.)

As a disclaimer, this post will focus entirely upon feature file examples and not upon automation through step definitions. Writing good Gherkin scenarios must come before implementing step definitions. Automation will be covered in future posts. (Note that these examples could easily be automated using Selenium.)

A Simple Feature File

Let’s start with the example from the previous post:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

This is a complete feature file. It starts with a required Feature section and a description. The description is optional, and it may have as many or as few lines as desired. The description will not affect automation at all – think of it as a comment. As an Agile best practice, it should include the user story for the features under test. This feature file then has one Scenario section with a title and one each of Given–When–Then steps in order. It could have more scenarios, but for simplicity, this example has only one. Each scenario will be run independently of the other scenarios – the output of one scenario has no bearing on the next! The indents and blank lines also make the feature file easy to read.

Notice how concise yet descriptive the scenario is. Any non-technical person can easily understand how Google searches should behave from reading this scenario. “Search for pandas? Get pandas!” The feature’s behavior is clear to the developer, the tester, and the product owner. Thus, this one feature file can be shared by all stakeholders and can dispel misunderstandings.

Another thing to notice is the ability to parameterize steps. Steps should be written for reusability. A step hard-coded to search for pandas is not very reusable, but a step parameterized to search for any phrase is. Parameterization is handled at the level of the step definitions in the automation code, but by convention, it is a best practice to write parameterized values in double-quotes. This makes the parameters easy to identify.

Additional Steps

Not all behaviors can be fully described using only three steps. Thankfully, scenarios can have any number of steps using And and But. Let’s extend the previous example:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown
    And the related results include "Panda Express"
    But the related results do not include "pandemonium"

Now, there are three Then steps to verify the outcome. And and But steps can be attached to any type of step. They are interchangeable and do not have any unique meaning – they exist simply to make scenarios more readable. For example, the scenario above could have been written as Given-When-Then-Then-Then, but Given-When-Then-And-But makes more sense. Furthermore, And and But do not represent any sort of conditional logic. Gherkin steps are entirely sequential and do not branch based on if/else conditions.

Doc Strings

In-line parameters are not the only way to pass inputs to a step. Doc strings can pass larger pieces of text as inputs like this:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown
    And the result page displays the text
      """
      Scientific name: Ailuropoda melanoleuca
      Conservation status: Endangered (Population decreasing)
      """

Doc strings are delimited by three double-quotes ‘”””‘. They may fit onto one line, or they may be multiple lines long. The step definition receives the doc string input as a plain old string. Gherkin doc strings are reminiscent of Python docstrings in format.

Step Tables

Tables are a valuable way to provide data with concise syntax. In Gherkin, a table can be passed into a step as an input. The example above can be rewritten to use a table for related results like this:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown
    And the following related results are shown
      | related       |
      | Panda Express |
      | giant panda   |
      | panda videos  |

Step tables are delimited by the pipe symbol “|”. They may have as many rows or columns as desired. The first row contains column names and is not treated as input data. The table is passed into the step definition as a data structure native to the language used for automation (such as an array). Step tables may be attached to any step, but they will be connected to that step only. For good formatting, remember to indent the step table and to space the delimiters evenly.

The Background Section

Sometimes, scenarios in a feature file may share common setup steps. Rather than duplicate these steps, they can be put into a Background section:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  Background:
    Given a web browser is on the Google page

  Scenario: Simple Google search for pandas
    When the search phrase "panda" is entered
    Then results for "panda" are shown

  Scenario: Simple Google search for elephants
    When the search phrase "elephant" is entered
    Then results for "elephant" are shown

Since each scenario is independent, the steps in the Background section will run before each scenario is run, not once for the whole set. The Background section does not have a title. It can have any type or number of steps, but as a best practice, it should be limited to Given steps.

Scenario Outlines

Scenario outlines bring even more reusability to Gherkin. Notice in the example above that the two scenarios are identical apart from their search terms. They could be combined with a Scenario Outline section:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  Scenario Outline: Simple Google searches
    Given a web browser is on the Google page
    When the search phrase "<phrase>" is entered
    Then results for "<phrase>" are shown
    And the related results include "<related>"
    
    Examples: Animals
      | phrase   | related       |
      | panda    | Panda Express |
      | elephant | Elephant Man  |

Scenario outlines are parameterized using Examples tables. Each Examples table has a title and uses the same format as a step table. Each row in the table represents one test instance for that particular combination of parameters. In the example above, there would be two tests for this Scenario Outline. The table values are substituted into the steps above wherever the column name is surrounded by the “<” “>” symbols.

A Scenario Outline section may have multiple Examples tables. This may make it easier to separate combinations. For example, tables could be added for “Planets” and “Food”. Each Examples table is connected to the Scenario Outline section immediately preceding it. A feature file can have any number of Scenario Outline sections, but make sure to write them well. (See Are Multiple Scenario Outlines in a Feature File Okay?)

Be careful not to confuse step tables with Examples tables! This is a common mistake for Gherkin beginners. Step tables provide input data structures, whereas Examples tables provide input parameterization.

Comments

Comments allow the author to add additional information to a feature file. In Gherkin, comments must use a whole line, and each line must start with a hashtag “#”. Comment lines may appear anywhere and are ignored by the automation framework. For example:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  # Test ID: 12345
  # Author: Andy
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

Since Gherkin is very self-documenting, it is a best practice to limit the use of comments in favor of more descriptive steps and titles.

Writing Good Gherkin

This post merely shows how to use the Gherkin syntax. The next post will cover how to write good Gherkin feature files.

BDD 101: The Gherkin Language

As mentioned in the previous post, behavior scenarios are the cornerstone of BDD. Each scenario is the formalized specification of a single behavior of a product or feature. Scenarios are both the requirements for the feature as well as the test cases. This post will show how to write behavior scenarios in Gherkin feature files. (Check the Automation Panda BDD page for the full table of contents.)

Introducing Gherkin

Gherkin is the domain-specific language for writing behavior scenarios. It is a simple programming language, and its “code” is written into feature files (text files with a “.feature” extension). The official Gherkin language standard is maintained by Cucumber, one of the most prevalent BDD automation frameworks. Most other BDD frameworks use Gherkin, but some may not conform 100% to Cucumber’s language standards.

Gherkin scenarios are meant to be short and to sound like plain English. Each scenario has the following structure:

Given some initial state
When an action is taken
Then verify an outcome

A simple feature file example is shown below, with keywords in bold:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

As you can see, it reads intuitively. Even non-technical people can understand it.

The Feature section has a title and a description, which are both used only for documentation purposes. When the feature is tied to an Agile user story, it is good practice to put the user story in the description. The Feature section has one or more Scenario sections, each with a unique title.

Each scenario is essentially a test case. The Given-When-Then format concisely frames the behavior under test. Each Given, When, or Then line is called a step. Steps must appear in the order of Given->When->Then and are executed sequentially. The Given step sets up the expected state before the main actions take place (like loading the Google home page). The When step contains the actions for exercising the behavior under test (running a Google search), and the Then step verifies that the behavior was successful (seeing the results page). The English-y phrase following the step keyword is a description of what the step will do, written by the test author. This description is linked to a step definition (a method/function that implements the operations for the step) in the automation code base using string or regular expression matching. (Feature files apart from step definitions are basically manual test case procedures.) Good steps are declarative in that they state what should happen at a high level, and not imperative because they shouldn’t focus on direct, low-level instructions.

Gherkin Keywords

Every programming language has its keywords, and Gherkin is no different. The table below explains how each keyword is used in the official Gherkin language. Note that some BDD frameworks may not be fully compliant. Cucumber provides a decent Gherkin language reference for its implementation.

Keyword	Purpose
Feature	section denoting product or feature under test contains a one-line title contains extra lines for description description should include the user story may have one Background section may have multiple Scenario and Scenario Outline sections should be one Feature per feature file
Scenario	section for a specific behavior scenario contains a one-line title contains multiple Given, When, and Then steps each type of step is optional step order matters each scenario runs independently
Given	step to define the preconditions (initial state or context) should put the product under test into the desired state may be parameterized
When	step to define the action to be performed may be parameterized
Then	step to define the expected result from the action taken by When may be parameterized
And	an additional step added to a Given, When, or Then used instead of repeating Given, When, or Then example: Given-Given-When-Then = Given-And-When-Then associated with the immediately preceding step order matters
But	functions the same as And, but might be easier to read interchangeable with And
Background	a section of Given and And statements to run before each scenario does not have a title or description only one Background for each Feature section
Scenario Outline	a templated scenario section uses “<” and “>” to identify parameter names followed by Examples tables that provides parameter values may have more than one Examples tables parameters are substituted when the tests run
Examples	a section to provide a table of parameter values for a Scenario Outline each table row represents a combination of values to test together may have any positive number of rows
\|	table delimeter used for Examples tables and step tables use the escape sequence “\\|” to use pipe characters as text within a column
“””	doc string delimiter for passing large text into a step doc strings may be multi-line
@	prefix for a tag: @ tags may be placed before Feature or Scenario sections tags are used to filter scenarios
#	prefix for a comment line comments are not read by the Gherkin parser

The next post will walk through several Gherkin examples to show how to write good scenarios.

BDD 101: Introducing BDD

Series Overview

BDD 101 is a blog series to teach the basics of behavior-driven development. It is both a “getting started” guide for BDD beginners, as well as a best-practice reference for pros. I wrote this series for anyone involved in the daily duties of software development: developers, testers, scrum masters, product owners, and managers. The content in this series comes from my experiences using BDD for many projects. It focuses on Gherkin-based specification, and test automation will be a major theme. If this series is for you, then let’s dive in!

The BDD 101 table of contents is given on the Automation Panda BDD page. Note that some articles in the series were posted months apart and will not all appear together using the “previous” and “next” article arrows.

The Big BDD Picture: The main goals of BDD are collaboration and automation.

What is a Behavior?

A behavior is how a product or feature operates. It is defined as a scenario of inputs, actions, and outcomes. A product or feature exhibits countless behaviors. Identifying behaviors individually brings clarity and simplicity. It also helps explain how behaviors are related. Below are examples of behaviors:

Logging into a web page
Clicking links on a navigation bar
Submitting forms
Making successful service calls
Receiving expected errors

Separating individual behaviors makes it easy to define a system without unnecessary repetition. For example, there may be multiple ways to navigate to the same page.

Search from a text field and searching directly from URL parameters both lead to the same results page.

What is BDD?

Behavior-Driven Development (BDD) is a test-centric software development process that grew out of Test-Driven Development (TDD). It has been around since roughly the mid-2000s. BDD focuses on clearly identifying the desired behavior of a feature from the very start. Behaviors are identified using specification by example: behavior specs are written to illustrate the desired behavior with realistic examples, rather than being written with abstract, generic jargon. They serve as both the product’s requirements/acceptance criteria (before development) and its test cases (after development). Gherkin is one of the most popular languages for writing formal behavior specifications – it captures behaviors as “Given-When-Then” scenarios. With the help of automation tools, scenarios can easily be turned into automated test cases. Anybody from engineers to product owners can write BDD scenarios, since they are just English phrases. BDD keeps developers focused on delivering precisely what the product owner wants. It also expedites testing. As such, BDD pairs nicely with Agile Software Development.

Quick Points

BDD is specification by example.
- When someone says “BDD”, immediately think of “Given-When-Then”.
BDD focuses on behavior first.
- Behavior scenarios are the cornerstone of BDD.
BDD is a refinement of the Agile process, not an overhaul.
- It formalizes acceptance criteria and test coverage.
BDD is a paradigm shift.
- Behaviors become the team’s main focus.

The Origins of BDD

The following quote comes from an article entitled Introducing BDD, written by Dan North (the “Father of BDD”) in March 2006:

I had a problem. While using and teaching agile practices like test-driven development (TDD) on projects in different environments, I kept coming across the same confusion and misunderstandings. Programmers wanted to know where to start, what to test and what not to test, how much to test in one go, what to call their tests, and how to understand why a test fails.

The deeper I got into TDD, the more I felt that my own journey had been less of a wax-on, wax-off process of gradual mastery than a series of blind alleys. I remember thinking “If only someone had told me that!” far more often than I thought “Wow, a door has opened.” I decided it must be possible to present TDD in a way that gets straight to the good stuff and avoids all the pitfalls.

My response is behaviour-driven development (BDD). It has evolved out of established agile practices and is designed to make them more accessible and effective for teams new to agile software delivery. Over time, BDD has grown to encompass the wider picture of agile analysis and automated acceptance testing.

12 Awesome Benefits

BDD improves the development process in a dozen ways:

Inclusion	Anyone can write BDD scenarios, because they are written in plain English. Think of The Three Amigos.
Clarity	Scenarios focus specifically on the expected behavior of the product under development, resulting in less ambiguity for what to develop.
Streamlining	Requirements = acceptance criteria = test cases. Modular syntax expedites automation as well.
Shift-Left	Test case definition inherently becomes part of grooming.
Artifacts	Scenarios form a collection of test cases. Any tests not automated can be added to a known automation backlog.
Automation	BDD frameworks make it easy to turn scenarios into automated tests.
Test-Driven	Most BDD frameworks can run scenarios to fail until the feature is implemented.
Code Reuse	“Given-When-Then” steps can be reused between scenarios.
Parameterization	Steps can be parameterized. For example, a step to click a button can take in its ID.
Variation	Using parameters, example tables make it easy to run the same scenario with different combinations of inputs.
Momentum	Scenarios become easier and faster to write and automate as more step definitions are added.
Adaptability	Scenarios are easy to rewrite as the products and features change.

Testing Recommendations

Since BDD focuses on actual feature behavior, behavior specs are best for higher-level, functional, black box tests. For example, BDD is great for testing APIs and web UIs. Gherkin excels for acceptance testing. However, behavior specs would be overkill for unit tests, and it is also not a good choice for performance tests that focus on metrics and not pass/fail results. Read more about this in the article BDD 101: Unit, Integration, and End-to-End Tests.

Next Steps

Lost yet? Don’t worry – this first post presented a lot of information. Things will make much more sense after learning how to write Gherkin test scenarios, which will be covered in the next post in this 101 series.

Why is Automation Full of Duplicate Code?

Copypasta
noun

A block of text that is duplicated repeatedly via "copy-paste",
    often causing annoyance or frustration.

One the the biggest problems (if not the biggest problem) I have seen in test automation is copypasta – the unnecessary duplication of code. It happens at all layers of testing. It happens in any type of project. It happens at companies big and small. And the consequences are stark: test development slows down, mistakes become more common, and maintenance becomes a nightmare. Although duplicate code can happen in any software project, it is especially prevalent in test automation. The reasons may or may not surprise you, but the solutions are clear.

4 Reasons Why Duplicate Code Pervades Automation

#1: Test cases are repetitive. For any given product, tests will share many of the same steps. For example, web app tests must all navigate to a start page at first, or API tests might cover a few variations for one call. Testing mechanics such as input parameters, setup/cleanup, logging, and assertions happen frequently. Put all of that together into test suites that have tens, hundreds, thousands, or even more test cases. It’s simply the nature of testing.

#2: Automation frameworks reinforce repetition. Most frameworks structure test cases as a class with methods (like JUnit) or as a collection of functions (like pytest), in which each method or function represents one test. Inherently, this basic structure is a good thing for making tests independent. However, lazy programmers may abuse the structure. Often, they put all test code inside these test methods, instead of extracting repetitive logic into helper methods or design patterns. Then, it becomes easier to simply duplicate an entire test case method and change a few things, rather than to implement a better overall design.

#3: Test code takes a backseat to product code. Business needs drive software development in the industry, and since test code is not part of the product delivered to customers, it is often deemed to be less important. Not as much devotion is given to developing good test code. Many best practices are abandoned for expediency.

#4: Testers often have weaker development skills. This is not a condemnation of testers, nor a universal labeling, but rather a distinction between disciplines: developers are developers because they are good at making software, and testers are testers because they are good at exercising software and finding bugs. Of course, I know plenty of testers who do indeed have strong dev skills. However, I also know solid testers who have limited programming experience. When automation responsibility falls upon testers with limited dev skills, poor development practices happen, and code duplication is typically rampant.

How to Avoid Duplicate Code in Automation

Code duplication is code cancer. -Andy

There are a number of ways to slay the copypasta monster. The first line of defense is to check yourself before you wreck yourself. Always question yourself when you copy-paste blocks of code. Why did you do that? What are you changing in the pasted copy? Should you abstract that logic into a method or a class that can be reused? Can you parameterize it? Override your Ctrl-C, Ctrl-V keyboard shortcut if necessary.

Be a good programmer. Develop packages for reusable actions. Things like assertions, logging, setup, and cleanup should be shared by all test cases. In that shared code, keep action calls short. Long method names with too many parameters inhibit usability. Remember that automated test cases should be self-documenting so that they read like test procedures. Whenever possible, make repetitive actions happen automatically. For example, make library methods do internal logging, and use test framework setup/cleanup routines. For another example, I once wrote code to automatically reconnect SSH sessions whenever they dropped. These auto-actions allow test case code to focus less on the low-level mechanics and more on the high-level features under test.

Finally, be a team player. Use the same development practices for test code as for product code. Automation is a product, and its customers are the team. Use coding standards, design patterns, and revision control. Most importantly, reinforce good practices through code review. Use the review process as a constructive way to learn new tricks and even to mentor less experienced team members. Finally, divide testing roles between test formulation, test case automation, and test framework development. “QA” (quality assurance) is a wide discipline, and not everyone is equally skilled. Let people do what they do best. There is strength in diversity and in teamwork.