Gherkin

Python Testing 101: behave

Warning: If you are new to BDD, then I strongly recommend reading the BDD 101 series before trying to use the behave framework.

Overview

behave is a behavior-driven (BDD) test framework that is very similar to Cucumber, Cucumber-JVM, and SpecFlow. BDD frameworks are unique in that test cases are not written in raw programming code but rather in plain specification language that is then “glued” to code. The “behavior specs” help to define what the behavior is, and steps can be reused by multiple test cases (or “scenarios”). This is very different from more traditional frameworks like unittest and pytest. Although behave is not an official Cucumber variant, it still uses the Gherkin language (“Given-When-Then”) for behavior specification.

Test scenarios are written in Gherkin “.feature” files. Each Given, When, and Then step is “glued” to a step definition – a Python function decorated by a matching string in a step definition module. The behave framework essentially runs feature files like test scripts. Hooks (in “environment.py”) and fixtures can also insert helper logic for test execution.

behave is officially supported for Python 2, but it seems to run just fine using Python 3.

Installation

Use pip to install the behave module.

pip install behave

Project Structure

Since behave is an opinionated framework, it has a very opinionated project structure. All code must be located under a directory named “features”. Gherkin feature files and the “environment.py” file for hooks must appear under “features”, and step definition modules must appear under “features/steps”. Configuration files can store common execution settings and even override the path to the “features” directory.

Note: Step definition module names do not need to be the same as feature file names. Any step definition can be used by any feature file within the same project.

[project root directory]
|‐‐ [product code packages]
|-- features
|   |-- environment.py
|   |-- *.feature
|   `-- steps
|       `-- *_steps.py
`-- [behave.ini|.behaverc|tox.ini|setup.cfg]

Example Code

An example project named behavior-driven-python located in GitHub shows how to write tests using behave. This section will explain how the Web tests are designed.

The top layer in a behave project is the set of Gherkin feature files. Notice how the scenario below is concise, focused, meaningful, and declarative:

@web @duckduckgo
Feature: DuckDuckGo Web Browsing
  As a web surfer,
  I want to find information online,
  So I can learn new things and get tasks done.

  # The "@" annotations are tags
  # One feature can have multiple scenarios
  # The lines immediately after the feature title are just comments

  Scenario: Basic DuckDuckGo Search
    Given the DuckDuckGo home page is displayed
    When the user searches for "panda"
    Then results are shown for "panda"

Each scenario step is “glued” to a decorated Python function called a step definition. Step defs can use different types of step matchers and can also take parametrized inputs:

from behave import *
from selenium.webdriver.common.keys import Keys

DUCKDUCKGO_HOME = 'https://duckduckgo.com/'

@given('the DuckDuckGo home page is displayed')
def step_impl(context):
  context.browser.get(DUCKDUCKGO_HOME)

@when('the user searches for "{phrase}"')
def step_impl(context, phrase):
  search_input = context.browser.find_element_by_name('q')
  search_input.send_keys(phrase + Keys.RETURN)

@then('results are shown for "{phrase}"')
def step_impl(context, phrase):
  links_div = context.browser.find_element_by_id('links')
  assert len(links_div.find_elements_by_xpath('//div')) > 0
  search_input = context.browser.find_element_by_name('q')
  assert search_input.get_attribute('value') == phrase

The “environment.py” file can specify hooks to execute additional logic before and after steps, scenarios, features, and even the whole test suite. Hooks should handle automation concerns that should not be exposed through Gherkin. For example, Selenium WebDriver setup and cleanup should be handled by hooks instead of step definitions because after hooks always get run despite failures, while steps after an abortive failure will not get run.

from selenium import webdriver

def before_scenario(context, scenario):
  if 'web' in context.tags:
    context.browser = webdriver.Firefox()
    context.browser.implicitly_wait(10)

def after_scenario(context, scenario):
  if 'web' in context.tags:
    context.browser.quit()

Test Launch

behave boasts a powerful command line with many options. Below are common use case examples when running tests from the project root directory:

# Run all scenarios in the project
behave

# Run all scenarios in a specific feature file
behave features/web.feature

# Filter tests by tag
behave --tags-help
behave --tags @duckduckgo
behave --tags ~@unit
behave --tags @basket --tags @add,@remove

# Write a JUnit report (useful for Jenkins and other CI tools)
behave --junit

# Don't print skipped scenarios
behave -k

Pros and Cons

Like all BDD test frameworks, behave is opinionated. It works best for black box testing due to its behavior focus. Web testing would be a great use case because user interactions can easily be described using plain language. Reusable steps also foster a snowball effect for automation development. However, behave would not be good for unit testing or low-level integration testing – the verbosity would become more of a hindrance than a helper.

My recommendation is to use behave for black box testing if the team has bought into BDD. I would also strongly consider pytest-bdd as an alternative BDD framework because it leverages all the goodness of pytest.

5 Things I Love About SpecFlow

SpecFlow, a.k.a. “Cucumber for .NET,” is a leading BDD test automation framework for .NET. Created by Gáspár Nagy and maintained as a free, open source project on GitHub by TechTalk, SpecFlow presently has almost 3 million total NuGet downloads. I’ve used it myself at a few companies, and, I must say as an automationeer, it’s awesome! SpecFlow shares a lot in common with other Cucumber frameworks like Cucumber-JVM, but it is not a knockoff – it excels in many ways. Below are five features I love about SpecFlow.

#1: Declarative Specification by Example

SpecFlow is a behavior-driven test framework. Test cases are written as Given-When-Then scenarios in Gherkin “.feature” files. For example, imagine testing a cucumber basket:

Feature: Cucumber Basket
  As a gardener,
  I want to carry many cucumbers in a basket,
  So that I don’t drop them all.
  
  @cucumber-basket
  Scenario: Fill an empty basket with cucumbers
    Given the basket is empty
    When "10" cucumbers are added to the basket
    Then the basket is full

Notice a few things:

  • It is declarative in that steps indicate what should be done at a high level.
  • It is concise in that a full test case is only a few lines long.
  • It is meaningful in that the coverage and purpose of the test are intuitively obvious.
  • It is focused in that the scenario covers only one main behavior.

Gherkin makes it easy to specify behaviors by example. That way, everybody can understand what is happening. C# code will implement each step in lower layers. Even if your team doesn’t do the full-blown BDD process, using a BDD framework like SpecFlow is still great for test automation. Test code naturally abstracts into separate layers, and steps are reusable, too!

#2: Context is King

Safely sharing data (e.g., “context”) between steps is a big challenge in BDD test frameworks. Using static variables is a simple yet terrible solution – any class can access them, but they create collisions for parallel test runs. SpecFlow provides much better patterns for sharing context.

Context injection is SpecFlow’s simple yet powerful mechanism for inversion of control (using BoDi). Any POCOs can be injected into any step definition class, either using default values or using a specific initialization, by declaring the POCO as a step def constructor argument. Those instances will also be shared instances, meaning steps across different classes can share the same objects! For example, steps for Web tests will all need a reference to the scenario’s one WebDriver instance. The context-injected objects are also created fresh for each scenario to protect test case independence.

Another powerful context mechanism is ScenarioContext. Every scenario has a unique context: title, tags, feature, and errors. Arbitrary objects can also be stored in the context object like a Dictionary, which is a simple way to pass data between steps without constructor-level context injection. Step definition classes can access the current scenario context using the static ScenarioContext.Current variable, but a better, thread-safe pattern is to make all step def classes extend the Steps class and simply reference the ScenarioContext instance variable.

#3: Hooks for Any Occasion

Hooks are special methods that insert extra logic at critical points of execution. For example, WebDriver cleanup should happen after a Web test scenario completes, no matter the result. If the cleanup routine were put into a Then step, then it would not be executed if the scenario had a failure in a When step. Hooks are reminiscent of Aspect-Oriented Programming.

Most BDD frameworks have some sort of hooks, but SpecFlow stands out for its hook richness. Hooks can be applied before and after steps, scenario blocks, scenarios, features, and even around the whole test run. (Cucumber-JVM, by contrast, does not support global hooks.) Hooks can be selectively applied using tags, and they can be assigned an order if a project has multiple hooks of the same type. Hook methods will also be picked up from any step definition class. SpecFlow hooks are just awesome!

#4: Thorough Outline Templating

Scenario Outlines are a standard part of Gherkin syntax. They’re very useful for templating scenarios with multiple input combinations. Consider the cucumber basket again:

Feature: Cucumber Basket
  
  Scenario Outline: Add cucumbers to the basket
    Given the basket has "<initial>" cucumbers
    When "<some>" cucumbers are added to the basket
    Then the basket has "<total>" cucumbers

    Examples: Counts
      | initial | some | total |
      | 1       | 2    | 3     |
      | 5       | 3    | 8     |

All BDD frameworks can parametrize step inputs (shown in double quotes). However, SpecFlow can also parametrize the non-input parts of a step!

Feature: Cucumber Basket
  
  Scenario Outline: Use the cucumber basket
    Given the basket has "<initial>" cucumbers
    When "<some>" cucumbers are <handled-with> the basket
    Then the basket has "<total>" cucumbers

    Examples: Counts
      | initial | some | handled-with | total |
      | 1       | 2    | added to     | 3     |
      | 5       | 3    | removed from | 2     |

The step definitions for the add and remove steps are separate. The step text for the action is parametrized, even though it is not a step input:

[When(@"""(\d+)"" cucumbers are added to the basket")]
public void WhenCucumbersAreAddedToTheBasket(int count) { /* */ }

[When(@"""(\d+)"" cucumbers are removed from the basket")]
public void WhenCucumbersAreRemovedFromTheBasket(int count) { /* */ }

That’s cool!

#5: Test Thread Affinity

SpecFlow can use any unit test runner (like MsTest, NUnit, and xUnit.net), but TechTalk provides the official SpecFlow+ Runner for a licensed fee. I’m not associated with TechTalk in any way, but the SpecFlow+ Runner is worth the cost for enterprise-level projects. It has a friendly command line, a profile file to customize execution, parallel execution, and nice integrations.

The major differentiator, in my opinion, is its test thread affinity feature. When running tests in parallel, the major challenge is avoiding collisions. Test thread affinity is a simple yet powerful way to control which tests run on which threads. For example, consider testing a website with user accounts. No two tests should use the same user at the same time, for fear of collision. Scenarios can be tagged for different users, and each thread can have the affinity to run scenarios for a unique user. Some sort of parallel isolation management like test thread affinity is absolutely necessary for test automation at scale. Given that the SpecFlow+ Runner can handle up to 64 threads (according to TechTalk), massive scale-up is possible.

But Wait, There’s More!

SpecFlow is an all-around great test automation framework, whether or not your team is doing full BDD. Feel free to add comments below about other features you love (or *gasp* hate) about SpecFlow!

 

BDD Example Mapping

The two major goals of Behavior-Driven Development are better collaboration and automation. Even when the Three Amigos actually get together, collaboration can be tough. Where do we start? What scenarios should we write? What examples should be included?

Well, the Cucumber folks have a practice called “Example Mapping” to make it easier. All you need is a pack of index cards and a big table!

  1. Write the story under discussion on a yellow at the top of the table.
  2. Write a rule for each known acceptance criteria on a blue card under the story.
  3. Write each example for a rule on a green card.
  4. Write each open question on a red card on the side to discuss later.

Keep writing cards until the team is satisfied with the story. This process provides clear, fast feedback for stories. A team can quickly see if a story is too big or needs further refinement. Engineers can easily turn example cards into Gherkin scenarios.

Rather than duplicate documentation here, please read Matt Wynne’s seminal post on the practice, Introducing Example Mapping.

Also, watch this webinar recording from Cucumber about Example Mapping:

Are Gherkin Scenarios with Multiple When-Then Pairs Okay?

Don’t know about Behavior-Driven Development or Gherkin? Start here!

Writing Gherkin is easy, but writing good Gherkin is hard. My post BDD 101: Writing Good Gherkin covers many aspects of good behavior specification, including titles, phrasing, and data. One of the major points I make anytime I discuss good Gherkin is what I call the “Cardinal Rule of BDD.”

The Cardinal Rule of BDDOne Scenario, One Behavior!

A behavior scenario specification should focus on one individual behavior. This is the essence of the BDD mindset – a product’s features can be specified in terms of its behaviors, and the specs should be written as examples of those behaviors in action. Identifying individual behaviors brings clarity to design, development, and testing. Combining behaviors into a single scenario causes ambiguity, miscommunication, and test gaps. Test failure triage also becomes more difficult and time consuming because the root causes for failures are less clear – the culprit could be one of multiple behaviors. There is also a high risk of duplication when scenarios repeat the same sequence of steps instead of isolating behaviors.

One of the dead giveaways to violations of the Cardinal Rule of BDD is when a Gherkin scenario has multiple When-Then pairs, like this:

Feature: Google Searching

  Scenario: Google Image search shows pictures
    Given the user opens a web browser
    And the user navigates to "https://www.google.com/"
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

A When-Then pair denotes a unique behavior. In this example, the behaviors of performing a search and changing the search to images could and should clearly be separated into two scenarios, like this:

Feature: Google Searching

  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  Scenario: Image search
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

Despite being so central to BDD philosophy, the Cardinal Rule is the one thing people always try to sidestep. Nobody ever doubts the usefulness of step parameters or the need for good grammar, but people frequently show me scenarios with multiple When-Then pairs and basically ask for an exception from the rule. My gut reaction is always, “NO! Rules don’t change.”

However…

I must first admit that the Cardinal Rule of BDD is “opinionated” – it is the way that I have found BDD to work best for collaboration and automation. Adherence forces people to adopt a behavior-driven mindset, and strictness keeps feature and test quality high. Other experts are more permissive of multiple When-Then pairs, though. Most examples I could find from leading sources such as The Cucumber Book exhibit strict Given-When-Then order for Gherkin scenarios, but other sources such as the online JBehave documentation show scenarios with multiple When-Then pairs boldly on the front page.

I must also begrudgingly admit that there are times when it is simply more convenient for a single scenario to have multiple behaviors (and thus multiple When-Then pairs). This is by no means a best practice but rather a pragmatic alternative for specification dilemmas. (See Purist vs. Pragmatist.) Below are situations in which multiple When-Then pairs may be acceptable.

Lengthy End-to-End Scenarios

End-to-end tests verify execution paths through a live system with all of its parts. Web UI tests frequently fall into this category: Selenium WebDriver interacts with a page in a browser, which then triggers calls to a backend service layer or database. Despite the name, end-to-end tests may still focus on one individual behavior. The example scenarios above, though short, technically count as end-to-end tests.

However, many people use the term “end-to-end” to refer to tests that cover sequences of behaviors. Such a scenario could violate the Cardinal Rule of BDD if it is not handled carefully. My article BDD 101: Unit, Integration, and End-to-End Tests gives strategies for handling lengthy end-to-end scenarios. One strategy is to simply turn a blind eye to multiple When-Then pairs. Ideally, each behavior would already have its own individual scenario, but then a new scenario would explicitly combine the behaviors together to get that full, end-to-end path. The new scenario would be easy to write because the steps could be reused. This isn’t the only strategy, so please be sure to consider the others before writing the tests.

Audits

Software system audits frequently require lengthy end-to-end scenarios. They are quite common in highly-regulated domains. For example, a bank may need to prove that a loan is prepared correctly or that a transaction puts money into the right accounts. Auditors typically require tests to run through entire system paths (e.g., multiple behaviors) using the same records, such as one loan application or one payment. Auditees must not only provide test results for past runs but must also repeat tests on demand. Separating each individual behavior into its own scenario makes each test independent, so during test execution, there will be no guaranteed order and no shared test data, and auditors would not have the end-to-end verification that they require. The simplest way to give the auditors what they need is to write one lengthy scenario with multiple When-Then pairs.

Service Calls

Service call testing is another case for which multiple When-Then pairs may be pragmatically justified. REST, SOAP, and WSDL are examples of service call types. Service layer development is more engineering-centric than business-centric, but many teams nevertheless choose to test service calls with Gherkin-based frameworks like Cucumber. Due to the programmatic nature of services, Gherkin scenarios for service calls tend to be quite imperative: specify a request, make the call, and verify parts of the response. This isn’t so bad for independent service calls, but it becomes a problematic when one request needs another call’s response.

One solution is the classic “pure” scenario split: put any necessary setup, including initial requests to get required response parts, into custom Given steps. This abides by the Cardinal Rule and avoids duplicate When-Then pairs. But, it introduces an unsavory form of code duplication. Many service calls end up being written twice: once as a Gherkin scenario for testing, and once in the underlying automation code to be called by Given steps. This violates the DRY principle.

The alternative “pragmatic” solution is to write scenarios that specify multiple service calls in the Gherkin steps. The Karate project advocates this approach, as shown in their “Hello World” example:

Take Caution!

There may be other cases when When-Then repetition is useful. Feel free to leave suggestions in the comments below. My examples are meant to be descriptive, not prescriptive. Another aspect to consider is that allowing multiple When-Then pairs per scenario indicates that a team sees more value in BDD’s test framework than in its collaborative spec process. (Refer to ‑‑BDD; Automation without Collaboration and BDD‑‑; Collaboration without Automation.)

Ultimately, you must decide what practices are best for your project. The main reason I uphold the Cardinal Rule of BDD so strongly is that it makes for good specs and good tests. I’ve seen engineers write extremely long, intensive test procedures (and I mean, dozens of duplicate behaviors per test) that are alright for manual testing but do not transition well into automation because they are too fragile and they don’t yield useful information upon failure. The Cardinal Rule is a way to break out of the procedure-driven mindset, and banning multiple When-Then pairs per Gherkin scenario is an effective rule for enforcing it.

Good Gherkin Scenario Titles

Don’t know about Behavior-Driven Development or Gherkin? Start here!

The Golden Gherkin Rule states:

Treat other readers as you would want to be treated. Write Gherkin so that people who don’t know the feature will understand it.

Part of writing good Gherkin (or any other specification-by-example language) includes writing good behavior scenario titles. The title is the face of the scenario: it summarizes what the behavior is all about. Good titles make collaboration and test triage a breeze, whereas bad titles make it tougher. But what makes a title “good”? Below are some helpful pointers.

One-Liners

Good titles should be short one-liners. One simple statement should be sufficient to concisely capture the intended behavior. Anything longer likely means that either the author doesn’t truly understand the behavior in focus, or that the scenario does not focus on one main behavior. Extra comments may be added to supplement the scenario’s description if necessary to avoid lengthy titles. Also, most BDD test automation frameworks will print scenario titles to logs for traceability.

Bad Example Good Example
The user can log into the app, navigate to the profile page, and see their full name, address, phone number, email, and username The profile page displays the user’s personal info

Conjunction Disjunction

Watch out for conjunction words like “and,” “or,” and “but.” Conjunctions typically imply that more than one thing will be done, which for scenario titles implies that more than one behavior will be covered. Or, it indicates that a Scenario Outline may be appropriate Don’t break the Cardinal Rule of BDD! Keep each scenario focused on one main behavior.

Avoid other conjunctions like “because,” “since,” and “so” as well. Phrases starting with those words often give an explanation for why the scenario exists. However, for conciseness, scenario titles should focus on what the behavior is. The why can either be deduced from the steps or made plain with comments.

Bad Example Good Example
The user can request an insurance quote from the big “Get-A-Quote” button on the home page or from the “Insurance Policies” page Two Scenarios: The user requests an insurance quote from the “Get-A-Quote” button on the home page / The user requests an insurance quote from the “Insurance Policies” page

-OR-

Scenario Outline: The user requests an insurance quote

The last five search phrases are saved so that the user can rerun them from the history page The history page saves the last five search phrases

Avoid Assertion Language

Don’t use the words “verify,” “assert,” or “should” in scenario titles. They put the scenario’s emphasis on the assertion rather than the behavior. Assertions are merely a facet of behavior testing – they verify that something exists or that two values are equal. Behavior scenarios, however, are full software specifications. BDD is a development practice for making better software products – it’s not just a test tool. Don’t reduce the behavior-driven mindset to a test-only mindset.

Furthermore, leading every scenario title with “verify” or “assert” becomes very repetitive. The words just don’t enhance the meaningfulness of the title. They also thwart alphabetical order.

Bad Example Good Example
Verify the user can change their address on the profile page Profile page address change
Assert that a stock quote is displayed in green text when its value is higher than its previous closing value A stock quote has green text when its value is higher than its previous closing value
The goodbye page should be displayed after a successful logout Logout displays the goodbye page

 

Do you have any more suggestions? Put them in the comments below!

Pipe Character Escape for Gherkin Tables

For the first time today, I had to write a Gherkin behavior scenario in which table text needed to use the pipe character “|”. I wanted a generic step that would find and click web page links by name, and one of the link names had the pipe in it! The first version of the step I wrote looked like this:

When the user follows the links:
  | link              |
  | Category          |
  | Sub-Category      |
  | Index|Description |

Naturally, this step didn’t parse – the “|” was parsed as a table delimiter instead of the intended link text. I could have rewritten the step to search for partial link text, or I could have done a key-value lookup, but I wanted to keep the step simple and direct.

The solution was simple: escape the pipe character “|” with a backslash character “\”. Easy! Thanks, StackOverflow! The updated table looks like this:

When the user follows the links:
 | link               |
 | Category           |
 | Sub-Category       |
 | Index\|Description |

“\|” works for both step tables and scenario outline example tables. It looks like it is fairly standard for test frameworks that use Gherkin. I verified that Cucumber-JVM and SpecFlow support it, and it looks like Cucumber for Ruby does as well. It looks like behave will support it in 1.2.6.

After learning this trick, I updated the BDD 101: The Gherkin Language page.

Note that backslash escape sequences won’t work for quotes in Gherkin steps. Quotes in steps are merely conventions and not part of the Gherkin language standard.

Gherkin Syntax Highlighting in Chrome

Google Chrome is one of the most popular web browsers around. Recently, I discovered that Chrome can edit and display Gherkin feature files. The Chrome Web Store has two useful extensions for Gherkin: Tidy Gherkin and Pretty Gherkin, both developed by Martin Roddam. Together, these two extensions provide a convenient, lightweight way to handle feature files.

Tidy Gherkin

Tidy Gherkin is a Chrome app for editing and formatting feature files. Once it is installed, it can be reached from the Chrome Apps page (chrome://apps/). The editor appears in a separate window. Gherkin text is automatically colored as it is typed. The bottom preview pane automatically formats each line, and clicking the “TIDY!” button in the upper-left corner will format the user-entered text area as well. Feature files can be saved and opened like a regular text editor. Templates for Feature, Scenario, and Scenario Outline sections may be inserted, as well as tables, rows, and columns.

Another really nice feature of Tidy Gherkin is that the preview pane automatically generates step definition stubs for Java, Ruby, and JavaScript! The step def code is compatible with the Cucumber test frameworks. (The Java code uses the traditional step def format, not the Java 8 lambdas.) This feature is useful if you aren’t already using an IDE for automation development.

Tidy Gherkin has pros and cons when compared to other editors like Notepad++ and Atom. The main advantages are automatic formatting and step definition generation – features typically seen only in IDEs. It’s also convenient for users who already use Chrome, and it’s cross-platform. However, it lacks richer text editing features offered by other editors, it’s not extendable, and the step def gen feature may not be useful to all users. It also requires a bit of navigation to open files, whereas other editors may be a simple right-click away. Overall, Tidy Gherkin is nevertheless a nifty, niche editor.

This slideshow requires JavaScript.

Pretty Gherkin

Pretty Gherkin is a Chrome extension for viewing Gherkin feature files through the browser with syntax highlighting. After installing it, make sure to enable the “Allow access to the file URLs” option on the Chrome Extensions page (chrome://extensions/). Then, whenever Chrome opens a feature file, it should display pretty text. For example, try the GoogleSearch.feature file from my Cucumber-JVM example project, cucumber-jvm-java-example. Unfortunately, though, I could not get Chrome to display local feature files – every time I would try to open one, Chrome would simply download it. Nevertheless, Pretty Gherkin seems to work for online SCM sites like GitHub and BitBucket.

Since Pretty Gherkin is simply a display tool, it can’t really be compared to other editors. I’d recommend Pretty Gherkin to Chrome users who often read feature files from online code repositories.

This slideshow requires JavaScript.

 

Be sure to check out other Gherkin editors, too!