BDD

What is BDD, and How Do We Practice It? (Webinar + Q&A)

On March 18, 2019, I gave a webinar entitled, “What is Behavior-Driven Development, and How Do We Practice It?” in collaboration with Paul Merrill and his company, Beaufort Fairmont. It was both a pleasure and an honor to do this webinar with them. Paul is a top-notch test automation expert, and Beaufort Fairmont is doing really exciting things. Check out their two-day BDD training offering, as well as their blog and other webinars.

To see my webinar recording, register here.

During the webinar, attendees asked more questions than we could answer. I’m excited that so many people asked questions. My answers are below.

Questions about Process

How is BDD different from TDD (Test-Driven Development)?

BDD is an evolution of TDD. In TDD, developers (1) write unit tests and watch them fail, (2) develop the feature to make the tests pass, (3) refactor the code to make it stronger, and (4) repeat the cycle. In BDD, teams do this same loop with feature tests (a.k.a “acceptance” or “black-box” tests) as well as unit tests. Furthermore, BDD adds shift left practices like Example Mapping and Specification by Example so that teams know what they are doing and focus on developing the right things.

Check out Dan North’s article, Introducing BDD, for a more thorough answer.

Can BDD be used with manual testing?

Yes! BDD is not merely an automation tool – it is a set of pragmatic practices to help teams develop better software. Gherkin scenarios are first and foremost behavior specs that help a team’s collaboration and accountability. They function secondarily as test cases that can be executed either manually or with automation.

Can we use BDD with technical stories or backend features?

Yes! If you can describe it, then you can do it.

How many Gherkin scenarios should one story have?

There’s no hard rule, but I recommend no more than a handful of rules per story, and no more than a handful of examples per rule. If you do Example Mapping and feel overwhelmed by the number of cards for a story, then the story should probably be broken into smaller stories.

Should we do Example Mapping for every story? Spending 20-30 minutes for each story would take a long time.

Try doing Example Mapping on one or two stories to start. The first time is always rough, but as you iterate on it, you’ll get better as a team. Even though Example Mapping has an upfront time cost, it will save a lot of time later in the sprint because (a) acceptance criteria is clear, (b) tests are already written, and (c) everyone has a mutual understanding of the story. The team won’t suffer through the inefficiencies of miscommunication and poor planning. You may even want to replace planning meeting with Example Mapping meetings.

What metrics should we use with BDD?

All metrics are flawed, but some metrics are useful. All the standard testing and Agile metrics still apply: code coverage, story velocity, etc. Here are some additional metrics you may consider for BDD:

the percentage of stories that undergo Example Mapping before the sprint
the number of rules and examples that get “missed” during Example Mapping and need to be added later
the percentage of Gherkin scenarios that get automated in the sprint

If you choose to track metrics, make sure their feedback is used to improve team practices. For more info on metrics, please read my Quality Metrics 101 series.

What were the resources you recommended at the end of the webinar?

Questions about Tools

What test management tools should we use with BDD?

I’m sure there are BDD plugins for test management tools, but I don’t have any that I can personally recommend. To be honest, I try to stay away from large test management tools like HP ALM, qTest, VersionOne. When doing BDD, the Gherkin feature files themselves should be the single source of truth for feature-level tests, and they should be version-controlled in a repository. Don’t fall into the trap of slapping “Given-When-Then” keywords onto existing functional tests – that’s not BDD.

Does Jira support Example Mapping?

I have not personally used any Jira plugin for Example Mapping. It looks like there is an Easy Agile User Story Maps plugin that is similar to but slightly different from Example Mapping.

Are there other good tools for BDD and Example Mapping?

Cycle Automation provides a nice app with Gherkin steps out of the box, so you can automate tests without needing a programming language.
TeamUp Labs provides an online Example Mapping tool.
IDEs from JetBrains and Eclipse provide BDD plugins
Gherkin Syntax Highlighting in Notepad++
Gherkin Syntax Highlighting in Visual Studio Code
Gherkin Syntax Highlighting in Atom
Gherkin Syntax Highlighting in Chrome

What’s the difference between Gherkin, Cucumber, and SpecFlow?

Gherkin is the Given-When-Then spec language.
Cucumber is a company and its eponymous test framework that uses Gherkin.
SpecFlow is Cucumber for .NET.

Questions about Testing

Can BDD test frameworks be used for unit testing?

Yes, but I don’t recommend it. BDD frameworks shine for black-box feature testing. They’re a bit too verbose for code-level unit tests. Read BDD 101: Unit, Integration, and End-to-End Tests for more info.

Can BDD test frameworks be used for integration testing?

Yes! See BDD 101: Unit, Integration, and End-to-End Tests.

How long should Gherkin scenarios be?

Scenarios should be bite-sized. Each scenario should focus on one individual behavior. There’s no hard rule, but I recommend single-digit step counts. Read BDD 101: Writing Good Gherkin for more info.

What are “step definitions” in Cucumber?

Step definitions are the methods in the automation code that execute the steps. When a BDD framework runs a Gherkin scenario as a test, it “glues” each step to a step definition based on some sort of string matching.

How can we minimize duplicate code within a BDD test framework?

Know your steps. Always search for existing steps before writing new steps. Refactor existing steps whenever appropriate. Reuse steps when writing new scenarios. Do pair programming or mob programming when writing scenarios. Put scenarios through code reviews. Apply good coding practices – remember, test automation is software.

I write Gherkin scenarios, but I don’t write test automation code. What’s the best way to write Gherkin scenarios so that they can be automated?

Do pair programming with the automation engineers to write Gherkin scenarios together. Become familiar with existing steps by reading and searching feature files. Otherwise, the Gherkin steps you write in isolation might not be usable. Remember, BDD is a team effort!

The examples in the webinar were all fairly basic. Do you have any examples with more complex systems?

I have some example projects on GitHub in Python and Java with some basic unit, integration, and end-to-end tests, but I don’t have any large-scale examples that I can share publicly.

We wrote hundreds of SpecFlow tests without the other Amigos. Now, there are large test gaps, and many steps aren’t reusable. What should we do?

I’m sorry to hear that. It’s not an uncommon story. There are two paths: (1) refactoring or (2) starting over. Without really knowing the situation, I don’t think it’s my place to say which way is better. Here are some questions to help guide your decision:

What are your goals for testing and automation?
What’s your overall quality and testing strategy?
What parts of the code base are salvageable?
What parts of the code base should be removed?
If you started again from scratch, what would you do differently to make sure the same problems don’t reoccur?

I strongly recommend taking the Setting a Foundation for Successful Test Automation course from Test Automation University. (It’s free.) I also gave a talk about this very problem, Egad! How Do We Start Writing (Better) Tests?, at a few Python conferences.

We have a large BDD test suite with heavy coupling and slow execution times. The business amigos have also left the company. Should we try to fix what we have or just start over?

Sorry to hear that; same answer as before.

Final Questions

Why do you call yourself the “Automation Panda”?

Pandas are awesome. Everybody loves them. And nobody forgets my moniker.

Where can I get team training in BDD?

Beaufort Fairmont provides a one- or two-day course in BDD and writing Gherkin. Sign up for more information here.

Sprint Planning Sucks. Can It Be Fixed?

Warning: This article contains strong opinions that might not be suitable for all audiences. Reader discretion is advised.

It’s Monday morning. After an all-too-short weekend and rush hour traffic, you finally arrive at the office. You throw your bag down at your desk, run to the break room, and queue up for coffee. As the next pot is brewing, you check your phone. It’s 8:44am… now 8:45am, and DING! A meeting reminder appears:

Sprint Planning – 9am to 3pm.

What’s your visceral reaction?

I can’t tell you mine, because I won’t put profanity on my blog.

Real Talk

In the capital-A Agile Scrum process, sprint planning is the kick-off meeting for the next iteration. The whole team comes together to talk about features, size work items with points, and commit to deliverables for the next “sprint” (typically 2 weeks long). Idealistically, team members collaborate freely as they learn about product needs and give valued input.

Let’s have some real talk, though: sprint planning sucks. Maybe that’s a harsh word, but, if you’re reading this article, then it caught your attention. Personally, my sprint planning experiences have been lousy. Why? Am I just bellyaching, or are there some serious underlying problems?

Sprint planning is a huge time commitment. 9am to 3pm is not an exaggeration. Sprint planning meetings are typically half-day to full-day affairs. Most people can’t stay focused on one thing for that long. Plus, when a sprint is only two weeks long, one hour is a big chunk of time, let alone 3, or 6, or a whole day. The longer the meeting, the higher the opportunity cost, and the deeper the boredom.

Collaboration is a farce. Planning meetings typically devolve into one “leader” (like a scrum master, product owner, or manager) pulling teeth to get info for a pre-determined list of stories. Only two people, the leader and the story-owner, end up talking, while everyone else just stares at their laptops until it’s their turn. Discussions typically don’t follow any routine beyond, “What’s the acceptance criteria?” and, “Does this look right?” with an interloper occasionally chiming in. Each team member typically gets only a few minutes of value out of an hours-long ordeal. That’s an inefficient use of everyone’s time.

No real planning actually happens. These meetings ought to be called “guessing” meetings, instead. Story point sizes are literally made up. Do they measure time or complexity? No, they really just measure groupthink. Teams even play a game called planning poker that subliminally encourages bluffing. Then, point totals are used to guess how much work can be done during the sprint. When the guess turns out to be wrong at the end of the sprint (and it always does), the team berates itself in retro for letting points slip. Every. Time.

Does It Spark Joy?

I’ve long wondered to myself if sprint planning is a good concept just implemented poorly, or if it’s conceptually flawed at its root. I’m pretty sure it’s just flawed. The meetings don’t facilitate efficient collaboration relative to their time commitments, and estimates are based on poor models. Retros can’t fix that. And gut reactions don’t lie.

So, what should we do? Should we Konmari our planning meetings to see if they spark joy? Should we get rid of our ceremonies and start over? Is this an indictment of the whole Agile Scrum process? But then, how will we know what to do, and when things can get done?

I think we can evolve our Agile process with more effective practices than sprint planning. And I don’t think that evolution would be terribly drastic.

Behavior-Driven Planning

What we really want out of a planning meeting is planning, not pulling and not predicting. Planning is the time to figure out what will be done and how it will be done. The size of the work should be based on the size of the blueprint. Enter Example Mapping.

Example Mapping is a Behavior-Driven Development practice for clarifying and confirming stories. The process is straightforward:

Write the story on a yellow card.
Write each rule that the story must satisfy on a blue card.
Illustrate each rule with examples written on green cards.
Got stuck on a question? Write it on a red card and move on.

One story should take about 20-30 minutes to map. The whole team can participate, or the team can split up into small groups to divide-and-conquer. Rules become acceptance criteria, examples become test cases, and questions become spikes.

Here’s a good walkthrough of Example Mapping.

What about story size? That’s easy – count the cards. How many cards does a story have? That’s a rough size for the work to be done based on the blueprint, not bluffing. More cards = more complexity. It’s objective. No games. Frankly, it can’t be any worse that made-up point values.

This is real planning: a blueprint with a course of action.

So, rather than doing traditional sprint planning meetings, try doing Example Mapping sessions. Actually plan the stories, and use card counts for point sizes. Decisions about priority and commitments can happen between rounds of story mapping, too. The Scrum process can otherwise remain the same.

If you want to evolve further, you could eliminate the time boxes of sprints in favor of Kanban. Two-week work item boundaries can arbitrarily fall in the middle of progress, which is not only disruptive to workflow but can also encourage bad responses (like cramming to get things done or shaming for not being complete.) Kanban treats work items as a continuous flow of prioritized work fed to a team in bite-sized pieces. When a new story comes up, it can have its own Example Mapping “planning” meeting. Now, Kanban is not for everyone, but it is popular among post-Agile practitioners. What’s important is to find what works for your team.

Rant Over

I know I expressed strong, controversial opinions in this article. And I also recognize that I’m arguing against bad examples of Agile Scrum. Nevertheless, I believe my points are fair: planning itself is not a waste of time, but the way many teams plan their sprints uses time inefficiently and sets poor expectations. There are better ways to do planning – let’s give them a try!

Python Testing 101: pytest-bdd

Warning: If you are new to BDD, then I strongly recommend reading the BDD 101 series before trying to use pytest-bdd. Also, make sure that you are already familiar with the pytest framework.

Overview

pytest-bdd is a behavior-driven (BDD) test framework that is very similar to behave, Cucumber and SpecFlow. BDD frameworks are very different from more traditional frameworks like unittest and pytest. Test scenarios are written in Gherkin “.feature” files using plain language. Each Given, When, and Then step is “glued” to a step definition – a Python function decorated by a matching string in a step definition module. This means that there is a separation of concerns between test cases and test code. Gherkin steps may also be reused by multiple scenarios.

pytest-bdd is very similar to other Python BDD frameworks like behave, radish, and lettuce. However, unlike the others, pytest-bdd is not a standalone framework: it is a plugin for pytest. Thus, all of pytest‘s features and plugins can be used with pytest-bdd. This is a huge advantage!

Installation

Use pip to install both pytest and pytest-bdd.

pip install pytest
pip install pytest-bdd

Project Structure

Project structure for pytest-bdd is actually pretty flexible (since it is based on pytest), but the following conventions are recommended:

All test code should appear under a test directory named “tests”.
Feature files should be placed in a test subdirectory named “features”.
Step definition modules should be placed in a test subdirectory named “step_defs”.
conftest.py files should be located together with step definition modules.

Other names and hierarchies may be used. For example, large test suites can have feature-specific directories of features and step defs. pytest should be able to discover tests anywhere under the test directory.

[project root directory]
|‐‐ [product code packages]
|-- [test directories]
|   |-- features
|   |   `-- *.feature
|   `-- step_defs
|       |-- __init__.py
|       |-- conftest.py
|       `-- test_*.py
`-- [pytest.ini|tox.ini|setup.cfg]

Note: Step definition module names do not need to be the same as feature file names. Any step definition can be used by any feature file within the same project.

Example Code

An example project named behavior-driven-python located in GitHub shows how to write tests using pytest-bdd. This section will explain how the Web tests are designed.

The top layer for pytest-bdd tests is the set of Gherkin feature files. Notice how the scenario below is concise, focused, meaningful, and declarative:

@web @duckduckgo
Feature: DuckDuckGo Web Browsing
  As a web surfer,
  I want to find information online,
  So I can learn new things and get tasks done.

  # The "@" annotations are tags
  # One feature can have multiple scenarios
  # The lines immediately after the feature title are just comments

  Scenario: Basic DuckDuckGo Search
    Given the DuckDuckGo home page is displayed
    When the user searches for "panda"
    Then results are shown for "panda"

Each scenario step is “glued” to a decorated Python function called a step definition. Step definitions are written in Python test modules, as shown below:

import pytest

from pytest_bdd import scenarios, given, when, then, parsers
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

# Constants

DUCKDUCKGO_HOME = 'https://duckduckgo.com/'

# Scenarios

scenarios('../features/web.feature')

# Fixtures

@pytest.fixture
def browser():
    b = webdriver.Firefox()
    b.implicitly_wait(10)
    yield b
    b.quit()

# Given Steps

@given('the DuckDuckGo home page is displayed')
def ddg_home(browser):
    browser.get(DUCKDUCKGO_HOME)

# When Steps

@when(parsers.parse('the user searches for "{phrase}"'))
def search_phrase(browser, phrase):
    search_input = browser.find_element_by_id('search_form_input_homepage')
    search_input.send_keys(phrase + Keys.RETURN)

# Then Steps

@then(parsers.parse('results are shown for "{phrase}"'))
def search_results(browser, phrase):
    # Check search result list
    # (A more comprehensive test would check results for matching phrases)
    # (Check the list before the search phrase for correct implicit waiting)
    links_div = browser.find_element_by_id('links')
    assert len(links_div.find_elements_by_xpath('//div')) > 0
    # Check search phrase
    search_input = browser.find_element_by_id('search_form_input')
    assert search_input.get_attribute('value') == phrase

Notice how each Given/When/Then step has a function with an appropriate decorator. Arguments, such as the search “phrase,” may also be passed from step to function. pytest-bdd provides a few argument parsers out of the box and also lets programmers implement their own. (By default, strings are compared using equality.) One function can be decorated for many steps, too.

pytest fixtures may also be used by step functions. The code above uses a fixture to initialize the Firefox WebDriver before each scenario and then quit it after each scenario. Fixtures follow all the same rules, including scope. Any step function can use a fixture by declaring it as an argument. Furthermore, any “@given” step function that returns a value can also be used as a fixture. Please read the official docs for more info about fixtures with pytest-bdd.

One important, easily-overlooked detail is that scenarios must be explicitly declared in test modules. Unlike other BDD frameworks that treat feature files as the main scripts, pytest-bdd treats the “test_*.py” module as the main scripts (because that’s what pytest does). Scenarios may be specified explicitly using scenario decorators, or all scenarios in a list of feature files may be included implicitly using the “scenarios” shortcut function shown above.

To share steps across multiple feature files, add them to the “conftest.py” file instead of the test modules. Since scenarios must be declared within a test module, they can only use step functions available within the same module or in “conftest.py”. As a best practice, put commonly shared steps in “conftest.py” and feature-specific steps in the test module. The same recommendation also applies for hooks.

Scenario outlines require special implementation on the Python side to run successfully. Unfortunately, steps used by scenario outlines need unique step decorators and extra converting. Please read the official docs or the example project to see examples.

Test Launch

pytest-bdd can leverage the full power of pytest. Tests can be run in full or filtered by tag. Below are example commands using the example project:

# run all tests
pytest

# filter tests by test module
# note: feature files cannot be run directly
pytest tests/step_defs/test_unit_basic.py
pytest tests/step_defs/test_unit_outlines.py
pytest tests/step_defs/test_unit_service.py
pytest tests/step_defs/test_unit_web.py

# filter tests by tags
# running by tag is typically better than running by path
pytest -k "unit"
pytest -k "service"
pytest -k "web"
pytest -k "add or remove"
pytest -k "unit and not outline"

# print JUnit report
pytest -junitxml=/path/for/output

pytest-bdd tests can be executed and filtered together with regular pytest tests. Tests can all be located within the same directory. Tags work just like pytest.mark. As a warning, marks must be explicitly added to “pytest.ini” starting with pytest 5.0.

All other pytest plugins should work, too. For example:

Run tests in parallel with pytest-xdist
Generate code coverage reports with pytest-cov
Integrate with popular frameworks using pytest-django, pytest-flask, or other similar plugins

Pros and Cons

Just like for other BDD frameworks, pytest-bdd is best suited for black-box testing because it forces the developer to write test cases in plain, descriptive language. In my opinion, it is arguably the best BDD framework currently available for Python because it rests on the strength and extendability of pytest. It also has PyCharm support (in the Professional Edition). However, it can be more cumbersome to use than behave due to the extra code needed for declaring scenarios, implementing scenario outlines, and sharing steps. Nevertheless, I would still recommend pytest-bdd over behave for most users because it is more powerful – pytest is just awesome!

Behavior-Driven Blasphemy

This is my 100th post on Automation Panda! I’m thrilled to see how much this blog has grown and how many people it has helped. For such a monumental occasion, I have chosen to voice a rather controversial opinion about test automation.

Behavior-driven development seems to be the software testing buzzword of the decade. What started as a refinement of test-driven development by developers in Europe and the UK quickly became the big process fad of the 2010’s. The Cucumber project (now 10 years old) developed or inspired Gherkin-based test automation frameworks in all the major programming languages. Companies started requiring Given-When-Then format for acceptance criteria and test scenarios. Three Amigos meetings became standard calendar fixtures during sprints. Organizations that once undertook “Agile transformations” now have similar initiatives for BDD. For better or worse, BDD exists and cannot be ignored.

The dogmatic benefits of BDD are better collaboration and automation. However, leaders frequently insist that Gherkin-style test frameworks add value only when paired with practices like Example Mapping. “BDD is a process, not a tool,” is a common mantra. “Otherwise, the Gherkin just gets in the way.” Although I wholeheartedly agree that behavior-driven practices add significant value to the development process, I nevertheless espouse a rather blasphemous opinion:

BDD test automation frameworks are better than traditional frameworks for black box functional testing even when BDD processes are not followed.

What Exactly Are You Saying?

My claim is that behavior-driven test frameworks like Cucumber, SpecFlow, and behave are significantly better than traditional xUnit-style frameworks for testing live features. For example, I would rather use SpecFlow than NUnit for testing a Web app with Selenium WebDriver, whether or not the other two Amigos are with me. The resulting automation code has better structure, readability, and reusability.

I’m not saying that teams shouldn’t do BDD practices, and I’m not saying that the Three Amigos should be separated. Collaboration is key to success, and BDD really helps. Example Mapping is one of the most useful practices a development team can do. I’m also not saying that BDD frameworks should be used for all testing purposes – they are poorly suited for unit testing and for performance testing.

Objection!

I find myself very lonely in this opinion. BDD leaders repeatedly insist that BDD is not about testing and automation:

BDD is not about Testing (slides by Dan North)
The world’s most misunderstood collaboration tool by Aslak Hellesøy
BDD is – BDD is not by Augusto Evangelisti
BDD Tool Cucumber is Not a Testing Tool by Jan Stenberg
3 misconceptions about BDD from ThoughtWorks

The most outspoken BDDers (mostly coalescing around the Cucumber community) have largely moved their focus to the collaboration benefits, almost forsaking the automation benefits. (This may not necessarily be true, but it appears that way based on the literature and materials floating on the Web.) That outlook is somewhat disingenuous because the main tools supporting BDD are, in fact, test frameworks.

BDD also has outspoken opponents – it’s love or hate. I’ve personally spoken with several engineers who despise Gherkin-based frameworks. “I can see how it would be valuable when a whole team embraces behavior-driven practices,” many have told me, “but otherwise, the Gherkin layer just gets in the way of automation.” I’ve heard it called “plaster” and “garbage.” Engineers just want to code their tests. And code should always be readable, right?

Testing is an inherently opinionated space. People can never seem to agree on things.

The Bigger Picture

Test automation must be developed regardless of any specific development practices, and its architecture must stand firmly in its own right. Unfortunately, both sides miss the bigger picture:

The best solution for test automation is a domain-specific language.

A domain-specific language (DSL) is a programming language with a purpose. It is designed to handle very specific needs, rather than general-purpose programming. For example:

SQL is a DSL for database queries.
XPath is a DSL for finding elements in an XML document.
YAML is a DSL for object serialization.

Gherkin is also a DSL – for behavior specification.

Domain-specific languages naturally suit test automation due to the clear difference between test cases and test code. Test cases are procedures that exercise product behavior. Anyone can write a test case. They are dictated or explained in plain language. Test code, however, is the software implementation of test cases. Test code handles function calls, logging, exceptions, and all those other little programming details that help run tests. A test automation DSL separates those concerns: test cases are written in a special language, and the interpreter handles repetitive, low-level details. Some type of extensions can handle product-specific interactions. The purpose of a language is to effectively express intention – and the intention is to test the product.

To truly achieve an optimal solution, however, the DSL and its interpreter must be treated as part of the automation software, just like the test cases and extensions. Remember, a language’s interpreter is just another piece of software. The interpreter is part of the separation of concerns and the single responsibility principle. Concerns that would typically be handled by classes and functions in traditional test code should be moved to the interpreter. For example, the interpreter should automatically log every test case step, rather that forcing the author to write explicit logging statements.

When I worked at NetApp years ago, I implemented a DSL to test platform-level features of our operating system. I called it DS – short for “Design Steps” (from HP ALM) (but also not without an affinity for the Nintendo DS). NetApp’s entire test automation code was developed in Perl at the time, so I implemented the DS interpreter in Perl to reuse existing libraries. DS test cases were typically only a dozen lines long each, and DS expressions could call specially-written Perl modules directly for complete extendability. During the first big release using DS, my team saved countless hours of automation development time as compared to the previous release while delivering a higher number of tests. I also did this before I had ever heard of BDD.

Unfortunately, most teams have neither the time to develop their own testing DSL nor the understanding of compiler theory to build it right. And if they were given such a language, they typically limit themselves to the provided implementation instead of taking ownership to extend the language for their needs.

The original Nintendo DS. Fun times!

Who Truly Misunderstands Gherkin?

Enter Gherkin: the world’s first major general-purpose, off-the-shelf language for test automation. It is general enough to cover any case through its plain language steps, yet specific enough to standardize tests. Users don’t need to be compiler theory experts – they just make up their own step names and provide the definition code to execute them. Early BDD projects like JBehave and Cucumber packaged an interpreter as a test framework and delivered it to a testing world still stuck on JUnit. The need for a testing DSL was there, whether or not the BDD folks meant to serve it.

Cucumber-ites frequently bemoan that their framework is misunderstood by the masses. They shudder to see teams using their framework purely for test automation. However, Cucumber effectively lowered the entry barrier for teams to make their own testing DSLs. Kodak did the same thing for film: they made it cheap and standard so anyone could be a photographer. Not everyone who uses a BDD framework misunderstands its purpose: some (like me) just see an alternative value proposition than what is preached by orthodox BDD practitioners. Gherkin fills a need that nobody knew. Its popularity validates that claim.

Benefits Apart from Process

Using a BDD framework adds much value to testing and development even without BDD processes. Below are just a handful of benefits:

Focus first on good scenarios. Gherkin forces authors to think before they code.
Faster automation development. Gherkin steps are reusable and parametrizable.
Stronger structure. Engineers know where to put things in the framework.
Test understandability. Anyone can read scenarios because they are written in plain language. Business people can help. New people can pick it up fast.
Test sharing. Feature files can be shared apart from test code, which can be helpful for business partners.
Test similarity. Tests all look the same. Team members can more easily help each other.
Clearer failures. When a scenario fails, reports show exactly what step failed.
Simpler bug reports. Use scenario steps as instructions to reproduce the failure.
2-phase test reviews. Review Gherkin first and then test code second to make sure the test cases are good before implementing the wrong things.
BDD enablement. Using a BDD framework opens the door for a team to embrace better behavioral practices in the future.

I wrote about these advantages before:

Case Studies

I’m also not the only one who finds value in BDD test frameworks outside of the full BDD process. Below are five case studies.

radish

radish is a Python test framework inspired by Cucumber. Its DSL syntax is a superset of Gherkin that adds preconditions, loops, variables, and expressions. These language additions indicate a bias towards automation because they enable engineers to write tests more programmatically, albeit in a Gherkin-ese way.

Karate

Karate is a test framework with a full DSL based on Gherkin with steps specifically tailored to Web service calls. Although it is implemented in Java, testers do not need to do any Java programming to write complete tests cases from day one. Peter Thomas, the creator of Karate, unabashedly declares that Karate does not truly adhere to BDD but nevertheless uses Cucumber for its automation benefits. (Note: Karate is working to move completely off of Cucumber. See GitHub issue #444.)

REST Assured

REST Assured is a Java package for testing REST APIs. Unlike Karate, REST Assured provides a fluent syntax (and not a DSL) for writing service calls directly in Java code. The fluent syntax is based on Gherkin: given() a request spec is created, when() the call is made, then() verify the response. Although REST Assured is not a full testing framework, it nevertheless pulls inspiration from BDD frameworks for order and structure.

Cycle

Cycle is a BDD-focused test automation platform from Cycle Labs for testing Web, terminal, and desktop apps. Cycle is unique because it provides out-of-the-box steps for all types of supported testing so that no programming experience is required. Testers write feature files using Cycle 2.0’s slick new Electron app. Scenarios are written in CycleScript, a Gherkin-ese language with additions like variables and sub-scenario calls. Steps tend to be imperative, but that’s the tradeoff for not requiring lower-level programming.

Hexawise

Hexawise is a combinatorial testing tool designed to maximize coverage with minimal test counts by smartly joining feature variations. It helps testers write better tests with less redundancy and fewer gaps. Although Hexawise has historically assisted manual testers, it also can generate Gherkin feature files for test variations.

Not all cucumbers are the same. Above is a sea cucumber.

Good Enough?

Gherkin-based test frameworks are not perfect, but they do provide good structure. They gained popularity outside of the pure BDD movement because they genuinely added value to testing and automation. Like any other tool, teams will use them in both good and bad ways. (Trust me, I’ve seen scary Gherkin.)

It’s interesting to see how groups outside the Cucumber diaspora are attempting to solve the limitations of pure Gherkin. Each case study above showed a unique path. Clearly, the test automation problem has not yet been completely solved, but current BDD frameworks are the best off-the-shelf solutions we have until a new software testing movement comes along.

Gherkin Syntax Highlighting in Visual Studio Code

Visual Studio Code is an incredible code editor that’s on the rise. It offers the power of an IDE with the speed and simplicity of a lightweight text editor, similar to Sublime, Atom, and Notepad++. If you’re a BDD addict, then VS Code is a great choice for writing Gherkin features, too! There are a number of extensions for Gherkin. Which one is the best? Below is my recommendation.

TL;DR

Install both:

Extension #1

VS Code has a few free extensions to support Gherkin. The first one I tried was Cucumber (Gherkin) Full Support. This one had the highest number of installs. When I started writing feature files, it provided snippets for each section and syntax colors. The documentation said it could also provide step suggestions (meaning, I type “Given” and it shows me all available Given steps) and navigation to step definition code, but since it looked like it only worked for JavaScript, I didn’t try it myself. that left me with no step suggestions. The indentation looked off, too. Not perfect. I wanted a better extension.

This slideshow requires JavaScript.

Extension #2

The second one I tried was Snippets and Syntax Highlight for Gherkin (Cucumber). It provides colorful syntax highlighting and a few three-letter snippets for Gherkin keywords. When I typed “fea”, a full template for a Feature section appeared with user story stubs (“In order to ___, As a ___, I want ___”). Nice! Good practice. The “sce” snippet did the same thing for the Scenario section with Given, When, and Then steps. Each section was indented nicely, too. The only downside was the lack of a snippet for Examples tables. Nevertheless, tables were still highlighted. But again, no step suggestions.

This slideshow requires JavaScript.

Extension #3

The third extension I tried was Feature Syntax Highlight and Snippets (Cucumber). It was very similar to the previous extension, but it used different colors. The snippet shortcuts were also not as intuitive – they used the letter “f” for feature followed by the first letter of the section. For example, “ff” was a Feature section, and “fs” was a Scenario section. Unfortunately, this extension did not provide step suggestions. Comments and example table rows did not get highlighted, either. Personally, I preferred the previous extension’s color scheme.

This slideshow requires JavaScript.

Extension #4

The fourth extension I tried was Gherkin step autocomplete. This one promised step suggestions! However, I had some trouble setting it up. When I enabled the extension by itself, feature files did not show any syntax highlighting, and the steps had no suggestions. What? Lame. What the README doesn’t say is that it relies on a separate extension for feature file support. So, I enabled extension #2 together with this one. Then, I had to move my feature file into a project-root-level directory named “features.” (This path could be customized in the extension’s settings, but “features” is the default.) And, voila! I got pretty colors and step suggestions.

This slideshow requires JavaScript.

But Wait, There’s More!

There were even more extensions for Gherkin. I was happy with #2 and #4, so I didn’t try others. The others also didn’t have as many installations. If anyone finds goodness out of others, please post in the comments!

Why Choose BDD Over Other Test Frameworks?

People are heavily opinionated about Behavior-Driven Development. I frequently hear opponents say things like this:

Why would I use a BDD test framework instead of a traditional test framework like JUnit, NUnit, or pytest? The extra layer of plain language Gherkin steps gets in the way of the automation code. I can directly write code for those steps instead. BDD frameworks require lots of extra work that just doesn’t seem to add value. My team isn’t doing behavior-driven development practices, anyway.

I can sympathize with these sentiments, especially for those who have participated in projects where BDD was done poorly. Even if a team isn’t doing full behavior-driven development practices, I still assert that BDD test automation frameworks are better than traditional test frameworks for most feature testing (above-unit, black box). Here are reasons why.

Separation of Test Cases from Test Code

Test cases and test code are separate concerns. I should be able to design, discuss, and digest a test case without ever touching code. We describe features in plain language, and so we should also describe tests in plain language. Step definitions are nothing more than the automation behind the test case steps. Traditional test frameworks simply don’t have this separation of concerns, even if test methods are loaded with comments.

Guide Rails

BDD frameworks enforce good structure and layers for automation. There are designated places for test cases, step definitions, and support classes. The framework encourages good practices. Traditional test framework, however, are much more free-form. Programmers can do scary and stupid things with test classes. Functionally, a traditional test framework can still be structured well with layers and support classes, but it’s not required. Based on my experiences seeing less experienced automationeers shoving everything into Frankenstein’ed test methods, I much prefer to have the guide rails of a BDD framework.

Inherent Reusability

Steps are the building blocks of test cases, and test cases almost always have the same steps. BDD frameworks identify the step as a unique concern. One step with its definition can be used by any scenario, and steps can be parametrized for flexibility. This creates a “snowball” effect once enough steps have been developed: new tests may not require any new automation code! Traditional test frameworks simply don’t have this mechanism. It could be implemented by calling functions and classes outside of test classes, but not all automationeers are disciplined to do so, and everyone who does it will do it differently.

Aspect-Oriented Controls

Good frameworks handle cross-cutting concerns automatically. Things like logging, dependency injection, and cleanup shouldn’t interfere with test cases. BDD frameworks provide hooks to insert extra logic for these concerns around steps, scenarios, features, and even the whole test suite. Hooks can squeeze into steps because the framework is structured around steps. For example, hooks can automatically log steps to Extent Reports, instead of forcing programmers to write explicit logging calls in each test method.

The HOOKS, me bucko!

Easier Reviews

Nothing ruins your day like an illegible code review on features you don’t know. You are responsible for providing valuable feedback, but you can’t figure out what’s going on in the short amount of time you can dedicate to the review. Good Gherkin, however, makes it easy. A reviewer can review the test case apart from any code first to make sure it is a good test case. At this level, the reviewer could even be a non-technical person like a product owner. Then, the reviewer can either send the test case back with suggestions or, if the test case passes muster, dig deeper into the automation code.

Easier Onboarding

It can be hard to onboard new team members. They have so much to learn about the product, the code base, and the team practices. If tests are written using a BDD framework, then newbies can learn the features simply by reading the behavior specs. New automationeers likewise can rely on existing steps both for reuse and for examples as they develop new tests.

Other Reasons?

I’m sure there are other benefits to BDD frameworks, but these are the big ones for me. It’s an opinionated thing. Feel free to add comments below!

Behavior-Driven Python

This year at PyCon 2018, I gave a talk entitled “Behavior-Driven Python” in which I showed how to use the behave framework for test automation. I then wrote a companion article on Opensource.com. Here’s the link:

What is behavior-driven Python?

In case you missed it, the video recording for the talk is here:

Are Multiple Scenario Outlines in a Feature File Okay?

Recently, someone asked me:

In Gherkin, is it good or bad practice to have multiple Scenario Outlines with Examples tables in one feature file?

The short answer is yes, it is perfectly fine to have multiple Scenario Outlines within one feature file.

However, the unspoken concern with this question is the potential size of the feature file. If one Feature has multiple Scenario Outlines with large feature tables, then the feature file could become unreadable. Remember, Gherkin is a specification language, not a programming language. A feature file should look more like a meaningful behavior example than a giant wall of text or a low-level test script. Make sure to follow good Gherkin guidelines:

Follow the Golden Gherkin Rule: Treat other readers as you would want to be treated.
Follow the Cardinal Rule of BDD: One scenario, one behavior.
Write declarative steps, not imperative ones.
Try to limit the number of steps in each scenario to single digits.
Use only a few rows and columns per example table.

Use, but don’t abuse, the templating facet of Scenario Outlines!

Python Testing 101: behave

Warning: If you are new to BDD, then I strongly recommend reading the BDD 101 series before trying to use the behave framework.

Overview

behave is a behavior-driven (BDD) test framework that is very similar to Cucumber, Cucumber-JVM, and SpecFlow. BDD frameworks are unique in that test cases are not written in raw programming code but rather in plain specification language that is then “glued” to code. The “behavior specs” help to define what the behavior is, and steps can be reused by multiple test cases (or “scenarios”). This is very different from more traditional frameworks like unittest and pytest. Although behave is not an official Cucumber variant, it still uses the Gherkin language (“Given-When-Then”) for behavior specification.

Test scenarios are written in Gherkin “.feature” files. Each Given, When, and Then step is “glued” to a step definition – a Python function decorated by a matching string in a step definition module. The behave framework essentially runs feature files like test scripts. Hooks (in “environment.py”) and fixtures can also insert helper logic for test execution.

behave is officially supported for Python 2, but it seems to run just fine using Python 3.

Installation

Use pip to install the behave module.

pip install behave

Project Structure

Since behave is an opinionated framework, it has a very opinionated project structure. All code must be located under a directory named “features”. Gherkin feature files and the “environment.py” file for hooks must appear under “features”, and step definition modules must appear under “features/steps”. Configuration files can store common execution settings and even override the path to the “features” directory.

Note: Step definition module names do not need to be the same as feature file names. Any step definition can be used by any feature file within the same project.

[project root directory]
|‐‐ [product code packages]
|-- features
|   |-- environment.py
|   |-- *.feature
|   `-- steps
|       `-- *_steps.py
`-- [behave.ini|.behaverc|tox.ini|setup.cfg]

Example Code

An example project named behavior-driven-python located in GitHub shows how to write tests using behave. This section will explain how the Web tests are designed.

The top layer in a behave project is the set of Gherkin feature files. Notice how the scenario below is concise, focused, meaningful, and declarative:

@web @duckduckgo
Feature: DuckDuckGo Web Browsing
  As a web surfer,
  I want to find information online,
  So I can learn new things and get tasks done.

  # The "@" annotations are tags
  # One feature can have multiple scenarios
  # The lines immediately after the feature title are just comments

  Scenario: Basic DuckDuckGo Search
    Given the DuckDuckGo home page is displayed
    When the user searches for "panda"
    Then results are shown for "panda"

Each scenario step is “glued” to a decorated Python function called a step definition. Step defs can use different types of step matchers and can also take parametrized inputs:

from behave import *
from selenium.webdriver.common.keys import Keys

DUCKDUCKGO_HOME = 'https://duckduckgo.com/'

@given('the DuckDuckGo home page is displayed')
def step_impl(context):
  context.browser.get(DUCKDUCKGO_HOME)

@when('the user searches for "{phrase}"')
def step_impl(context, phrase):
  search_input = context.browser.find_element_by_name('q')
  search_input.send_keys(phrase + Keys.RETURN)

@then('results are shown for "{phrase}"')
def step_impl(context, phrase):
  links_div = context.browser.find_element_by_id('links')
  assert len(links_div.find_elements_by_xpath('//div')) > 0
  search_input = context.browser.find_element_by_name('q')
  assert search_input.get_attribute('value') == phrase

The “environment.py” file can specify hooks to execute additional logic before and after steps, scenarios, features, and even the whole test suite. Hooks should handle automation concerns that should not be exposed through Gherkin. For example, Selenium WebDriver setup and cleanup should be handled by hooks instead of step definitions because after hooks always get run despite failures, while steps after an abortive failure will not get run.

from selenium import webdriver

def before_scenario(context, scenario):
  if 'web' in context.tags:
    context.browser = webdriver.Firefox()
    context.browser.implicitly_wait(10)

def after_scenario(context, scenario):
  if 'web' in context.tags:
    context.browser.quit()

Test Launch

behave boasts a powerful command line with many options. Below are common use case examples when running tests from the project root directory:

# Run all scenarios in the project
behave

# Run all scenarios in a specific feature file
behave features/web.feature

# Filter tests by tag
behave --tags-help
behave --tags @duckduckgo
behave --tags ~@unit
behave --tags @basket --tags @add,@remove

# Write a JUnit report (useful for Jenkins and other CI tools)
behave --junit

# Don't print skipped scenarios
behave -k

Pros and Cons

Like all BDD test frameworks, behave is opinionated. It works best for black box testing due to its behavior focus. Web testing would be a great use case because user interactions can easily be described using plain language. Reusable steps also foster a snowball effect for automation development. However, behave would not be good for unit testing or low-level integration testing – the verbosity would become more of a hindrance than a helper.

My recommendation is to use behave for black box testing if the team has bought into BDD. I would also strongly consider pytest-bdd as an alternative BDD framework because it leverages all the goodness of pytest.

5 Things I Love About SpecFlow

SpecFlow, a.k.a. “Cucumber for .NET,” is a leading BDD test automation framework for .NET. Created by Gáspár Nagy and maintained as a free, open source project on GitHub by TechTalk, SpecFlow presently has almost 3 million total NuGet downloads. I’ve used it myself at a few companies, and, I must say as an automationeer, it’s awesome! SpecFlow shares a lot in common with other Cucumber frameworks like Cucumber-JVM, but it is not a knockoff – it excels in many ways. Below are five features I love about SpecFlow.

#1: Declarative Specification by Example

SpecFlow is a behavior-driven test framework. Test cases are written as Given-When-Then scenarios in Gherkin “.feature” files. For example, imagine testing a cucumber basket:

Feature: Cucumber Basket
  As a gardener,
  I want to carry many cucumbers in a basket,
  So that I don’t drop them all.
  
  @cucumber-basket
  Scenario: Fill an empty basket with cucumbers
    Given the basket is empty
    When "10" cucumbers are added to the basket
    Then the basket is full

Notice a few things:

It is declarative in that steps indicate what should be done at a high level.
It is concise in that a full test case is only a few lines long.
It is meaningful in that the coverage and purpose of the test are intuitively obvious.
It is focused in that the scenario covers only one main behavior.

Gherkin makes it easy to specify behaviors by example. That way, everybody can understand what is happening. C# code will implement each step in lower layers. Even if your team doesn’t do the full-blown BDD process, using a BDD framework like SpecFlow is still great for test automation. Test code naturally abstracts into separate layers, and steps are reusable, too!

#2: Context is King

Safely sharing data (e.g., “context”) between steps is a big challenge in BDD test frameworks. Using static variables is a simple yet terrible solution – any class can access them, but they create collisions for parallel test runs. SpecFlow provides much better patterns for sharing context.

Context injection is SpecFlow’s simple yet powerful mechanism for inversion of control (using BoDi). Any POCOs can be injected into any step definition class, either using default values or using a specific initialization, by declaring the POCO as a step def constructor argument. Those instances will also be shared instances, meaning steps across different classes can share the same objects! For example, steps for Web tests will all need a reference to the scenario’s one WebDriver instance. The context-injected objects are also created fresh for each scenario to protect test case independence.

Another powerful context mechanism is ScenarioContext. Every scenario has a unique context: title, tags, feature, and errors. Arbitrary objects can also be stored in the context object like a Dictionary, which is a simple way to pass data between steps without constructor-level context injection. Step definition classes can access the current scenario context using the static ScenarioContext.Current variable, but a better, thread-safe pattern is to make all step def classes extend the Steps class and simply reference the ScenarioContext instance variable.

#3: Hooks for Any Occasion

Hooks are special methods that insert extra logic at critical points of execution. For example, WebDriver cleanup should happen after a Web test scenario completes, no matter the result. If the cleanup routine were put into a Then step, then it would not be executed if the scenario had a failure in a When step. Hooks are reminiscent of Aspect-Oriented Programming.

Most BDD frameworks have some sort of hooks, but SpecFlow stands out for its hook richness. Hooks can be applied before and after steps, scenario blocks, scenarios, features, and even around the whole test run. (Cucumber-JVM, by contrast, does not support global hooks.) Hooks can be selectively applied using tags, and they can be assigned an order if a project has multiple hooks of the same type. Hook methods will also be picked up from any step definition class. SpecFlow hooks are just awesome!

#4: Thorough Outline Templating

Scenario Outlines are a standard part of Gherkin syntax. They’re very useful for templating scenarios with multiple input combinations. Consider the cucumber basket again:

Feature: Cucumber Basket
  
  Scenario Outline: Add cucumbers to the basket
    Given the basket has "<initial>" cucumbers
    When "<some>" cucumbers are added to the basket
    Then the basket has "<total>" cucumbers

    Examples: Counts
      | initial | some | total |
      | 1       | 2    | 3     |
      | 5       | 3    | 8     |

All BDD frameworks can parametrize step inputs (shown in double quotes). However, SpecFlow can also parametrize the non-input parts of a step!

Feature: Cucumber Basket
  
  Scenario Outline: Use the cucumber basket
    Given the basket has "<initial>" cucumbers
    When "<some>" cucumbers are <handled-with> the basket
    Then the basket has "<total>" cucumbers

    Examples: Counts
      | initial | some | handled-with | total |
      | 1       | 2    | added to     | 3     |
      | 5       | 3    | removed from | 2     |

The step definitions for the add and remove steps are separate. The step text for the action is parametrized, even though it is not a step input:

[When(@"""(\d+)"" cucumbers are added to the basket")]
public void WhenCucumbersAreAddedToTheBasket(int count) { /* */ }

[When(@"""(\d+)"" cucumbers are removed from the basket")]
public void WhenCucumbersAreRemovedFromTheBasket(int count) { /* */ }

That’s cool!

#5: Test Thread Affinity

SpecFlow can use any unit test runner (like MsTest, NUnit, and xUnit.net), but TechTalk provides the official SpecFlow+ Runner for a licensed fee. I’m not associated with TechTalk in any way, but the SpecFlow+ Runner is worth the cost for enterprise-level projects. It has a friendly command line, a profile file to customize execution, parallel execution, and nice integrations.

The major differentiator, in my opinion, is its test thread affinity feature. When running tests in parallel, the major challenge is avoiding collisions. Test thread affinity is a simple yet powerful way to control which tests run on which threads. For example, consider testing a website with user accounts. No two tests should use the same user at the same time, for fear of collision. Scenarios can be tagged for different users, and each thread can have the affinity to run scenarios for a unique user. Some sort of parallel isolation management like test thread affinity is absolutely necessary for test automation at scale. Given that the SpecFlow+ Runner can handle up to 64 threads (according to TechTalk), massive scale-up is possible.

But Wait, There’s More!

SpecFlow is an all-around great test automation framework, whether or not your team is doing full BDD. Feel free to add comments below about other features you love (or *gasp* hate) about SpecFlow!