BDD Example Mapping

The two major goals of Behavior-Driven Development are better collaboration and automation. Even when the Three Amigos actually get together, collaboration can be tough. Where do we start? What scenarios should we write? What examples should be included?

Well, the Cucumber folks have a practice called “Example Mapping” to make it easier. All you need is a pack of index cards and a big table!

  1. Write the story under discussion on a yellow at the top of the table.
  2. Write a rule for each known acceptance criteria on a blue card under the story.
  3. Write each example for a rule on a green card.
  4. Write each open question on a red card on the side to discuss later.

Keep writing cards until the team is satisfied with the story. This process provides clear, fast feedback for stories. A team can quickly see if a story is too big or needs further refinement. Engineers can easily turn example cards into Gherkin scenarios.

Rather than duplicate documentation here, please read Matt Wynne’s seminal post on the practice, Introducing Example Mapping.

Also, watch this webinar recording from Cucumber about Example Mapping:

Are Gherkin Scenarios with Multiple When-Then Pairs Okay?

Don’t know about Behavior-Driven Development or Gherkin? Start here!

Writing Gherkin is easy, but writing good Gherkin is hard. My post BDD 101: Writing Good Gherkin covers many aspects of good behavior specification, including titles, phrasing, and data. One of the major points I make anytime I discuss good Gherkin is what I call the “Cardinal Rule of BDD.”

The Cardinal Rule of BDDOne Scenario, One Behavior!

A behavior scenario specification should focus on one individual behavior. This is the essence of the BDD mindset – a product’s features can be specified in terms of its behaviors, and the specs should be written as examples of those behaviors in action. Identifying individual behaviors brings clarity to design, development, and testing. Combining behaviors into a single scenario causes ambiguity, miscommunication, and test gaps. Test failure triage also becomes more difficult and time consuming because the root causes for failures are less clear – the culprit could be one of multiple behaviors. There is also a high risk of duplication when scenarios repeat the same sequence of steps instead of isolating behaviors.

One of the dead giveaways to violations of the Cardinal Rule of BDD is when a Gherkin scenario has multiple When-Then pairs, like this:

Feature: Google Searching

  Scenario: Google Image search shows pictures
    Given the user opens a web browser
    And the user navigates to ""
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

A When-Then pair denotes a unique behavior. In this example, the behaviors of performing a search and changing the search to images could and should clearly be separated into two scenarios, like this:

Feature: Google Searching

  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  Scenario: Image search
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

Despite being so central to BDD philosophy, the Cardinal Rule is the one thing people always try to sidestep. Nobody ever doubts the usefulness of step parameters or the need for good grammar, but people frequently show me scenarios with multiple When-Then pairs and basically ask for an exception from the rule. My gut reaction is always, “NO! Rules don’t change.”


I must first admit that the Cardinal Rule of BDD is “opinionated” – it is the way that I have found BDD to work best for collaboration and automation. Adherence forces people to adopt a behavior-driven mindset, and strictness keeps feature and test quality high. Other experts are more permissive of multiple When-Then pairs, though. Most examples I could find from leading sources such as The Cucumber Book exhibit strict Given-When-Then order for Gherkin scenarios, but other sources such as the online JBehave documentation show scenarios with multiple When-Then pairs boldly on the front page.

I must also begrudgingly admit that there are times when it is simply more convenient for a single scenario to have multiple behaviors (and thus multiple When-Then pairs). This is by no means a best practice but rather a pragmatic alternative for specification dilemmas. (See Purist vs. Pragmatist.) Below are situations in which multiple When-Then pairs may be acceptable.

Lengthy End-to-End Scenarios

End-to-end tests verify execution paths through a live system with all of its parts. Web UI tests frequently fall into this category: Selenium WebDriver interacts with a page in a browser, which then triggers calls to a backend service layer or database. Despite the name, end-to-end tests may still focus on one individual behavior. The example scenarios above, though short, technically count as end-to-end tests.

However, many people use the term “end-to-end” to refer to tests that cover sequences of behaviors. Such a scenario could violate the Cardinal Rule of BDD if it is not handled carefully. My article BDD 101: Unit, Integration, and End-to-End Tests gives strategies for handling lengthy end-to-end scenarios. One strategy is to simply turn a blind eye to multiple When-Then pairs. Ideally, each behavior would already have its own individual scenario, but then a new scenario would explicitly combine the behaviors together to get that full, end-to-end path. The new scenario would be easy to write because the steps could be reused. This isn’t the only strategy, so please be sure to consider the others before writing the tests.


Software system audits frequently require lengthy end-to-end scenarios. They are quite common in highly-regulated domains. For example, a bank may need to prove that a loan is prepared correctly or that a transaction puts money into the right accounts. Auditors typically require tests to run through entire system paths (e.g., multiple behaviors) using the same records, such as one loan application or one payment. Auditees must not only provide test results for past runs but must also repeat tests on demand. Separating each individual behavior into its own scenario makes each test independent, so during test execution, there will be no guaranteed order and no shared test data, and auditors would not have the end-to-end verification that they require. The simplest way to give the auditors what they need is to write one lengthy scenario with multiple When-Then pairs.

Service Calls

Service call testing is another case for which multiple When-Then pairs may be pragmatically justified. REST, SOAP, and WSDL are examples of service call types. Service layer development is more engineering-centric than business-centric, but many teams nevertheless choose to test service calls with Gherkin-based frameworks like Cucumber. Due to the programmatic nature of services, Gherkin scenarios for service calls tend to be quite imperative: specify a request, make the call, and verify parts of the response. This isn’t so bad for independent service calls, but it becomes a problematic when one request needs another call’s response.

One solution is the classic “pure” scenario split: put any necessary setup, including initial requests to get required response parts, into custom Given steps. This abides by the Cardinal Rule and avoids duplicate When-Then pairs. But, it introduces an unsavory form of code duplication. Many service calls end up being written twice: once as a Gherkin scenario for testing, and once in the underlying automation code to be called by Given steps. This violates the DRY principle.

The alternative “pragmatic” solution is to write scenarios that specify multiple service calls in the Gherkin steps. The Karate project advocates this approach, as shown in their “Hello World” example:

Take Caution!

There may be other cases when When-Then repetition is useful. Feel free to leave suggestions in the comments below. My examples are meant to be descriptive, not prescriptive. Another aspect to consider is that allowing multiple When-Then pairs per scenario indicates that a team sees more value in BDD’s test framework than in its collaborative spec process. (Refer to ‑‑BDD; Automation without Collaboration and BDD‑‑; Collaboration without Automation.)

Ultimately, you must decide what practices are best for your project. The main reason I uphold the Cardinal Rule of BDD so strongly is that it makes for good specs and good tests. I’ve seen engineers write extremely long, intensive test procedures (and I mean, dozens of duplicate behaviors per test) that are alright for manual testing but do not transition well into automation because they are too fragile and they don’t yield useful information upon failure. The Cardinal Rule is a way to break out of the procedure-driven mindset, and banning multiple When-Then pairs per Gherkin scenario is an effective rule for enforcing it.

Good Gherkin Scenario Titles

Don’t know about Behavior-Driven Development or Gherkin? Start here!

The Golden Gherkin Rule states:

Treat other readers as you would want to be treated. Write Gherkin so that people who don’t know the feature will understand it.

Part of writing good Gherkin (or any other specification-by-example language) includes writing good behavior scenario titles. The title is the face of the scenario: it summarizes what the behavior is all about. Good titles make collaboration and test triage a breeze, whereas bad titles make it tougher. But what makes a title “good”? Below are some helpful pointers.


Good titles should be short one-liners. One simple statement should be sufficient to concisely capture the intended behavior. Anything longer likely means that either the author doesn’t truly understand the behavior in focus, or that the scenario does not focus on one main behavior. Extra comments may be added to supplement the scenario’s description if necessary to avoid lengthy titles. Also, most BDD test automation frameworks will print scenario titles to logs for traceability.

Bad Example Good Example
The user can log into the app, navigate to the profile page, and see their full name, address, phone number, email, and username The profile page displays the user’s personal info

Conjunction Disjunction

Watch out for conjunction words like “and,” “or,” and “but.” Conjunctions typically imply that more than one thing will be done, which for scenario titles implies that more than one behavior will be covered. Or, it indicates that a Scenario Outline may be appropriate Don’t break the Cardinal Rule of BDD! Keep each scenario focused on one main behavior.

Avoid other conjunctions like “because,” “since,” and “so” as well. Phrases starting with those words often give an explanation for why the scenario exists. However, for conciseness, scenario titles should focus on what the behavior is. The why can either be deduced from the steps or made plain with comments.

Bad Example Good Example
The user can request an insurance quote from the big “Get-A-Quote” button on the home page or from the “Insurance Policies” page Two Scenarios: The user requests an insurance quote from the “Get-A-Quote” button on the home page / The user requests an insurance quote from the “Insurance Policies” page


Scenario Outline: The user requests an insurance quote

The last five search phrases are saved so that the user can rerun them from the history page The history page saves the last five search phrases

Avoid Assertion Language

Don’t use the words “verify,” “assert,” or “should” in scenario titles. They put the scenario’s emphasis on the assertion rather than the behavior. Assertions are merely a facet of behavior testing – they verify that something exists or that two values are equal. Behavior scenarios, however, are full software specifications. BDD is a development practice for making better software products – it’s not just a test tool. Don’t reduce the behavior-driven mindset to a test-only mindset.

Furthermore, leading every scenario title with “verify” or “assert” becomes very repetitive. The words just don’t enhance the meaningfulness of the title. They also thwart alphabetical order.

Bad Example Good Example
Verify the user can change their address on the profile page Profile page address change
Assert that a stock quote is displayed in green text when its value is higher than its previous closing value A stock quote has green text when its value is higher than its previous closing value
The goodbye page should be displayed after a successful logout Logout displays the goodbye page


Do you have any more suggestions? Put them in the comments below!

Pipe Character Escape for Gherkin Tables

For the first time today, I had to write a Gherkin behavior scenario in which table text needed to use the pipe character “|”. I wanted a generic step that would find and click web page links by name, and one of the link names had the pipe in it! The first version of the step I wrote looked like this:

When the user follows the links:
  | link              |
  | Category          |
  | Sub-Category      |
  | Index|Description |

Naturally, this step didn’t parse – the “|” was parsed as a table delimiter instead of the intended link text. I could have rewritten the step to search for partial link text, or I could have done a key-value lookup, but I wanted to keep the step simple and direct.

The solution was simple: escape the pipe character “|” with a backslash character “\”. Easy! Thanks, StackOverflow! The updated table looks like this:

When the user follows the links:
 | link               |
 | Category           |
 | Sub-Category       |
 | Index\|Description |

“\|” works for both step tables and scenario outline example tables. It looks like it is fairly standard for test frameworks that use Gherkin. I verified that Cucumber-JVM and SpecFlow support it, and it looks like Cucumber for Ruby does as well. It looks like behave will support it in 1.2.6.

After learning this trick, I updated the BDD 101: The Gherkin Language page.

Note that backslash escape sequences won’t work for quotes in Gherkin steps. Quotes in steps are merely conventions and not part of the Gherkin language standard.

YAML Comments in Gherkin Feature Files

In Gherkin-based BDD test frameworks, feature files hold behavior scenarios with Given-When-Then steps. Features and scenarios may be categorized by tags for hooks and filtering, and additional comment lines may be added anywhere. However, Gherkin itself may not be sufficient enough to capture all desired test metadata. Tags are great for simple classification but crude for larger information. And comments are meaningful only to the reader.

Fraser Scott (zeroXten) came up with a nifty idea for improving Gherkin information while working on the OWASP Cloud Security project: write YAML comments in feature files to provide more formal documentation. As stated on the project home page, “The OWASP Cloud Security project aims to help people secure their products and services running in the cloud by providing a set of easy to use threat and control BDD stories that pool together the expertise and experience of the development, operations and security communities.” It’s a pretty cool idea – use Gherkin to model attacks for both education and automation. The team is writing YAML comments at the top of feature files to provide custom information in a clean, readable format that could also be easily parsed by other tools. Below is an example feature file I copied from the project, with YAML comments at the top:

At first, I wasn’t too thrilled by the thought of YAML comments in feature files. Gherkin should provide all specification needs, and tag classification is often needed for automation. However, the YAML comments are quite clean, and for this project, they appear to document aspects of the scenarios that shouldn’t be buried in Gherkin (such as confirmation status and reference links). YAML is a very sensible format for formalized comments, too.

Take this idea as food for thought: YAML comments can be an effective way to add metadata to Gherkin feature files. Just make sure to capture all behavior specification using Gherkin and to still use tags for automation.

The Airing of Grievances: BDD

Behavior-Driven Development – one of my favorite blog topics. When done right, it’s a wonderful way to foster better collaboration and automation. When it’s not… well, let’s just say I got a lot of problems with bad BDD practices, and now you’re gonna hear about it!


Treating BDD as a Tool and Not as a Process

BDD is a process – it is a set of tools and practices designed to help teams deliver better software. BDD is not just a test automation framework; the framework is just one of the tools that support BDD. Heck, the word “development” is in the name!

Complaining that Gherkin is Too Technical

Really? Really!? Gherkin is basically just plain language with some buzzwords mixed in! It is specifically designed for non-technical people to handle it! It is not a full-fledged programming language – it is essentially a simple format for behavior specification that automation frameworks can easily parse. The steps are meant to be read like plain English (or any other spoken language) so that better collaboration can happen. If Gherkin is “too technical” for you, then I hate to know what isn’t.

No Buy-In from All Roles

The three major roles on an Agile team, a.k.a the “Three Amigos,” are biz, dev, and test (regardless of fancy names or assignments). For BDD to work well, all three role types must embrace it. Otherwise, collaboration will suffer. BDD is not just a QA thing, it’s for everyone. Biz gets better features in shorter time because requirements were communicated better. Dev wastes less time figuring out what biz wants and gets tests faster. Test can start automating right away since test scenarios are defined from the start in Gherkin. Everybody wins if everybody contributes.

No Three Amigos Meetings

Three Amigos meetings are like dietary fiber supplements: they help a team stay regular with collaboration, or else development gets constipated as engineers start building crap instead of the intended behaviors. Then the crap gets blocked up as the team must rework it, meaning it could be another sprint before there’s a healthy flush of new features. Open conversations in regularly scheduled Three Amigos meetings would have avoided the whole obstruction.

Forcing QA to Write All Behavior Scenarios

BDD is not just QA thing – it is for all roles. Pigeonholing the responsibility of writing behavior scenarios onto QA is not only unfair, it is anti-collaborative. The whole reason for writing scenarios in plain language with Gherkin is to let everyone contribute to feature behavior. Scenarios are primarily about capturing behavior, not writing tests. If tests were the main focus, then engineers could just write test cases using traditional automation frameworks directly in general purpose programming languages like Java or Python. BDD offers the benefits of process efficiency and shifting left when the whole team helps to write behavior scenarios.


Bad Gherkin

Only you can prevent bad Gherkin. Or I can – via rejected code reviews.

Typos, Poor Grammar, and Inconsistent Formatting

Gherkin needs to be readable. Steps with typos, poor grammar, and inconsistent formatting will still run fine for test automation, but they make it tough to understand the behaviors they describe. Sometimes, they can even make the meaning ambiguous.

No Double-Quotes Around Step Parameters

How do you know if something is a step parameter? “Double quotes” make it easy. However, Gherkin does not enforce double quotes around parameters. It is merely by programmer’s convention, but it’s a really helpful convention indeed.

No Tags

Tags make it super easy to filter scenarios at runtime. No tags? Good luck remembering long paths and names at runtime, or running related scenarios across different feature files together.

More Than 120 Characters per Line

Any longer is too much to comprehend. Either write the step more concisely, or split it apart. Plus, the line may go off the edge of the screen.

More Than 10 Steps per Scenario

Again, any longer is too much to comprehend. Scenarios should be short and sweet – they should concisely describe behavior. Too many steps means the scenario is too imperative or covers more than one behavior.

Multiple Behaviors per Scenario

Scenarios should not have multiple personality disorder: one scenario, one behavior. Don’t break the Cardinal Rule of BDD! So many people break this rule when they first start BDD because they are locked into procedure-driven thinking. Then, when tests fail, nobody knows exactly what behavior is the culprit. One scenario, one behavior.

Out-of-Order Step Types

Givens, Whens, and Thens each serve a specific, ordered purpose: Given some initial state, When actions are taken, Then verify an expected outcome. Jumbling them up ruins their meaning. Furthermore, duplicate When-Then pairs indicate multiple behaviors per scenario. And don’t just reassign step types to skirt the strict-ordering rule. Do it right – put integrity into the steps!

Gigantic Tables

Have you ever seen an Examples table with 13 columns? Or maybe 517 rows? I have. The horror, the horror! Tables that big make scenarios lose any semblance of specification-by-example. Make sure table rows and columns are actually needed. Use key-value lookups if the data is too gritty.

Being Imperative Rather Than Declarative

Given I’m logged into the app, when I click here, and I click there, and I type P, and I type L, and I type E, and I type A, and I type S, and I type E, and I type D, and I type O, and I type N, and I type T, and I type W, and I type R, and I type I, and I type T, and I type E, and I type S, and I type C, and I type E, and I type N, and I type A, and I type R, and I type I, and I type O, and I type S, and I type L, and I type I, and I type K, and I type E, and I type T, and I type H, and I type I, and I type S, then go directly to jail, and do not pass GO, and do not collect $200. Steps should focus more on what than how.

Prefixing Existing Test Procedure Steps with Gherkin Buzzwords

Let’s just take our existing test procedures from a tool like HP QualityCenter or ALM and put the words “Given,” “When,” and “Then” in front of every step. Ta-da! We’re now doing BDD! …WRONG!! I kid you not, I have see this happen. These people clearly never took BDD 101. It hurts to see.


Unorganized Step Definitions

Programmers like to throw their step definition methods anywhere. Add ’em to an unrelated existing class? Create a whole new class for only two new steps? Mix up the types? Who cares! Don’t bother to alphabetize them, either. Well, that’s how tech debt happens. That’s how duplicate steps get written, because originals can’t be found. Imagine a library without the Dewey Decimal System – that’s what an unorganized step def collection will be.

Putting Cleanup Code in Then Steps

Cleanup code belongs in After hooks, where it will be run no matter what fails during the scenario. Writing Then steps to do cleanup not only breaks step type integrity, the cleanup code will not run if a previous step aborts!

Catching and Burying All Exceptions

Here’s something I see all the time in automation code (and not just for BDD):

// FYI - This is Java, but the same thing can happen in any language
@When("^do something$")
public void doSomething() throws Throwable {
  try {
  catch (Exception e) {

The entirety of a step (or even a whole test) is surrounded by a try-catch that catches every exception. THIS STEP CAN NEVER REGISTER A FAILURE! Even if there was a failed assertion or, worse, an exception that ought to abort the test, it will get caught and buried with not much more than a slight whimper in the log. In this case, the test will carry forth to the next step, which will probably not work, either. I’ve seen projects with this sort of exception handling around every single step definition. In modern test frameworks, the framework will catch all exceptions at the highest level, register the test as failed, and move on safely to the next test. There is no need to catch any exception, unless the test can be recovered.

Changing Steps Without Testing Affected Scenarios

Sharing steps is a wonderful thing, but changing steps without testing all scenarios that use them is a terrible thing. I’ve seen people change step text or step def code and test only their new scenarios. Meanwhile, in the continuous integration environment, a dozen other tests using those steps started failing. (Hell, I’ve seen people push code that doesn’t even compile, but that’s another grievance.)

Multiple Names for the Same Step

Just because you can do something doesn’t mean you should. Different names for the same step may be useful for readability, but please keep the name variants limited.

No Dependency Injection

Dependency injection is the best way to share objects in an automation framework. (Singletons work well, too, but DI allows more careful control of scope.) Many frameworks like Cucumber-JVM even integrate with existing DI frameworks like PicoContainer and Spring. DON’T MAKE NON-CONSTANT VARIABLES GLOBAL! DON’T BLINDLY MAKE THINGS “STATIC” JUST TO SHARE THEM! Globals (or “statics” in Java/C# like languages) are dangerous: they can be easily misused, they are a nightmare to trace, and they can break multithreaded execution. Just use the appropriate design pattern: dependency injection.

Gherkin Syntax Highlighting in Chrome

Google Chrome is one of the most popular web browsers around. Recently, I discovered that Chrome can edit and display Gherkin feature files. The Chrome Web Store has two useful extensions for Gherkin: Tidy Gherkin and Pretty Gherkin, both developed by Martin Roddam. Together, these two extensions provide a convenient, lightweight way to handle feature files.

Tidy Gherkin

Tidy Gherkin is a Chrome app for editing and formatting feature files. Once it is installed, it can be reached from the Chrome Apps page (chrome://apps/). The editor appears in a separate window. Gherkin text is automatically colored as it is typed. The bottom preview pane automatically formats each line, and clicking the “TIDY!” button in the upper-left corner will format the user-entered text area as well. Feature files can be saved and opened like a regular text editor. Templates for Feature, Scenario, and Scenario Outline sections may be inserted, as well as tables, rows, and columns.

Another really nice feature of Tidy Gherkin is that the preview pane automatically generates step definition stubs for Java, Ruby, and JavaScript! The step def code is compatible with the Cucumber test frameworks. (The Java code uses the traditional step def format, not the Java 8 lambdas.) This feature is useful if you aren’t already using an IDE for automation development.

Tidy Gherkin has pros and cons when compared to other editors like Notepad++ and Atom. The main advantages are automatic formatting and step definition generation – features typically seen only in IDEs. It’s also convenient for users who already use Chrome, and it’s cross-platform. However, it lacks richer text editing features offered by other editors, it’s not extendable, and the step def gen feature may not be useful to all users. It also requires a bit of navigation to open files, whereas other editors may be a simple right-click away. Overall, Tidy Gherkin is nevertheless a nifty, niche editor.

This slideshow requires JavaScript.

Pretty Gherkin

Pretty Gherkin is a Chrome extension for viewing Gherkin feature files through the browser with syntax highlighting. After installing it, make sure to enable the “Allow access to the file URLs” option on the Chrome Extensions page (chrome://extensions/). Then, whenever Chrome opens a feature file, it should display pretty text. For example, try the GoogleSearch.feature file from my Cucumber-JVM example project, cucumber-jvm-java-example. Unfortunately, though, I could not get Chrome to display local feature files – every time I would try to open one, Chrome would simply download it. Nevertheless, Pretty Gherkin seems to work for online SCM sites like GitHub and BitBucket.

Since Pretty Gherkin is simply a display tool, it can’t really be compared to other editors. I’d recommend Pretty Gherkin to Chrome users who often read feature files from online code repositories.

This slideshow requires JavaScript.


Be sure to check out other Gherkin editors, too!