Are Gherkin Scenarios with Multiple When-Then Pairs Okay?

Don’t know about Behavior-Driven Development or Gherkin? Start here!

Writing Gherkin is easy, but writing good Gherkin is hard. My post BDD 101: Writing Good Gherkin covers many aspects of good behavior specification, including titles, phrasing, and data. One of the major points I make anytime I discuss good Gherkin is what I call the “Cardinal Rule of BDD.”

The Cardinal Rule of BDDOne Scenario, One Behavior!

A behavior scenario specification should focus on one individual behavior. This is the essence of the BDD mindset – a product’s features can be specified in terms of its behaviors, and the specs should be written as examples of those behaviors in action. Identifying individual behaviors brings clarity to design, development, and testing. Combining behaviors into a single scenario causes ambiguity, miscommunication, and test gaps. Test failure triage also becomes more difficult and time consuming because the root causes for failures are less clear – the culprit could be one of multiple behaviors. There is also a high risk of duplication when scenarios repeat the same sequence of steps instead of isolating behaviors.

One of the dead giveaways to violations of the Cardinal Rule of BDD is when a Gherkin scenario has multiple When-Then pairs, like this:

Feature: Google Searching

  Scenario: Google Image search shows pictures
    Given the user opens a web browser
    And the user navigates to "https://www.google.com/"
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

A When-Then pair denotes a unique behavior. In this example, the behaviors of performing a search and changing the search to images could and should clearly be separated into two scenarios, like this:

Feature: Google Searching

  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  Scenario: Image search
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

Despite being so central to BDD philosophy, the Cardinal Rule is the one thing people always try to sidestep. Nobody ever doubts the usefulness of step parameters or the need for good grammar, but people frequently show me scenarios with multiple When-Then pairs and basically ask for an exception from the rule. My gut reaction is always, “NO! Rules don’t change.”

However…

I must first admit that the Cardinal Rule of BDD is “opinionated” – it is the way that I have found BDD to work best for collaboration and automation. Adherence forces people to adopt a behavior-driven mindset, and strictness keeps feature and test quality high. Other experts are more permissive of multiple When-Then pairs, though. Most examples I could find from leading sources such as The Cucumber Book exhibit strict Given-When-Then order for Gherkin scenarios, but other sources such as the online JBehave documentation show scenarios with multiple When-Then pairs boldly on the front page.

I must also begrudgingly admit that there are times when it is simply more convenient for a single scenario to have multiple behaviors (and thus multiple When-Then pairs). This is by no means a best practice but rather a pragmatic alternative for specification dilemmas. (See Purist vs. Pragmatist.) Below are situations in which multiple When-Then pairs may be acceptable.

Lengthy End-to-End Scenarios

End-to-end tests verify execution paths through a live system with all of its parts. Web UI tests frequently fall into this category: Selenium WebDriver interacts with a page in a browser, which then triggers calls to a backend service layer or database. Despite the name, end-to-end tests may still focus on one individual behavior. The example scenarios above, though short, technically count as end-to-end tests.

However, many people use the term “end-to-end” to refer to tests that cover sequences of behaviors. Such a scenario could violate the Cardinal Rule of BDD if it is not handled carefully. My article BDD 101: Unit, Integration, and End-to-End Tests gives strategies for handling lengthy end-to-end scenarios. One strategy is to simply turn a blind eye to multiple When-Then pairs. Ideally, each behavior would already have its own individual scenario, but then a new scenario would explicitly combine the behaviors together to get that full, end-to-end path. The new scenario would be easy to write because the steps could be reused. This isn’t the only strategy, so please be sure to consider the others before writing the tests.

Audits

Software system audits frequently require lengthy end-to-end scenarios. They are quite common in highly-regulated domains. For example, a bank may need to prove that a loan is prepared correctly or that a transaction puts money into the right accounts. Auditors typically require tests to run through entire system paths (e.g., multiple behaviors) using the same records, such as one loan application or one payment. Auditees must not only provide test results for past runs but must also repeat tests on demand. Separating each individual behavior into its own scenario makes each test independent, so during test execution, there will be no guaranteed order and no shared test data, and auditors would not have the end-to-end verification that they require. The simplest way to give the auditors what they need is to write one lengthy scenario with multiple When-Then pairs.

Service Calls

Service call testing is another case for which multiple When-Then pairs may be pragmatically justified. REST, SOAP, and WSDL are examples of service call types. Service layer development is more engineering-centric than business-centric, but many teams nevertheless choose to test service calls with Gherkin-based frameworks like Cucumber. Due to the programmatic nature of services, Gherkin scenarios for service calls tend to be quite imperative: specify a request, make the call, and verify parts of the response. This isn’t so bad for independent service calls, but it becomes a problematic when one request needs another call’s response.

One solution is the classic “pure” scenario split: put any necessary setup, including initial requests to get required response parts, into custom Given steps. This abides by the Cardinal Rule and avoids duplicate When-Then pairs. But, it introduces an unsavory form of code duplication. Many service calls end up being written twice: once as a Gherkin scenario for testing, and once in the underlying automation code to be called by Given steps. This violates the DRY principle.

The alternative “pragmatic” solution is to write scenarios that specify multiple service calls in the Gherkin steps. The Karate project advocates this approach, as shown in their “Hello World” example:


Feature: karate 'hello world' example
Scenario: create and retrieve a cat
Given url 'http://myhost.com/v1/cats'
And request { name: 'Billie' }
When method post
Then status 201
And match response == { id: '#notnull', name: 'Billie' }
Given path response.id
When method get
Then status 200

Take Caution!

There may be other cases when When-Then repetition is useful. Feel free to leave suggestions in the comments below. My examples are meant to be descriptive, not prescriptive. Another aspect to consider is that allowing multiple When-Then pairs per scenario indicates that a team sees more value in BDD’s test framework than in its collaborative spec process. (Refer to ‑‑BDD; Automation without Collaboration and BDD‑‑; Collaboration without Automation.)

Ultimately, you must decide what practices are best for your project. The main reason I uphold the Cardinal Rule of BDD so strongly is that it makes for good specs and good tests. I’ve seen engineers write extremely long, intensive test procedures (and I mean, dozens of duplicate behaviors per test) that are alright for manual testing but do not transition well into automation because they are too fragile and they don’t yield useful information upon failure. The Cardinal Rule is a way to break out of the procedure-driven mindset, and banning multiple When-Then pairs per Gherkin scenario is an effective rule for enforcing it.

15 comments

  1. Hello Andy!

    While idealistic, the cardinal rule does not really hold out for complex scenarios and client expectations. The gherkin will have to break somewhere in favor of code reuse. “Purist” gherkin, from experience, rarely translates into reusable code with the latter being more valued by business stakeholders. Building new features from existing steps should be an advantage of gherkin and this will be difficult to achieve if the steps are bound to one behavior. I prefer to write gherkin in such a way that each step is an independent unit and can be reused anytime without dependency. Hence, multiple When’s and Then’s can be called by any scenario that needs them. As the library of steps build up, it becomes easier to build new features/scenarios down the line because you have many pieces now – like Legos.

    By no means, I claim this to be the best approach, but I can attest for the code efficiency/reusability and time saved with the “Lego Model” when time is usually not in our luxury. Given that gherkin is step-based, steps are methods, and methods by nature are independent units, it can be a noteworthy point of view. 🙂

    Thank you for this article. I look forward to reading more.

    Like

    1. The Cardinal Rule is not in conflict with reusable, independent steps. At LexisNexis, my team has over a thousand end-to-end scenarios that each focus on one behavior and never duplicate When-Then pairs. These scenarios also reuse steps widely (the “Lego Model”). A lot of times, we write new Given steps that internally call pre-existing steps to establish a new behavior for a new scenario. Separating Givens also gives the advantage to “short-cut” to individual behaviors, too.

      Like

      1. I like your company name. 🙂

        It can be difficult trying to call pre-existing step methods in a new Given method if these previous steps take parameters or even DataTables. I would be interested to see an actual code implementation for that. I imagine it can be convoluted. It is actually my main gripe when writing these purist gherkins. If there is a way around that I would like to know.

        Like

  2. Hi Andy,
    wouldn’t the Given-step “Google search results for ‘panda’ are shown” in the second scenario “Image Search” make the first scenario “Search from the search bar” somewhat redundant? I mean, I can see no way the second scenario would succeed with the first one failing (unless I assume that a search engine can “show search results for a search term” but not “show links related to search term”).
    After all, each scenario should be able to run in isolation, so wouldn’t this Given-step’s step definition be some kind of setup method that actually does what the first scenario does?

    Or did I get something wrong?

    Like

    1. Those two scenarios are written to cover separate but related behaviors. Indeed, the second behavior will not work if the first one failed. However, the first one is not redundant specification because it covers a behavior that is unique and separate.

      Nevertheless, when the scenarios are run as automated tests, there might be some duplicate execution. The step “Google search results for ‘panda’ are shown” essentially re-runs the first scenario as setup for the second. Although I don’t like that duplication, I find test case independence and separate of behaviors to be the greater concern. Plus, the setup step could do some optimization not available to the first scenario. The first scenario must exercise the behavior like a regular user by navigating to the Google home page, typing in a search result, and clicking the search button. The setup to the second scenario could optimize by using a direct search URL with the “q” parameter. That’s okay because the “regular” search was already tested.

      Like

      1. Thanks a lot and yes, I definitely did not consider the idea of a “shortcut” in the setup step of the second scenario.

        I also noticed another aspect I did not consider: If only the second scenario fails, I instantly know that only the image search ist faulty.
        This sure is a big advantage if different teams are reponsible for text search and image search, as I can assign different scenarios to different teams.

        Like

  3. Hi Andy. We are assessing adopting BDD in our organisation.

    One consideration is that we have to maintain assets on our applications, which traditionally means use cases.

    I’ve been looking at whether we can do a simple port from use case to scenario, on the basis:
    – Given = precondition
    – When = user step
    – Then = system step

    This implies multiple When-Then pairs, as we have in our use cases. It means that the ‘behaviour’ = goal – take out cash, change PIN, etc.

    I’ve created an example below. Let me know what you think.

    Dom

    Feature: Use ATM

    Scenario: Validate ID

    Given that I have an account
    And I’ve been issued with a card and PIN
    When I go to use an ATM
    Then the ATM will ask me for my PIN

    When I give it my PIN
    Then the ATM will validate my ID

    Scenario: Take out cash

    Given that the ATM has validated my ID
    When I ask to take out cash
    Then the ATM will ask me how much cash I want to take out

    When I say how much cash I want to take out
    Then the ATM will check that I’ve got enough money in my account
    And spit out the card
    And ask me to take back the card

    When I take the card
    Then the ATM will dispense the cash
    And ask me to take the cash

    Like

    1. Hi Dom,

      Here’s my question back to you: What are your goals in adopting BDD? That question extends into: What problems are you trying to solve? Are you looking to improve collaboration through behavior-driven practices? How important is test automation? How many people and projects will be affected?

      I’ll be direct. Based on what you shared, it looks like your org would simply be adding BDD buzzwords to existing practices and scenarios. I’m skeptical of the value in rewriting existing use cases using Given-When-Then if no further thought is given to the behaviors they intend to cover. My assumption is that your org wants to try BDD because it is having problems with collaboration and/or automation with existing practices. If that’s true, then masking old ways with new words won’t solve the problems.

      If the intention for adopting BDD is purely for collaboration, then rewriting existing use cases is probably nothing but a time tax. It would make more sense to me (not knowing more of your org’s situation) to write new scenarios using Gherkin and leave the old scenarios in place. Then, it would be easier to be guided by behavior-driven thought.

      If the intention is to automate these use cases using a BDD test framework, then I would be extremely hesitant to write the scenarios with multiple When-Then pairs for the reasons mentioned in the article. Automated tests become more fragile and their results become less conclusive with each additional step. New test automation will be a lot of work if you don’t have a framework already in place, so it would probably be better to do it right (writing better behavior scenarios) than to do it fast (just translating existing stuff to get it running).

      I hope this helps! Please feel free to share more info.

      Andy

      Like

  4. Is it right this like Karate alternative?

    Feature: Schedule an appointment
    Scenario: I want to create an appointment

    Given user opens login form
    And system says the email is not registered
    When user clicks register button
    Then should popup a register form

    Given user see the register form
    When fill all the field
    And press the register button
    Then user should see a successful modal
    Examples:
    | required |
    | email |
    | name |

    Like

Leave a comment