BDD 101: Writing Good Gherkin

So, you and your team have decided to make test automation a priority. You plan to use behavior-driven development to shift left with testing. You read the BDD 101 Series up through the previous post. You picked a good language for test automation. You even peeked at Cucumber-JVM or another BDD framework on your own. That’s great! Big steps! And now, you are ready to write your first Gherkin feature file.  You fire open Atom with a Gherkin plugin or Notepad++ with a Gherkin UDL, you type “Given” on the first line, and…

Writer’s block.  How am I supposed to write my Gherkin steps?

Good Gherkin feature files are not easy to write at first. Writing is definitely an art. With some basic pointers, and a bit of practice, Gherkin becomes easier. This post will cover how to write top-notch feature files. (Check the Automation Panda BDD page for the full table of contents.)

The Golden Gherkin Rule: Treat other readers as you would want to be treated. Write Gherkin so that people who don’t know the feature will understand it.

Proper Behavior

The biggest mistake BDD beginners make is writing Gherkin without a behavior-driven mindset. They often write feature files as if they are writing “traditional” procedure-driven functional tests: step-by-step instructions with actions and expected results. HP ALM, qTest, AccelaTest, and many other test repository tools store tests in this format. These procedure-driven tests are often imperative and trace a path through the system that covers multiple behaviors. As a result, they may be unnecessarily long, which can delay failure investigation, increase maintenance costs, and create confusion.

For example, let’s consider a test that searches for images of pandas on Google. Below would be a reasonable test procedure:

  1. Open a web browser.
    1. Web browser opens successfully.
  2. Navigate to https://www.google.com/.
    1. The web page loads successfully and the Google image is visible.
  3. Enter “panda” in the search bar.
    1. Links related to “panda” are shown on the results page.
  4. Click on the “Images” link at the top of the results page.
    1. Images related to “panda” are shown on the results page.

I’ve seen many newbies translate a test like this into Gherkin like the following:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Google Image search shows pictures
    Given the user opens a web browser
    And the user navigates to "https://www.google.com/"
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

This scenario is terribly wrong. All that happened was that the author put BDD buzzwords in front of each step of the traditional test. This is not behavior-driven, it is still procedure-driven.

The first two steps are purely setup: they just go to Google, and they are strongly imperative. Since they don’t focus on the desired behavior, they can be reduced to one declarative step: “Given a web browser is at the Google home page.” This new step is friendlier to read.

After the Given step, there are two When-Then pairs. This is syntactically incorrect: Given-When-Then steps must appear in order and cannot repeat. A Given may not follow a When or Then, and a When may not follow a Then. The reason is simple: any single When-Then pair denotes an individual behavior. This makes it easy to see how, in the test above, there are actually two behaviors covered: (1) searching from the search bar, and (2) performing an image search. In Gherkin, one scenario covers one behavior. Thus, there should be two scenarios instead of one. Any time you want to write more than one When-Then pair, write separate scenarios instead. (Note: Some BDD frameworks may allow disordered steps, but it would nevertheless be anti-behavioral.)

This splitting technique also reveals unnecessary behavior coverage. For instance, the first behavior to search from the search bar may be covered in another feature file. I once saw a scenario with about 30 When-Then pairs, and many were duplicate behaviors.

Do not be tempted to arbitrarily reassign step types to make scenarios follow strict Given-When-Then ordering. Respect the integrity of the step types: Givens set up initial state, Whens perform an action, and Thens verify outcomes. In the example above, the first Then step could have been turned into a When step, but that would be incorrect because it makes an assertion. Step types are meant to be guide rails for writing good behavior scenarios.

The correct feature file would look something like this:

Feature: Google Searching

  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  Scenario: Image search
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

The second behavior arguably needs the first behavior to run first because the second needs to start at the search result page. However, since that is merely setup for the behavior of image searching and is not part of it, the Given step in the second scenario can basically declare (declaratively) that the “panda” search must already be done. Of course, this means that the “panda” search would be run redundantly at test time, but the separation of scenarios guarantees behavior-level independence.

The Cardinal Rule of BDD: One Scenario, One Behavior!

Remember, behavior scenarios are more than tests – they also represent requirements and acceptance criteria. Good Gherkin comes from good behavior.

(For deeper information about the Cardinal Rule of BDD and multiple When-Then pairs per scenario, please refer to my article, Are Gherkin Scenarios with Multiple When-Then Pairs Okay?)

Phrasing Steps

How you write a step matters. If you write a step poorly, it cannot easily be reused. Thankfully, some basic rules maintain consistent phrasing and maximum reusability.

Write all steps in third-person point of view. If first-person and third-person steps mix, scenarios become confusing. I even dedicated a whole blog post entirely to this point: Should Gherkin Steps Use First-Person or Third-Person? TL;DR: just use third-person at all times.

Write steps as a subject-predicate action phrase. It may tempting to leave parts of speech out of a step line for brevity, especially when using Ands and Buts, but partial phrases make steps ambiguous and more likely to be reused improperly. For example, consider the following example:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Google search result page elements
    Given the user navigates to the Google home page
    When the user entered "panda" at the search bar
    Then the results page shows links related to "panda"
    And image links for "panda"
    And video links for "panda"

The final two And steps lack the subject-predicate phrase format. Are the links meant to be subjects, meaning that they perform some action? Or, are they meant to be direct objects, meaning that they receive some action? Are they meant to be on the results page or not? What if someone else wrote a scenario for a different page that also had image and video links – could they reuse these steps? Writing steps without a clear subject and predicate is not only poor English but poor communication.

Also, use appropriate tense and phrasing for each type of step. For simplicity, use present tense for all step types. Rather than take a time warp back to middle school English class, let’s illustrate tense with a bad example:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Simple Google search
    Given the user navigates to the Google home page
    When the user entered "panda" at the search bar
    Then links related to "panda" will be shown on the results page

The Given step above uses present tense, but its subject is misleading. It indicates an action when it says, “Given the user navigates.” Actions imply the exercise of behavior. However, Given steps are meant to establish an initial state, not exercise a behavior. This may seem like a trivial nuance, but it can confuse feature file authors who may not be able to tell if a step is a Given or When. A better phrasing would be, “Given the Google home page is displayed.” It establishes a starting point for the scenario. Use present tense with an appropriate subject to indicate a state rather than an action.

The When step above uses past tense when it says, “The user entered.” This indicates that an action has already happened. However, When steps should indicate that an action is presently happening. Plus, past tense here conflicts with the tenses used in the other steps.

The Then step above uses future tense when it says, “The results will be shown.” Future tense seems practical for Then steps because it indicates what the result should be after the current action is taken. However, future tense reinforces a procedure-driven approach because it treats the scenario as a time sequence. A behavior, on the other hand, is a present-tense aspect of the product or feature. Thus, it is better to write Then steps in the present tense.

The corrected example looks like this:

Feature: Google Searching

  Scenario: Simple Google search
    Given the Google home page is displayed
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

And note, all steps are written in third-person. Read Should Gherkin Steps use Past, Present, or Future Tense? to learn more.

Good Titles

Good titles are just as important as good steps. The title is like the face of a scenario – it’s the first thing people read. It must communicate in one concise line what the behavior is. Titles are often logged by the automation framework as well. Specific pointers for writing good scenario titles are given in my article, Good Gherkin Scenario Titles.

Choices, Choices

Another common misconception for beginners is thinking that Gherkin has an “Or” step for conditional or combinatorial logic. People may presume that Gherkin has “Or” because it has “And”, or perhaps programmers want to treat Gherkin like a structured language. However, Gherkin does not have an “Or” step. When automated, every step is executed sequentially.

Below is a bad example based on a classic Super Mario video game, showing how people might want to use “Or”:

# BAD EXAMPLE! Do not copy.
Feature: SNES Mario Controls

  Scenario: Mario jumps
    Given a level is started
    When the player pushes the "A" button
    Or the player pushes the "B" button
    Then Mario jumps straight up

Clearly, the author’s intent is to say that Mario should jump when the player pushes either of two buttons. The author wants to cover multiple variations of the same behavior. In order to do this the right way, use Scenario Outline sections to cover multiple variations of the same behavior, as shown below:

Feature: SNES Mario Controls

  Scenario Outline: Mario jumps
    Given a level is started
    When the player pushes the "<letter>" button
    Then Mario jumps straight up
    
    Examples: Buttons
      | letter |
      | A      |
      | B      |

The Known Unknowns

Test data can be difficult to handle. Sometimes, it may be possible to seed data in the system and write tests to reference it, but other times, it may not. Google search is the prime example: the result list will change over time as both Google and the Internet change. To handle the known unknowns, write scenarios defensively so that changes in the underlying data do not cause test runs to fail. Furthermore, to be truly behavior-driven, think about data not as test data but as examples of behavior.

Consider the following example from the previous post:

Feature: Google Searching
  
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown
    And the following related results are shown
      | related       |
      | Panda Express |
      | giant panda   |
      | panda videos  |

This scenario uses a step table to explicitly name results that should appear for a search. The step with the table would be implemented to iterate over the table entries and verify each appeared in the result list. However, what if Panda Express were to go out of business and thus no longer be ranked as high in the results? (Let’s hope not.) The test run would then fail, not because the search feature is broken, but because a hard-coded variation became invalid. It would be better to write a step that more intelligently verified that each returned result somehow related to the search phrase, like this: “And links related to ‘panda’ are shown on the results page.” The step definition implementation could use regular expression parsing to verify the presence of “panda” in each result link.

Another nice feature of Gherkin is that step definitions can hide data in the automation when it doesn’t need to be exposed. Step definitions may also pass data to future steps in the automation. For example, consider another Google search scenario:

Feature: Google Searching

  Scenario: Search result linking
    Given Google search results for "panda" are shown
    When the user clicks the first result link
    Then the page for the chosen result link is displayed

Notice how the When step does not explicitly name the value of the result link – it simply says to click the first one. The value of the first link may change over time, but there will always be a first link. The Then step must know something about the chosen link in order to successfully verify the outcome, but it can simply reference it as “the chosen result link”. Behind the scenes, in the step definitions, the When step can store the value of the chosen link in a variable and pass the variable forward to the Then step.

Handling Test Data

Some types of test data should be handled directly within the Gherkin, but other types should not. Remember that BDD is specification by example – scenarios should be descriptive of the behaviors they cover, and any data written into the Gherkin should support that descriptive nature. Read Handling Test Data in BDD for comprehensive information on handling test data.

Less is More

Scenarios should be short and sweet. I typically recommend that scenarios should have a single-digit step count (<10). Long scenarios are hard to understand, and they are often indicative of poor practices. One such problem is writing imperative steps instead of declarative steps. I have touched on this topic before, but I want to thoroughly explain it here.

Imperative steps state the mechanics of how an action should happen. They are very procedure-driven. For example, consider the following When steps for entering a Google search:

  1. When the user scrolls the mouse to the search bar
  2. And the user clicks the search bar
  3. And the user types the letter “p”
  4. And the user types the letter “a”
  5. And the user types the letter “n”
  6. And the user types the letter “d”
  7. And the user types the letter “a”
  8. And the user types the ENTER key

Now, the granularity of actions may seem like overkill, but it illustrates the point that imperative steps focus very much on how actions are taken. Thus, they often need many steps to fully accomplish the intended behavior. Furthermore, the intended behavior is not always as self-documented as with declarative steps.

Declarative steps state what action should happen without providing all of the information for how it will happen. They are behavior-driven because they express action at a higher level. All of the imperative steps in the example above could be written in one line: “When the user enters ‘panda’ at the search bar.” The scrolling and keystroking is implied, and it will ultimately be handled by the automation in the step definition. When trying to reduce step count, ask yourself if your steps can be written more declaratively.

Another reason for lengthy scenarios is scenario outline abuse. Scenario outlines make it all too easy to add unnecessary rows and columns to their Examples tables. Unnecessary rows waste test execution time. Extra columns indicate complexity. Both should be avoided. Below are questions to ask yourself when facing an oversized scenario outline:

  • Does each row represent an equivalence class of variations?
    • For example, searching for “elephant” in addition to “panda” does not add much test value.
  • Does every combination of inputs need to be covered?
    • N columns with M inputs each generates MN possible combinations.
    • Consider making each input appear only once, regardless of combination.
  • Do any columns represent separate behaviors?
    • This may be true if columns are never referenced together in the same step.
    • If so, consider splitting apart the scenario outline by column.
  • Does the feature file reader need to explicitly know all of the data?
    • Consider hiding some of the data in step definitions.
    • Some data may be derivable from other data.

These questions are meant to be sanity checks, not hard-and-fast rules. The main point is that scenario outlines should focus on one behavior and use only the necessary variations.

Style and Structure

While style often takes a backseat during code review, it is a factor that differentiates good feature files from great feature files. In a truly behavior-driven team, non-technical stakeholders will rely upon feature files just as much as the engineers. Good writing style improves communication, and good communication skills are more than just resume fluff.

Below are a number of tidbits for good style and structure:

  1. Focus a feature on customer needs.
  2. Limit one feature per feature file. This makes it easy to find features.
  3. Limit the number of scenarios per feature. Nobody wants a thousand-line feature file. A good measure is a dozen scenarios per feature.
  4. Limit the number of steps per scenario to less than ten.
  5. Limit the character length of each step. Common limits are 80-120 characters.
  6. Use proper spelling.
  7. Use proper grammar.
  8. Capitalize Gherkin keywords.
  9. Capitalize the first word in titles.
  10. Do not capitalize words in the step phrases unless they are proper nouns.
  11. Do not use punctuation (specifically periods and commas) at the end of step phrases.
  12. Use single spaces between words.
  13. Indent the content beneath every section header.
  14. Separate features and scenarios by two blank lines.
  15. Separate examples tables by 1 blank line.
  16. Do not separate steps within a scenario by blank lines.
  17. Space table delimiter pipes (“|”) evenly.
  18. Adopt a standard set of tag names. Avoid duplicates.
  19. Write all tag names in lowercase, and use hyphens (“-“) to separate words.
  20. Limit the length of tag names.

Without these rules, you might end up with something like this:

# BAD EXAMPLE! Do not copy.

 Feature: Google Searching
     @AUTOMATE @Automated @automation @Sprint32GoogleSearchFeature
 Scenario outline: GOOGLE STUFF
Given a Web Browser is on the Google page,
 when The seach phrase "<phrase>" Enter,

 Then  "<phrase>" shown.
and The relatedd   results include "<related>".
Examples: animals
 | phrase | related |
| panda | Panda Express        |
| elephant    | elephant Man  |

Don’t do this. It looks horrible. Please, take pride in your profession. While the automation code may look hairy in parts, Gherkin files should look elegant.

Gherkinize Those Behaviors!

With these best practices, you can write Gherkin feature files like a pro. Don’t be afraid to try: nobody does things perfectly the first time. As a beginner, I broke many of the guidelines I put in this post, but I learned as I went. Don’t give up if you get stuck. Always remember the Golden Gherkin Rule and the Cardinal Rule of BDD!

This is the last of three posts in the series focused exclusively on Gherkin. The next post will address how to adopt behavior-driven practices into the Agile software development process.

181 comments

  1. Hi Andy,

    loved reading this, just a few questions if i may.

    We tend to have some discussions about the AC being too detailed, the development team dont want to have to read rows and rows of scenarios to find out what they are actually developing, but then testing want to make sure they are testing something from start to finish, do you have any suggestions at all please??

    Like

  2. I feel like your examples are a bit forced and only applicable in unit test scenarios.
    For example, in case of your first example, where you separated the scenario into 2. Why the 1st one is needed? Obviously the second ones given should call all the methods and do the same as the 1st one… Its okay if that function is already tested, but 100% coverage mostly applicable for unit tests.
    If you have a long and performance heavy steps in your test, like the login with authentication, then separating tests limitlessly will cause your automation time to be too long to worth it… Writing 1 test that run for like 5 minutes instead of separating it to 10 and let them run for 30 minutes (multiplying the slow performing steps) aren’t worth it from the point of automation. In this case your dream of 1 when-then pair sound good on unit test level but above that, where the performance of the software will heavily affect the testing time, unnecessarily multiplying steps is not applicable anymore.

    Like

    1. Be careful in how you define terms. I would not define the tests you mention as “unit tests” – please see my “Testing” page.

      The reason to separate the behaviors is for understanding them, communicating them, and covering them. Yes, there is a bit of a performance hit, but running tests in parallel with an optimized framework makes it not so painful, and the test results are MUCH easier Ron triage and explain.

      If you are doing time-heavy testing, then you should find ways to optimize. You may also want to read the later article in this series about “lengthy end-to-end tests”.

      Like

      1. In that exact scenario, global hooks would be the tool of optimization. (log in before, log out after). It’s always good to have good models to follow, but sometimes you have to break them. Just don’t stray too far

        For the java version, these are the current options:

        Cucumber-JVM Global Hook Workarounds

        Like

  3. Andy,

    How would you write Gherkin test cases for, uh… visual checks? Would that be appropriate?

    For example:

    Given that the user is on a page that displays a table
    When the user views the tableheaders
    Then the tableheaders are labeled as follows:

    Column 1
    Column 2
    Column 3

    ————

    Is this a a valid/sensible thing to do with Gherkin?

    Like

  4. Thank you Sir for such a useful article. It’s hard to find those kind of quality content about testing and BDD on the Internet.

    Like

  5. Great article Andy. Any thoughts on balancing happy path vs. exception path, at different levels of Fowler’s Testing Pyramid? I have seen organisations testing (more than I would care for) at the UI automation level, which, combined with the Golden Rule (One Scenario One Test) has lead to excessive test run times.

    Like

    1. Do what makes sense. Focus on ROI. Test as close to the code as possible. The higher in the pyramid, the more focus on happy paths and the less focus on exception paths.

      Like

  6. Great post again, thanks!

    I have following question: supports this feature files imports or veriable declaration? Let’s say that in my example table is not simply “A” or “B”, but for example some SQL query. It is possible?

    Like

    1. You could put a SQL query into a step or example table, but that’s not recommended. Gherkin is meant to be a descriptive high-level spec language, not a programming language. It would be better to put SQL queries in the automation code.

      Like

  7. Hi Andy

    Great article, I am wondering about how one should try to keep things DRY with Gherkin. I feel like all my scenarios and feature files for that matter should be able to run independently. Without relying on the preceding scenario or feature for example. How does one go about this? Take this trivial example, Say a user is signing up for an account and we want to check that they can’t use an already taken email. So they enter, name, address, email (already taken). We validate the email taken error message is shown. Our next scenario, we shouldn’t just pick up where the last scenario finished, so we should redo all the steps of the previous scenario, so it runs independently, but change the email and confirm the registration works. Is this how it should be done, or are there better ways to maintain DRY without tight coupling of scenarios?

    Like

    1. Try this:

      Scenario: One
      Given start
      When A
      And B
      And C
      Then point one

      Scenario: Two
      Given point one is already reached
      When D
      Then point two

      “Given point one is already reached” can cover A, B, and C in the automated step definition. Yes, the automation will repeat certain operations, but test case independence is worth it.

      Like

      1. What if in Scenario one I did things that alter the state of page and for scenario two I want it to be altered in different way, can I just say that in Given, or I need different scenario about it.
        E.G.
        Scenario 01
        Given start
        When the user selects checkbox A
        Then point one

        Scenario 02
        Given point one is already reached (this indicates that A checkbox is selected) which I don’t want. I want only B checkbox to be checked, so can I write
        Given B checkbox is selected (from manual point of view its fine I guess, but what about automation, cause there is a loophole in between scenarios. you can’t rerun scenario 01 to get to scenario 02 given state. Do you need different scenario in between ? Consider that this is the simple example and it can be even bigger loophole in between scenarios, if things are more complicated.
        So I guess the main question is, Can Given state be reached by automation even if we don’t have exact route for it written in different scenarios ?

        Like

      2. Remember, each scenario is independent of each other. The second scenario has no dependency on the first. You can write your Given steps to establish precisely the state you want.

        Like

  8. I’d love to get your opinion on something around the use of Gherkin.

    Probably best to set the scene first (as I know a number of rules have been broken but it’s what I’ve inherited so I’m being forced to work with what I have currently).

    I’ve been dropped into 3 year old attempt at automation of testing on monolithic application that is currently being refactored/decoupled into a micro-services architecture.

    There is a reliance on test automation to highlight if any changes are breaking parts of the system, due to the high level of coupling in the application itself it’s entirely possible that some change in one area can break something totally unrelated elsewhere.

    So every night there is a full run of the entire suite of UI Regression tests that they have. So there is a load of UNIT tests and around 89 UI Scenarios that take around 5 hours to run.

    The initial run of UI Test automation that was built ran fully sequentially and every test relied on the tests before it to have run successfully or the whole house of cards came falling down.

    Done extensive work to isolate all these tests with the exception of a common setup suite that must run first.

    They rely on a hour long setup suite that creates the test environment from scratch by using the UI, at the end of this we checkpoint the database and this checkpoint is restored between each scenario. Windows 7 (moving to 10 soon) so no real containerisation and no cloud service at the moment (although MS Azure in the pipeline) and all tests running against the same database so no real parallelisation available to us at the moment.

    So now at least after the setup has completed once and we have a database backup file that we can restore from then any of the other tests can be run in isolation and don’t rely on anything else.

    However, we now have totally isolated scenarios running through the UI which often repeat a number of steps (which we have in the background).

    I’m totally onboard with the idea of this background being repeated for each test to allow them to remain totally isolated in principle.

    However the fact that everything runs through the UI at the moment and these backgrounds repeat cause the feature file time at points to exceed 20 minutes.

    What’s your thoughts on swapping these backgrounds to a Scenario: which is run once at the start of the feature file to set up the state for the following scenarios then removing the checkpoint restores until the feature file is complete.

    I know this sounds like bad practice but I’m trying to wrestle the idea of good practice with acceptable turnaround times so the test packs actually have some use to developers (who aren’t feasibly going to run a 1.5 hour pack to prove their changes).

    And until we start leveraging API level testing, cloud, containers, parallelisation then I can’t see an alternative.

    So in short, would you advocate breaking the rules of good gherkin in the short term to achieve good turnaround times provided this is captured and addressed when viable alternative present themselves to maintain (or improve) turnaround while bringing the approach back in line with good gherkin practices.

    (Sorry for the length of this).

    Like

    1. This is not a Gherkin problem. This is a test data problem.

      You need to manage “ready state.” Please read: https://automationpanda.com/2017/08/05/handling-test-data-in-bdd/

      Depending upon the core framework, you should be able to use hooks to handle ready state at the appropriate level. Since you are not doing full BDD, bend the framework to do what you need it to do. Please read: https://automationpanda.com/2018/09/04/behavior-driven-blasphemy/

      If you do parallel testing, make sure to avoid collisions with any shared systems or data. Please read: https://automationpanda.com/2018/01/21/to-infinity-and-beyond-a-guide-to-parallel-testing/

      Like

  9. Hi, first of all, thank you very much for your excelent blog. Well, my question is about edge cases. I work with Ruby on Rails and I do a lot of automation tests with a tool called RSPEC. When I develop a form for user login, for example, I have to write a test that validates this behavior and show a message to user if any field is blank. This is a edge case. What is the level of granularity for test edge cases that I have to adopt when working with cucumber? Can you give a example or a link referencing this topic? Thank you in advance.

    Like

    1. That’s up to you. The Cucumber framework will let you write scenarios however you want to write them. If you believe the edge case is worth covering, then I suggest trying to write a Gherkin scenario for it and see how it turns out. Edge cases are behaviors, too!

      Like

  10. Hi Andy,

    How can I write a smoke test which navigates through multiple pages in Cucumber BDD format using the suggested Given..When…Then format?
    Consider the scenario of purchasing something from Amazon. Where the user starts with product search, selecting a product from search results, payment details/delivery details and getting the confirmation of purchase. As you can see there are at-least four actions to perfrom before getting to the expected result of ‘successful confirmation’
    Can you please let me know how to achieve this using a single Given..When…Then format?

    Like

  11. How do you handle scenarios that require complex setup? Like a scenario where the specific state of the entities can’t be summarized in a short given. For example, a banking application where we want to know if multiple transactions across multiple accounts behave appropriately given that each of those transactions has a unique state that affects the outcome of the test and then given that an api call mock returns the correct output for the transactions in their particular states.

    Like

    1. There are a few ways to handle that situation.

      One: generalize. Do you really need to specify all that detail in Gherkin? Or can you say something more generic like, “Given multiple accounts have such-and-such transactions completed”?

      Two: list out all steps. Your scenarios might be longer, but if you need that level of specificity, then do it.

      Three: don’t use Gherkin. API mocks aren’t something that should be specified in Gherkin. They’re an automation detail. Either go to point #1 or don’t use Gherkin for that type of scenario.

      Like

  12. Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! It’s always nice when you can not only be informed, but also entertained!

    Like

  13. I’m unsure as to what people mean when they say scenarios must be ‘independent’.

    Surely the Given step is another way of expressing dependency on a previous action(?)

    A very simple example is below. The second scenario is dependent on the first scenario having happened. Is this the wrong way to do it?

    Scenario: Newsletter sign-up with a valid email address
    Given I am on the newsletter sign-up page
    And I have entered a valid email address
    When I hit the submit button
    Then I should see a confirmation message
    And I should receive a welcome email

    Scenario: Navigating back to the home page from the Newsletter sign-up page
    Given I have submitted a valid email address
    And I can see the confirmation message
    When I navigate to the home page
    Then I end up on the home page

    Like

    1. Test case “independence” means that one test case does not require another test case to run first. An independent test case is entirely self-contained. Its own setup is sufficient to complete itself.

      In the two scenarios you shared, “Given I have submitted a valid email address” is not a dependency. It is a setup step. It says, “Start the test here”. The second scenario could run even if the first scenario didn’t run. (Of course, this Given step would probably repeat many of the same actions, and any failure discovered by the first test would likely cause the second test to fail.) Test case independence addresses dependencies in terms of setup, not in terms of coverage.

      Like

  14. Most of the examples I see are very short scenarios, 4-5 keywords used per test case, but in my case I trying to adapt this to an automotive hardware testing environment where there are more steps needed in the test cases and I am wondering if BDD is really suitable for this application.

    Perhaps I’m not fully understanding how to use BDD. My keywords are used to build a test script that will be executed in a separate environment. Any tips in the regard would be really helpful.

    Like

    1. If you can describe it, then you can test it. Try writing out your scenarios the way you think they should be written, and then ask yourself some questions:

      Does each scenario cover one unique, independent behavior?
      Do scenarios look like walls of text?
      Do scenarios read more like they are accomplishing a goal or more like a bunch of clicks and typing?
      Do any scenarios have repeated When-Then pairs?

      Unfortunately, I don’t know the specifics about your domain or environments, so I can’t provide much more advice without more info. Feel free to email me directly through the contact page.

      Like

  15. Hi Andy,

    Do know in cucumber io if they have “like” @AfterStep but not @After? The reason I’m asking is that @After method when consolidated in Multiple Reports in cucumber cukes, Hooks are not part of it. So if we put softassert.assertAll() in @After it will be included in Hooks – meaning Multiple Reports can’t see it, thus, Scenario is considered PASS.

    Thanks,
    AC

    Like

    1. Hi AC! I’m not sure about the context of your question. Why are you putting softassert.assertAll() in a hook, and not inside a step?

      Cucumber-JVM has both @AfterStep and @After hooks. What do you mean when you say “like” @AfterStep but not @After?

      Like

      1. Thanks for replying!

        We normally put our softAssert.assertTrue (for example) calls per step, but to assert it all (assertAll method), we put it in the @After hooks so that it will call all softassert of all different steps. But having it in @After, in our Multiple Reports cucumber cukes, only gets Given When Then, neglecting the Hooks – thus is is considered passed. That’s why we are looking if we have something like every after step but not @After so that it is still considered failed in the report.

        Like

      2. Using soft assertions across multiple steps is not a best practice.

        If one step needs to make a few assertions inside itself, that’s okay – soft assertions can be used internally. However, when an individual step completes, then it should give a clear result of PASS or FAIL. Its assertion results should not be punted forward to a future step or “after” hook. Otherwise, a test report would show that step as passing with future steps as failing. Triaging that failure would be more difficult because the report would be unintuitive about the location of the failure.

        The struggle you have right now is due to the fact that you are fighting the design of the Cucumber framework. Trying to make soft assertions across steps is hacky, and therefore it doesn’t work as you would like. I strongly recommend reconsidering this approach. If you nevertheless insist on soft assertions across steps, then you will need to share the soft assertions object between steps using Cucumber’s dependency injection pattern and make the assertion calls directly in the steps. That’s a bit more complicated, and it also forces steps to be “chained” together, which limits their future reusability.

        Liked by 1 person

  16. Hi.

    I really apreciate your blog. Thank you and congratulations.

    Please, give me a north.

    I use Rspec (for unit test) with Cucumber (for e2e test) in my project.

    I have defined that cucumber must to be used for Business Rules and Rspec must be used for low level tests (with many different scenarios).

    With that in mind, I was stucked with a conceptual question.

    My project is a knives store site and a new Business has emerged: An order is not valid if has less than 6 knives from at least 2 different models.

    I put this validations in the model with their unit tests (rspec) and everything is fine.

    THe question is: should I put this test in the cucumber too? (since this is a business rule test). If ‘yes’, should I remove the unit test or keep this duplication (cucumber and rspec test)?

    thanks in advance

    (sorry for my english)

    Like

    1. Hi Nonnis! This is a great question. Let me ask a question back: Is the risk that this feature has a regression significantly covered by the unit test? If the answer is yes, then there is no need to create an end-to-end test for it as well. Unit tests are much less costly to maintain and execute than end-to-end tests. If risk can be acceptably mitigated by a test at the unit level, then do not increase the burden of ownership with a duplicative end-to-end test.

      The only reason for adding an end-to-end tests for this feature would be if the feature includes extra behavior that can’t be tested at the unit test level and should be tested. For example, a unit tests might not be able to cover if the invalid order were to trigger a system alert or bring up a new page.

      Like

      1. Hi Andy, thank you very much for your response.

        Yes, the unit test cover this feature very well.

        My question is related to the use of Example Mapping.

        I use example mapping to discover and especify requirements.

        I have this rule/example:

        ———————————————————-
        RULE: An order must have more than 6 knives from at least 2 different models

        SCENARIO: User requests a valid Order

        Given that the user is on the New Order page
        And the user selects 6 knives of
        And the user select 7 knives of
        When the user submits the Order
        Then the order is created
        And the user sees a success message
        ———————————————————-

        The unit test that I wrote asserts this rule: “6 knives from at least 2 different models”.

        But I wonder if write this gerkin and don’t use it with cucumber (since that I’m using unit test) is wrong.

        I’m also testing 20 other different scenarios (rule/example) related to the creation of an Order (the Order must have address, credit card, a valid user, etc).

        With unit test this 20 scenarios is more easier to write and maintain.

        Maybe, the main question is: Should I write a cucumber test for each rule/example discovered in the example mapping session? Or, how to decide wheter the rule/example must be tested only with unit test?

        Thanks in advance, man!

        Like

      2. It’s okay to skip automating some scenarios. In a behavior-driven development process, “discovery” leads to definition, implementation, and testing. Example Mapping is an activity to help discover behaviors. It is a useful activity all by itself. Using the cards that are discovered, teams can then “define” behaviors more clearly using Gherkin. If appropriate, teams can then automate tests for the scenarios that are defined. However, the team may also decide that black box tests aren’t needed for some scenarios because they’re covered by unit tests or automation is deemed unworthwhile. Doing Example Mapping and writing Gherkin scenarios was still beneficial because they facilitated good collaboration and helped the team make an informed decision about implementations (product and test).

        Like

    1. Background sections are simply a set of steps that are executed before each scenario in the feature file. I recommend including no more than a few steps in a Background section. Try to keep Background step types to be Givens, as well.

      Like

  17. Hi Thank for the detail tutorial.
    We generally pass test data either by hard coded or by parameters using example and data table in feature file.
    In case of longer form data we do pass the data from excel file or properties file.
    In case of small form, is it good to pass data from excel file or properties file or does it spoils the data driven testing concept.
    When to use which way of test data param in feature file and pros and cons of every ways.

    Like

    1. Gherkin is useful for Behavior-Driven Development. It is not necessarily good for data-driven testing. If you need to crank dozens (to maybe even hundreds or thousands) of rows of data into test cases as inputs a la data-driven testing, then Gherkin probably isn’t the right tool. Gherkin is good for identifying a few equivalence classes of inputs that exemplify desired behaviors.

      Like

      1. That’s what I thought, but the context confused me.

        We’ve got “The Given step above indicates an action when it says, “The user navigates.” Actions imply the exercise of behavior. However, Given steps are meant to establish an initial state, not exercise a behavior. This may seem like a trivial nuance, but it can confuse feature file authors who may not be able to tell if a step is a Given or When. Using present or present perfect tense indicates a state rather than an action.”

        So, “The user navigates” is part of a bad example, and it indicates an action. In order to indicate a state, we should use present tense.

        So, if we’re bringing in grammar, what can we say, grammatically, about “the user navigates” to disqualify it?

        Like

      2. I reread the segment, and I see how it was confusing. I edited it:

        The Given step above indicates an action when it says, “Given the user navigates.” Actions imply the exercise of behavior. However, Given steps are meant to establish an initial state, not exercise a behavior. This may seem like a trivial nuance, but it can confuse feature file authors who may not be able to tell if a step is a Given or When. A better phrasing would be, “Given the Google home page is displayed.” It establishes a starting point for the scenario. Use present or present perfect tense to indicate a state rather than an action.

        Like

  18. Hi, very interesting reading!

    What makes that terribly wrong scenario a procedure-driven and not a behavior-driven? The granularity of the steps or the way it describes the steps?

    Like

  19. Thank you for the truely awesome and so well written article on BDD

    I have one question if you can help

    The Cardinal Rule of BDD: One Scenario, One Behavior!

    I guess it means truely one when statement. .but my question is is it fine to have multiple then with then + and like :

    Given a
    and b
    When c
    Then . .
    and .. .
    and . . .
    and

    Most of my scenarios need to have multiple and. .So I wonder if it is okay . .as in your blog post I see most eg. are with only one Then statement

    Like

    1. Yes, I use the “And” step frequently. My caution is simply to keep scenarios concise. If you write more than a dozen lines for a scenario, then perhaps the scenario is doing too much, or perhaps the steps are too granular.

      Like

  20. Is it a good practice to pass parameters as a string, in case of json parameters like in below example ?

    Given user wants to validate the json

    “””
    {
    “name”: “Name”,
    “id”: 25544,
    }
    “””

    Like

    1. This looks very imperative to me. I would not recommend writing steps like this. I probably wouldn’t use pytest-bdd to explicitly very JSON values (like from API tests).

      Like

  21. Andy, this article is great! You help me a lot to set up my mind as I’m trying to introduce Gherkin for product specifications and there’s so much noise in the web, mainly focused in the automated testing, and not very much in writing right.
    I have a doubt about how I should specify a config, its values and defaults. For example, following your article I made the following:
    Scenario Outline: Move the Panning motor from the smartphone application
    Given That the smartphone application is used
    When touch on a icon
    And there’s no other movement detection executing in the camera
    Then the camera move to the left or to the right
    Examples:
    | keys |
    | left arrow |
    | right arrow |
    But the step_degrees is a conversion value, like how many steps I should move for a given angle. I’m thinking in using something like , and set the expectations of the team to use this as an internal rule for the default value.
    What do you think? Is there something I’m missing?

    Like

  22. Love your Gherkin articles. My Dev team introduced BDD to me. I’m happy having to stumble across your site when researching ‘scenarios’. Can’t wait to take our iteration planning up a notch. -Thanks!

    Just had a question…wondering it these two statements are contradictory:

    BDD 101: Writing Good Gherkin
    “…The second behavior arguably needs the first behavior to run first because the second needs to start at the search result page. However, since that is merely setup for the behavior of image searching and is not part of it, the Given step in the second scenario can basically declare (declaratively) that the “panda” search must already be done.”

    BDD 101: Gherkin by Example
    “Each scenario will be run independently of the other scenarios – the output of one scenario has no bearing on the next!”

    Like

    1. Behaviors cannot always be completely independent. For example, a search must already be done before an image search can be requested. However, scenarios or “test cases” can and should be independent in the sense that they should be written as independent specifications and executed such that one test does not impact another.

      Like

  23. Hey Andy, First of all I really loved this article! I’ve been looking for something like this for a few days now since I’m kinda new to BDD. There is only one thing that I can’t really understand. I want to add BDD to my existing framework which currently performs a simple task but all of the steps/scenarios depend on the previous.

    For an example, If I have a test that opens a browser, goes on Amazon, orders something and makes assertions should I make separate scenarios for each of those steps? Since each step happens on a different page. If yes, how can I maintain the WebDriver opened and keep it’s state as it is in the previous step/scenario?

    I did some research on google on this topic and I found out that there is no annotation “@BeforeAll” that executes only once before all scenarios.

    Thanks alot in advance!
    Keep up the good work.

    Like

    1. How should you structure scenarios? That’s up to you. If you can describe it, then you can do it. I recommend focusing on individual, independent behaviors. For example, there are multiple ways to add items to a shopping cart, so each one should probably have its own scenario. Larger workflows are more susceptible to confusion, obfuscation, and/or automation failures.

      Gherkin itself does not provide a way to run something one time before all steps. It does provide a Background section to run a set of steps before each Scenario in a Feature. Different BDD test frameworks usually provide the equivalent of a once-before-all hook. For example, SpecFlow has a hook named “BeforeTestRun”.

      Like

  24. BDD is not a great way to do regression testing. The small test scenarios recommended in this article does not add a lot of value in the end because they end up always passing but the product is still left with a lot of bugs in production.

    I am surprised to see that the automation experts still recommend doing BDD even after many years of being practiced and basically Ineffective for the time spent in automating all those small scenarios.

    Like

Leave a comment