In BDD, What Should Be A Feature?

How do I decide what a feature should be? And should I define a feature first before writing behavior specs, or should I start with behaviors and see how they fit together into features?

Features, scenarios, and behaviors are all common BDD terms that should be carefully defined:

  • behavior is an operation with inputs, actions, and expected outcomes.
  • A scenario is the specification of a behavior using formal steps and examples.
  • feature is a desired product functionality often involving multiple behaviors.

Don’t try to over-think the definition of “feature.” Some features are small, while other features are large. The main distinction between a feature and a scenario or behavior is that features are what customers expect to receive. Small features may cover only a few or even only one behavior, while large features may cover several.

The Gherkin language has Feature and Scenario sections. In this sense, a Feature is simply a collection of related Scenarios. They align roughly to the more general meanings of the terms.

Don’t over-think features with Agile, either. Some teams define a feature as a collection of user stories. Other teams say that one user story is a feature. In terms of Gherkin, don’t presume that one user story must have exactly one feature file with one Feature section. A user story could have zero-to-many feature files to cover its behaviors. Do whatever is appropriate.

Features should be determined by customer needs. They should solve problems the customers have. For example, perhaps the customer needs a better way to process orders through their online store. That’s where features should start – as business needs. Behaviors should then naturally come as part of grooming and refinement efforts. Thus, in most cases, features should be identified first before individual behaviors.

Nevertheless, there may be times during development that scenario-to-feature realignment should be done. It may be more convenient to create a new feature file for related behaviors. Or, a new feature may be “discovered” out of particularly useful behaviors. This is more the exception than the norm.

BDD 101: Manual Testing

Behavior-driven development takes an automation-first philosophy: behavior specs should become automated tests. However, BDD can also accommodate manual testing. Manual testing has a place and a purpose, even in BDD. Remember, behavior scenarios are first and foremost behavior specifications, and they provide value beyond testing and automation. Any behavior scenario could be run as a manual test. The main questions, then, are (1) when is manual testing appropriate and (2) how should it be handled.

When is Manual Testing Appropriate?

Automation is not a silver bullet – it doesn’t satisfy all testing needs. Scenarios should be written for all behaviors, but they likely shouldn’t be automated under the following circumstances:

  • The return-on-investment to automate the scenarios is too low.
  • The scenarios won’t be included in regression or continuous integration.
  • The behaviors are temporary (ex: hotfixes).
  • The automation itself would be too complex or too fragile.
  • The nature of the feature is non-functional (ex: performance, UX, etc.).
  • The team is still learning BDD and is not yet ready to automate all scenarios.

Manual testing is also appropriate for exploratory testing, in which engineers rely upon experience rather than explicit test procedures to “explore” the product under test for bugs and quality concerns. It complements automation because both testing styles serve different purposes. However, behavior scenarios themselves are incompatible with exploratory testing. The point of exploring is for engineers to go “unscripted” – without formal test plans – to find problems only a user would catch. Rather than writing scenarios, the appropriate way to approach behavior-driven exploratory testing is more holistic: testers should assume the role of a user and exercise the product under test as a collection of interacting behaviors. If exploring uncovers any glaring behavior gaps, then new behavior scenarios should be added to the catalog.

How Should Manual Testing Be Handled?

Manual testing fits into BDD in much the same way as automated testing because both formats share the same process for behavior specification. Where the two ways diverge is in how the tests are run. There are a few special considerations to make when writing scenarios that won’t be automated.


Both manual and automated behavior scenarios should be stored in the same repository. The natural way to organize behaviors is by feature, regardless of how the tests will be run. All scenarios should also be managed by some form of version control.

Furthermore, all scenarios should be co-located for document-generation tools like Pickles. Doc tools make it easy to expose behavior specs and steps to everyone. They make it easier for the Three Amigos to collaborate. Non-technical people are not likely to dig into programming projects.


Scenarios must be classified as manual or automated. When BDD frameworks run tests, they need a way to exclude tests that are not automated. Otherwise, test reports would be full of errors! In Gherkin, scenarios should be classified using tags. For example, scenarios could be tagged as either “@manual” or “@automated”. A third tag, “@automatable”, could be used to distinguish scenarios that are not yet automated but are targeted for automation.

Some BDD frameworks have nifty features for tags. In Cucumber-JVM, tags can be set as runner class options for convenience. This means that tag options could be set to “~@manual” to avoid manual tests. In SpecFlow, any scenario with the special “@ignore” tag will automatically be skipped. Nevertheless, I strongly recommend using custom tags to denote manual tests, since there are many reasons why a test may be ignored (such as known bugs).

Extra Comments

The conciseness of behavior scenarios is problematic for manual testing because steps don’t provide all the information a tester may need. For example, test data may not be written explicitly in the spec. The best way to add extra information to a scenario is to add comments. Gherkin allows any number of lines for comments and description. Comments provide extra information to the reader but are ignored by the automation.

It may be tempting to simply write new Gherkin steps to handle the extra information for manual testing. However, this is not a good approach. Principles of good Gherkin should be used for all scenarios, regardless of whether or not the scenarios will be automated. High-quality specification should be maintained for consistency, for documentation tools, and for potential future automation.

An Example

Below is a feature that shows how to write behavior scenarios for manual tests:

Feature: Google Searching

  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  Scenario: Image search
    # The Google home page URL is:
    # Make sure the images shown include pandas eating bamboo
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

It’s not really different from any other behavior scenarios.


As stated in the beginning, BDD should be automation-first. Don’t use the content of this article to justify avoiding automation. Rather, use the techniques outlined here for manual testing only as needed.


BDD 101: Automation

Better automation is one of BDD’s hallmark benefits. In fact, the main goal of BDD could be summarized as rapidly turning conceptualized behavior into automatically tested behavior. While the process and the Gherkin are universal, the underlying automation could be built using one of many frameworks.

This post explains how BDD automation frameworks work. It focuses on the general structure of the typical framework – it is not a tutorial on how to use any specific framework. However, I wrote short examples for each piece using Python’s behave framework, since learning is easier with examples. I chose to use Python here simply for its conciseness.

Framework Parts

Every BDD automation framework has five major pieces:

#1: Feature Files

Gherkin feature files are very much part of the automation. They act like test scripts – each scenario is essentially a test case. Previous posts covered Gherkin in depth.

Here is an example feature file named google_search.feature:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  # This scenario should look familiar
  @automated @google-search @panda
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

#2: Step Definitions

step definition is a code block that implements the logic to execute a step. It is typically a method or function with the English-y step phrase as an annotation. Step definitions can take in arguments, doc strings, and step tables. They may also make assertions to pass or fail a scenario. In most frameworks, data can be passed between steps using some sort of context object. When a scenario is executed, the driver matches each scenario step phrase to its step definition. (Most frameworks use regular expressions for phrase matching.) Thus, every step in a feature file needs a step definition.

The step definitions would be written in a Python source file like this:

from behave import *

@given('a web browser is on the Google page')
def step_impl(context):

@when('the search phrase "{phrase}" is entered')
def step_impl(context, phrase):

@then('the results for "{phrase}" are shown')
def step_impl(context, phrase):
  assert context.google_page.has_results(phrase)

#3: Hooks

Certain automation logic cannot be handled by step definitions. For example, scenarios may need special setup and cleanup operations. Most BDD frameworks provide hooks that can insert calls before or after Gherkin sections, typically filterable using tags. Hooks are similar in concept to aspect-oriented programming.

In behave, hooks are written in a Python source file named

import page_objects
from selenium import webdriver

def before_all(context):
  context.browser = webdriver.Chrome()

def before_scenario(context):
  context.google_page = page_objects.GooglePage(context.browser)

def after_all(context):

#4: Shared Code

Shared code (a.k.a libraries or packages) refers to any code called by step definitions and hooks. Shared code could be dependency packages downloaded using managers like Maven (Java), NuGet (.NET), or PyPI (Python). For example, Selenium is a well-known package for web browser automation. Shared code could also be components to assist automation, such as page objects or other design patterns. As the cliché goes, “Don’t reinvent the wheel.” Step definitions and hooks should not contain all of the logic for running the actions – they should reuse common code as much as possible.

A Python page object class from the module could look like this:

class GooglePage(object):
  """A page object for the Google home page"""
  def __init__(self, browser):
    self.browser = browser
  def load():
    # put code here
  def search(phrase):
    # put code here
  def has_results(phrase):
    # put code here
    return False

#5: Driver

Every automation framework has a driver that runs tests, and BDD frameworks are no different. The driver executes each scenario in a feature file independently. Whenever a failure happens, the driver reports the failure and aborts the scenario. Drivers typically have discovery mechanisms for selecting scenarios to run based on tag names or file paths.

The behave driver can be launched from the command line like this:

> behave --tags @panda

Automation Advantages

Even if a team does not apply behavior-driven practices to its full development process, BDD test frameworks still have some significant advantages over non-BDD test frameworks. First of all, steps make BDD automation very modular and thus reusable. Each step is an independent action, much like how each scenario is an independent behavior. Once a step definition is written, it may be reused by any number of scenarios. This is crucial, since most behaviors for a feature share common actions. And all steps are inherently self-documenting, since they are written in plain language. There is a natural connection between high-level behavior and low-level implementation.

Test execution also has advantages. Tags make it very easy to select tests to run, especially from the command line. Failures are very informative as well. The driver pinpoints precisely which step failed for which scenario. And since behaviors are isolated, a failure for one scenario is less likely to affect other test scenarios than would be the case for procedure-driven tests.

Available Frameworks

There are many BDD frameworks out there. The next post will introduce a few major frameworks for popular languages.

BDD 101: Behavior-Driven Agile

Previous posts in this 101 series have focused heavily upon Gherkin. They may haven given the impression that Gherkin is merely a testing language, and that BDD is a test framework. Wrong: BDD is a full development process! Its practices complement Agile software development by bringing clearer communication and shift left testing. As such, BDD is a refinement, not an overhaul, of the Agile process. This post explains how to add behavior-driven practices to the Agile process.

Common Agile Problems

User stories can sometimes seem like a game of telephone: the product owner says one thing, the developer makes another thing, and the tester writes a bad test. When the test fails, the tester goes back to the developer for clarification, who in turn goes back to the product owner. Hopefully, the misunderstanding is corrected before demo day, but time is nevertheless lost and resources are burned. Acceptance criteria for a user story should clarify how things should be, but often they are poorly stated or entirely missing.

Another Agile problem, especially in Scrum, is incomplete testing. Development work is often treated like a pipeline: design -> implement -> review -> test -> automate -> DONE. And stories have deadlines. When coding runs late, testers may not get testing done, let alone test automation. Add a game of telephone, and in-sprint testing can become perpetually impeded.

BDD to the Rescue

BDD solves both of these Agile problems beautifully through process efficiency. Let me break this down from a behavior-oriented perspective:

  • Acceptance criteria specify feature behavior.
  • Test cases validate feature behavior.
  • Gherkin feature files document feature behavior.

Therefore, when written in Gherkin, acceptance criteria are test cases! The Gherkin feature file is the formal specification of both the acceptance criteria and the test cases for a user story. One artifact covers both things!

The Behavior-Driven Three Amigos

The Three Amigos” refers to the three primary roles engaged in producing software: business, development, and testing. Each role brings its own perspective to the product, and good software results when all can collaborate well. A common Agile practice is to hold meetings with the Three Amigos as part of grooming or planning.

The BDD process is an enhanced implementation of The Three Amigos. All stakeholders can participate in behavior-driven development because Gherkin is like plain English. Feature files mean different things to different people:

  • They are requirements for product owners.
  • They are acceptance criteria for developers.
  • They are test cases for testers.
  • They are scripts for automators.
  • They are descriptions for other stakeholders.

Thus, BDD fosters healthy collaboration because feature files are pertinent to all stakeholders (or “amigos”). Features files are like receipts – they’re a “proof of purchase” for the team. They document precisely what will be delivered.

Behavior-Driven Sprints

To see why this is a big deal, see what happens in a behavior-driven sprint:

  1. Feature files begin at grooming. As the team prepares user stories in the backlog, acceptance criteria are written in Gherkin. Since Gherkin is easy to read, even non-technical people (namely product owners) can contribute when The Three Amigos meet.
  2. During planning, all stakeholders get a good understanding of how a feature should behave. Better conversations can happen. Clarifications can be written directly into feature files.
  3. When the sprint starts, feature files are already written. Developers know what must be developed. Testers know what must be tested. There’s no ambiguity.
  4. Test automation can begin immediately because the scenario steps are already written. Some step definitions may already exist, too. New step definitions are typically understandable enough to implement even before the developer commits the product code.
  5. Manual testers know from the start which tests will be automated and which must be run manually. This enables them to make better test plans. It also frees them to do more exploratory testing, which is better for finding bugs.

Overall, Gherkinized acceptance criteria streamline development and improve quality. The team is empowered to shift left. On-time story completion and in-sprint automation become the norm.

(Arguably, these benefits would happen in Kanban as well as in Scrum.)

New Rules

In order to reap the benefits of BDD, the Agile process needs a few new rules. First, formalize all acceptance criteria as Gherkin feature files. In retrospect, it should seem odd that the user story itself is formalized (“As a ___, I want ___, so that ___”) if the acceptance criteria is not (“Given-When-Then”). Writing feature files adds more work to grooming, but it enables the collaboration and shift left testing.

Second, never commit to completing a user story that doesn’t have Gherkinized acceptance criteria. Don’t become sloppy out of expediency. Use the planning meeting as an accountability measure.

Third, include test automation in the definition of done. Stories should not be accepted without their tests completed and automated. Automation in the present guarantees regression coverage in the future, which in turn allows teams to respond to change safely and quickly.

More Agility through Automation

BDD truly improves the Agile process by fixing its shortcomings. The next step is to learn BDD automation, which will be covered in the next post. Until then, I’ll leave this gem here:


BDD 101: Writing Good Gherkin

So, you and your team have decided to make test automation a priority. You plan to use behavior-driven development to shift left with testing. You read my “BDD 101 Series” up through the previous post. You even peeked at Cucumber or another BDD framework on your own.  That’s great!  Big steps!  And now, you are ready to write your first Gherkin feature file.  You fire open Notepad++ with a Gherkin UDL, you type “Given” on the first line, and…

Writer’s block.  How am I supposed to write my Gherkin steps?

Good Gherkin feature files are not easy to write at first. Writing is definitely an art. With some basic pointers, and a bit of practice, Gherkin becomes easier. This post will cover how to write top-notch feature files.

Proper Behavior

The biggest mistake BDD beginners make is writing Gherkin without a behavior-driven mindset. They often write feature files as if they are writing “traditional” procedure-driven functional tests: step-by-step instructions with actions and expected results. HP ALM, qTest, and many other test repository tools store tests in this format. These procedure-driven tests are often imperative and trace a path through the system that covers multiple behaviors. As a result, they may be unnecessarily long, which can delay failure investigation, increase maintenance costs, and create confusion.

For example, let’s consider a test that searches for images of pandas on Google. Below would be a reasonable test procedure:

  1. Open a web browser.
    1. Web browser opens successfully.
  2. Navigate to
    1. The web page loads successfully and the Google image is visible.
  3. Enter “panda” in the search bar.
    1. Links related to “panda” are shown on the results page.
  4. Click on the “Images” link at the top of the results page.
    1. Images related to “panda” are shown on the results page.

I’ve seen many newbies translate a test like this into Gherkin like the following:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Google Image search shows pictures
    Given the user opens a web browser
    And the user navigates to ""
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

This scenario is terribly wrong. All that happened was that the author put BDD buzzwords in front of each step of the traditional test. This is not behavior-driven, it is still procedure-driven.

The first two steps are purely setup: they just go to Google, and they are strongly imperative. Since they don’t focus on the desired behavior, they can be reduced to one declarative step: “Given a web browser is at the Google home page.” This new step is friendlier to read.

After the Given step, there are two When-Then pairs. This is syntactically incorrect: Given-When-Then steps must appear in order and cannot repeat. A Given may not follow a When or Then, and a When may not follow a Then. The reason is simple: any single When-Then pair denotes an individual behavior. This makes it easy to see how, in the test above, there are actually two behaviors covered: (1) searching from the search bar, and (2) performing an image search. In Gherkin, one scenario covers one behavior. Thus, there should be two scenarios instead of one. Any time you want to write more than one When-Then pair, write separate scenarios instead. (Note: Some BDD frameworks may allow disordered steps, but it would nevertheless be anti-behavioral.)

This splitting technique also reveals unnecessary behavior coverage. For instance, the first behavior to search from the search bar may be covered in another feature file. I once saw a scenario with about 30 When-Then pairs, and many were duplicate behaviors.

Do not be tempted to arbitrarily reassign step types to make scenarios follow strict Given-When-Then ordering. Respect the integrity of the step types: Givens set up initial state, Whens perform an action, and Thens verify outcomes. In the example above, the first Then step could have been turned into a When step, but that would be incorrect because it makes an assertion. Step types are meant to be guide rails for writing good behavior scenarios.

The correct feature file would look something like this:

Feature: Google Searching

  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  Scenario: Image search
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

The second behavior arguably needs the first behavior to run first because the second needs to start at the search result page. However, since that is merely setup for the behavior of image searching and is not part of it, the Given step in the second scenario can basically declare (declaratively) that the “panda” search must already be done. Of course, this means that the “panda” search would be run redundantly at test time, but the separation of scenarios guarantees behavior-level independence.

The Cardinal Rule of BDD: One Scenario, One Behavior!

Remember, behavior scenarios are more than tests – they also represent requirements and acceptance criteria. Good Gherkin comes from good behavior.

Phrasing Steps

How you write a step matters. If you write a step poorly, it cannot easily be reused. Thankfully, some basic rules maintain consistent phrasing and maximum reusability.

Write all steps in third-person point of view. If first-person and third-person steps mix, scenarios become confusing. I even dedicated a whole blog post entirely to this point: Should Gherkin Steps Use First-Person or Third-Person? TL;DR: just use third-person at all times.

Write steps as a subject-predicate action phrase. It may tempting to leave parts of speech out of a step line for brevity, especially when using Ands and Buts, but partial phrases make steps ambiguous and more likely to be reused improperly. For example, consider the following example:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Google search result page elements
    Given the user navigates to the Google home page
    When the user entered "panda" at the search bar
    Then the results page shows links related to "panda"
    And image links for "panda"
    And video links for "panda"

The final two And steps lack the subject-predicate phrase format. Are the links meant to be subjects, meaning that they perform some action? Or, are they meant to be direct objects, meaning that they receive some action? Are they meant to be on the results page or not? What if someone else wrote a scenario for a different page that also had image and video links – could they reuse these steps? Writing steps without a clear subject and predicate is not only poor English but poor communication.

Also, use appropriate tense for each type of step. Givens should always use present perfect tense, and Whens and Thens should always use present tense. Rather than take a time warp back to middle school English class, let’s illustrate tense with a bad example:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Simple Google search
    Given the user navigates to the Google home page
    When the user entered "panda" at the search bar
    Then links related to "panda" will be shown on the results page

The Given step above indicates an action when it says, “The user navigates.” Actions imply the exercise of behavior. However, Given steps are meant to establish an initial state, not exercise a behavior. This may seem like a trivial nuance, but it can confuse feature file authors who may not be able to tell if a step is a Given or When. Using present perfect tense indicates a state rather than an action.

The When step above uses past tense when it says, “The user entered.” This indicates that an action has already happened. However, When steps should indicate that an action is presently happening. Plus, past tense here conflicts with the tenses used in the other steps.

The Then step above uses future tense when it says, “The results will be shown.” Future tense seems practical for Then steps because it indicates what the result should be after the current action is taken. However, future tense reinforces a procedure-driven approach because it treats the scenario as a time sequence. A behavior, on the other hand, is a present-tense aspect of the product or feature. Thus, it is better to write Then steps in the present tense.

The corrected example looks like this:

Feature: Google Searching

  Scenario: Simple Google search
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

And note, all steps are written in third-person.

Choices, Choices

Another common misconception for beginners is thinking that Gherkin has an “Or” step for conditional or combinatorial logic. People may presume that Gherkin has “Or” because it has “And”, or perhaps programmers want to treat Gherkin like a structured language. However, Gherkin does not have an “Or” step. When automated, every step is executed sequentially.

Below is a bad example based on a classic Super Mario video game, showing how people might want to use “Or”:

# BAD EXAMPLE! Do not copy.
Feature: SNES Mario Controls

  Scenario: Mario jumps
    Given a level is started
    When the player pushes the "A" button
    Or the player pushes the "B" button
    Then Mario jumps straight up

Clearly, the author’s intent is to say that Mario should jump when the player pushes either of two buttons. The author wants to cover multiple variations of the same behavior. In order to do this the right way, use Scenario Outline sections to cover multiple variations of the same behavior, as shown below:

Feature: SNES Mario Controls

  Scenario Outline: Mario jumps
    Given a level is started
    When the player pushes the "<letter>" button
    Then Mario jumps straight up
    Examples: Buttons
      | letter |
      | A      |
      | B      |

The Known Unknowns

Test data can be difficult to handle. Sometimes, it may be possible to seed data in the system and write tests to reference it, but other times, it may not. Google search is the prime example: the result list will change over time as both Google and the Internet change. To handle the known unknowns, write scenarios defensively so that changes in the underlying data do not cause test runs to fail. Furthermore, to be truly behavior-driven, think about data not as test data but as examples of behavior.

Consider the following example from the previous post:

Feature: Google Searching
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown
    And the following related results are shown
      | related       |
      | Panda Express |
      | giant panda   |
      | panda videos  |

This scenario uses a step table to explicitly name results that should appear for a search. The step with the table would be implemented to iterate over the table entries and verify each appeared in the result list. However, what if Panda Express were to go out of business and thus no longer be ranked as high in the results? (Let’s hope not.) The test run would then fail, not because the search feature is broken, but because a hard-coded variation became invalid. It would be better to write a step that more intelligently verified that each returned result somehow related to the search phrase, like this: “And links related to ‘panda’ are shown on the results page.” The step definition implementation could use regular expression parsing to verify the presence of “panda” in each result link.

Another nice feature of Gherkin is that step definitions can hide data in the automation when it doesn’t need to be exposed. Step definitions may also pass data to future steps in the automation. For example, consider another Google search scenario:

Feature: Google Searching

  Scenario: Search result linking
    Given Google search results for "panda" are shown
    When the user clicks the first result link
    Then the page for the chosen result link is displayed

Notice how the When step does not explicitly name the value of the result link – it simply says to click the first one. The value of the first link may change over time, but there will always be a first link. The Then step must know something about the chosen link in order to successfully verify the outcome, but it can simply reference it as “the chosen result link”. Behind the scenes, in the step definitions, the When step can store the value of the chosen link in a variable and pass the variable forward to the Then step.

Handling Test Data

Some types of test data should be handled directly within the Gherkin, but other types should not. Remember that BDD is specification by example – scenarios should be descriptive of the behaviors they cover, and any data written into the Gherkin should support that descriptive nature. Read Handling Test Data in BDD for comprehensive information on handling test data.

Less is More

Scenarios should be short and sweet. I typically recommend that scenarios should have a single-digit step count (<10). Long scenarios are hard to understand, and they are often indicative of poor practices. One such problem is writing imperative steps instead of declarative steps. I have touched on this topic before, but I want to thoroughly explain it here.

Imperative steps state the mechanics of how an action should happen. They are very procedure-driven. For example, consider the following When steps for entering a Google search:

  1. When the user scrolls the mouse to the search bar
  2. And the user clicks the search bar
  3. And the user types the letter “p”
  4. And the user types the letter “a”
  5. And the user types the letter “n”
  6. And the user types the letter “d”
  7. And the user types the letter “a”
  8. And the user types the ENTER key

Now, the granularity of actions may seem like overkill, but it illustrates the point that imperative steps focus very much on how actions are taken. Thus, they often need many steps to fully accomplish the intended behavior. Furthermore, the intended behavior is not always as self-documented as with declarative steps.

Declarative steps state what action should happen without providing all of the information for how it will happen. They are behavior-driven because they express action at a higher level. All of the imperative steps in the example above could be written in one line: “When the user enters ‘panda’ at the search bar.” The scrolling and keystroking is implied, and it will ultimately be handled by the automation in the step definition. When trying to reduce step count, ask yourself if your steps can be written more declaratively.

Another reason for lengthy scenarios is scenario outline abuse. Scenario outlines make it all too easy to add unnecessary rows and columns to their Examples tables. Unnecessary rows waste test execution time. Extra columns indicate complexity. Both should be avoided. Below are questions to ask yourself when facing an oversized scenario outline:

  • Does each row represent an equivalence class of variations?
    • For example, searching for “elephant” in addition to “panda” does not add much test value.
  • Does every combination of inputs need to be covered?
    • N columns with M inputs each generates MN possible combinations.
    • Consider making each input appear only once, regardless of combination.
  • Do any columns represent separate behaviors?
    • This may be true if columns are never referenced together in the same step.
    • If so, consider splitting apart the scenario outline by column.
  • Does the feature file reader need to explicitly know all of the data?
    • Consider hiding some of the data in step definitions.
    • Some data may be derivable from other data.

These questions are meant to be sanity checks, not hard-and-fast rules. The main point is that scenario outlines should focus on one behavior and use only the necessary variations.

Style and Structure

While style often takes a backseat during code review, it is a factor that differentiates good feature files from great feature files. In a truly behavior-driven team, non-technical stakeholders will rely upon feature files just as much as the engineers. Good writing style improves communication, and good communication skills are more than just resume fluff.

Below are a number of tidbits for good style and structure:

  1. Limit one feature per feature file. This makes it easy to find features.
  2. Limit the number of scenarios per feature. Nobody wants a thousand-line feature file. A good measure is a dozen scenarios per feature.
  3. Limit the number of steps per scenario to less than ten.
  4. Limit the character length of each step. Common limits are 80-120 characters.
  5. Use proper spelling.
  6. Use proper grammar.
  7. Capitalize Gherkin keywords.
  8. Capitalize the first word in titles.
  9. Do not capitalize words in the step phrases unless they are proper nouns.
  10. Do not use punctuation (specifically periods and commas) at the end of step phrases.
  11. Use single spaces between words.
  12. Indent the content beneath every section header.
  13. Separate features and scenarios by two blank lines.
  14. Separate examples tables by 1 blank line.
  15. Do not separate steps within a scenario by blank lines.
  16. Space table delimiter pipes (“|”) evenly.
  17. Adopt a standard set of tag names. Avoid duplicates.
  18. Write all tag names in lowercase, and use hyphens (“-“) to separate words.
  19. Limit the length of tag names.

Without these rules, you might end up with something like this:

# BAD EXAMPLE! Do not copy.

 Feature: Google Searching
     @AUTOMATE @Automated @automation @Sprint32GoogleSearchFeature
 Scenario outline: GOOGLE STUFF
Given a Web Browser is on the Google page,
 when The seach phrase "<phrase>" Enter,

 Then  "<phrase>" shown.
and The relatedd   results include "<related>".
Examples: animals
 | phrase | related |
| panda | Panda Express        |
| elephant    | elephant Man  |

Don’t do this. It looks horrible. Please, take pride in your profession. While the automation code may look hairy in parts, Gherkin files should look elegant.

Gherkinize Those Behaviors!

With these best practices, you can write Gherkin feature files like a pro. Don’t be afraid to try: nobody does things perfectly the first time. As a beginner, I broke many of the guidelines I put in this post, but I learned as I went. Don’t give up if you get stuck.

This is the last of three posts in the series focused exclusively on Gherkin. The next post will address how to adopt behavior-driven practices into the Agile software development process.

BDD 101: Gherkin By Example

Gherkin is learned best by example. Whereas the previous post in this series focused on Gherkin syntax and semantics, this post will walk through a set of examples that show how to use all of the language parts. The examples cover basic Google searching, which is easy to explain and accessible to all. You can find other good example references from Cucumber and Behat.

As a disclaimer, this post will focus entirely upon feature file examples and not upon automation through step definitions. Writing good Gherkin scenarios must come before implementing step definitions. Automation will be covered in future posts. (Note that these examples could easily be automated using Selenium.)

A Simple Feature File

Let’s start with the example from the previous post:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

This is a complete feature file. It starts with a required Feature section and a description. The description is optional, and it may have as many or as few lines as desired. The description will not affect automation at all – think of it as a comment. As an Agile best practice, it should include the user story for the features under test. This feature file then has one Scenario section with a title and one each of GivenWhenThen steps in order. It could have more scenarios, but for simplicity, this example has only one. Each scenario will be run independently of the other scenarios – the output of one scenario has no bearing on the next! The indents and blank lines also make the feature file easy to read.

Notice how concise yet descriptive the scenario is. Any non-technical person can easily understand how Google searches should behave from reading this scenario. “Search for pandas? Get pandas!” The feature’s behavior is clear to the developer, the tester, and the product owner. Thus, this one feature file can be shared by all stakeholders and can dispel misunderstandings.

Another thing to notice is the ability to parameterize steps. Steps should be written for reusability. A step hard-coded to search for pandas is be very reusable, but a step parameterized to search for any phrase is. Parameterization is handled at the level of the step definitions in the automation code, but by convention, it is a best practice to write parameterized values in double-quotes. This makes the parameters easy to identify.

Additional Steps

Not all behaviors can be fully described using only three steps. Thankfully, scenarios can have any number of steps using And and But. Let’s extend the previous example:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown
    And the related results include "Panda Express"
    But the related results do not include "pandemonium"

Now, there are three Then steps to verify the outcome. And and But steps can be attached to any type of step. They are interchangeable and do not have any unique meaning – they exist simply to make scenarios more readable. For example, the scenario above could have been written as Given-When-Then-Then-Then, but Given-When-Then-And-But makes more sense. Furthermore, And and But do not represent any sort of conditional logic. Gherkin steps are entirely sequential and do not branch based on if/else conditions.

Doc Strings

In-line parameters are not the only way to pass inputs to a step. Doc strings can pass larger pieces of text as inputs like this:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown
    And the result page displays the text
      Scientific name: Ailuropoda melanoleuca
      Conservation status: Endangered (Population decreasing)

Doc strings are delimited by three double-quotes ‘”””‘.  They may fit onto one line, or they may be multiple lines long. The step definition receives the doc string input as a plain old string. Gherkin doc strings are reminiscent of Python docstrings in format.

Step Tables

Tables are a valuable way to provide data with concise syntax. In Gherkin, a table can be passed into a step as an input. The example above can be rewritten to use a table for related results like this:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown
    And the following related results are shown
      | related       |
      | Panda Express |
      | giant panda   |
      | panda videos  |

Step tables are delimited by the pipe symbol “|”. They may have as many rows or columns as desired. The  first row contains column names and is not treated as input data. The table is passed into the step definition as a data structure native to the language used for automation (such as an array). Step tables may be attached to any step, but they will be connected to that step only. For good formatting, remember to indent the step table and to space the delimiters evenly.

The Background Section

Sometimes, scenarios in a feature file may share common setup steps. Rather than duplicate these steps, they can be put into a Background section:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
    Given a web browser is on the Google page

  Scenario: Simple Google search for pandas
    When the search phrase "panda" is entered
    Then results for "panda" are shown

  Scenario: Simple Google search for elephants
    When the search phrase "elephant" is entered
    Then results for "elephant" are shown

Since each scenario is independent, the steps in the Background section will run before each scenario is run, not once for the whole set. The Background section does not have a title. It can have any type or number of steps, but as a best practice, it should be limited to Given steps.

Scenario Outlines

Scenario outlines bring even more reusability to Gherkin. Notice in the example above that the two scenarios are identical apart from their search terms. They could be combined with a Scenario Outline section:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  Scenario Outline: Simple Google searches
    Given a web browser is on the Google page
    When the search phrase "<phrase>" is entered
    Then results for "<phrase>" are shown
    And the related results include "<related>"
    Examples: Animals
      | phrase   | related       |
      | panda    | Panda Express |
      | elephant | Elephant Man  |

Scenario outlines are parameterized using Examples tables. Each Examples table has a title and uses the same format as a step table. Each row in the table represents one test instance for that particular combination of parameters. In the example above, there would be two tests for this Scenario Outline. The table values are substituted into the steps above wherever the column name is surrounded by the “<” “>” symbols.

Scenario Outline section may have multiple Examples tables. This may make it easier to separate combinations. For example, tables could be added for “Planets” and “Food”. Each Examples table is connected to the Scenario Outline section immediately preceding it.

Be careful not to confuse step tables with Examples tables! This is a common mistake for Gherkin beginners. Step tables provide input data structures, whereas Examples tables provide input parameterization.


Tags are a great way to classify scenarios. They can be used to selectively run tests based on tag name, and they can be used to apply before-and-after wrappers around scenarios. Most BDD frameworks support tags. Any scenario can be given tags like this:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  @automated @google @panda
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

Tags start with the “@” symbol. Tag names are case-sensitive and whitespace-separated. As a best practice, they should be lowercase and use hyphens (“-“) between separate words. Tags must be put on the line before a Scenario or Scenario Outline section begins. Any number of tags may be used.


Comments allow the author to add additional information to a feature file. In Gherkin, comments must use a whole line, and each line must start with a hashtag “#”. Comment lines may appear anywhere and are ignored by the automation framework. For example:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  # Test ID: 12345
  # Author: Andy
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

Since Gherkin is very self-documenting, it is a best practice to limit the use of comments in favor of more descriptive steps and titles.

Writing Good Gherkin

This post merely shows how to use the Gherkin syntax. The next post will cover how to write good Gherkin feature files.

BDD 101: The Gherkin Language

As mentioned in the previous post, behavior scenarios are the cornerstone of BDD. Each scenario is the formalized specification of a single behavior of a product or feature. Scenarios are both the requirements for the feature as well as the test cases. This post will show how to write behavior scenarios in Gherkin feature files.

Introducing Gherkin

Gherkin is the domain-specific language for writing behavior scenarios. It is a simple programming language, and its “code” is written into feature files (text files with a “.feature” extension). The official Gherkin language standard is maintained by Cucumber, one of the most prevalent BDD automation frameworks. Most other BDD frameworks use Gherkin, but some may not conform 100% to Cucumber’s language standards.

Gherkin scenarios are meant to be short and to sound like plain English. Each scenario has the following structure:

  1. Given some initial state
  2. When an action is taken
  3. Then verify an outcome

A simple feature file example is shown below, with keywords in bold:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

As you can see, it reads intuitively. Even non-technical people can understand it.

The Feature section has a title and a description, which are both used only for documentation purposes. When the feature is tied to an Agile user story, it is good practice to put the user story in the description. The Feature section has one or more Scenario sections, each with a unique title.

Each scenario is essentially a test case. The Given-When-Then format concisely frames the behavior under test. Each Given, When, or Then line is called a step. Steps must appear in the order of Given->When->Then and are executed sequentially. The Given step sets up the expected state before the main actions take place (like loading the Google home page). The When step contains the actions for exercising the behavior under test (running a Google search), and the Then step verifies that the behavior was successful (seeing the results page). The English-y phrase following the step keyword is a description of what the step will do, written by the test author. This description is linked to a step definition (a method/function that implements the operations for the step) in the automation code base using string or regular expression matching. (Feature files apart from step definitions are basically manual test case procedures.) Good steps are declarative in that they state what should happen at a high level, and not imperative because they shouldn’t focus on direct, low-level instructions.

Gherkin Keywords

Every programming language has its keywords, and Gherkin is no different. The table below explains how each keyword is used in the official Gherkin language. Note that some BDD frameworks may not be fully compliant. Cucumber provides a decent Gherkin language reference for its implementation.

Keyword Purpose
  • section denoting product or feature under test
  • contains a one-line title
  • contains extra lines for description
  • description should include the user story
  • may have one Background section
  • may have multiple Scenario and Scenario Outline sections
  • should be one Feature per feature file
  • section for a specific behavior scenario
  • contains a one-line title
  • contains multiple Given, When, and Then steps
  • each type of step is optional
  • step order matters
  • each scenario runs independently
  • step to define the preconditions (initial state or context)
  • should put the product under test into the desired state
  • may be parameterized
  • step to define the action to be performed
  • may be parameterized
  • step to define the expected result from the action taken by When
  • may be parameterized
  • an additional step added to a Given, When, or Then
  • used instead of repeating Given, When, or Then
  • example: Given-Given-When-Then = Given-And-When-Then
  • associated with the immediately preceding step
  • order matters
  • functions the same as And, but might be easier to read
  • interchangeable with And
  • a section of Given and And statements to run before each scenario
  • does not have a title or description
  • only one Background for each Feature section
Scenario Outline
  • a templated scenario section
  • uses “<” and “>” to identify parameter names
  • followed by Examples tables that provides parameter values
  • may have more than one Examples tables
  • parameters are substituted when the tests run
  • a section to provide a table of parameter values for a Scenario Outline
  • each table row represents a combination of values to test together
  • may have any positive number of rows
  • table delimeter used for Examples tables and step tables
  • doc string delimiter for passing large text into a step
  • doc strings may be multi-line
  • prefix for a tag: @
  • tags may be placed before Feature or Scenario sections
  • tags are used to filter scenarios
  • prefix for a comment line
  • comments are not read by the Gherkin parser

The next post will walk through several Gherkin examples to show how to write good scenarios.