YAML Comments in Gherkin Feature Files

In Gherkin-based BDD test frameworks, feature files hold behavior scenarios with Given-When-Then steps. Features and scenarios may be categorized by tags for hooks and filtering, and additional comment lines may be added anywhere. However, Gherkin itself may not be sufficient enough to capture all desired test metadata. Tags are great for simple classification but crude for larger information. And comments are meaningful only to the reader.

Fraser Scott (zeroXten) came up with a nifty idea for improving Gherkin information while working on the OWASP Cloud Security project: write YAML comments in feature files to provide more formal documentation. As stated on the project home page, “The OWASP Cloud Security project aims to help people secure their products and services running in the cloud by providing a set of easy to use threat and control BDD stories that pool together the expertise and experience of the development, operations and security communities.” It’s a pretty cool idea – use Gherkin to model attacks for both education and automation. The team is writing YAML comments at the top of feature files to provide custom information in a clean, readable format that could also be easily parsed by other tools. Below is an example feature file I copied from the project, with YAML comments at the top:

At first, I wasn’t too thrilled by the thought of YAML comments in feature files. Gherkin should provide all specification needs, and tag classification is often needed for automation. However, the YAML comments are quite clean, and for this project, they appear to document aspects of the scenarios that shouldn’t be buried in Gherkin (such as confirmation status and reference links). YAML is a very sensible format for formalized comments, too.

Take this idea as food for thought: YAML comments can be an effective way to add metadata to Gherkin feature files. Just make sure to capture all behavior specification using Gherkin and to still use tags for automation.

The Airing of Grievances: BDD

Behavior-Driven Development – one of my favorite blog topics. When done right, it’s a wonderful way to foster better collaboration and automation. When it’s not… well, let’s just say I got a lot of problems with bad BDD practices, and now you’re gonna hear about it!


Treating BDD as a Tool and Not as a Process

BDD is a process – it is a set of tools and practices designed to help teams deliver better software. BDD is not just a test automation framework; the framework is just one of the tools that support BDD. Heck, the word “development” is in the name!

Complaining that Gherkin is Too Technical

Really? Really!? Gherkin is basically just plain language with some buzzwords mixed in! It is specifically designed for non-technical people to handle it! It is not a full-fledged programming language – it is essentially a simple format for behavior specification that automation frameworks can easily parse. The steps are meant to be read like plain English (or any other spoken language) so that better collaboration can happen. If Gherkin is “too technical” for you, then I hate to know what isn’t.

No Buy-In from All Roles

The three major roles on an Agile team, a.k.a the “Three Amigos,” are biz, dev, and test (regardless of fancy names or assignments). For BDD to work well, all three role types must embrace it. Otherwise, collaboration will suffer. BDD is not just a QA thing, it’s for everyone. Biz gets better features in shorter time because requirements were communicated better. Dev wastes less time figuring out what biz wants and gets tests faster. Test can start automating right away since test scenarios are defined from the start in Gherkin. Everybody wins if everybody contributes.

No Three Amigos Meetings

Three Amigos meetings are like dietary fiber supplements: they help a team stay regular with collaboration, or else development gets constipated as engineers start building crap instead of the intended behaviors. Then the crap gets blocked up as the team must rework it, meaning it could be another sprint before there’s a healthy flush of new features. Open conversations in regularly scheduled Three Amigos meetings would have avoided the whole obstruction.

Forcing QA to Write All Behavior Scenarios

BDD is not just QA thing – it is for all roles. Pigeonholing the responsibility of writing behavior scenarios onto QA is not only unfair, it is anti-collaborative. The whole reason for writing scenarios in plain language with Gherkin is to let everyone contribute to feature behavior. Scenarios are primarily about capturing behavior, not writing tests. If tests were the main focus, then engineers could just write test cases using traditional automation frameworks directly in general purpose programming languages like Java or Python. BDD offers the benefits of process efficiency and shifting left when the whole team helps to write behavior scenarios.


Bad Gherkin

Only you can prevent bad Gherkin. Or I can – via rejected code reviews.

Typos, Poor Grammar, and Inconsistent Formatting

Gherkin needs to be readable. Steps with typos, poor grammar, and inconsistent formatting will still run fine for test automation, but they make it tough to understand the behaviors they describe. Sometimes, they can even make the meaning ambiguous.

No Double-Quotes Around Step Parameters

How do you know if something is a step parameter? “Double quotes” make it easy. However, Gherkin does not enforce double quotes around parameters. It is merely by programmer’s convention, but it’s a really helpful convention indeed.

No Tags

Tags make it super easy to filter scenarios at runtime. No tags? Good luck remembering long paths and names at runtime, or running related scenarios across different feature files together.

More Than 120 Characters per Line

Any longer is too much to comprehend. Either write the step more concisely, or split it apart. Plus, the line may go off the edge of the screen.

More Than 10 Steps per Scenario

Again, any longer is too much to comprehend. Scenarios should be short and sweet – they should concisely describe behavior. Too many steps means the scenario is too imperative or covers more than one behavior.

Multiple Behaviors per Scenario

Scenarios should not have multiple personality disorder: one scenario, one behavior. Don’t break the Cardinal Rule of BDD! So many people break this rule when they first start BDD because they are locked into procedure-driven thinking. Then, when tests fail, nobody knows exactly what behavior is the culprit. One scenario, one behavior.

Out-of-Order Step Types

Givens, Whens, and Thens each serve a specific, ordered purpose: Given some initial state, When actions are taken, Then verify an expected outcome. Jumbling them up ruins their meaning. Furthermore, duplicate When-Then pairs indicate multiple behaviors per scenario. And don’t just reassign step types to skirt the strict-ordering rule. Do it right – put integrity into the steps!

Gigantic Tables

Have you ever seen an Examples table with 13 columns? Or maybe 517 rows? I have. The horror, the horror! Tables that big make scenarios lose any semblance of specification-by-example. Make sure table rows and columns are actually needed. Use key-value lookups if the data is too gritty.

Being Imperative Rather Than Declarative

Given I’m logged into the app, when I click here, and I click there, and I type P, and I type L, and I type E, and I type A, and I type S, and I type E, and I type D, and I type O, and I type N, and I type T, and I type W, and I type R, and I type I, and I type T, and I type E, and I type S, and I type C, and I type E, and I type N, and I type A, and I type R, and I type I, and I type O, and I type S, and I type L, and I type I, and I type K, and I type E, and I type T, and I type H, and I type I, and I type S, then go directly to jail, and do not pass GO, and do not collect $200. Steps should focus more on what than how.

Prefixing Existing Test Procedure Steps with Gherkin Buzzwords

Let’s just take our existing test procedures from a tool like HP QualityCenter or ALM and put the words “Given,” “When,” and “Then” in front of every step. Ta-da! We’re now doing BDD! …WRONG!! I kid you not, I have see this happen. These people clearly never took BDD 101. It hurts to see.


Unorganized Step Definitions

Programmers like to throw their step definition methods anywhere. Add ’em to an unrelated existing class? Create a whole new class for only two new steps? Mix up the types? Who cares! Don’t bother to alphabetize them, either. Well, that’s how tech debt happens. That’s how duplicate steps get written, because originals can’t be found. Imagine a library without the Dewey Decimal System – that’s what an unorganized step def collection will be.

Putting Cleanup Code in Then Steps

Cleanup code belongs in After hooks, where it will be run no matter what fails during the scenario. Writing Then steps to do cleanup not only breaks step type integrity, the cleanup code will not run if a previous step aborts!

Catching and Burying All Exceptions

Here’s something I see all the time in automation code (and not just for BDD):

// FYI - This is Java, but the same thing can happen in any language
@When("^do something$")
public void doSomething() throws Throwable {
  try {
  catch (Exception e) {

The entirety of a step (or even a whole test) is surrounded by a try-catch that catches every exception. THIS STEP CAN NEVER REGISTER A FAILURE! Even if there was a failed assertion or, worse, an exception that ought to abort the test, it will get caught and buried with not much more than a slight whimper in the log. In this case, the test will carry forth to the next step, which will probably not work, either. I’ve seen projects with this sort of exception handling around every single step definition. In modern test frameworks, the framework will catch all exceptions at the highest level, register the test as failed, and move on safely to the next test. There is no need to catch any exception, unless the test can be recovered.

Changing Steps Without Testing Affected Scenarios

Sharing steps is a wonderful thing, but changing steps without testing all scenarios that use them is a terrible thing. I’ve seen people change step text or step def code and test only their new scenarios. Meanwhile, in the continuous integration environment, a dozen other tests using those steps started failing. (Hell, I’ve seen people push code that doesn’t even compile, but that’s another grievance.)

Multiple Names for the Same Step

Just because you can do something doesn’t mean you should. Different names for the same step may be useful for readability, but please keep the name variants limited.

No Dependency Injection

Dependency injection is the best way to share objects in an automation framework. (Singletons work well, too, but DI allows more careful control of scope.) Many frameworks like Cucumber-JVM even integrate with existing DI frameworks like PicoContainer and Spring. DON’T MAKE NON-CONSTANT VARIABLES GLOBAL! DON’T BLINDLY MAKE THINGS “STATIC” JUST TO SHARE THEM! Globals (or “statics” in Java/C# like languages) are dangerous: they can be easily misused, they are a nightmare to trace, and they can break multithreaded execution. Just use the appropriate design pattern: dependency injection.

Gherkin Syntax Highlighting in Chrome

Google Chrome is one of the most popular web browsers around. Recently, I discovered that Chrome can edit and display Gherkin feature files. The Chrome Web Store has two useful extensions for Gherkin: Tidy Gherkin and Pretty Gherkin, both developed by Martin Roddam. Together, these two extensions provide a convenient, lightweight way to handle feature files.

Tidy Gherkin

Tidy Gherkin is a Chrome app for editing and formatting feature files. Once it is installed, it can be reached from the Chrome Apps page (chrome://apps/). The editor appears in a separate window. Gherkin text is automatically colored as it is typed. The bottom preview pane automatically formats each line, and clicking the “TIDY!” button in the upper-left corner will format the user-entered text area as well. Feature files can be saved and opened like a regular text editor. Templates for Feature, Scenario, and Scenario Outline sections may be inserted, as well as tables, rows, and columns.

Another really nice feature of Tidy Gherkin is that the preview pane automatically generates step definition stubs for Java, Ruby, and JavaScript! The step def code is compatible with the Cucumber test frameworks. (The Java code uses the traditional step def format, not the Java 8 lambdas.) This feature is useful if you aren’t already using an IDE for automation development.

Tidy Gherkin has pros and cons when compared to other editors like Notepad++ and Atom. The main advantages are automatic formatting and step definition generation – features typically seen only in IDEs. It’s also convenient for users who already use Chrome, and it’s cross-platform. However, it lacks richer text editing features offered by other editors, it’s not extendable, and the step def gen feature may not be useful to all users. It also requires a bit of navigation to open files, whereas other editors may be a simple right-click away. Overall, Tidy Gherkin is nevertheless a nifty, niche editor.

This slideshow requires JavaScript.

Pretty Gherkin

Pretty Gherkin is a Chrome extension for viewing Gherkin feature files through the browser with syntax highlighting. After installing it, make sure to enable the “Allow access to the file URLs” option on the Chrome Extensions page (chrome://extensions/). Then, whenever Chrome opens a feature file, it should display pretty text. For example, try the GoogleSearch.feature file from my Cucumber-JVM example project, cucumber-jvm-java-example. Unfortunately, though, I could not get Chrome to display local feature files – every time I would try to open one, Chrome would simply download it. Nevertheless, Pretty Gherkin seems to work for online SCM sites like GitHub and BitBucket.

Since Pretty Gherkin is simply a display tool, it can’t really be compared to other editors. I’d recommend Pretty Gherkin to Chrome users who often read feature files from online code repositories.

This slideshow requires JavaScript.


Be sure to check out other Gherkin editors, too!

Cucumber-JVM for Java

This post is a concise-yet-comprehensive overview of Cucumber-JVM for Java. It is an introduction, a primer, a guide, and a reference. If you are new to BDD, please learn about it before using Cucumber-JVM.



Cucumber is an open-source software test automation framework for behavior-driven development. It uses a business-readable, domain-specific language called Gherkin for specifying feature behaviors that become tests. The Cucumber project started in 2008 when Aslak Hellesøy released the first version of the Cucumber framework for Ruby.

Cucumber-JVM is the official port for JVM languages, such as Java, Groovy, Scala, Clojure, and Gosu. Every Gherkin step is “glued” to a step definition method that executes the step. The English text of a step is glued using annotations and regular expressions. Cucumber-JVM integrates nicely with other testing packages. Anything that can be done with Java or other JVM languages can be handled by Cucumber-JVM. Cucumber-JVM is ideal for black-box, above-unit, functional tests. This guide focuses on Java, though the concepts apply for all JVM languages.

Example Projects

Github contains two Cucumber-JVM example projects for this guide:

The projects use Java, Apache Maven, Selenium WebDriver, and AssertJ. The README files include practice exercises as well.

Prerequisite Skills

To be successful with Cucumber-JVM for Java, the following skills are required:

Prerequisite Tools

Test machines must have the Java Development Kit (JDK) installed to build and run Cucumber-JVM tests. They should also have the desired build tool installed (such as Apache Maven). The build tool should automatically install Cucumber-JVM packages through dependency management.

An IDE such as JetBrains IntelliJ IDEA (with the Cucumber for Java plugin) or Eclipse (with the Cucumber JVM Eclipse Plugin) is recommended for Cucumber-JVM test automation development. Software configuration management (SCM) with a tool like Git is also strongly recommended.


Cucumber-JVM 2.0 was released in August 2017 and should be used for new Cucumber-JVM projects. Releases may be found under Maven Group ID io.cucumber. Older Cucumber-JVM 1.x versions may be found under Maven Group ID info.cukes.

Build Management

Apache Maven is the preferred build management tool for Cucumber-JVM projects. All Cucumber-JVM packages are available from the Maven Central Repository. Maven can automatically run Cucumber-JVM tests as part of the build process. Projects using Cucumber-JVM should follow Maven’s Standard Directory Layout. The examples use Maven. Gradle may also be used, but it requires extra setup.

Every Maven project has a POM file for configuration. The POM should contain appropriate Cucumber-JVM dependencies. There is a separate package for each JVM language, dependency injection framework, and underlying unit test runner. Since Cucumber-JVM is a test framework, its dependencies should use test scope. Below is a typical list of Java dependencies, though others may be required. Check io.cucumber on the Maven site for the latest packages and versions.


Project Structure

Cucumber-JVM test automation has the same layered approach as other BDD frameworks:

BDD Automation Layers.png

The higher layers focus more on specification, while the lower layers focus more on implementation. Gherkin feature files and step definition classes are BDD-specific.

Cucumber-JVM tests may be included in the same project as product code or in a separate project. Either way, projects using Cucumber-JVM should follow Maven’s Standard Directory Layout: test code should be located under src/test.

Cucumber-JVM Example Project

Screenshot of the example project from IntelliJ IDEA’s Project view.

Gherkin Feature Files

Gherkin feature files are text files that contain Gherkin behavior scenarios. They use the “.feature” extension. In a Maven project, they belong under src/test/resources, since they are not Java source files. They should also be organized into a sensible package hierarchy. Refer to other BDD pages for writing good Gherkin.

Gherkin Feature File

A feature file from the example projects, opened in IntelliJ IDEA.

Step Definition Classes

Step definition classes are Java classes containing methods that implement Gherkin steps. Step def classes are like regular Java classes: they have variables, constructors, and methods. Steps are “glued” to methods using regular expressions. Feature file scenarios can use steps from any step definition class in the project. In a Maven project, step defs belong in packages under src/test/java, and their class names should end in “Steps”.

The Basics

Below is a step definition class from the cucumber-jvm-java-example project, which uses the traditional method annotation style for step defs as part of the cucumber-java package. Each method should throw Throwable so that exceptions are raised up to the Cucumber-JVM framework.

package com.automationpanda.example.stepdefs;

import com.automationpanda.example.pages.GooglePage;
import org.openqa.selenium.WebDriver;

import static org.assertj.core.api.Assertions.assertThat;

public class GoogleSearchSteps {

  private WebDriver driver;
  private GooglePage googlePage;

  @Before(value = "@web", order = 1)
  public void initWebDriver() throws Throwable {
    driver = new ChromeDriver();

  @Before(value = "@google", order = 10)
  public void initGooglePage() throws Throwable {
    googlePage = new GooglePage(driver);

  @Given("^a web browser is on the Google page$")
  public void aWebBrowserIsOnTheGooglePage() throws Throwable {

  @When("^the search phrase \"([^\"]*)\" is entered$")
  public void theSearchPhraseIsEntered(String phrase) throws Throwable {

  @Then("^results for \"([^\"]*)\" are shown$")
  public void resultsForAreShown(String phrase) throws Throwable {

  @After(value = "@web")
  public void disposeWebDriver() throws Throwable {

Alternatively, in Java 8, step definitions may be written using lambda expressions. As shown in the cucumber-jvm-java8-example project, lambda-style step defs are more concise and may be defined dynamically. The cucumber-java8 package is required:

package com.automationpanda.example.stepdefs;

import com.automationpanda.example.pages.GooglePage;
import cucumber.api.Scenario;
import cucumber.api.java8.En;
import org.openqa.selenium.WebDriver;

import static org.assertj.core.api.Assertions.assertThat;

public class GoogleSearchSteps implements En {

  private WebDriver driver;
  private GooglePage googlePage;

  // Warning: Make sure the timeouts for hooks using a web driver are zero

  public GoogleSearchSteps() {
    Before(new String[]{"@web"}, 0, 1, (Scenario scenario) -> {
      driver = new ChromeDriver();
    Before(new String[]{"@google"}, 0, 10, (Scenario scenario) -> {
      googlePage = new GooglePage(driver);
    Given("^a web browser is on the Google page$", () -> {
    When("^the search phrase \"([^\"]*)\" is entered$", (String phrase) -> {
    Then("^results for \"([^\"]*)\" are shown$", (String phrase) -> {
    After(new String[]{"@web"}, (Scenario scenario) -> {

Either way, steps from any feature file are glued to step definition methods/lambdas from any class at runtime:

Step Def Glue

Gluing a Gherkin step to its Java definition using regular expressions. IDEs have features to automatically generate definition stubs for steps.

For best practice, class inheritance should also be avoided – step bindings in superclasses will trigger DuplicateStepDefinitionException exceptions at runtime, and any step definition concern handled by inheritance can be handled better with other design patterns. Class constructors should be used primarily for dependency injection, while setup operations should instead be handled in Before hooks.


Scenarios sometimes need automation-centric setup and cleanup routines that should not be specified in Gherkin. For example, web tests must first initialize a Selenium WebDriver instance. Step definition classes can have Before and After hooks that run before and after a scenario. They are analogous to setup and teardown methods from other test frameworks like JUnit. Hooks may optionally specify tags for the scenarios to which they apply, as well as an order number. They are similar to Aspect-Oriented Programming. After hooks will run even if a scenario has an exception or abortive assertion – use them for cleanup routines instead of Gherkin steps to guarantee cleanup runs.

The code snippet below shows Before and After hooks from the traditional-style example project. The order given to the Before hooks guarantees the web driver is initialized before the page object is created.

  @Before(value = "@web", order = 1)
  public void initWebDriver() throws Throwable {
    driver = new ChromeDriver();

  @Before(value = "@google", order = 10)
  public void initGooglePage() throws Throwable {
    googlePage = new GooglePage(driver);

  @After(value = "@web")
  public void disposeWebDriver() throws Throwable {

Before and After hooks surround scenarios only. Cucumber-JVM does not provide hooks to surround the whole test suite. This protects test case independence but makes global setup and cleanup challenging. The best workaround is to use the singleton pattern with lazy initialization. The solution is documented in Cucumber-JVM Global Hook Workarounds.

Dependency Injection

Cucumber-JVM supports dependency injection (DI) as a way to share objects between step definition classes. For example, steps in different classes may need to share the same web driver instance. Cucumber-JVM supports many DI modules, and each has its own dependency package. As a warning, do not use static variables for sharing objects between step definition classes – static variables can break test independence and parallelization.

PicoContainer is the simplest DI framework and is recommended for most needs. Dependency injection hinges upon step definition class constructors. Without DI, step def constructors must not have parameters. With DI, PicoContainer will automatically construct each object in a step def constructor signature and pass them in when the step def object is constructed. Furthermore, the same object is injected into all step def classes that have its type as a constructor parameter. Objects that require constructor parameters should use a holder or caching class to provide the necessary arguments. Note that dependency-injected objects are created fresh for each scenario.

Below is a trivial example for how to apply dependency injection using PicoContainer to initialize the web driver in the example projects. (A more advanced example would read browser type from a config file and set the web driver accordingly.)

public class WebDriverHolder {
  private WebDriver driver;
  public WebDriver getDriver() {
    return driver;
  public void initWebDriver() {
    driver = new ChromeDriver();

public class GoogleSearchSteps {
  private WebDriverHolder holder;
  public GoogleSearchSteps(WebDriverHolder holder) {
    this.holder = holder;
  public void initWebDriver() throws Throwable {
    if (holder.getDriver() == null)

Automation Support Classes

Automation support classes are extra classes outside of the Cucumber-JVM framework itself that are needed for test automation. They could come from the same test project, a separate but proprietary package, or an open-source package. Regardless of the source, they should fold into build management. They can integrate seamlessly with Cucumber-JVM. Step definitions should be very short because the bulk of automation work should be handled by support classes for maximum code reusability.

Popular open-source Java packages for test automation support are:

Page objects, file readers, and data processors also count as support classes.

Configuration Files

Configuration files are extra files outside of the Cucumber-JVM framework that provide environment-specific data to the tests, such as URLs, usernames, passwords, logging/reporting settings, and database connections. They should be saved in standard formats like CSV, XML, JSON, or Java Properties, and they should be read into memory once at the start of the test suite using global hook workarounds. The automation code should look for files at predetermined locations or using paths passed in as environment variables or properties.

Not all test automation projects need config files, but many do. Never hard-code config data into the automation code. Avoid non-text-based formats like Microsoft Excel so that version control can easily do diffs, and avoid non-standard formats that require custom parsers because they require extra development and maintenance time.

Running Tests

Cucumber-JVM tests may be run in a number of ways.

Using JUnit or TestNG

The cucumber-junit and cucumber-testng packages enable JUnit and TestNG respectively to run Cucumber-JVM tests. They require test runner classes that provide CucumberOptions for how to run the tests. A project may have more than one runner class. The example projects use the JUnit runner like this:

package com.automationpanda.example.runners;

import cucumber.api.CucumberOptions;
import cucumber.api.junit.Cucumber;
import org.junit.runner.RunWith;

  plugin = {"pretty", "html:target/cucumber", "junit:target/cucumber.xml"},
  features = "src/test/resources/com/automationpanda/example/features",
  glue = {"com.automationpanda.example.stepdefs"})
public class PandaCucumberTest {

JUnit and TestNG runners can also be picked up by build management tools. For example, Maven will automatically run any runner classes named * during the test phase and * during the verify phase. Be sure to include the clean option to delete old test results. Avoid duplicate test runs by making sure runner classes do not cover the same tests – use tags to avoid duplicate coverage.

Using the Command Line Runner

Cucumber-JVM provides a CLI runner that can run feature files directly from the command line. To use it, invoke:

java cucumber.api.cli.Main

Run with “–help” to see all available options.

Using IDEs

Both JetBrains IntelliJ IDEA (with the Cucumber for Java plugin) and Eclipse (with the Cucumber JVM Eclipse Plugin) are great IDEs for Cucumber-JVM test development. They provide features for linking steps to definitions, generating definition stubs, and running tests with various options.

Cucumber Options

Cucumber options may be specified either in a runner class or from the command line as a Java system property. Set options from the command line using “-Dcucumber.options” – it will work for any java or mvn command. To see all available options, set the options to “–help”, or check the official Cucumber-JVM doc page.

The most useful option is probably the tags option. Selecting tags to run dynamically at runtime, rather than statically in runner classes, is very useful. In Cucumber-JVM 2.0, tag expressions use a basic English Boolean language:

@automated and @web
@web or @service
not @manual
(@web or @service) and (not @wip)

Older version of Cucumber-JVM used a more complicated syntax with tildes and commas.


In BDD, What Should Be A Feature?

How do I decide what a feature should be? And should I define a feature first before writing behavior specs, or should I start with behaviors and see how they fit together into features?

Features, scenarios, and behaviors are all common BDD terms that should be carefully defined:

  • behavior is an operation with inputs, actions, and expected outcomes.
  • A scenario is the specification of a behavior using formal steps and examples.
  • feature is a desired product functionality often involving multiple behaviors.

Don’t try to over-think the definition of “feature.” Some features are small, while other features are large. The main distinction between a feature and a scenario or behavior is that features are what customers expect to receive. Small features may cover only a few or even only one behavior, while large features may cover several.

The Gherkin language has Feature and Scenario sections. In this sense, a Feature is simply a collection of related Scenarios. They align roughly to the more general meanings of the terms.

Don’t over-think features with Agile, either. Some teams define a feature as a collection of user stories. Other teams say that one user story is a feature. In terms of Gherkin, don’t presume that one user story must have exactly one feature file with one Feature section. A user story could have zero-to-many feature files to cover its behaviors. Do whatever is appropriate.

Features should be determined by customer needs. They should solve problems the customers have. For example, perhaps the customer needs a better way to process orders through their online store. That’s where features should start – as business needs. Behaviors should then naturally come as part of grooming and refinement efforts. Thus, in most cases, features should be identified first before individual behaviors.

Nevertheless, there may be times during development that scenario-to-feature realignment should be done. It may be more convenient to create a new feature file for related behaviors. Or, a new feature may be “discovered” out of particularly useful behaviors. This is more the exception than the norm.

BDD 101: Unit, Integration, and End-to-End Tests

There are many types of software tests. BDD practices can be incorporated into all aspects of testing, but BDD frameworks are not meant to handle all test types. Behavior scenarios are inherently functional tests – they verify that the product under test works correctly. While instrumentation for performance metrics could be added, BDD frameworks are not intended for performance testing. This post focuses on how BDD automation works into the Testing Pyramid. Please read BDD 101: Manual Testing for manual test considerations. (Check the Automation Panda BDD page for the full table of contents.)

The Testing Pyramid

The Testing Pyramid is a functional test development approach that divides tests into three layers: unit, integration, and end-to-end.

  • Unit tests are white-box tests that verify individual “units” of code, such as functions, methods, and classes. They should be written in the same language as the product under test, and they should be stored in the same repository. They often run as part of the build to indicate immediate success or failure.
  • Integration tests are black-box tests that verify integration points between system components work correctly. The product under test should be active and deployed to a test environment. Service tests are often integration-level tests.
  • End-to-end tests are black-box tests that test execution paths through a system. They could be seen as multi-step integration tests. Web UI tests are often end-to-end-level tests.

Below is a visual representation of the Testing Pyramid:

The Testing Pyramid

The Testing Pyramid

From bottom to top, the tests increase in complexity: unit tests are the simplest and run very fast, while end-to-end require lots of setup, logic, and execution time. Ideally, there should be more tests at the bottom and fewer tests at the top. Test coverage is easier to implement and isolate at lower levels, so fewer high-investment, more-fragile tests need to be written at the top. Pushing tests down the pyramid can also mean wider coverage with less execution time. Different layers of testing mitigate risk at their optimal returns-on-investment.

Behavior-Driven Unit Testing

BDD test frameworks are not meant for writing unit tests. Unit tests are meant to be low-level, program-y tests for individual functions and methods. Writing Gherkin for unit tests is doable, but it is overkill. It is much better to use established unit test frameworks like JUnit, NUnit, and pytest.

Nevertheless, behavior-driven practices still apply to unit tests. Each unit test should focus on one main thing: a single call, an individual variation, a specific input combo; a behavior. Furthermore, in the software process, feature-level behavior specs draw a clear dividing line between unit and above-unit tests. The developer of a feature is often responsible for its unit tests, while a separate engineer is responsible for integration and end-to-end tests for accountability. Behavior specs carry a gentleman’s agreement that unit tests will be completed separately.

Integration and End-to-End Testing

BDD test frameworks shine at the integration and end-to-end testing levels. Behavior specs expressively and concisely capture test case intent. Steps can be written at either integration or end-to-end levels. Service tests can be written as behavior specs like in Karate. End-to-end tests are essentially multi-step integrations tests. Note how a seemingly basic web interaction is truly a large end-to-end test:

Given a user is logged into the social media site
When the user writes a new post
Then the user's home feed displays the new post
And the all friends' home feeds display the new post

Making a simple social media post involves web UI interaction, backend service calls, and database updates all in real time. That’s a full pathway through the system. The automated step definitions may choose to cover these layers implicitly or explicitly, but they are nevertheless covered.

Lengthy End-to-End Tests

Terms often mean different things to different people. When many people say “end-to-end tests,” what they really mean are lengthy procedure-driven tests: tests that cover multiple behaviors in sequence. That makes BDD purists shudder because it goes against the cardinal rule of BDD: one scenario, one behavior. BDD frameworks can certainly handle lengthy end-to-end tests, but careful considerations should be taken for if and how it should be done.

There are five main ways to handle lengthy end-to-end scenarios in BDD:

  1. Don’t bother. If BDD is done right, then every individual behavior would already be comprehensively covered by scenarios. Each scenario should cover all equivalence classes of inputs and outputs. Thus, lengthy end-to-end scenarios would primarily be duplicate test coverage. Rather than waste the development effort, skip lengthy end-to-end scenario automation as a small test risk, and compensate with manual and exploratory testing.
  2. Combine existing scenarios into new ones. Each When-Then pair represents an individual behavior. Steps from existing scenarios could be smashed together with very little refactoring. This violates good Gherkin rules and could result in very lengthy scenarios, but it would be the most pragmatic way to reuse steps for large end-to-end scenarios. Most BDD frameworks don’t enforce step type order, and if they do, steps could be re-typed to work. (This approach is the most pragmatic but least pure.)
  3. Embed assertions in Given and When steps. This strategy avoids duplicate When-Then pairs and ensures validations are still performed. Each step along the way is validated for correctness with explicit Gherkin text. However, it may require a number of new steps.
  4. Treat the sequence of behaviors as a unique, separate behavior. This is the best way to think about lengthy end-to-end scenarios because it reinforces behavior-driven thinking. A lengthy scenario adds value only if it can be justified as a uniquely separate behavior. The scenario should then be written to highlight this uniqueness. Otherwise, it’s not a scenario worth having. These scenarios will often be very declarative and high-level.
  5. Ditch the BDD framework and write them purely in the automation programming. Gherkin is meant for collaboration about behaviors, while lengthy end-to-end tests are meant exclusively for intense QA work. Biz roles will write behavior specs but will never write end-to-end tests. Forcing behavior specification on lengthy end-to-end scenarios can inhibit their development. A better practice could be coexistence: acceptance tests could be written with Gherkin, while lengthy end-to-end tests could be written in raw programming. Automation for both test sets could still nevertheless share the same automation code base – they could share the same support modules and even step definition methods.

Pick the approach that best meets the team’s needs.

Gherkin Syntax Highlighting in Atom

Atom, “a hackable editor for the 21st Century,” is a really great text editor for both quick edits and serious programming. Atom is free, open-source, and developed by GitHub. It can support a host of languages out-of-the-box, with plugins for even more. What makes Atom really nice compared to Notepad++ is that Atom is cross-platform: it runs on Linux, macOS, and Windows. Another bonus point over Notepad++ is the in-editor Project tree view for directories. Atom also has Atom IDE for advanced development support. Even though Atom is feature-rich, its response time is pretty fast. It’s a solid text editor choice for both technical and non-technical users.

One of my first blog posts on Automation Panda was Gherkin Syntax Highlighting in Notepad++. It continues to be one of my post popular posts, too. However, Notepad++ doesn’t help feature file authors who use macOS or Linux. Thankfully, Atom has a decent plugin for Gherkin. In fact, it has a number of Gherkin plugins available.

Atom Intall Plugin

On macOS, Settings are available under File -> Preferences… and on the Install tab.

I installed the first package, language-gherkin, and I was very pleased with the syntax highlighting. I also tried the internationalized package below it in the list, but the colors were not as nice (call me picky). It looked like other packages could do autocomplete and table formatting as well.

Atom Gherkin


Atom is just another great option for writing Gherkin feature files.