BDD

Who Should Lead BDD?

Behavior-driven development offers great benefits: better communication, easier test automation, and higher code quality. There are many ways for a team to start doing BDD, and naturally, someone needs to stand up and lead the effort. In my experience, adopting BDD is its own process. An evangelist converts team leaders, training sessions are given, and Gherkinized acceptance criteria start being automated. However, not everyone will embrace the changes, especially those across different role types. And big changes take time. Rome wasn’t built in a day, and neither will be a mature, effective BDD process.

This post covers three possible ways to lead BDD adoption, each from one of the Three Amigos roles. Any role can lead the charge, but each will have its unique struggles. These possibilities are advisory but not necessarily prescriptive. If you want to move your team into BDD, use these three approaches as guidelines for crafting a plan that best meets your needs. And, of course, the advice in Winning Support for BDD pertains to all approaches. Furthermore, as you read these approaches, put yourselves in the shoes of roles other than your own, so you can better understand the struggles each role faces.

Note: The approaches below presume that the underlying software development process is Agile Scrum. Nevertheless, they may be tweaked and applied to other processes, like Waterfall or Kanban.

The Starting Point

The starting point for all three approaches below is a “traditional” Agile sprint – one that is not (yet) behavior-driven. Product owners write user stories, developers implement the solutions, and testers test the deliverables. The diagram below shows the the main flow of sprint work in this type of sprint, and it will serve as the basis for illustrating BDD adoption:

Traditional Sprint

The overall flow of a “traditional,” non-behavior-driven Agile sprint. Ceremonies like planning, review, and retrospective should still happen, but the are left out of this diagram to put emphasis on parts affected by BDD.

QA-Led BDD

Circumstances

The most common approach I’ve seen is QA-led BDD adoption, because testers arguably have the most to gain. It is most applicable when the Three Amigos roles (biz, dev, and test) are well-defined and separate. The impetus for QA to lead BDD adoption could be that developers deliver code too late to adequately test and automate within a sprint, or it could be that the QA team is struggling to scale their test automation development. There may also be resistance to BDD from biz and dev roles.

Steps

The sensible path for QA is to start all the way to the right and progressively shift left. This means that the starting point would be test automation. Start by building a solid automation code base. Pick a well-supported BDD framework like Cucumber, SpecFlow, or behave, and start adding scenarios and step definitions. Select scenarios for core product features rather than the latest sprint stories, so that the code base will be populated with the most basic, useful steps. Once the automation code reaches a “critical mass” for step reusability, QA can then proactively classify new test scenarios as automated or manual. Automated tests become easier and easier to write, giving QA more time to be exploratory with manual testing. Ideally, all manual testing would become exploratory.

Then, it’s time to start shifting left. At this point, all Gherkin steps would be in the automation code only, so set up a tool like Pickles to expose the steps to all team members as living documentation. QA should then schedule Three Amigos meetings with biz and dev to proactively discuss user story expectations. In those meetings, QA should start demonstrating how to write acceptance criteria in Gherkin, which then expedites testing. A big win would be if a QA engineer could write a new scenario using only pre-existing, pre-automated steps and then run it successfully on the spot.

Once biz and dev folks are convinced of BDD’s benefits, encourage them to participate in writing Gherkin. When they get comfortable, encourage product owners to write acceptance criteria in Gherkin when they write user stories, and hold Three Amigos meetings before sprint planning as part of grooming. Convince them that for them to help write Gherkin scenarios is a process efficiency for the whole team.

 

This slideshow requires JavaScript.

Struggles

Shifting left is never easy, especially when team members are hardened into their roles. That’s why QA must write both really good test automation and really good Gherkin scenarios. Success should speak for itself once QA delivers good automation fast. Furthermore, QA must be clear that BDD is not merely a test tool, it’s a process that requires a paradigm shift. Otherwise, BDD could be easily pigeonholed to be a “QA thing.”

Dev-Led BDD

Circumstances

There are a few reasons that could push developers to lead BDD. On some Agile teams, there’s no distinction between dev and QA roles: all team members are software engineers responsible for both developing and testing the software. Or, developers may not be satisfied with the testing effort. Maybe too many bugs are escaping the sprint, or maybe automation isn’t getting done in time. Or, perhaps the product owner is not happy with the deliverables and putting pressure on the team to do better. Whatever the circumstance, developers are more than capable of winning with BDD.

Steps

The best way for developers to start is to set up Three Amigos meetings, to stop the game of telephone between biz and test. In those meetings, start translating acceptance criteria into Gherkin. Then, start helping out with test automation – that may mean anything from offering advice to QA to building the framework from scratch. Then, start pushing left and right to get biz and test on board with BDD.

 

This slideshow requires JavaScript.

Struggles

It may be difficult for developers to work on test automation because they may lack either the expertise or the time to devote to good test automation. Automation is a specialized discipline, and it takes time and diligence to build up expertise to do it right. I’ve seen very skilled developers haughtily build very shabby automation frameworks.

Developers must also be careful to not be too technical, or else biz and test roles may reject BDD for being too complicated or beyond their abilities. Furthermore, some teams may be resistant to developing test automation. For example, automation work may be “starved” for points because it is underestimated or similarly starved for time because it is deemed lower in priority to other work.

Biz-Led BDD

Circumstances

BDD is designed to bring technical and business roles together into healthier collaboration, and biz folks can certainly lead BDD adoption as successfully as more technical folks can. Major reasons for biz to take the lead could be if development is perpetually running behind schedule, if deliverables don’t meet the original requirements, or if software bugs are rampant.

Steps

For biz roles, “shift left” could be better called “pull left.” Start by writing solid user stories and Gherkin acceptance criteria. Focus on good Gherkin that is readable and reusable. Then, introduce BDD as a refinement to the Agile process, highlighting its benefits. Initiate Three Amigos meetings to make sure that you are communicating the right things to dev and test. Once collaboration is going well, suggest BDD automation as a way to expedite dev and test work. If acceptance criteria are all Gherkinized, then developing BDD automation would be a natural extension.

 

This slideshow requires JavaScript.

Struggles

In my experience, biz roles (specifically product owners) tend to be the most hesitant about BDD. They often see writing Gherkin as a burdensome requirement rather than a way to help their team. Or, they may fear that BDD is “too technical” for them. It may also be difficult for them to pitch BDD automation to the team. To be successful, biz roles need to step outside their comfort zone to win supporters from dev and test.

Process paradigm shifts can be hard, especially on teams that are already overwhelmed with work. Some people just don’t like change. Process and automation change can also be a big challenge if QA is outsourced (which is common).

Side-By-Side Comparison

Here’s the TL;DR:

Role Circumstances Steps Struggles
QA
  • Code is delivered too late to test and automate
  • Automation development is not scaling
  1. Build a solid BDD automation framework
  2. Demonstrate automation success
  3. Set up Three Amigos meetings during the sprint
  4. Start writing Gherkin scenarios with biz and dev as part of grooming
  • Showing that BDD is a whole development process, not just a QA thing
  • Getting the team to truly shift left
Dev
  • No separation of dev and QA roles
  • Too many bugs are escaping the sprint
  • Pressure from biz to do better
  1. Initiate better collaboration through Three Amigos and Gherkin
  2. Push right by helping QA with testing and automation
  3. Push left by helping biz write better acceptance criteria
  • Humbly learning good automation practices
  • Dedicating time for automation and more meetings
Biz
  • Missed deadlines
  • Deliverables not matching expectations
  • Too many bugs
  1. Write acceptance criteria in good Gherkin
  2. Set up Three Amigos meetings to review Gherkin
  3. Pitch BDD automation
  • Learning semi-technical things
  • Pushing all the way to automation

Conclusion

These are just three general approaches intended to show how BDD is for everyone. If you have other approaches, please describe them in the comment section below! Whatever the approach, make sure to demonstrate that BDD helps everyone, or else people may feel forced into corners and reject BDD for bad reasons. And remember, software quality is not just QA’s responsibility; it is everyone’s responsibility.

Winning Support for BDD

Adopting behavior-driven development practices can greatly improve software quality and productivity, but like any big change, it will have opponents along with supporters. I’ve met resistance from all roles: testers, developers, product owners, and managers. And some people can be stubborn. As with any proposal, the best way to win support is not just to tell the benefits but to demonstrate them. Below are five major ways to demonstrate the benefits of BDD.

Make it a Refinement, not an Overhaul

I remember talking with a scrum master one time about challenges his team faced with testing and automation. The user stories his team wrote were a mess: they may or may not have had acceptance criteria, and the product owner would often ask for features to be scrapped or redone after a sprint or two. The team basically gave up on automated testing due to feature flux. Naturally, I proposed BDD to him, suggesting that it could help drive better features through formalization. However, this scrum master balked at the idea: “My team is stretched so thin right now, there’s no way we can overhaul our process right now.”

Clearly, the team had a serious problem, but they weren’t willing to try any solution deemed too “big.” The scrum master’s perception was that BDD would be a disruptive change that would hurt them more than help them. In cases like this, it is best to present BDD as a refinement of Agile, and not an overhaul of it. Agile says user stories should have acceptance criteria; BDD says acceptance criteria should be formalized. Agile says that the definition of done should include test automation; BDD says automation is a natural extension of the acceptance criteria. There’s nothing in Agile that BDD undoes, and there are shortcomings in Agile that BDD solves.

Write Good Gherkin

There is a big difference between Gherkin and good Gherkin. Anyone can add BDD buzzwords to existing test procedures, but effective BDD needs a paradigm shift. Unfortunately, bad Gherkin can ruin many of the benefits BDD can bring. For example, imperative steps will frustrate product owners, and mixed point-of-view will confuse testers. Nothing will ever be truly perfect, but it is important to strive for good Gherkin from the start, especially when the first behavior scenarios will often be used as examples for future scenarios.

Start the Automation Snowball

BDD and automation go together like peas and carrots. Not only can test automation shift left (since Gherkin scenarios are both acceptance criteria and tests), but steps can be implemented once and reused by any scenarios. When the first BDD scenarios are written, obviously all steps are new steps. As sprints pass, though, many common steps will likely be reused. I’ve even written new scenarios without adding any new steps!

Test automation is often the last thing to be done for a story, if it’s even reached at all. The inherent step reusability helps BDD automation get done sooner. It may take a while to build up useful, reusable steps in the code base, but they will cause an “automation snowball” once they are there. Imagine telling your team that the test automation is already done once a scenario is written in Gherkin!

Take Baby Steps

Rome wasn’t built in a day, and neither will a mature BDD process be. People take time to adjust to new paradigms. Start out slow, and do it right. Train the team how to write good Gherkin. Try a few stories one sprint, rather than taking on the whole backlog. For a product-owner-led approach, start with Gherkinizing acceptance criteria for a sprint or two before attempting any automation. Alternatively, for a test-led approach, work on the automation framework first, and then start to shift the scenario writing left to the developers and then to the product owners once the snowball gets bigger.

It’s okay if things aren’t perfect at first. Learn the lessons and iterate for improvement. Take baby steps!

Highlight how Everyone Wins

BDD is truly a win/win for everyone. It’s not a way to shuffle responsibilities or push around busywork, it’s a way to make a team more interdependent upon each other. Each role in the Three Amigos is empowered to do the right things, with support from each other in lock-step. Consider how BDD process changes help each role work together better:

Role New Responsibility Interdependent Benefit
Product Owner Learn to express requirements in a more formalized, slightly techy way Better assurance that features will be what they actually want, be working correctly, and be protected against future regressions
Developer Contribute more to grooming and test planning Less likely to develop the wrong thing or to be “held up” by testing
Tester Build and learn a new automation framework Automation will snowball, allowing them to meet sprint commitments and focus extra time on exploratory testing
Everyone Another meeting or two Better communication and fewer problems

 

Nobody on an Agile team can rightly say, “BDD isn’t useful to me.” Software quality is everyone’s responsibility, and BDD is a great way to improve it.

A Cucumber Recipe

Scenario: Dinnertime
Given I have cucumbers
When I prepare my favorite cucumber dish
Then everyone at the dinner table is happy

We all know “Cucumber” is the buzzword for BDD, but the vast majority of the population knows it only as a vegetable. My wife and I prepare a special Chinese cucumber dish about once a week that’s fresh, tasty, and nutritious. It is very easy to prepare, since it requires no cooking. It may be served chilled or at room temperature, which makes it great for warm weather. As a change of pace for my blog, I’d like to share my cucumber recipe. C’mon, a panda’s gotta eat!

Ingredients

I strongly recommend against substitutions. This recipe is best with authentic ingredients. Specifically, do not substitute the cucumber variety – “regular” cucumbers won’t work. If you cannot find the required ingredients at a regular grocery store, Asian supermarkets will carry them.

52

These are Persian cucumbers. They are about 6 inches long and have a 1-1.5 in diameter.

53

These are all the ingredients that you need!

Instructions

Prep time: 15 minutes, tops.

The best way to prepare this dish is to use a Chinese cleaver (also called the “Chinese chef knife”). After washing the cucumbers, chop off the ends. Then, one at a time, smash the cucumber with the flat side of the knife. Literally swing your arm as if hammering a nail. The weight of the blade will smash the cucumber into about four strips lengthwise. Beware that seeds and guts may fly out. Then, chop the pieces into one-inch segments. (If you don’t have a Chinese cleaver, you can smash the cucumbers with a rolling pin. Other knives simply don’t have the weight to break up the cucumbers effectively.)

54

A cucumber, smashed and chopped with the Chinese cleaver.

Combine the cucumber pieces and all of the other ingredients (except the sesame seeds) into a flat-bottomed dish or wide bowl. Stir everything together and let rest. A flat bottom allows the cucumbers to soak up the sauce; regular bowls will leave the cucumbers on top somewhat flavorless. Top with sesame seeds for garnish. This dish may be served after a few minutes of resting, or it may be chilled and served later. (Please note that the texture of the cucumbers is not as good after more than a day in the refrigerator.)

55

The finished dish, ready to eat! Sesame seeds and garlic powder are sprinkled on top. Also, notice how the glass container spreads the cucumbers out so that they can soak in the sauce.

Fun Facts

 

Bon appetit!

Cucumber-JVM Global Hook Workarounds

Almost all BDD automation frameworks have some sort of hooks that run before and after scenarios. However, not all frameworks have global hooks that run once at the beginning or end of a suite of scenarios – and Cucumber-JVM is one of these unlucky few. Cucumber-JVM GitHub Issue #515, which seeks to add @BeforeAll and @AfterAll hooks, has been open and active since 2013, but it looks unclear if the issue will ever be resolved. Thankfully, there are some workarounds to effect the same behavior as global hooks.

Workaround #1: Don’t Do It

From a purist’s perspective, each scenario (or test) should be completely independent, meaning it should not share parts with any other tests. Independence provides the following benefits:

  • Safety between tests
  • Consistency across tests
  • The ability to run any tests individually, in any order, or in parallel
  • More sensible, understandable tests

If not handled properly, global hooks can be dangerous because they make tests interdependent. Changes or failures in one test may cascade into others. Global test data would waste memory for tests that don’t use it. Furthermore, the fact that Issue #515 has been open for years indicates the difficulty of properly implementing global hooks.

However, the main cost of independence is runtime. Independent tests often repeat similar setup and cleanup routines. Even a few extra seconds per test can add up tremendously. Google Guava, for example, has over 286,000 tests – adding one second to each test would amount to nearly 80 hours! Performance becomes especially critical for continuous integration, in which wasted time means either delivery delays or coverage gaps. Certain operations like preparing a database or fetching authentication tokens may be pragmatic candidates for global hooks.

The best strategy is to use global hooks only when necessary for time-intensive setup that can be shared safely. Any shared test data should be immutable. Always question the need for global hooks. Most tests probably won’t need them.

Workaround #2: Static Variables

A basic hack for global hooks is actually provided in Issue #515. A static Boolean flag can indicate when the @Before hook has run more than once because it isn’t “reset” when a new scenario re-instantiates the step definition classes. The runtime shutdown hook will be called once all tests are done and the program exits. (Note that a static flag cannot be used in an @After hook due to the halting problem.) The example from the issue is shamelessly copied below:

public class GlobalHooks {
    private static boolean dunit = false;

    @Before
    public void beforeAll() {
        if(!dunit) {
            Runtime.getRuntime().addShutdownHook(afterAllThread);
            // do the beforeAll stuff...
            dunit = true;
        }
    }
}

Workaround #3: Singleton Caching

The basic hack is useful for simple setup and cleanup routines, but it becomes inelegant when objects must be shared by scenarios. Rather than polluting the class with static members, a singleton can cache test data between scenarios, and global setup logic may be put into the singleton’s constructor. Furthermore, if the singleton uses lazy initialization, then @Before hooks may not be needed at all. A “lazy” singleton will not be instantiated until the first time its getInstance method is called, meaning it will be skipped if the scenarios do not need them. This is a huge advantage when selectively running scenarios by name, tag, or feature. (Please refer to the previous post, Static or Singleton, for a deeper explanation of the singleton pattern.)

Consider scenarios that must generate authentication tokens (like OAuth) for API testing. A singleton “token holder” could cache tokens for usernames, rather than doing the authorization dance for every scenario. The snippet below shows how such a singleton could be called within a @When step definition with no @Before method.

public class ExampleSteps {
    ...
    @When("^some API is called$")
    public void whenSomeApiIsCalled() {
        // Get the token from the singleton cache lazily
        String token = TokenHolder.getInstance().getToken("user", "pass");
        // Use the token to call some API (method not shown)
        callSomeApi(token);
    }
    ...
}

And the singleton class could be defined like this:

public class TokenHolder {
    private static volatile TokenHolder instance = null;
    private HashMap<String, String> tokens;

    private TokenHolder() {
        tokens = new HashMap<String, String>();
    }

    public static TokenHolder getInstance() {
        // Lazy and thread-safe
        if (instance == null) {
            synchronized(TokenHolder.class) {
                if (instance == null) {
                    instance = new TokenHolder();
                }
            }
        }

        return instance;
    }
    
    public String getToken(String username, String password) {
        // This check could be extended to handle token expiration
        if (!tokens.containsKey(username)) {
            // Request a fresh authentication token (method not shown)
            String token = requestToken(username, password);
            // Cache the token for later
            tokens.put(username, token);
        }
        
        return tokens.get(username);
    }
    
    ...
}

Workaround #4: JUnit Class Annotations

Another workaround mentioned in Issue #515 and elsewhere is to use JUnit‘s @BeforeClass and @AfterClass annotations in the runner class, like this:

@RunWith(Cucumber.class)
@Cucumber.Options(format = {
    "html:target/cucumber-html-report",
    "json-pretty:target/cucumber-json-report.json"})
public class RunCukesTest {

    @BeforeClass
    public static void setup() {
        System.out.println("Ran the before");
    }

    @AfterClass
    public static void teardown() {
        System.out.println("Ran the after");
    }
}

While @BeforeClass and @AfterClass may look like the cleanest solution at first, they are not very practical to use. They work only when Cucumber-JVM is set to use the JUnit runner. Other runners, like TestNG, the command line runner, and special IDE runners, won’t pick up these hooks. Their methods must also be are static and would need static variables or singletons to share data anyway. Therefore, I personally discourage using these annotations in Cucumber-JVM.

What About Dependency Injection?

Dependency injection is a marvelous technique. As defined by Wikipedia:

In software engineering, dependency injection is a technique whereby one object supplies the dependencies of another object. A dependency is an object that can be used (a service). An injection is the passing of a dependency to a dependent object (a client) that would use it. The service is made part of the client’s state. Passing the service to the client, rather than allowing a client to build or find the service, is the fundamental requirement of the pattern.

Dependency injection can be a powerful alternative to singletons because DI provides finer control over the scope of objects. However, Cucumber-JVM’s dependency injection cannot be applied with global hooks because dependency objects, like step definition objects, are constructed and destroyed for each scenario.

Comparison Table

Ultimately, the best approach for global hooks in Cucumber-JVM is the one that best fits the tests’ needs. Below is a table to make workaround comparisons easier.

Workaround Pros Cons
Don’t Do It Scenarios are completely independent. No complicated or risky workarounds. Repeated setup and cleanup procedures may add significant execution time.
Static Variables Simple yet effective implementation. May need many static variables to share test data.
Singleton Caching Abstracts test data and setup procedures. Easily handles lazy initialization and evaluation. May not need a @Before hook. More complicated design.
JUnit Class Annotations Clean look for basic setup and cleanup routines. May be used only with the JUnit runner. Requires static variables or singletons to share test data anyway.

The Behavior-Driven Three Amigos

Recently, my manager said to me, “Andy, your BDD documentation looks great, but could you please mention The Three Amigos?” A brief flash of panic came over me – I had never heard of “The Three Amigos” before. My immediate thought was the 1986 film of the same name or the Disney film The Three Caballeros. After a little research, I knew exactly who they were; I just didn’t know they had a name.

Who are “The Three Amigos”?

The Three Amigos” refers to a meeting of the minds of the three primary roles involved in producing software:

  1. Business – Often named the “business analyst” (BA) or “product owner” (PO), the business role provides what problem must be solved. They provide requirements for the solution. Typically, the business role is non-technical.
  2. Development – The developer role provides how the solution to the problem will be implemented. They build the software and must be very technical.
  3. Testing – The testing role, sometimes named “quality assurance” (QA), verifies that the delivered software product works correctly. They also try to find defects. The tester role must be somewhat technical.

During software development, The Three Amigos should meet regularly to discuss how the product will be developed. It is a shift left practice to avoid misunderstandings (like a game of telephone), thus improving quality and avoiding missed deadlines. The discussions should include only the individuals who will actually work on the specific deliverable, not the whole team.

While The Three Amigos seems most popular in Agile, it can be applied to any software development process. Some (here and here) advocate regularly scheduled formal meetings. Others (here and here) interpret it as an attitude instead of a process, in which the roles continuously collaborate. Regardless of implementation, The Three Amigos need to touch base before development begins.

Applying BDD

The Three Amigos fits perfectly into behavior-driven development, especially as part of BDD with Agile. Behavior scenarios are meant to foster collaboration between technical and non-technical roles because they are simple, high-level, and written in plain language. Given-When-Then provides a common format for discussion.

Ideally, when The Three Amigos meet during grooming and planning, they would formalize acceptance criteria as Gherkin features. Those feature files are then used directly by the developer for direction and the tester for automation. They act like a receipt of purchase for the business role – the feature file says, “This is what you ordered.”

A great technique for Three Amigos collaboration is Example Mapping – it efficiently identifies rules for acceptance criteria, behavior examples, and open questions. Examples can easily be turned into Gherkin scenarios either during or after the meeting.

Since BDD relies on feature files as artifacts, The Behavior-Driven Three Amigos must be more than just an attitude. The point of the collaboration is to produce feature files early for process efficiency. Less formal meetings could quickly devolve into all-talk-no-action.

Don’t Presume Anything

Don’t presume that the three roles will naturally collaborate on their own. I’ve seen teams in which the testers don’t participate in planning. I’ve also seen organizations in which automation engineers don’t help to write the test cases that they need to automate! Developers often abdicate responsibility for testing considerations, because “that’s QA’s job.” And, specifically for BDD, I’ve noticed that product owners resist writing acceptance criteria in Gherkin because they think it is too technical and beyond their role.

The Three Amigos exists as a named practice because collaboration between roles does not always happen. It is an accountability measure. Remember, the ultimate purpose for The Three Amigos is higher quality in both the product and the process. Nobody wants more meetings on their calendar, but everyone can agree that quality is necessary.

12 Awesome Benefits of BDD

What can BDD do for you? Why adopt a new process with a new framework? Because it’s worth it! The main benefits of BDD are better collaboration and automation. This article expands those two into a dozen awesome benefits. (If you read the BDD 101 series, then these points should look familiar.)

#1: Inclusion

BDD is meant to be collaborative. Everyone from the customer to the tester should be able to easily engage in product development. And anyone can write behavior scenarios because they are written in plain language. Scenarios are:

  • Requirements for product owners
  • Acceptance criteria for developers
  • Test cases for testers
  • Scripts for automators
  • Description for other stakeholders

Essentially, BDD is an enhancement of The Three Amigos.

#2: Clarity

Scenarios focus on the expected behaviors of the product. Each scenario focuses on one specific thing. Behaviors are described in plain language, and any ambiguity can be clarified with a simple conversation or Example Mapping. There’s no unreadable code or obscure technical jargon, and there’s no game of telephone. Clarity ensures the customer gets what the customer wants.

#3: Streamlining

BDD is designed to speed up the development process. Everyone involved in development relies upon the same scenarios. Scenarios are requirements, acceptance criteria, test cases, and test scripts all in one – there is no need to write any other artifact. The modular nature of Gherkin syntax expedites test automation development. Furthermore, scenarios can be used as steps to reproduce failures for defect reports.

#4: Shift Left

Shift left” is a buzzword for testing early in the development process. Testing earlier means fewer bugs later. In BDD, test case definition inherently becomes part of the requirements phase (for waterfall) or grooming (for Agile). As soon as behavior scenarios are written, testing and automation can theoretically begin.

#5: Artifacts

Scenarios form a collection of self-documenting test cases as a result of the BDD process. This ever-growing collection forms a perfect regression test suite. Scenarios can be run manually or with automation. Any tests not automated can be added to a backlog to automate in the future.

#6: Automation

BDD frameworks make it easy to turn scenarios into automated tests. The steps are already given by the scenarios – the automation engineer simply needs to write a method/function to perform each step’s operations.

#7: Test-Driven

BDD is an evolution of TDD. Writing scenarios from the beginning enforces quality-first and test-first mindsets. BDD automation can run scenarios to fail until the feature is implemented and causes tests to pass.

#8: Code Reuse

Given-When-Then steps can be reused between scenarios. The underlying implementation for each step does not change. Automation code becomes very modular.

#9: Parameterization

Scenario steps can be parameterized to be even more reusable. For example, a step to click a button can take in its ID. Parameterization can help a team adopt a common, reusable set of steps, and it inspires healthier discussion when writing scenarios.

#10: Variation

Scenario outlines make it easy to run the same scenario with different combinations of inputs. This is a simple but powerful way to expand test coverage without code duplication, which is the bane of test automation.

#11: Momentum

BDD has a snowball effect: scenarios become easier and faster to write and automate as more step definitions are added. Scenarios typically share common steps. Sometimes, new scenarios need nothing more than different step parameters or just one new line.

#12: Adaptability

BDD scenarios are easy to update as the product changes. Plain language is easy to edit. Modular design makes changes to automation code safer. Scenarios can also be filtered by tag name to decide what runs and what doesn’t.

BDD 101: Frameworks

Every major programming language has a BDD automation framework. Some even have multiple choices. Building upon the structural basics from the previous post, this post provides a survey of the major frameworks available today. Since I cannot possibly cover every BDD framework in depth in this 101 series, my goal is to empower you, the reader, to pick the best framework for your needs. Each framework has support documentation online justifying its unique goodness and detailing how to use it, and I would prefer not to duplicate documentation. Use this post primarily as a reference. (Check the Automation Panda BDD page for the full table of contents.)

Major Frameworks

Most BDD frameworks are Cucumber versions, JBehave derivatives inspired by Dan North, or non-Gherkin spec runners. Some put behavior scenarios into separate files, while others put them directly into the source code.

C# and Microsoft .NET

SpecFlow, created by Gáspár Nagy, is arguably the most popular BDD framework for Microsoft .NET languages. Its tagline is “Cucumber for .NET” – thus fully compliant with Gherkin. SpecFlow also has polished, well-designed hookscontext injection, and parallel execution (especially with test thread affinity). The basic package is free and open source, but SpecFlow also sells licenses for SpecFlow+ extensions. The free version requires a unit test runner like MsTest, NUnit, or xUnit.net in order to run scenarios. This makes SpecFlow flexible but also feels jury-rigged and inelegant. The licensed version provides a slick runner named SpecFlow+ Runner (which is BDD-friendly) and a Microsoft Excel integration tool named SpecFlow+ Excel. Microsoft Visual Studio has extensions for SpecFlow to make development easier.

There are plenty of other BDD frameworks for C# and .NET, too. xBehave.net is an alternative that pairs nicely with xUnit.net. A major difference of xBehave.net is that scenario steps are written directly in the code, instead of in separate text (feature) files. LightBDD bills itself as being more lightweight than other frameworks and basically does some tricks with partial classes to make the code more readable. NSpec is similar to RSpec and Mocha and uses lambda expressions heavily. Concordion offers some interesting ways to write specs, too. NBehave is a JBehave descendant, but the project appears to be dead without any updates since 2014.

Java and JVM Languages

The main Java rivalry is between Cucumber-JVM and JBehave. Cucumber-JVM is the official Cucumber version for Java and other JVM languages (Groovy, Scala, Clojure, etc.). It is fully compliant with Gherkin and generates beautiful reports. The Cucumber-JVM driver can be customized, as well. JBehave is one of the first and foremost BDD frameworks available. It was originally developed by Dan North, the “father of BDD.” However, JBehave is missing key Gherkin features like backgrounds, doc strings, and tags. It was also a pure-Java implementation before Cucumber-JVM existed. Both frameworks are widely used, have plugins for major IDEs, and distribute Maven packages. This popular but older article compares the two in slight favor of JBehave, but I think Cucumber-JVM is better, given its features and support.

The Automation panda article Cucumber-JVM for Java is a thorough guide for the Cucumber-JVM framework.

Java also has a number of other BDD frameworks. JGiven uses a fluent API to spell out scenarios, and pretty HTML reports print the scenarios with the results. It is fairly clean and concise. Spock and JDave are spec frameworks, but JDave has been inactive for years. Scalatest for Scala also has spec-oriented features. Concordion also provides a Java implementation.

JavaScript

Almost all JavaScript BDD frameworks run on Node.js. Jasmine and Mocha are two of the most popular general-purpose JS test frameworks. They differ in that Jasmine has many features included (like assertions and spies) that Mocha does not. This makes Jasmine easier to get started (good for beginners) but makes Mocha more customizable (good for power users). Both claim to be behavior-driven because they structure tests using “describe” and “it-should” phrases in the code, but they do not have the advantage of separate, reusable steps like Gherkin. Personally, I consider Jasmine and Mocha to be behavior-inspired but not fully behavior-driven.

Other BDD frameworks are more true to form. Cucumber provides Cucumber.js for Gherkin-compliant happiness. Yadda is Gherkin-like but with a more flexible syntax. Vows provides a different way to approach behavior using more formalized phrase partitions for a unique form of reusability. The Cucumber blog argues that Cucumber.js is best due to its focus on good communication through plain language steps, whereas other JavaScript BDD frameworks are more code-y. (Keep in mind, though, that Cucumber would naturally boast of its own framework.) Other comparisons are posted here, here, here, and here.

PHP

The two major BDD frameworks for PHP are Behat and Codeception. Behat is the official Cucumber version for PHP, and as such is seen as the more “pure” BDD framework. Codeception is more programmer-focused and can handle other styles of testing. There are plenty of articles comparing the two – here, here, and here (although the last one seems out of date). Both seem like good choices, but Codeception seems more flexible.

Python

Python has a plethora of test frameworks, and many are BDD. behave and lettuce are probably the two most popular players. Feature comparison is analogous to Cucumber-JVM versus JBehave, respectively: behave is practically Gherkin compliant, while lettuce lacks a few language elements. Both have plugins for major IDEs. pytest-bdd is on the rise because it integrates with all the wonderful features of pytestradish is another framework that extends the Gherkin language to include scenario loops, scenario preconditions, and variables. All these frameworks put scenarios into separate feature files. They all also implement step definitions as functions instead of classes, which not only makes steps feel simpler and more independent, but also avoids unnecessary object construction.

Other Python frameworks exist as well. pyspecs is a spec-oriented framework. Freshen was a BDD plugin for Nose, but both Freshen and Nose are discontinued projects.

Ruby

Cucumber, the gold standard for BDD frameworks, was first implemented in Ruby. Cucumber maintains the official Gherkin language standard, and all Cucumber versions are inspired by the original Ruby version. Spinach bills itself as an enhancement to Cucumber by encapsulating steps better. RSpec is a spec-oriented framework that does not use Gherkin.

Which One is Best?

There is no right answer – the best BDD framework is the one that best fits your needs. However, there are a few points to consider when weighing your options:

  • What programming language should I use for test automation?
  • Is it a popular framework that many others use?
  • Is the framework actively supported?
  • Is the spec language compliant with Gherkin?
  • What type of testing will you do with the framework?
  • What are the limitations as compared to other frameworks?

Frameworks that separate scenario text from implementation code are best for shift-left testing. Frameworks that put scenario text directly into the source code are better for white box testing, but they may look confusing to less experienced programmers.

Personally, my favorites are SpecFlow and pytest-bdd. At LexisNexis, I used SpecFlow and Cucumber-JVM. For Python, I used behave at MaxPoint, but I have since fallen in love with pytest-bdd since it piggybacks on the wonderfulness of pytest. (I can’t wait for this open ticket to add pytest-bdd support in PyCharm.) For skill transferability, I recommend Gherkin compliance, as well.

Reference Table

The table below categorizes BDD frameworks by language and type for quick reference. It also includes frameworks in languages not described above. Recommended frameworks are denoted with an asterisk (*). Inactive projects are denoted with an X (x).

Language Framework Type
C Catch In-line Spec
C++ Igloo In-line Spec
C# and .NET Concordion
LightBDD
NBehave x
NSpec
SpecFlow *
xBehave.net
In-line Spec
In-line Gherkin
Separated semi-Gherkin
In-line Spec
Separated Gherkin
In-line Gherkin
Golang Ginkgo In-line Spec
Java and JVM Cucumber-JVM *
JBehave
JDave x
JGiven *
Scalatest
Spock
Separated Gherkin
Separated semi-Gherkin
In-line Spec
In-line Gherkin
In-line Spec
In-line Spec
JavaScript Cucumber.js *
Yadda
Jasmine
Mocha
Vows
Separated Gherkin
Separated semi-Gherkin
In-line Spec
In-line Spec
In-line Spec
Perl Test::BDD::Cucumber Separated Gherkin
PHP Behat
Codeception *
Separated Gherkin
Separated or In-line
Python behave *
freshen x
lettuce
pyspecs
pytest-bdd *
radish
Separated Gherkin
Separated Gherkin
Separated semi-Gherkin
In-line Spec
Separated semi-Gherkin
Separated Gherkin-plus
Ruby Cucumber *
RSpec
Spinach
Separated Gherkin
In-line Spec
Separated Gherkin
Swift / Objective C Quick In-line Spec

 

[4/22/2018] Update: I updated info for C# and Python frameworks.

BDD 101: Automation

Better automation is one of BDD’s hallmark benefits. In fact, the main goal of BDD could be summarized as rapidly turning conceptualized behavior into automatically tested behavior. While the process and the Gherkin are universal, the underlying automation could be built using one of many frameworks.

This post explains how BDD automation frameworks work. It focuses on the general structure of the typical framework – it is not a tutorial on how to use any specific framework. However, I wrote short examples for each piece using Python’s behave framework, since learning is easier with examples. I chose to use Python here simply for its conciseness. (Check the Automation Panda BDD page for the full table of contents.)

Framework Parts

Every BDD automation framework has five major pieces:

#1: Feature Files

Gherkin feature files are very much part of the automation. They act like test scripts – each scenario is essentially a test case. Previous posts covered Gherkin in depth.

Here is an example feature file named google_search.feature:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  # This scenario should look familiar
  @automated @google-search @panda
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

#2: Step Definitions

step definition is a code block that implements the logic to execute a step. It is typically a method or function with the English-y step phrase as an annotation. Step definitions can take in arguments, doc strings, and step tables. They may also make assertions to pass or fail a scenario. In most frameworks, data can be passed between steps using some sort of context object. When a scenario is executed, the driver matches each scenario step phrase to its step definition. (Most frameworks use regular expressions for phrase matching.) Thus, every step in a feature file needs a step definition.

The step definitions would be written in a Python source file like this:

from behave import *

@given('a web browser is on the Google page')
def step_impl(context):
  context.google_page.load();

@when('the search phrase "{phrase}" is entered')
def step_impl(context, phrase):
  context.google_page.search(phrase)

@then('the results for "{phrase}" are shown')
def step_impl(context, phrase):
  assert context.google_page.has_results(phrase)

#3: Hooks

Certain automation logic cannot be handled by step definitions. For example, scenarios may need special setup and cleanup operations. Most BDD frameworks provide hooks that can insert calls before or after Gherkin sections, typically filterable using tags. Hooks are similar in concept to aspect-oriented programming.

In behave, hooks are written in a Python source file named environment.py:

import page_objects
from selenium import webdriver

def before_all(context):
  context.browser = webdriver.Chrome()

def before_scenario(context):
  context.google_page = page_objects.GooglePage(context.browser)

def after_all(context):
  context.browser.quit()

#4: Support Code

Support code (a.k.a libraries or packages) refers to any code called by step definitions and hooks. Support code could be dependency packages downloaded using managers like Maven (Java), NuGet (.NET), or PyPI (Python). For example, Selenium WebDriver is a well-known package for web browser automation. Support code could also be components to assist automation, such as page objects or other design patterns. As the cliché goes, “Don’t reinvent the wheel.” Step definitions and hooks should not contain all of the logic for running the actions – they should reuse common code as much as possible.

A Python page object class from the page_objects.py module could look like this:

class GooglePage(object):
  """A page object for the Google home page"""
  
  def __init__(self, browser):
    self.browser = browser
  
  def load():
    # put code here
    pass
  
  def search(phrase):
    # put code here
    pass
  
  def has_results(phrase):
    # put code here
    return False

#5: Driver

Every automation framework has a driver that runs tests, and BDD frameworks are no different. The driver executes each scenario in a feature file independently. Whenever a failure happens, the driver reports the failure and aborts the scenario. Drivers typically have discovery mechanisms for selecting scenarios to run based on tag names or file paths.

The behave driver can be launched from the command line like this:

> behave google_search.py --tags @panda

Automation Advantages

Even if a team does not apply behavior-driven practices to its full development process, BDD test frameworks still have some significant advantages over non-BDD test frameworks. First of all, steps make BDD automation very modular and thus reusable. Each step is an independent action, much like how each scenario is an independent behavior. Once a step definition is written, it may be reused by any number of scenarios. This is crucial, since most behaviors for a feature share common actions. And all steps are inherently self-documenting, since they are written in plain language. There is a natural connection between high-level behavior and low-level implementation.

Test execution also has advantages. Tags make it very easy to select tests to run, especially from the command line. Failures are very informative as well. The driver pinpoints precisely which step failed for which scenario. And since behaviors are isolated, a failure for one scenario is less likely to affect other test scenarios than would be the case for procedure-driven tests.

All of this is explained more thoroughly in the Automation Panda article, ‑‑BDD; Automation without Collaboration.

What About Test Data?

Test data is a huge concern for any automation framework. Simple test data values may be supplied directly in Gherkin as step arguments or table values, but larger test data sets require other strategies. Support code can be used to handle test data. Read BDD 101: Test Data for more information.

Available Frameworks

There are many BDD frameworks out there. The next post will introduce a few major frameworks for popular languages.

BDD 101: Behavior-Driven Agile

Previous posts in this 101 series have focused heavily upon Gherkin. They may have given the impression that Gherkin is merely a testing language, and that BDD is a test framework. Wrong: BDD is a full development process! Its practices complement Agile software development by bringing clearer communication and shift left testing. As such, BDD is a refinement, not an overhaul, of the Agile process. This post explains how to add behavior-driven practices to the Agile process. (Check the Automation Panda BDD page for the full table of contents.)

Common Agile Problems

User stories can sometimes seem like a game of telephone: the product owner says one thing, the developer makes another thing, and the tester writes a bad test. When the test fails, the tester goes back to the developer for clarification, who in turn goes back to the product owner. Hopefully, the misunderstanding is corrected before demo day, but time is nevertheless lost and resources are burned. Acceptance criteria for a user story should clarify how things should be, but often they are poorly stated or entirely missing.

Another Agile problem, especially in Scrum, is incomplete testing. Development work is often treated like a pipeline: design -> implement -> review -> test -> automate -> DONE. And stories have deadlines. When coding runs late, testers may not get testing done, let alone test automation. Add a game of telephone, and in-sprint testing can become perpetually impeded.

BDD to the Rescue

BDD solves both of these Agile problems beautifully through process efficiency. Let me break this down from a behavior-oriented perspective:

  • Acceptance criteria specify feature behavior.
  • Test cases validate feature behavior.
  • Gherkin feature files document feature behavior.

Therefore, when written in Gherkin, acceptance criteria are test cases! The Gherkin feature file is the formal specification of both the acceptance criteria and the test cases for a user story. One artifact covers both things!

The Behavior-Driven Three Amigos

The Three Amigos” refers to the three primary roles engaged in producing software: business, development, and testing. Each role brings its own perspective to the product, and good software results when all can collaborate well. A common Agile practice is to hold meetings with the Three Amigos as part of grooming or planning.

The BDD process is an enhanced implementation of The Three Amigos. All stakeholders can participate in behavior-driven development because Gherkin is like plain English. Feature files mean different things to different people:

  • They are requirements for product owners.
  • They are acceptance criteria for developers.
  • They are test cases for testers.
  • They are scripts for automators.
  • They are descriptions for other stakeholders.

Thus, BDD fosters healthy collaboration because feature files are pertinent to all stakeholders (or “amigos”). Features files are like receipts – they’re a “proof of purchase” for the team. They document precisely what will be delivered.

Behavior-Driven Sprints

To see why this is a big deal, see what happens in a behavior-driven sprint:

  1. Feature files begin at grooming. As the team prepares user stories in the backlog, acceptance criteria are written in Gherkin. Since Gherkin is easy to read, even non-technical people (namely product owners) can contribute when The Three Amigos meet. Techniques like Example Mapping can make this collaboration easy and efficient.
  2. During planning, all stakeholders get a good understanding of how a feature should behave. Better conversations can happen. Clarifications can be written directly into feature files.
  3. When the sprint starts, feature files are already written. Developers know what must be developed. Testers know what must be tested. There’s no ambiguity.
  4. Test automation can begin immediately because the scenario steps are already written. Some step definitions may already exist, too. New step definitions are typically understandable enough to implement even before the developer commits the product code.
  5. Manual testers know from the start which tests will be automated and which must be run manually. This enables them to make better test plans. It also frees them to do more exploratory testing, which is better for finding bugs.

Overall, Gherkinized acceptance criteria streamline development and improve quality. The team is empowered to shift left. On-time story completion and in-sprint automation become the norm.

(Arguably, these benefits would happen in Kanban as well as in Scrum.)

New Rules

In order to reap the benefits of BDD, the Agile process needs a few new rules. First, formalize all acceptance criteria as Gherkin feature files. In retrospect, it should seem odd that the user story itself is formalized (“As a ___, I want ___, so that ___”) if the acceptance criteria is not (“Given-When-Then”). Writing feature files adds more work to grooming, but it enables the collaboration and shift left testing.

Second, never commit to completing a user story that doesn’t have Gherkinized acceptance criteria. Don’t become sloppy out of expediency. Use the planning meeting as an accountability measure.

Third, include test automation in the definition of done. Stories should not be accepted without their tests completed and automated. Automation in the present guarantees regression coverage in the future, which in turn allows teams to respond to change safely and quickly.

Advanced Topics

Check out these other articles for further learning:

More Agility through Automation

BDD truly improves the Agile process by fixing its shortcomings. The next step is to learn BDD automation, which will be covered in the next post. Until then, I’ll leave this gem here:

1iqmat

BDD 101: Writing Good Gherkin

So, you and your team have decided to make test automation a priority. You plan to use behavior-driven development to shift left with testing. You read the BDD 101 Series up through the previous post. You picked a good language for test automation. You even peeked at Cucumber-JVM or another BDD framework on your own. That’s great! Big steps! And now, you are ready to write your first Gherkin feature file.  You fire open Atom with a Gherkin plugin or Notepad++ with a Gherkin UDL, you type “Given” on the first line, and…

Writer’s block.  How am I supposed to write my Gherkin steps?

Good Gherkin feature files are not easy to write at first. Writing is definitely an art. With some basic pointers, and a bit of practice, Gherkin becomes easier. This post will cover how to write top-notch feature files. (Check the Automation Panda BDD page for the full table of contents.)

The Golden Gherkin Rule: Treat other readers as you would want to be treated. Write Gherkin so that people who don’t know the feature will understand it.

Proper Behavior

The biggest mistake BDD beginners make is writing Gherkin without a behavior-driven mindset. They often write feature files as if they are writing “traditional” procedure-driven functional tests: step-by-step instructions with actions and expected results. HP ALM, qTest, AccelaTest, and many other test repository tools store tests in this format. These procedure-driven tests are often imperative and trace a path through the system that covers multiple behaviors. As a result, they may be unnecessarily long, which can delay failure investigation, increase maintenance costs, and create confusion.

For example, let’s consider a test that searches for images of pandas on Google. Below would be a reasonable test procedure:

  1. Open a web browser.
    1. Web browser opens successfully.
  2. Navigate to https://www.google.com/.
    1. The web page loads successfully and the Google image is visible.
  3. Enter “panda” in the search bar.
    1. Links related to “panda” are shown on the results page.
  4. Click on the “Images” link at the top of the results page.
    1. Images related to “panda” are shown on the results page.

I’ve seen many newbies translate a test like this into Gherkin like the following:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Google Image search shows pictures
    Given the user opens a web browser
    And the user navigates to "https://www.google.com/"
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

This scenario is terribly wrong. All that happened was that the author put BDD buzzwords in front of each step of the traditional test. This is not behavior-driven, it is still procedure-driven.

The first two steps are purely setup: they just go to Google, and they are strongly imperative. Since they don’t focus on the desired behavior, they can be reduced to one declarative step: “Given a web browser is at the Google home page.” This new step is friendlier to read.

After the Given step, there are two When-Then pairs. This is syntactically incorrect: Given-When-Then steps must appear in order and cannot repeat. A Given may not follow a When or Then, and a When may not follow a Then. The reason is simple: any single When-Then pair denotes an individual behavior. This makes it easy to see how, in the test above, there are actually two behaviors covered: (1) searching from the search bar, and (2) performing an image search. In Gherkin, one scenario covers one behavior. Thus, there should be two scenarios instead of one. Any time you want to write more than one When-Then pair, write separate scenarios instead. (Note: Some BDD frameworks may allow disordered steps, but it would nevertheless be anti-behavioral.)

This splitting technique also reveals unnecessary behavior coverage. For instance, the first behavior to search from the search bar may be covered in another feature file. I once saw a scenario with about 30 When-Then pairs, and many were duplicate behaviors.

Do not be tempted to arbitrarily reassign step types to make scenarios follow strict Given-When-Then ordering. Respect the integrity of the step types: Givens set up initial state, Whens perform an action, and Thens verify outcomes. In the example above, the first Then step could have been turned into a When step, but that would be incorrect because it makes an assertion. Step types are meant to be guide rails for writing good behavior scenarios.

The correct feature file would look something like this:

Feature: Google Searching

  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  Scenario: Image search
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

The second behavior arguably needs the first behavior to run first because the second needs to start at the search result page. However, since that is merely setup for the behavior of image searching and is not part of it, the Given step in the second scenario can basically declare (declaratively) that the “panda” search must already be done. Of course, this means that the “panda” search would be run redundantly at test time, but the separation of scenarios guarantees behavior-level independence.

The Cardinal Rule of BDD: One Scenario, One Behavior!

Remember, behavior scenarios are more than tests – they also represent requirements and acceptance criteria. Good Gherkin comes from good behavior.

(For deeper information about the Cardinal Rule of BDD and multiple When-Then pairs per scenario, please refer to my article, Are Gherkin Scenarios with Multiple When-Then Pairs Okay?)

Phrasing Steps

How you write a step matters. If you write a step poorly, it cannot easily be reused. Thankfully, some basic rules maintain consistent phrasing and maximum reusability.

Write all steps in third-person point of view. If first-person and third-person steps mix, scenarios become confusing. I even dedicated a whole blog post entirely to this point: Should Gherkin Steps Use First-Person or Third-Person? TL;DR: just use third-person at all times.

Write steps as a subject-predicate action phrase. It may tempting to leave parts of speech out of a step line for brevity, especially when using Ands and Buts, but partial phrases make steps ambiguous and more likely to be reused improperly. For example, consider the following example:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Google search result page elements
    Given the user navigates to the Google home page
    When the user entered "panda" at the search bar
    Then the results page shows links related to "panda"
    And image links for "panda"
    And video links for "panda"

The final two And steps lack the subject-predicate phrase format. Are the links meant to be subjects, meaning that they perform some action? Or, are they meant to be direct objects, meaning that they receive some action? Are they meant to be on the results page or not? What if someone else wrote a scenario for a different page that also had image and video links – could they reuse these steps? Writing steps without a clear subject and predicate is not only poor English but poor communication.

Also, use appropriate tense and phrasing for each type of step. For simplicity, use present tense for all step types. Rather than take a time warp back to middle school English class, let’s illustrate tense with a bad example:

# BAD EXAMPLE! Do not copy.
Feature: Google Searching

  Scenario: Simple Google search
    Given the user navigates to the Google home page
    When the user entered "panda" at the search bar
    Then links related to "panda" will be shown on the results page

The Given step above uses present tense, but its subject is misleading. It indicates an action when it says, “Given the user navigates.” Actions imply the exercise of behavior. However, Given steps are meant to establish an initial state, not exercise a behavior. This may seem like a trivial nuance, but it can confuse feature file authors who may not be able to tell if a step is a Given or When. A better phrasing would be, “Given the Google home page is displayed.” It establishes a starting point for the scenario. Use present tense with an appropriate subject to indicate a state rather than an action.

The When step above uses past tense when it says, “The user entered.” This indicates that an action has already happened. However, When steps should indicate that an action is presently happening. Plus, past tense here conflicts with the tenses used in the other steps.

The Then step above uses future tense when it says, “The results will be shown.” Future tense seems practical for Then steps because it indicates what the result should be after the current action is taken. However, future tense reinforces a procedure-driven approach because it treats the scenario as a time sequence. A behavior, on the other hand, is a present-tense aspect of the product or feature. Thus, it is better to write Then steps in the present tense.

The corrected example looks like this:

Feature: Google Searching

  Scenario: Simple Google search
    Given the Google home page is displayed
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

And note, all steps are written in third-person. Read Should Gherkin Steps use Past, Present, or Future Tense? to learn more.

Good Titles

Good titles are just as important as good steps. The title is like the face of a scenario – it’s the first thing people read. It must communicate in one concise line what the behavior is. Titles are often logged by the automation framework as well. Specific pointers for writing good scenario titles are given in my article, Good Gherkin Scenario Titles.

Choices, Choices

Another common misconception for beginners is thinking that Gherkin has an “Or” step for conditional or combinatorial logic. People may presume that Gherkin has “Or” because it has “And”, or perhaps programmers want to treat Gherkin like a structured language. However, Gherkin does not have an “Or” step. When automated, every step is executed sequentially.

Below is a bad example based on a classic Super Mario video game, showing how people might want to use “Or”:

# BAD EXAMPLE! Do not copy.
Feature: SNES Mario Controls

  Scenario: Mario jumps
    Given a level is started
    When the player pushes the "A" button
    Or the player pushes the "B" button
    Then Mario jumps straight up

Clearly, the author’s intent is to say that Mario should jump when the player pushes either of two buttons. The author wants to cover multiple variations of the same behavior. In order to do this the right way, use Scenario Outline sections to cover multiple variations of the same behavior, as shown below:

Feature: SNES Mario Controls

  Scenario Outline: Mario jumps
    Given a level is started
    When the player pushes the "<letter>" button
    Then Mario jumps straight up
    
    Examples: Buttons
      | letter |
      | A      |
      | B      |

The Known Unknowns

Test data can be difficult to handle. Sometimes, it may be possible to seed data in the system and write tests to reference it, but other times, it may not. Google search is the prime example: the result list will change over time as both Google and the Internet change. To handle the known unknowns, write scenarios defensively so that changes in the underlying data do not cause test runs to fail. Furthermore, to be truly behavior-driven, think about data not as test data but as examples of behavior.

Consider the following example from the previous post:

Feature: Google Searching
  
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown
    And the following related results are shown
      | related       |
      | Panda Express |
      | giant panda   |
      | panda videos  |

This scenario uses a step table to explicitly name results that should appear for a search. The step with the table would be implemented to iterate over the table entries and verify each appeared in the result list. However, what if Panda Express were to go out of business and thus no longer be ranked as high in the results? (Let’s hope not.) The test run would then fail, not because the search feature is broken, but because a hard-coded variation became invalid. It would be better to write a step that more intelligently verified that each returned result somehow related to the search phrase, like this: “And links related to ‘panda’ are shown on the results page.” The step definition implementation could use regular expression parsing to verify the presence of “panda” in each result link.

Another nice feature of Gherkin is that step definitions can hide data in the automation when it doesn’t need to be exposed. Step definitions may also pass data to future steps in the automation. For example, consider another Google search scenario:

Feature: Google Searching

  Scenario: Search result linking
    Given Google search results for "panda" are shown
    When the user clicks the first result link
    Then the page for the chosen result link is displayed

Notice how the When step does not explicitly name the value of the result link – it simply says to click the first one. The value of the first link may change over time, but there will always be a first link. The Then step must know something about the chosen link in order to successfully verify the outcome, but it can simply reference it as “the chosen result link”. Behind the scenes, in the step definitions, the When step can store the value of the chosen link in a variable and pass the variable forward to the Then step.

Handling Test Data

Some types of test data should be handled directly within the Gherkin, but other types should not. Remember that BDD is specification by example – scenarios should be descriptive of the behaviors they cover, and any data written into the Gherkin should support that descriptive nature. Read Handling Test Data in BDD for comprehensive information on handling test data.

Less is More

Scenarios should be short and sweet. I typically recommend that scenarios should have a single-digit step count (<10). Long scenarios are hard to understand, and they are often indicative of poor practices. One such problem is writing imperative steps instead of declarative steps. I have touched on this topic before, but I want to thoroughly explain it here.

Imperative steps state the mechanics of how an action should happen. They are very procedure-driven. For example, consider the following When steps for entering a Google search:

  1. When the user scrolls the mouse to the search bar
  2. And the user clicks the search bar
  3. And the user types the letter “p”
  4. And the user types the letter “a”
  5. And the user types the letter “n”
  6. And the user types the letter “d”
  7. And the user types the letter “a”
  8. And the user types the ENTER key

Now, the granularity of actions may seem like overkill, but it illustrates the point that imperative steps focus very much on how actions are taken. Thus, they often need many steps to fully accomplish the intended behavior. Furthermore, the intended behavior is not always as self-documented as with declarative steps.

Declarative steps state what action should happen without providing all of the information for how it will happen. They are behavior-driven because they express action at a higher level. All of the imperative steps in the example above could be written in one line: “When the user enters ‘panda’ at the search bar.” The scrolling and keystroking is implied, and it will ultimately be handled by the automation in the step definition. When trying to reduce step count, ask yourself if your steps can be written more declaratively.

Another reason for lengthy scenarios is scenario outline abuse. Scenario outlines make it all too easy to add unnecessary rows and columns to their Examples tables. Unnecessary rows waste test execution time. Extra columns indicate complexity. Both should be avoided. Below are questions to ask yourself when facing an oversized scenario outline:

  • Does each row represent an equivalence class of variations?
    • For example, searching for “elephant” in addition to “panda” does not add much test value.
  • Does every combination of inputs need to be covered?
    • N columns with M inputs each generates MN possible combinations.
    • Consider making each input appear only once, regardless of combination.
  • Do any columns represent separate behaviors?
    • This may be true if columns are never referenced together in the same step.
    • If so, consider splitting apart the scenario outline by column.
  • Does the feature file reader need to explicitly know all of the data?
    • Consider hiding some of the data in step definitions.
    • Some data may be derivable from other data.

These questions are meant to be sanity checks, not hard-and-fast rules. The main point is that scenario outlines should focus on one behavior and use only the necessary variations.

Style and Structure

While style often takes a backseat during code review, it is a factor that differentiates good feature files from great feature files. In a truly behavior-driven team, non-technical stakeholders will rely upon feature files just as much as the engineers. Good writing style improves communication, and good communication skills are more than just resume fluff.

Below are a number of tidbits for good style and structure:

  1. Focus a feature on customer needs.
  2. Limit one feature per feature file. This makes it easy to find features.
  3. Limit the number of scenarios per feature. Nobody wants a thousand-line feature file. A good measure is a dozen scenarios per feature.
  4. Limit the number of steps per scenario to less than ten.
  5. Limit the character length of each step. Common limits are 80-120 characters.
  6. Use proper spelling.
  7. Use proper grammar.
  8. Capitalize Gherkin keywords.
  9. Capitalize the first word in titles.
  10. Do not capitalize words in the step phrases unless they are proper nouns.
  11. Do not use punctuation (specifically periods and commas) at the end of step phrases.
  12. Use single spaces between words.
  13. Indent the content beneath every section header.
  14. Separate features and scenarios by two blank lines.
  15. Separate examples tables by 1 blank line.
  16. Do not separate steps within a scenario by blank lines.
  17. Space table delimiter pipes (“|”) evenly.
  18. Adopt a standard set of tag names. Avoid duplicates.
  19. Write all tag names in lowercase, and use hyphens (“-“) to separate words.
  20. Limit the length of tag names.

Without these rules, you might end up with something like this:

# BAD EXAMPLE! Do not copy.

 Feature: Google Searching
     @AUTOMATE @Automated @automation @Sprint32GoogleSearchFeature
 Scenario outline: GOOGLE STUFF
Given a Web Browser is on the Google page,
 when The seach phrase "<phrase>" Enter,

 Then  "<phrase>" shown.
and The relatedd   results include "<related>".
Examples: animals
 | phrase | related |
| panda | Panda Express        |
| elephant    | elephant Man  |

Don’t do this. It looks horrible. Please, take pride in your profession. While the automation code may look hairy in parts, Gherkin files should look elegant.

Gherkinize Those Behaviors!

With these best practices, you can write Gherkin feature files like a pro. Don’t be afraid to try: nobody does things perfectly the first time. As a beginner, I broke many of the guidelines I put in this post, but I learned as I went. Don’t give up if you get stuck. Always remember the Golden Gherkin Rule and the Cardinal Rule of BDD!

This is the last of three posts in the series focused exclusively on Gherkin. The next post will address how to adopt behavior-driven practices into the Agile software development process.