BDD

BDD 101: Manual Testing

Behavior-driven development takes an automation-first philosophy: behavior specs should become automated tests. However, BDD can also accommodate manual testing. Manual testing has a place and a purpose, even in BDD. Remember, behavior scenarios are first and foremost behavior specifications, and they provide value beyond testing and automation. Any behavior scenario could be run as a manual test. The main questions, then, are (1) when is manual testing appropriate and (2) how should it be handled.

(Check the Automation Panda BDD page for the full BDD 101 table of contents.)

When is Manual Testing Appropriate?

Automation is not a silver bullet – it doesn’t satisfy all testing needs. Scenarios should be written for all behaviors, but they likely shouldn’t be automated under the following circumstances:

The return-on-investment to automate the scenarios is too low.
The scenarios won’t be included in regression or continuous integration.
The behaviors are temporary (ex: hotfixes).
The automation itself would be too complex or too fragile.
The nature of the feature is non-functional (ex: performance, UX, etc.).
The team is still learning BDD and is not yet ready to automate all scenarios.

Manual testing is also appropriate for exploratory testing, in which engineers rely upon experience rather than explicit test procedures to “explore” the product under test for bugs and quality concerns. It complements automation because both testing styles serve different purposes. However, behavior scenarios themselves are incompatible with exploratory testing. The point of exploring is for engineers to go “unscripted” – without formal test plans – to find problems only a user would catch. Rather than writing scenarios, the appropriate way to approach behavior-driven exploratory testing is more holistic: testers should assume the role of a user and exercise the product under test as a collection of interacting behaviors. If exploring uncovers any glaring behavior gaps, then new behavior scenarios should be added to the catalog.

How Should Manual Testing Be Handled?

Manual testing fits into BDD in much the same way as automated testing because both formats share the same process for behavior specification. Where the two ways diverge is in how the tests are run. There are a few special considerations to make when writing scenarios that won’t be automated.

Repository

Both manual and automated behavior scenarios should be stored in the same repository. The natural way to organize behaviors is by feature, regardless of how the tests will be run. All scenarios should also be managed by some form of version control.

Furthermore, all scenarios should be co-located for document-generation tools like Pickles. Doc tools make it easy to expose behavior specs and steps to everyone. They make it easier for the Three Amigos to collaborate. Non-technical people are not likely to dig into programming projects.

Extra Comments

The conciseness of behavior scenarios is problematic for manual testing because steps don’t provide all the information a tester may need. For example, test data may not be written explicitly in the spec. The best way to add extra information to a scenario is to add comments. Gherkin allows any number of lines for comments and description. Comments provide extra information to the reader but are ignored by the automation.

It may be tempting to simply write new Gherkin steps to handle the extra information for manual testing. However, this is not a good approach. Principles of good Gherkin should be used for all scenarios, regardless of whether or not the scenarios will be automated. High-quality specification should be maintained for consistency, for documentation tools, and for potential future automation.

An Example

Below is a feature that shows how to write behavior scenarios for manual tests:

Feature: Google Searching

  @automated
  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  @manual
  Scenario: Image search
    # The Google home page URL is: http://www.google.com/
    # Make sure the images shown include pandas eating bamboo
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

It’s not really different from any other behavior scenarios.

As stated in the beginning, BDD should be automation-first. Don’t use the content of this article to justify avoiding automation. Rather, use the techniques outlined here for manual testing only as needed.

BDD 101: Test Data

How should test data be handled in a behavior-driven test framework? This is a common question I hear from teams working on BDD test automation. A better question to ask first is, What is test data? This article will explain different types of test data and provide best practices for handling each. The strategies covered here can be applied to any BDD test framework. (Check the Automation Panda BDD page for the full table of contents.)

Types of Test Data

Personally, I hate the phrase “test data” because its meaning is so ambiguous. For functional test automation, there are three primary types of test data:

Test Case Values. These are the input and expected output values for test cases. For example, when testing calculator addition “1 + 2 = 3”, “1” and “2” would be input values, and “3” would be the expected output value. Input values are often parameterized for reusability, and output values are used in assertions.
Configuration Data. Config data represents the system or environment in which the tests run. Changes in config data should allow the same test procedure to run in different environments without making any other changes to the automation code. For example, a calculator service with an addition endpoint may be available in three different environments: development, test, and production. Three sets of config data would be needed to specify URLs and authentication in each environment (the config data), but 1 + 2 should always equal 3 in any environment (the test case values).
Ready State. Some tests require initial state to be ready within a system. “Ready” state could be user accounts, database tables, app settings, or even cluster data. If testing makes any changes, then the data must be reverted to the ready state.

Each type of test data has different techniques for handling it.

Test Case Values

There are 4 main ways to specify test case values in BDD frameworks, ranging from basic to complex.

In The Specs

The most basic way to specify test case values is directly within the behavior scenarios themselves! The Gherkin language makes it easy – test case values can be written into the plain language of a step, as step parameters, or in Examples tables. Consider the following example:

Scenario Outline: Simple Google searches
  Given a web browser is on the Google page
  When the search phrase "<phrase>" is entered
  Then results for "<phrase>" are shown
  
  Examples: Animals
    | phrase   |
    | panda    |
    | elephant |
    | rhino    |

The test case value used is the search phrase. The When and Then steps both have a parameter for this phrase, which will use three different values provided by the Examples table. It is perfectly suitable to put these test case values directly into the scenario because the values are small and descriptive.

Furthermore, notice how specific result values are not specified for the Then step. Values like “Panda Express” or “Elephant man” are not hard-coded. The step wording presumes that the step definition will have some sort of programmed mechanism for checking that result links relate to the search phrase (likely through regular expression matching).

Key-Value Lookup

Direct specification is great for small sets of simple values, but one size does not fit all needs. Key-value lookups are appropriate when test data is lengthier. For example, I’ve often seen steps like this:

Given the user navigates to "http://www.somewebsite.com/long/path/to/the/profile/page"

URLs, hexadecimal numbers, XML blocks, and comma-separated lists are all the usual suspects. While it is not incorrect to put these values directly into a step parameter, something like this would be more readable:

Given the user navigates to the "profile" page

Or even:

Given the user navigates to their profile page

The automation would store URLs in a lookup table so that these new steps could easily fetch the URL for the profile page by name. These steps are also more declarative than imperative and better resist changes in the underlying environment.

Another way to use key-value lookup is to refer to a set of values by one name. Consider the following scenario for entering an address:

Scenario Outline: Address entry
  Given the profile edit page is displayed
  When the user sets the street address to "<street>"
  And the user sets the second address line to "<second>"  
  And the user sets the city to "<city>"
  And the user sets the state to "<state>"
  And the user sets the zipcode to "<zipcode>"
  And the user sets the country to "<country>"
  And the user clicks the save button
  Then ...

  Examples: Addresses
    | street | second | city | state | zipcode | country |
    ...

An address has a lot of fields. Specifying each in the scenario makes it very imperative and long. Furthermore, if the scenario is an outline, the Examples table can easily extend far to the right, off the page. This, again, is not readable. This scenario would be better written like this:

Scenario Outline: Address entry
  Given the profile edit page is displayed
  When the user enters the "<address-type>" address
  And the user clicks the save button
  Then ...

  Examples: Addresses
    | address-type |
    | basic        |
    | two-line     |
    | foreign      |

Rather than specifying all the values for different addresses, this scenario names the classifications of addresses. The step definition can be written to link the name of the address class to the desired values.

Data Files

Sometimes, test case values should be stored in data files apart from the specs or the automation code. Reasons could be:

The data is simply too large to reasonably write into Gherkin or into code.
The data files may be generated by another tool or process.
The values are different between environments or other circumstances.
The values must be selected or switched at runtime (without re-compiling code).
The files themselves are used as payloads (ex: REST request bodies or file upload).

Scenario steps can refer to data files using the key-value lookup mechanisms described above. Lightweight, text-based, tabular file formats like CSV, XML, or JSON work the best. They can parsed easily and efficiently, and changes to them can easily be diff’ed. Microsoft Excel files are not recommended because they have extra bloat and cannot be easily diff’ed line-by-line. Custom text file formats are also not recommended because custom parsing is an extra automation asset requiring unnecessary development and maintenance. Personally, I like using JSON because its syntax is concise and its parsing tools seem to be the simplest in most programming languages.

External Sources

An external dependency exists when the data for test case values exists outside of the automation code base. For example, test case values could reside in a database instead of a CSV file, or they could be fetched from a REST service instead of a JSON file. This would be appropriate if the data is too large to manage as a set of files or if the data is constantly changing.

As a word of caution, external sources should be used only if absolutely necessary:

External sources introduce an additional point-of-failure. If that database or service goes down, then the test automation cannot run.
External sources degrade performance. It is slower to get data from a network connection than from a local machine.
Test case values are harder to audit. When they are in the specs, the code, or data files, history is tracked by version control, and any changes are easy to identify in code reviews.
Test case values may be unpredictable. The automation code base does not control the values. Bad values can fail tests.

External sources can be very useful, if not necessary, for performance / stress / load / limits testing, but it is not necessary for the vast majority of functional testing. It may be convenient to mock external sources with either a mocking framework like Mockito or with a dummy service.

Configuration Data

Config data pertain to the test environments, not the test cases. Test automation should never contain hard-coded values for config data like URLs, usernames, or passwords. Rather, test automation should read config data when it launches tests and make references to the required values. This should be done in Before hooks and not in Gherkin steps. In this way, automated tests can run on any configuration, such as different test environments before being released to production.

Config data can be stored in data files or accessed through some other dependency. (Read the previous section for pros and cons of those approaches.) The config to use should be somehow dynamically selectable when tests run. For example, the path to the config file to use could be provided as a command line argument to the test launch command.

Config data can be used to select test values to use at runtime. For example, different environments may need different test value data files. Conversely, scenario tagging can control what parts of config data should be used. For example, a tag could specify a username to use for the scenario, and a Before hook could use that username to fetch the right password from the config data.

For efficiency, only the necessary config data should be accessed or read into memory. In many cases, fetching the config data should also be done once globally, rather than before each test case.

Ready State

All scenarios have a starting point, and often, that starting point involves data. Setup operations must bring the system into the ready state, and cleanup operations must return the system to the ready state. Test data should leave no trace – temporary files should be deleted and records should be reverted. Otherwise, disk space may run out or duplicate records may fail tests. Maintaining the ready state between tests is necessary for true test independence.

During the Test Run

Simple setup and cleanup operations may be done directly within the automation. For example, when testing CRUD operations, records must be created before they can be retrieved, updated, or deleted. Setup would create a record, and cleanup would guarantee the record’s deletion. If the setup is appropriate to mention as part of the behavior, then it should be written as Given steps. This is true of CRUD operations: “Given a record has been created, When it is deleted, …”. If multiple scenarios share this same setup, then those Given steps should be put into a Background section.

However, sometimes setup details are not pertinent to the behavior at hand. For example, perhaps fresh authentication tokens must be generated for those CRUD calls. Those operations should be handled in Before hooks. The automation will take care of it, while the Gherkin steps can focus exclusively on the behavior.

No matter what, After hooks must do cleanup. It is incorrect to write final Then steps to do cleanup. Then steps should verify outcomes, not take more actions. Plus, the final Then steps will not be run if the test has a failure and aborts!

External Preparation

Some data simply takes too long to set up fresh for each test launch. Consider complicated user accounts or machine learning data: these are things that can be created outside of the test automation. The automation can simply presume that they exist as a precondition. These types of data require tool automation to prepare. Tool automation could involve a set of scripts to load a database, make a bunch of service calls, or navigate through a web portal to update settings. Automating this type of setup outside of the test automation enables engineers to more easily replicate it across different environments. Then, tests can run in much less time because the data is already there.

However, this external preparation must be carefully maintained. If any damage is done to the data, then test case independence is lost. For example, deleting a user account without replacing it means that subsequent test runs cannot log in! Along with setup tools, it is important to create maintenance tools to audit the data and make repairs or updates.

Advice for Any Approach

Use the minimal amount of test data necessary to test the functionality of the product under test. More test data requires more time to develop and manage. As a corollary, use the simplest approach that can pragmatically handle the test data. Avoid external dependencies as much as possible.

To minimize test data, remember that BDD is specification by example: scenarios should use descriptive values. Furthermore, variations should be reduced to input equivalence classes. For example, in the first scenario example on this page, it would probably be sufficient to test only one of those three animals, because the other two animals would not exhibit any different searching behavior.

Finally, be cautioned against randomization in test data. Functional tests are meant to be deterministic – they must always pass or fail consistently, or else test results will not be reliable. (Not only could this drive a tester crazy, but it would also break a continuous integration system.) Using equivalence classes is the better way to cover different types of inputs. Use a unique number counting mechanism whenever values must be unique.

For handling unpredictable test data, check out Unpredictable Test Data.

BDD‑‑; Collaboration without Automation

In the previous post, I described the tradeoffs of using a BDD test automation framework without the full BDD process. But, what about the opposite? What if a team wants to adopt BDD practices without a test framework to support it? Again, behavior-driven practices are beneficial apart from automation, but not without shortcomings.

The Power of Process

BDD should be a refinement, not an overhaul, of Agile software development. All of the problems BDD solves are simply aspects of the development process that must be solved anyway. BDD simply provides formal practices for solving them uniformly. Consider how BDD addresses the following problems:

Problem	Solution
Biz, dev, and test roles are siloed and do not talk together much.	BDD brings these three roles together in Three Amigos meetings.
Acceptance criteria are missing or poorly defined, wasting in-sprint time.	Acceptance criteria are formalized as specifications using Gherkin.
Product features are hard to explain.	Scenarios describe individual behaviors in plain language.
Team members have open questions or conflicting views about behaviors.	Example Mapping efficiently unifies a team’s understanding and identifies areas for further refinement.
Edge cases are overlooked during testing.	Well-defined behavior scenarios capture specifications by example early in development.

All of these problems can be solved through better, behavior-driven practices, and none of them pertain to test automation.

Spec-Less Automation

BDD process improvements don’t necessarily need a BDD framework for test automation. Any test framework could still automate scenario steps. The major difference is that there would be no mechanism to translate Gherkin lines into method/function calls: The automation engineer would simply need to program test cases the “good old-fashioned way.” It would not be much different from translating any other procedure-driven test cases into code.

The weakness of this approach is that specifications are not strongly linked to the test automation. The end-to-end development process is less efficient because behavior scenarios must essentially be rewritten into automation code, rather than becoming part of the automation code. There is also a higher risk that automated test cases won’t cover the actual intention of the test steps. Review and maintenance are more difficult because engineers must always cross-examine the automation code with the Gherkin to make sure they align. All of these problems make it harder to shift left with QA work.

The lack of a behavior-driven test framework is also a double-edged sword for Gherkin steps. On one hand, steps do not need to be scrutinized as strongly in review, since automation code does not directly depend upon them. It is not critical to reuse steps word-for-word or to worry about parameterization. However, sloppy steps can lead to miscommunication and will make adopting a BDD test framework in the future very difficult.

Better Than Nothing

Just like for automation without collaboration, using BDD practices without using a BDD test framework does improve the development process. There aren’t really any disadvantages because the process problems must be solved anyway. A “BDD‑‑;” situation (that’s a postfix decrement, to denote that automation did not follow collaboration) isn’t ideal, but at least it’s better than nothing.

‑‑BDD; Automation without Collaboration

Does it make sense to use a BDD test automation framework on a team that does not follow a Behavior-Driven Development process? I’ve faced this questions a few times recently. Although some BDD benefits will be missing, the answer is still yes, BDD test automation frameworks are still useful apart from a full BDD process. This article covers strengths and weaknesses to explain why.

Strengths

BDD test frameworks force tests to be behavior-driven, not procedure-driven. Behavior-driven tests focus on individual behaviors, making them concise and comprehensible. Impertinent factors are removed from test cases. Imperative details are specified only when necessary. Test reports are more descriptive, and test results are more meaningful. Tests written without a behavior-driven framework are more likely to become long, unnecessarily complicated, and fragile.

BDD test frameworks also provide inherent structure with steps. Steps are the basic building blocks of test cases, regardless of the type of test automation framework used. While almost all run-of-the-mill test frameworks (like JUnit, xUnit.net, or pytest) provide structure to write separate, independent test cases (usually as methods or functions), they lack structure to write separate test case steps. Typically, programmers end up writing test case logic directly into the test methods/functions, or they write ad hoc helper methods/functions/classes to get the job done. This approach often lacks consistency (especially when multiple engineers contribute to the automation code), and thus reusability suffers and duplication creeps in. Gherkin steps are like guide rails for test cases.

Gherkin steps provide easy reusability for rapid development. In a mature automation code base, new test cases can be written using a few short lines of pre-existing steps. And pre-existing steps can be trusted to work because they’ve been tested before. Parametrized steps enable even greater reuse.

Gherkin steps are self-documenting because they are written in plain English. This makes tests easier to do many things:

to write, because it provides an outline for the test in plain language
to review, because others less familiar with the feature can quickly understand concise scenarios
to maintain, because problems can be pinpointed
to explain, because non-technical people can’t read code

Much like any other test frameworks, BDD frameworks integrate with other testing packages and design patterns. For example, it is common to use a BDD framework with Selenium WebDriver and the Page Object Model to do Web UI testing. Other common packages for needs like logging, assertions, and REST API calls also work well with BDD frameworks.

Finally, BDD test frameworks open the door to shifting left. They can be the starting point for QA-led BDD. Demonstrating the value in behavior-driven automation can open interest in Three Amigos collaboration, which can then lead to more process improvements and better software quality.

Weaknesses

BDD test frameworks require extra development overhead at first. They aren’t as simple to use as unit-like test frameworks. It also takes a lot of practice to write good Gherkin. I’ve talked with engineers (typically developers) who see the feature file layer as unnecessary “plaster” over test cases. Without full team collaboration and cooperation, the justification for BDD diminishes.

Strict behavior independence may also make execution time less efficient. While steps may be reused, common setup operations must be run for each test. CRUD operations illustrate this point well. In a BDD framework, each operation (create, retrieve, update, delete) would be covered by a separate test scenario. However, the operations are interdependent: a test must create a thing before it can delete the thing. Thus, the delete scenario will borrow some logic from the create scenario. A procedure-driven test could more efficiently stack steps into one test case like this: create, retrieve, update, retrieve, delete, retrieve. Assertions would be interleaved with operations. This one test case would cover multiple behaviors, but it would save execution time by avoiding repeated creations for setup and deletions for cleanups. Many times, people have even asked me if there is a way to sequence Gherkin scenarios together to achieve the same effect! (This is not possible, and it would violate test independence.)

If BDD frameworks are used without a BDD process, then BDD could become pigeonholed as a “QA thing,” forever banished to the realm of the far right (the opposite of shift left, not the political spectrum). This could raise barriers to collaboration if not handled properly.

Furthermore, the lack of the full BDD means that many BDD benefits will go missing. Miscommunications could still easily happen because biz and dev would not be involved in defining behavior scenarios. Delivery deadlines could still be missed because testing and automation cannot readily shift left. Out of the 12 major benefits of BDD, the first 4 would be lost.

Conclusion

Overall, I think the advantages of BDD test automation frameworks outweigh the disadvantages for most above-unit functional testing needs, regardless of whether or not a team uses a full BDD process. Ideally, a team would embrace full-BDD, but that’s not always reality. A “‑‑BDD;” situation (that’s a prefix decrement, to note that collaboration was missing before automation) can still be seen as a glass half-full.

Who Should Lead BDD?

Behavior-driven development offers great benefits: better communication, easier test automation, and higher code quality. There are many ways for a team to start doing BDD, and naturally, someone needs to stand up and lead the effort. In my experience, adopting BDD is its own process. An evangelist converts team leaders, training sessions are given, and Gherkinized acceptance criteria start being automated. However, not everyone will embrace the changes, especially those across different role types. And big changes take time. Rome wasn’t built in a day, and neither will be a mature, effective BDD process.

This post covers three possible ways to lead BDD adoption, each from one of the Three Amigos roles. Any role can lead the charge, but each will have its unique struggles. These possibilities are advisory but not necessarily prescriptive. If you want to move your team into BDD, use these three approaches as guidelines for crafting a plan that best meets your needs. And, of course, the advice in Winning Support for BDD pertains to all approaches. Furthermore, as you read these approaches, put yourselves in the shoes of roles other than your own, so you can better understand the struggles each role faces.

Note: The approaches below presume that the underlying software development process is Agile Scrum. Nevertheless, they may be tweaked and applied to other processes, like Waterfall or Kanban.

The Starting Point

The starting point for all three approaches below is a “traditional” Agile sprint – one that is not (yet) behavior-driven. Product owners write user stories, developers implement the solutions, and testers test the deliverables. The diagram below shows the the main flow of sprint work in this type of sprint, and it will serve as the basis for illustrating BDD adoption:

The overall flow of a “traditional,” non-behavior-driven Agile sprint. Ceremonies like planning, review, and retrospective should still happen, but the are left out of this diagram to put emphasis on parts affected by BDD.

QA-Led BDD

Circumstances

The most common approach I’ve seen is QA-led BDD adoption, because testers arguably have the most to gain. It is most applicable when the Three Amigos roles (biz, dev, and test) are well-defined and separate. The impetus for QA to lead BDD adoption could be that developers deliver code too late to adequately test and automate within a sprint, or it could be that the QA team is struggling to scale their test automation development. There may also be resistance to BDD from biz and dev roles.

Steps

The sensible path for QA is to start all the way to the right and progressively shift left. This means that the starting point would be test automation. Start by building a solid automation code base. Pick a well-supported BDD framework like Cucumber, SpecFlow, or behave, and start adding scenarios and step definitions. Select scenarios for core product features rather than the latest sprint stories, so that the code base will be populated with the most basic, useful steps. Once the automation code reaches a “critical mass” for step reusability, QA can then proactively classify new test scenarios as automated or manual. Automated tests become easier and easier to write, giving QA more time to be exploratory with manual testing. Ideally, all manual testing would become exploratory.

Then, it’s time to start shifting left. At this point, all Gherkin steps would be in the automation code only, so set up a tool like Pickles to expose the steps to all team members as living documentation. QA should then schedule Three Amigos meetings with biz and dev to proactively discuss user story expectations. In those meetings, QA should start demonstrating how to write acceptance criteria in Gherkin, which then expedites testing. A big win would be if a QA engineer could write a new scenario using only pre-existing, pre-automated steps and then run it successfully on the spot.

Once biz and dev folks are convinced of BDD’s benefits, encourage them to participate in writing Gherkin. When they get comfortable, encourage product owners to write acceptance criteria in Gherkin when they write user stories, and hold Three Amigos meetings before sprint planning as part of grooming. Convince them that for them to help write Gherkin scenarios is a process efficiency for the whole team.

This slideshow requires JavaScript.

Struggles

Shifting left is never easy, especially when team members are hardened into their roles. That’s why QA must write both really good test automation and really good Gherkin scenarios. Success should speak for itself once QA delivers good automation fast. Furthermore, QA must be clear that BDD is not merely a test tool, it’s a process that requires a paradigm shift. Otherwise, BDD could be easily pigeonholed to be a “QA thing.”

Dev-Led BDD

Circumstances

There are a few reasons that could push developers to lead BDD. On some Agile teams, there’s no distinction between dev and QA roles: all team members are software engineers responsible for both developing and testing the software. Or, developers may not be satisfied with the testing effort. Maybe too many bugs are escaping the sprint, or maybe automation isn’t getting done in time. Or, perhaps the product owner is not happy with the deliverables and putting pressure on the team to do better. Whatever the circumstance, developers are more than capable of winning with BDD.

Steps

The best way for developers to start is to set up Three Amigos meetings, to stop the game of telephone between biz and test. In those meetings, start translating acceptance criteria into Gherkin. Then, start helping out with test automation – that may mean anything from offering advice to QA to building the framework from scratch. Then, start pushing left and right to get biz and test on board with BDD.

This slideshow requires JavaScript.

Struggles

It may be difficult for developers to work on test automation because they may lack either the expertise or the time to devote to good test automation. Automation is a specialized discipline, and it takes time and diligence to build up expertise to do it right. I’ve seen very skilled developers haughtily build very shabby automation frameworks.

Developers must also be careful to not be too technical, or else biz and test roles may reject BDD for being too complicated or beyond their abilities. Furthermore, some teams may be resistant to developing test automation. For example, automation work may be “starved” for points because it is underestimated or similarly starved for time because it is deemed lower in priority to other work.

Biz-Led BDD

Circumstances

BDD is designed to bring technical and business roles together into healthier collaboration, and biz folks can certainly lead BDD adoption as successfully as more technical folks can. Major reasons for biz to take the lead could be if development is perpetually running behind schedule, if deliverables don’t meet the original requirements, or if software bugs are rampant.

Steps

For biz roles, “shift left” could be better called “pull left.” Start by writing solid user stories and Gherkin acceptance criteria. Focus on good Gherkin that is readable and reusable. Then, introduce BDD as a refinement to the Agile process, highlighting its benefits. Initiate Three Amigos meetings to make sure that you are communicating the right things to dev and test. Once collaboration is going well, suggest BDD automation as a way to expedite dev and test work. If acceptance criteria are all Gherkinized, then developing BDD automation would be a natural extension.

This slideshow requires JavaScript.

Struggles

In my experience, biz roles (specifically product owners) tend to be the most hesitant about BDD. They often see writing Gherkin as a burdensome requirement rather than a way to help their team. Or, they may fear that BDD is “too technical” for them. It may also be difficult for them to pitch BDD automation to the team. To be successful, biz roles need to step outside their comfort zone to win supporters from dev and test.

Process paradigm shifts can be hard, especially on teams that are already overwhelmed with work. Some people just don’t like change. Process and automation change can also be a big challenge if QA is outsourced (which is common).

Side-By-Side Comparison

Here’s the TL;DR:

Role	Circumstances	Steps	Struggles
QA	Code is delivered too late to test and automate Automation development is not scaling	Build a solid BDD automation framework Demonstrate automation success Set up Three Amigos meetings during the sprint Start writing Gherkin scenarios with biz and dev as part of grooming	Showing that BDD is a whole development process, not just a QA thing Getting the team to truly shift left
Dev	No separation of dev and QA roles Too many bugs are escaping the sprint Pressure from biz to do better	Initiate better collaboration through Three Amigos and Gherkin Push right by helping QA with testing and automation Push left by helping biz write better acceptance criteria	Humbly learning good automation practices Dedicating time for automation and more meetings
Biz	Missed deadlines Deliverables not matching expectations Too many bugs	Write acceptance criteria in good Gherkin Set up Three Amigos meetings to review Gherkin Pitch BDD automation	Learning semi-technical things Pushing all the way to automation

Conclusion

These are just three general approaches intended to show how BDD is for everyone. If you have other approaches, please describe them in the comment section below! Whatever the approach, make sure to demonstrate that BDD helps everyone, or else people may feel forced into corners and reject BDD for bad reasons. And remember, software quality is not just QA’s responsibility; it is everyone’s responsibility.

Winning Support for BDD

Adopting behavior-driven development practices can greatly improve software quality and productivity, but like any big change, it will have opponents along with supporters. I’ve met resistance from all roles: testers, developers, product owners, and managers. And some people can be stubborn. As with any proposal, the best way to win support is not just to tell the benefits but to demonstrate them. Below are five major ways to demonstrate the benefits of BDD.

Make it a Refinement, not an Overhaul

I remember talking with a scrum master one time about challenges his team faced with testing and automation. The user stories his team wrote were a mess: they may or may not have had acceptance criteria, and the product owner would often ask for features to be scrapped or redone after a sprint or two. The team basically gave up on automated testing due to feature flux. Naturally, I proposed BDD to him, suggesting that it could help drive better features through formalization. However, this scrum master balked at the idea: “My team is stretched so thin right now, there’s no way we can overhaul our process right now.”

Clearly, the team had a serious problem, but they weren’t willing to try any solution deemed too “big.” The scrum master’s perception was that BDD would be a disruptive change that would hurt them more than help them. In cases like this, it is best to present BDD as a refinement of Agile, and not an overhaul of it. Agile says user stories should have acceptance criteria; BDD says acceptance criteria should be formalized. Agile says that the definition of done should include test automation; BDD says automation is a natural extension of the acceptance criteria. There’s nothing in Agile that BDD undoes, and there are shortcomings in Agile that BDD solves.

Write Good Gherkin

There is a big difference between Gherkin and good Gherkin. Anyone can add BDD buzzwords to existing test procedures, but effective BDD needs a paradigm shift. Unfortunately, bad Gherkin can ruin many of the benefits BDD can bring. For example, imperative steps will frustrate product owners, and mixed point-of-view will confuse testers. Nothing will ever be truly perfect, but it is important to strive for good Gherkin from the start, especially when the first behavior scenarios will often be used as examples for future scenarios.

Start the Automation Snowball

BDD and automation go together like peas and carrots. Not only can test automation shift left (since Gherkin scenarios are both acceptance criteria and tests), but steps can be implemented once and reused by any scenarios. When the first BDD scenarios are written, obviously all steps are new steps. As sprints pass, though, many common steps will likely be reused. I’ve even written new scenarios without adding any new steps!

Test automation is often the last thing to be done for a story, if it’s even reached at all. The inherent step reusability helps BDD automation get done sooner. It may take a while to build up useful, reusable steps in the code base, but they will cause an “automation snowball” once they are there. Imagine telling your team that the test automation is already done once a scenario is written in Gherkin!

Take Baby Steps

Rome wasn’t built in a day, and neither will a mature BDD process be. People take time to adjust to new paradigms. Start out slow, and do it right. Train the team how to write good Gherkin. Try a few stories one sprint, rather than taking on the whole backlog. For a product-owner-led approach, start with Gherkinizing acceptance criteria for a sprint or two before attempting any automation. Alternatively, for a test-led approach, work on the automation framework first, and then start to shift the scenario writing left to the developers and then to the product owners once the snowball gets bigger.

It’s okay if things aren’t perfect at first. Learn the lessons and iterate for improvement. Take baby steps!

Highlight how Everyone Wins

BDD is truly a win/win for everyone. It’s not a way to shuffle responsibilities or push around busywork, it’s a way to make a team more interdependent upon each other. Each role in the Three Amigos is empowered to do the right things, with support from each other in lock-step. Consider how BDD process changes help each role work together better:

Role	New Responsibility	Interdependent Benefit
Product Owner	Learn to express requirements in a more formalized, slightly techy way	Better assurance that features will be what they actually want, be working correctly, and be protected against future regressions
Developer	Contribute more to grooming and test planning	Less likely to develop the wrong thing or to be “held up” by testing
Tester	Build and learn a new automation framework	Automation will snowball, allowing them to meet sprint commitments and focus extra time on exploratory testing
Everyone	Another meeting or two	Better communication and fewer problems

Nobody on an Agile team can rightly say, “BDD isn’t useful to me.” Software quality is everyone’s responsibility, and BDD is a great way to improve it.

Cucumber-JVM Global Hook Workarounds

Almost all BDD automation frameworks have some sort of hooks that run before and after scenarios. However, not all frameworks have global hooks that run once at the beginning or end of a suite of scenarios – and Cucumber-JVM is one of these unlucky few. Cucumber-JVM GitHub Issue #515, which seeks to add @BeforeAll and @AfterAll hooks, has been open and active since 2013, but it looks unclear if the issue will ever be resolved. Thankfully, there are some workarounds to effect the same behavior as global hooks.

Workaround #1: Don’t Do It

From a purist’s perspective, each scenario (or test) should be completely independent, meaning it should not share parts with any other tests. Independence provides the following benefits:

Safety between tests
Consistency across tests
The ability to run any tests individually, in any order, or in parallel
More sensible, understandable tests

If not handled properly, global hooks can be dangerous because they make tests interdependent. Changes or failures in one test may cascade into others. Global test data would waste memory for tests that don’t use it. Furthermore, the fact that Issue #515 has been open for years indicates the difficulty of properly implementing global hooks.

However, the main cost of independence is runtime. Independent tests often repeat similar setup and cleanup routines. Even a few extra seconds per test can add up tremendously. Google Guava, for example, has over 286,000 tests – adding one second to each test would amount to nearly 80 hours! Performance becomes especially critical for continuous integration, in which wasted time means either delivery delays or coverage gaps. Certain operations like preparing a database or fetching authentication tokens may be pragmatic candidates for global hooks.

The best strategy is to use global hooks only when necessary for time-intensive setup that can be shared safely. Any shared test data should be immutable. Always question the need for global hooks. Most tests probably won’t need them.

Workaround #2: Static Variables

A basic hack for global hooks is actually provided in Issue #515. A static Boolean flag can indicate when the @Before hook has run more than once because it isn’t “reset” when a new scenario re-instantiates the step definition classes. The runtime shutdown hook will be called once all tests are done and the program exits. (Note that a static flag cannot be used in an @After hook due to the halting problem.) The example from the issue is shamelessly copied below:

public class GlobalHooks {
    private static boolean dunit = false;

    @Before
    public void beforeAll() {
        if(!dunit) {
            Runtime.getRuntime().addShutdownHook(afterAllThread);
            // do the beforeAll stuff...
            dunit = true;
        }
    }
}

Workaround #3: Singleton Caching

The basic hack is useful for simple setup and cleanup routines, but it becomes inelegant when objects must be shared by scenarios. Rather than polluting the class with static members, a singleton can cache test data between scenarios, and global setup logic may be put into the singleton’s constructor. Furthermore, if the singleton uses lazy initialization, then @Before hooks may not be needed at all. A “lazy” singleton will not be instantiated until the first time its getInstance method is called, meaning it will be skipped if the scenarios do not need them. This is a huge advantage when selectively running scenarios by name, tag, or feature. (Please refer to the previous post, Static or Singleton, for a deeper explanation of the singleton pattern.)

Consider scenarios that must generate authentication tokens (like OAuth) for API testing. A singleton “token holder” could cache tokens for usernames, rather than doing the authorization dance for every scenario. The snippet below shows how such a singleton could be called within a @When step definition with no @Before method.

public class ExampleSteps {
    ...
    @When("^some API is called$")
    public void whenSomeApiIsCalled() {
        // Get the token from the singleton cache lazily
        String token = TokenHolder.getInstance().getToken("user", "pass");
        // Use the token to call some API (method not shown)
        callSomeApi(token);
    }
    ...
}

And the singleton class could be defined like this:

public class TokenHolder {
    private static volatile TokenHolder instance = null;
    private HashMap<String, String> tokens;

    private TokenHolder() {
        tokens = new HashMap<String, String>();
    }

    public static TokenHolder getInstance() {
        // Lazy and thread-safe
        if (instance == null) {
            synchronized(TokenHolder.class) {
                if (instance == null) {
                    instance = new TokenHolder();
                }
            }
        }

        return instance;
    }
    
    public String getToken(String username, String password) {
        // This check could be extended to handle token expiration
        if (!tokens.containsKey(username)) {
            // Request a fresh authentication token (method not shown)
            String token = requestToken(username, password);
            // Cache the token for later
            tokens.put(username, token);
        }
        
        return tokens.get(username);
    }
    
    ...
}

Workaround #4: JUnit Class Annotations

Another workaround mentioned in Issue #515 and elsewhere is to use JUnit‘s @BeforeClass and @AfterClass annotations in the runner class, like this:

@RunWith(Cucumber.class)
@Cucumber.Options(format = {
    "html:target/cucumber-html-report",
    "json-pretty:target/cucumber-json-report.json"})
public class RunCukesTest {

    @BeforeClass
    public static void setup() {
        System.out.println("Ran the before");
    }

    @AfterClass
    public static void teardown() {
        System.out.println("Ran the after");
    }
}

While @BeforeClass and @AfterClass may look like the cleanest solution at first, they are not very practical to use. They work only when Cucumber-JVM is set to use the JUnit runner. Other runners, like TestNG, the command line runner, and special IDE runners, won’t pick up these hooks. Their methods must also be are static and would need static variables or singletons to share data anyway. Therefore, I personally discourage using these annotations in Cucumber-JVM.

What About Dependency Injection?

Dependency injection is a marvelous technique. As defined by Wikipedia:

In software engineering, dependency injection is a technique whereby one object supplies the dependencies of another object. A dependency is an object that can be used (a service). An injection is the passing of a dependency to a dependent object (a client) that would use it. The service is made part of the client’s state. Passing the service to the client, rather than allowing a client to build or find the service, is the fundamental requirement of the pattern.

Dependency injection can be a powerful alternative to singletons because DI provides finer control over the scope of objects. However, Cucumber-JVM’s dependency injection cannot be applied with global hooks because dependency objects, like step definition objects, are constructed and destroyed for each scenario.

Comparison Table

Ultimately, the best approach for global hooks in Cucumber-JVM is the one that best fits the tests’ needs. Below is a table to make workaround comparisons easier.

Workaround	Pros	Cons
Don’t Do It	Scenarios are completely independent. No complicated or risky workarounds.	Repeated setup and cleanup procedures may add significant execution time.
Static Variables	Simple yet effective implementation.	May need many static variables to share test data.
Singleton Caching	Abstracts test data and setup procedures. Easily handles lazy initialization and evaluation. May not need a @Before hook.	More complicated design.
JUnit Class Annotations	Clean look for basic setup and cleanup routines.	May be used only with the JUnit runner. Requires static variables or singletons to share test data anyway.

The Behavior-Driven Three Amigos

Recently, my manager said to me, “Andy, your BDD documentation looks great, but could you please mention The Three Amigos?” A brief flash of panic came over me – I had never heard of “The Three Amigos” before. My immediate thought was the 1986 film of the same name or the Disney film The Three Caballeros. After a little research, I knew exactly who they were; I just didn’t know they had a name.

Who are “The Three Amigos”?

“The Three Amigos” refers to a meeting of the minds of the three primary roles involved in producing software:

Business – Often named the “business analyst” (BA) or “product owner” (PO), the business role provides what problem must be solved. They provide requirements for the solution. Typically, the business role is non-technical.
Development – The developer role provides how the solution to the problem will be implemented. They build the software and must be very technical.
Testing – The testing role, sometimes named “quality assurance” (QA), verifies that the delivered software product works correctly. They also try to find defects. The tester role must be somewhat technical.

During software development, The Three Amigos should meet regularly to discuss how the product will be developed. It is a shift left practice to avoid misunderstandings (like a game of telephone), thus improving quality and avoiding missed deadlines. The discussions should include only the individuals who will actually work on the specific deliverable, not the whole team.

While The Three Amigos seems most popular in Agile, it can be applied to any software development process. Some (here and here) advocate regularly scheduled formal meetings. Others (here and here) interpret it as an attitude instead of a process, in which the roles continuously collaborate. Regardless of implementation, The Three Amigos need to touch base before development begins.

Applying BDD

The Three Amigos fits perfectly into behavior-driven development, especially as part of BDD with Agile. Behavior scenarios are meant to foster collaboration between technical and non-technical roles because they are simple, high-level, and written in plain language. Given-When-Then provides a common format for discussion.

Ideally, when The Three Amigos meet during grooming and planning, they would formalize acceptance criteria as Gherkin features. Those feature files are then used directly by the developer for direction and the tester for automation. They act like a receipt of purchase for the business role – the feature file says, “This is what you ordered.”

A great technique for Three Amigos collaboration is Example Mapping – it efficiently identifies rules for acceptance criteria, behavior examples, and open questions. Examples can easily be turned into Gherkin scenarios either during or after the meeting.

Since BDD relies on feature files as artifacts, The Behavior-Driven Three Amigos must be more than just an attitude. The point of the collaboration is to produce feature files early for process efficiency. Less formal meetings could quickly devolve into all-talk-no-action.

Don’t Presume Anything

Don’t presume that the three roles will naturally collaborate on their own. I’ve seen teams in which the testers don’t participate in planning. I’ve also seen organizations in which automation engineers don’t help to write the test cases that they need to automate! Developers often abdicate responsibility for testing considerations, because “that’s QA’s job.” And, specifically for BDD, I’ve noticed that product owners resist writing acceptance criteria in Gherkin because they think it is too technical and beyond their role.

The Three Amigos exists as a named practice because collaboration between roles does not always happen. It is an accountability measure. Remember, the ultimate purpose for The Three Amigos is higher quality in both the product and the process. Nobody wants more meetings on their calendar, but everyone can agree that quality is necessary.

12 Awesome Benefits of BDD

What can BDD do for you? Why adopt a new process with a new framework? Because it’s worth it! The main benefits of BDD are better collaboration and automation. This article expands those two into a dozen awesome benefits. (If you read the BDD 101 series, then these points should look familiar.)

#1: Inclusion

BDD is meant to be collaborative. Everyone from the customer to the tester should be able to easily engage in product development. And anyone can write behavior scenarios because they are written in plain language. Scenarios are:

Requirements for product owners
Acceptance criteria for developers
Test cases for testers
Scripts for automators
Description for other stakeholders

Essentially, BDD is an enhancement of The Three Amigos.

#2: Clarity

Scenarios focus on the expected behaviors of the product. Each scenario focuses on one specific thing. Behaviors are described in plain language, and any ambiguity can be clarified with a simple conversation or Example Mapping. There’s no unreadable code or obscure technical jargon, and there’s no game of telephone. Clarity ensures the customer gets what the customer wants.

#3: Streamlining

BDD is designed to speed up the development process. Everyone involved in development relies upon the same scenarios. Scenarios are requirements, acceptance criteria, test cases, and test scripts all in one – there is no need to write any other artifact. The modular nature of Gherkin syntax expedites test automation development. Furthermore, scenarios can be used as steps to reproduce failures for defect reports.

#4: Shift Left

“Shift left” is a buzzword for testing early in the development process. Testing earlier means fewer bugs later. In BDD, test case definition inherently becomes part of the requirements phase (for waterfall) or grooming (for Agile). As soon as behavior scenarios are written, testing and automation can theoretically begin.

#5: Artifacts

Scenarios form a collection of self-documenting test cases as a result of the BDD process. This ever-growing collection forms a perfect regression test suite. Scenarios can be run manually or with automation. Any tests not automated can be added to a backlog to automate in the future.

#6: Automation

BDD frameworks make it easy to turn scenarios into automated tests. The steps are already given by the scenarios – the automation engineer simply needs to write a method/function to perform each step’s operations.

#7: Test-Driven

BDD is an evolution of TDD. Writing scenarios from the beginning enforces quality-first and test-first mindsets. BDD automation can run scenarios to fail until the feature is implemented and causes tests to pass.

#8: Code Reuse

Given-When-Then steps can be reused between scenarios. The underlying implementation for each step does not change. Automation code becomes very modular.

#9: Parameterization

Scenario steps can be parameterized to be even more reusable. For example, a step to click a button can take in its ID. Parameterization can help a team adopt a common, reusable set of steps, and it inspires healthier discussion when writing scenarios.

#10: Variation

Scenario outlines make it easy to run the same scenario with different combinations of inputs. This is a simple but powerful way to expand test coverage without code duplication, which is the bane of test automation.

#11: Momentum

BDD has a snowball effect: scenarios become easier and faster to write and automate as more step definitions are added. Scenarios typically share common steps. Sometimes, new scenarios need nothing more than different step parameters or just one new line.

#12: Adaptability

BDD scenarios are easy to update as the product changes. Plain language is easy to edit. Modular design makes changes to automation code safer. Scenarios can also be filtered by tag name to decide what runs and what doesn’t.

BDD 101: Frameworks

Every major programming language has a BDD automation framework. Some even have multiple choices. Building upon the structural basics from the previous post, this post provides a survey of the major frameworks available today. Since I cannot possibly cover every BDD framework in depth in this 101 series, my goal is to empower you, the reader, to pick the best framework for your needs. Each framework has support documentation online justifying its unique goodness and detailing how to use it, and I would prefer not to duplicate documentation. Use this post primarily as a reference. (Check the Automation Panda BDD page for the full table of contents.)

Major Frameworks

Most BDD frameworks are Cucumber versions, JBehave derivatives inspired by Dan North, or non-Gherkin spec runners. Some put behavior scenarios into separate files, while others put them directly into the source code.

C# and Microsoft .NET

SpecFlow, created by Gáspár Nagy, is arguably the most popular BDD framework for Microsoft .NET languages. Its tagline is “Cucumber for .NET” – thus fully compliant with Gherkin. SpecFlow also has polished, well-designed hooks, context injection, and parallel execution (especially with test thread affinity). The basic package is free and open source, but SpecFlow also sells licenses for SpecFlow+ extensions. The free version requires a unit test runner like MsTest, NUnit, or xUnit.net in order to run scenarios. This makes SpecFlow flexible but also feels jury-rigged and inelegant. The licensed version provides a slick runner named SpecFlow+ Runner (which is BDD-friendly) and a Microsoft Excel integration tool named SpecFlow+ Excel. Microsoft Visual Studio has extensions for SpecFlow to make development easier.

There are plenty of other BDD frameworks for C# and .NET, too. xBehave.net is an alternative that pairs nicely with xUnit.net. A major difference of xBehave.net is that scenario steps are written directly in the code, instead of in separate text (feature) files. LightBDD bills itself as being more lightweight than other frameworks and basically does some tricks with partial classes to make the code more readable. NSpec is similar to RSpec and Mocha and uses lambda expressions heavily. Concordion offers some interesting ways to write specs, too. NBehave is a JBehave descendant, but the project appears to be dead without any updates since 2014.

Java and JVM Languages

The main Java rivalry is between Cucumber-JVM and JBehave. Cucumber-JVM is the official Cucumber version for Java and other JVM languages (Groovy, Scala, Clojure, etc.). It is fully compliant with Gherkin and generates beautiful reports. The Cucumber-JVM driver can be customized, as well. JBehave is one of the first and foremost BDD frameworks available. It was originally developed by Dan North, the “father of BDD.” However, JBehave is missing key Gherkin features like backgrounds, doc strings, and tags. It was also a pure-Java implementation before Cucumber-JVM existed. Both frameworks are widely used, have plugins for major IDEs, and distribute Maven packages. This popular but older article compares the two in slight favor of JBehave, but I think Cucumber-JVM is better, given its features and support.

The Automation panda article Cucumber-JVM for Java is a thorough guide for the Cucumber-JVM framework.

Java also has a number of other BDD frameworks. JGiven uses a fluent API to spell out scenarios, and pretty HTML reports print the scenarios with the results. It is fairly clean and concise. Spock and JDave are spec frameworks, but JDave has been inactive for years. Scalatest for Scala also has spec-oriented features. Concordion also provides a Java implementation.

JavaScript

Almost all JavaScript BDD frameworks run on Node.js. Jasmine and Mocha are two of the most popular general-purpose JS test frameworks. They differ in that Jasmine has many features included (like assertions and spies) that Mocha does not. This makes Jasmine easier to get started (good for beginners) but makes Mocha more customizable (good for power users). Both claim to be behavior-driven because they structure tests using “describe” and “it-should” phrases in the code, but they do not have the advantage of separate, reusable steps like Gherkin. Personally, I consider Jasmine and Mocha to be behavior-inspired but not fully behavior-driven.

Other BDD frameworks are more true to form. Cucumber provides Cucumber.js for Gherkin-compliant happiness. Yadda is Gherkin-like but with a more flexible syntax. Vows provides a different way to approach behavior using more formalized phrase partitions for a unique form of reusability. The Cucumber blog argues that Cucumber.js is best due to its focus on good communication through plain language steps, whereas other JavaScript BDD frameworks are more code-y. (Keep in mind, though, that Cucumber would naturally boast of its own framework.) Other comparisons are posted here, here, here, and here.

PHP

The two major BDD frameworks for PHP are Behat and Codeception. Behat is the official Cucumber version for PHP, and as such is seen as the more “pure” BDD framework. Codeception is more programmer-focused and can handle other styles of testing. There are plenty of articles comparing the two – here, here, and here (although the last one seems out of date). Both seem like good choices, but Codeception seems more flexible.

Python

Python has a plethora of test frameworks, and many are BDD. behave and lettuce are probably the two most popular players. Feature comparison is analogous to Cucumber-JVM versus JBehave, respectively: behave is practically Gherkin compliant, while lettuce lacks a few language elements. Both have plugins for major IDEs. pytest-bdd is on the rise because it integrates with all the wonderful features of pytest. radish is another framework that extends the Gherkin language to include scenario loops, scenario preconditions, and variables. All these frameworks put scenarios into separate feature files. They all also implement step definitions as functions instead of classes, which not only makes steps feel simpler and more independent, but also avoids unnecessary object construction.

Other Python frameworks exist as well. pyspecs is a spec-oriented framework. Freshen was a BDD plugin for Nose, but both Freshen and Nose are discontinued projects.

Ruby

Cucumber, the gold standard for BDD frameworks, was first implemented in Ruby. Cucumber maintains the official Gherkin language standard, and all Cucumber versions are inspired by the original Ruby version. Spinach bills itself as an enhancement to Cucumber by encapsulating steps better. RSpec is a spec-oriented framework that does not use Gherkin.

Which One is Best?

There is no right answer – the best BDD framework is the one that best fits your needs. However, there are a few points to consider when weighing your options:

What programming language should I use for test automation?
Is it a popular framework that many others use?
Is the framework actively supported?
Is the spec language compliant with Gherkin?
What type of testing will you do with the framework?
What are the limitations as compared to other frameworks?

Frameworks that separate scenario text from implementation code are best for shift-left testing. Frameworks that put scenario text directly into the source code are better for white box testing, but they may look confusing to less experienced programmers.

Personally, my favorites are SpecFlow and pytest-bdd. At LexisNexis, I used SpecFlow and Cucumber-JVM. For Python, I used behave at MaxPoint, but I have since fallen in love with pytest-bdd since it piggybacks on the wonderfulness of pytest. (I can’t wait for this open ticket to add pytest-bdd support in PyCharm.) For skill transferability, I recommend Gherkin compliance, as well.

Reference Table

The table below categorizes BDD frameworks by language and type for quick reference. It also includes frameworks in languages not described above. Recommended frameworks are denoted with an asterisk (*). Inactive projects are denoted with an X (x).

Language	Framework	Type
C	Catch	In-line Spec
C++	Igloo	In-line Spec
C# and .NET	Concordion LightBDD NBehave x NSpec SpecFlow * xBehave.net	In-line Spec In-line Gherkin Separated semi-Gherkin In-line Spec Separated Gherkin In-line Gherkin
Golang	Ginkgo	In-line Spec
Java and JVM	Cucumber-JVM * JBehave JDave x JGiven * Scalatest Spock	Separated Gherkin Separated semi-Gherkin In-line Spec In-line Gherkin In-line Spec In-line Spec
JavaScript	Cucumber.js * Yadda Jasmine Mocha Vows	Separated Gherkin Separated semi-Gherkin In-line Spec In-line Spec In-line Spec
Perl	Test::BDD::Cucumber	Separated Gherkin
PHP	Behat Codeception *	Separated Gherkin Separated or In-line
Python	behave * freshen x lettuce pyspecs pytest-bdd * radish	Separated Gherkin Separated Gherkin Separated semi-Gherkin In-line Spec Separated semi-Gherkin Separated Gherkin-plus
Ruby	Cucumber * RSpec Spinach	Separated Gherkin In-line Spec Separated Gherkin
Swift / Objective C	Quick	In-line Spec

[4/22/2018] Update: I updated info for C# and Python frameworks.

When is Manual Testing Appropriate?

How Should Manual Testing Be Handled?

Repository

Tags

Extra Comments

An Example

Types of Test Data

Test Case Values

In The Specs

Key-Value Lookup

Data Files

External Sources

Configuration Data

Ready State

During the Test Run

External Preparation

Advice for Any Approach

The Power of Process

Spec-Less Automation

Better Than Nothing

Strengths

Weaknesses

Conclusion

The Starting Point

QA-Led BDD

Circumstances

Steps

Struggles

Dev-Led BDD

Circumstances

Steps

Struggles

Biz-Led BDD

Circumstances

Steps

Struggles

Side-By-Side Comparison

Conclusion

Make it a Refinement, not an Overhaul

Write Good Gherkin

Start the Automation Snowball

Take Baby Steps

Highlight how Everyone Wins

Workaround #1: Don’t Do It

Workaround #2: Static Variables

Workaround #3: Singleton Caching

Workaround #4: JUnit Class Annotations

What About Dependency Injection?

Comparison Table

Who are “The Three Amigos”?

Applying BDD

Don’t Presume Anything

#1: Inclusion

#2: Clarity

#3: Streamlining

#4: Shift Left

#5: Artifacts

#6: Automation

#7: Test-Driven

#8: Code Reuse

#9: Parameterization

#10: Variation

#11: Momentum

#12: Adaptability

Major Frameworks

C# and Microsoft .NET

Java and JVM Languages

JavaScript

PHP

Python

Ruby

Which One is Best?

Reference Table