SpecFlow

Solving: How to write good UI interaction tests? #GivenWhenThenWithStyle

Writing good Gherkin is a passion of mine. Good Gherkin means good behavior specification, which results in better features, better tests, and ultimately better software. To help folks improve their Gherkin skills, Gojko Adzic and SpecFlow are running a series of #GivenWhenThenWithStyle challenges. I love reading each new challenge, and in this article, I provide my answer to one of them.

The Challenge

Challenge 20 states:

This week, we’re looking into one of the most common pain points with Given-When-Then: writing automated tests that interact with a user interface. People new to behaviour driven development often misunderstand what kind of behaviour the specifications should describe, and they write detailed user interactions in Given-When-Then scenarios. This leads to feature files that are very easy to write, but almost impossible to understand and maintain.

Here’s a typical example:

Scenario: Signed-in users get larger capacity
 
Given a user opens https://www.example.com using Chrome
And the user clicks on "Upload Files"
And the page reloads
And the user clicks on "Spreadsheet Formats"
Then the buttons "XLS" and "XLSX" show
And the user clicks on "XLSX"
And the user selects "500kb-sheet.xlsx"
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
And the user clicks on "XLSX"
And the user selects "1mb-sheet.xlsx"
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx" 
And the user clicks on "Login"
And the user enters "testuser123" into the "username" field
And the user enters "$Pass123" into the "password" field
And the user clicks on "Sign in"
And the page reloads
Then the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx" 
And the user clicks on "spreadsheet formats"
Then the buttons "XLS" and "XLSX" show
And the user clicks on "XLSX"
And the user selects "1mb-sheet.xlsx"
Then the upload completes
And the table "Uploaded Files" contains a cell with "1mb-sheet.xlsx" 
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx"

A common way to avoid such issues is to rewrite the specification to avoid the user interface completely. We’ve looked into that option several times in this article series. However, that solution only applies if the risk we’re testing is not in the user interface, but somewhere below. To make this challenge more interesting, let’s say that we actually want to include the user interface in the test, since the risk is in the UI interactions.

Indeed, most behavior-driven practitioners would generally recommend against phrasing steps using language specific to the user interface. However, there are times when testing a user interface itself is valid. For example, I work at PrecisionLender, a Q2 Company, and our main web app is very heavy on the front end. It has many, many interconnected fields for pricing commercial lending opportunities. My team has quite a few tests to cover UI-centric behaviors, such as verifying that entering a new interest rate triggers recalculation for summary amounts. If the target behavior is a piece of UI functionality, and the risk it bears warrants test coverage, then so be it.

Let’s break down the example scenario given above to see how to write Gherkin with style for user interface tests.

Understanding Behavior

Behavior is behavior. If you can describe it, then you can do it. Everything exhibits behavior, from the source code itself to the API, UIs, and full end-to-end workflows. Gherkin scenarios should use verbiage that reflects the context of the target behavior. Thus, the example above uses words like “click,” “select,” and “open.” Since the scenario explicitly covers a user interface, I think it is okay to use these words here. What bothers me, however, are two apparent code smells:

  1. The wall of text
  2. Out-of-order step types

The first issue is the wall of text this scenario presents. Walls of text are hard to read because they present too much information at once. The reader must take time to read through the whole chunk. Many readers simply read the first few lines and then skip the remainder. The example scenario has 27 Given-When-Then steps. Typically, I recommend Gherkin scenarios to have single-digit line length. A scenario with less than 10 steps is easier to understand and less likely to include unnecessary information. Longer scenarios are not necessarily “wrong,” but their longer lengths indicate that, perhaps, these scenarios could be rewritten more concisely.

The second issue in the example scenario is that step types are out of order. Given-When-Then is a formula for success. Gherkin steps should follow strict Given → When → Then ordering because this ordering demarcates individual behaviors. Each Gherkin scenario should cover one individual behavior so that the target behavior is easier to understand, easier to communicate, and easier to investigate whenever the scenario fails during testing. When scenarios break the order of steps, such as Given → Then → Given → Then in the example scenario, it shows that either the scenario covers multiple behaviors or that the author did not bring a behavior-driven understanding to the scenario.

The rules of good behavior don’t disappear when the type of target behavior changes. We should still write Gherkin with best practices in mind, even if our scenarios cover user interfaces.

Breaking Down Scenarios

If I were to rewrite the example scenario, I would start by isolating individual behaviors. Let’s look at the first half of the original example:

Given a user opens https://www.example.com using Chrome
And the user clicks on "Upload Files"
And the page reloads
And the user clicks on "Spreadsheet Formats"
Then the buttons "XLS" and "XLSX" show
And the user clicks on "XLSX"
And the user selects "500kb-sheet.xlsx"
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
And the user clicks on "XLSX"
And the user selects "1mb-sheet.xlsx"
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx"

Here, I see four distinct behaviors covered:

  1. Clicking “Upload Files” reloads the page.
  2. Clicking “Spreadsheet Formats” displays new buttons.
  3. Uploading a spreadsheet file makes the filename appear on the page.
  4. Attempting to upload a spreadsheet file that is 1MB or larger fails.

If I wanted to purely retain the same coverage, then I would rewrite these behavior specs using the following scenarios:

Feature: Example site
 
 
Scenario: Choose to upload files
 
Given the Example site is displayed
When the user clicks the "Upload Files" link
Then the page displays the "Spreadsheet Formats" link
 
 
Scenario: Choose to upload spreadsheets
 
Given the Example site is ready to upload files
When the user clicks the "Spreadsheet Formats" link
Then the page displays the "XLS" and "XLSX" buttons
 
 
Scenario: Upload a spreadsheet file that is smaller than 1MB
 
Given the Example site is ready to upload spreadsheet files
When the user clicks the "XLSX" button
And the user selects "500kb-sheet.xlsx" from the file upload dialog
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
 
 
Scenario: Upload a spreadsheet file that is larger than or equal to 1MB
 
Given the Example site is ready to upload spreadsheet files
When the user clicks the "XLSX" button
And the user selects "1mb-sheet.xlsx" from the file upload dialog
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx"

Now, each scenario covers each individual behavior. The first scenario starts with the Example site in a “blank” state: “Given the Example site is displayed”. The second scenario inherently depends upon the outcome of the first scenario. Rather than repeat all the steps from the first scenario, I wrote a new starting step to establish the initial state more declaratively: “Given the Example site is ready to upload files”. This step’s definition method may need to rerun the same operations as the first scenario, but it guarantees independence between scenarios. (The step could also optimize the operations, but that should be a topic for another challenge.) Likewise, the third and fourth scenarios have a Given step to establish the state they need: “Given the Example site is ready to upload spreadsheet files.” Both scenarios can share the same Given step because they have the same starting point. All three of these new steps are descriptive more than prescriptive. They declaratively establish an initial state, and they leave the details to the automation code in the step definition methods to determine precisely how that state is established. This technique makes it easy for Gherkin scenarios to be individually clear and independently executable.

I also added my own writing style to these scenarios. First, I wrote concise, declarative titles for each scenario. The titles dictate interaction over mechanics. For example, the first scenario’s title uses the word “choose” rather than “click” because, from the user’s perspective, they are “choosing” an action to take. The user will just happen to mechanically “click” a link in the process of making their choice. The titles also provide a level of example. Note that the third and fourth scenarios spell out the target file sizes. For brevity, I typically write scenario titles using active voice: “Choose this,” “Upload that,” or “Do something.” I try to avoid including verification language in titles unless it is necessary to distinguish behaviors.

Another stylistic element of mine was to remove explicit details about the environment. Instead of hard coding the website URL, I gave the site a proper name: “Example site.” I also removed the mention of Chrome as the browser. These details are environment-specific, and they should not be specified in Gherkin. In theory, this site could have multiple instances (like an alpha or a beta), and it should probably run in any major browser (like Firefox and Edge). Environmental characteristics should be specified as inputs to the automation code instead.I also refined some of the language used in the When and Then steps. When I must write steps for mechanical actions like clicks, I like to specify element types for target elements. For example, “When the user clicks the “Upload Files” link” specifies a link by a parameterized name. Saying the element is a link helps provides context to the reader about the user interface. I wrote other steps that specify a button, too. These steps also specified the element name as a parameter so that the step definition method could possibly perform the same interaction for different elements. Keep in mind, however, that these linguistic changes are neither “required” nor “perfect.” They make sense in the immediate context of this feature. While automating step definitions or writing more scenarios, I may revisit the verbiage and do some refactoring.

Determining Value for Each Behavior

The four new scenarios I wrote each covers an independent, individual behavior of the fictitious Example site’s user interface. They are thorough in their level of coverage for these small behaviors. However, not all behaviors may be equally important to cover. Some behaviors are simply more important than others, and thus some tests are more valuable than others. I won’t go into deep detail about how to measure risk and determine value for different tests in this article, but I will offer some suggestions regarding these example scenarios.

First and foremost, you as the tester must determine what is worth testing. These scenarios aptly specify behavior, and they will likely be very useful for collaborating with the Three Amigos, but not every scenario needs to be automated for testing. You as the tester must decide. You may decide that all four of these example scenarios are valuable and should be added to the automated test suite. That’s a fine decision. However, you may instead decide that certain user interface mechanics are not worth explicitly testing. That’s also a fine decision.

In my opinion, the first two scenarios could be candidates for the chopping block:

  1. Choose to upload files
  2. Choose to upload spreadsheets

Even though these are existing behaviors in the Example site, they are tiny. The tests simply verify that a user clicks makes certain links or buttons appear. It would be nice to verify them, but test execution time is finite, and user interface tests are notoriously slow compared to other tests. Consider the Rule of 1’s: typically, by orders of magnitude, a unit test takes about 1 millisecond, a service API test takes about 1 second, and a web UI test takes about 1 minute. Furthermore, these behaviors are implicitly exercised by the other scenarios, even if they don’t have explicit assertions.

One way to condense the scenarios could be like this:

Feature: Example site
 
 
Background:
 
Given the Example site is displayed
When the user clicks the "Upload Files" link
And the user clicks the "Spreadsheet Formats" link
And the user clicks the "XLSX" button
 
 
Scenario: Upload a spreadsheet file that is smaller than 1MB
 
When the user selects "500kb-sheet.xlsx" from the file upload dialog
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
 
 
Scenario: Upload a spreadsheet file that is larger than or equal to 1MB
 
When the user selects "1mb-sheet.xlsx" from the file upload dialog
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx" 

This new feature file eliminates the first two scenarios and uses a Background section to cover the setup steps. It also eliminates the need for special Given steps in each scenario to set unique starting points. Implicitly, if the “Upload Files” or “Spreadsheet Formats” links fail to display the expected elements, then those steps would fail.

Again, this modification is not necessarily the “best” way or the “right” way to cover the desired behaviors, but it is a reasonably good way to do so. However, I would assert that both the 4-scenario feature file and the 2-scenario feature file are much better approaches than the original example scenario.

More Gherkin

What I showed in my answer to this Gherkin challenge is how I would handle UI-centric behaviors. I try to keep my Gherkin scenarios concise and focused on individual, independent behaviors. Try using these style techniques to rewrite the second half of Gojko’s original scenario. Feel free to drop your Gherkin in the comments below. I look forward to seeing how y’all write #GivenWhenThenWithStyle!

SpecFlow’s Online Gherkin Editor

Finding a good Gherkin editor is difficult. Some editors like Visual Studio Code and similar IDEs work great for engineers but aren’t suitable for product owners and non-programmer Amigos who want to contribute. Other editors like Notepad++ and Atom are lighter in weight but still require extensions and a little expertise. Fancy BDD tools like CucumberStudio and Cucumber for Jira provide Gherkin editors together with a bunch of other nifty features, but they require paid licenses.

For years, I’ve wanted a lightweight Gherkin editor that’s easy to use and accessible to all. Now, one finally exists: the Online Gherkin Editor by SpecFlow!

SpecFlow is the most popular BDD test automation framework for .NET. It’s also my favorite BDD framework. Over the past few years, I’ve built two large-scale test automation solutions with SpecFlow.

The Online Gherkin Editor by SpecFlow is just an editor on a web page. When you first load the page, the editor has example scenarios for you to reference. You can type your own Gherkin into the text area, and the editor highlights it for you. The editor provides line numbers and visual scrolling, too. My language is English, but if you happen to speak German, French, Spanish, or Dutch, then you can change the language setting via a dropdown. Once you’re done writing your Gherkin, you can clear it, copy it to the clipboard, or download it as a feature file using icons in the top-right corner. Be warned, though, that this editor won’t save your Gherkin in the cloud.

If you want to give this new editor a try, here’s the link: https://specflow.org/gherkin-editor/

You can also read SpecFlow’s official announcement here: https://specflow.org/blog/introducing-the-specflow-online-gherkin-editor/

Thanks, SpecFlow! Happy “Gherk-ing”!

What is BDD, and How Do We Practice It? (Webinar + Q&A)

On March 18, 2019, I gave a webinar entitled, “What is Behavior-Driven Development, and How Do We Practice It?” in collaboration with Paul Merrill and his company, Beaufort Fairmont. It was both a pleasure and an honor to do this webinar with them. Paul is a top-notch test automation expert, and Beaufort Fairmont is doing really exciting things. Check out their two-day BDD training offering, as well as their blog and other webinars.

To see my webinar recording, register here.

During the webinar, attendees asked more questions than we could answer. I’m excited that so many people asked questions. My answers are below.

Questions about Process

How is BDD different from TDD (Test-Driven Development)?

BDD is an evolution of TDD. In TDD, developers (1) write unit tests and watch them fail, (2) develop the feature to make the tests pass, (3) refactor the code to make it stronger, and (4) repeat the cycle. In BDD, teams do this same loop with feature tests (a.k.a “acceptance” or “black-box” tests) as well as unit tests. Furthermore, BDD adds shift left practices like Example Mapping and Specification by Example so that teams know what they are doing and focus on developing the right things.

Check out Dan North’s article, Introducing BDD, for a more thorough answer.

Can BDD be used with manual testing?

Yes! BDD is not merely an automation tool – it is a set of pragmatic practices to help teams develop better software. Gherkin scenarios are first and foremost behavior specs that help a team’s collaboration and accountability. They function secondarily as test cases that can be executed either manually or with automation.

Can we use BDD with technical stories or backend features?

Yes! If you can describe it, then you can do it.

How many Gherkin scenarios should one story have?

There’s no hard rule, but I recommend no more than a handful of rules per story, and no more than a handful of examples per rule. If you do Example Mapping and feel overwhelmed by the number of cards for a story, then the story should probably be broken into smaller stories.

Should we do Example Mapping for every story? Spending 20-30 minutes for each story would take a long time.

Try doing Example Mapping on one or two stories to start. The first time is always rough, but as you iterate on it, you’ll get better as a team. Even though Example Mapping has an upfront time cost, it will save a lot of time later in the sprint because (a) acceptance criteria is clear, (b) tests are already written, and (c) everyone has a mutual understanding of the story. The team won’t suffer through the inefficiencies of miscommunication and poor planning. You may even want to replace planning meeting with Example Mapping meetings.

What metrics should we use with BDD?

All metrics are flawed, but some metrics are useful. All the standard testing and Agile metrics still apply: code coverage, story velocity, etc. Here are some additional metrics you may consider for BDD:

  • the percentage of stories that undergo Example Mapping before the sprint
  • the number of rules and examples that get “missed” during Example Mapping and need to be added later
  • the percentage of Gherkin scenarios that get automated in the sprint

If you choose to track metrics, make sure their feedback is used to improve team practices. For more info on metrics, please read my Quality Metrics 101 series.

What were the resources you recommended at the end of the webinar?

Questions about Tools

What test management tools should we use with BDD?

I’m sure there are BDD plugins for test management tools, but I don’t have any that I can personally recommend. To be honest, I try to stay away from large test management tools like HP ALM, qTest, VersionOne. When doing BDD, the Gherkin feature files themselves should be the single source of truth for feature-level tests, and they should be version-controlled in a repository. Don’t fall into the trap of slapping “Given-When-Then” keywords onto existing functional tests – that’s not BDD.

Does Jira support Example Mapping?

I have not personally used any Jira plugin for Example Mapping. It looks like there is an Easy Agile User Story Maps plugin that is similar to but slightly different from Example Mapping.

Are there other good tools for BDD and Example Mapping?

What’s the difference between Gherkin, Cucumber, and SpecFlow?

  • Gherkin is the Given-When-Then spec language.
  • Cucumber is a company and its eponymous test framework that uses Gherkin.
  • SpecFlow is Cucumber for .NET.

Questions about Testing

Can BDD test frameworks be used for unit testing?

Yes, but I don’t recommend it. BDD frameworks shine for black-box feature testing. They’re a bit too verbose for code-level unit tests. Read BDD 101: Unit, Integration, and End-to-End Tests for more info.

Can BDD test frameworks be used for integration testing?

Yes! See BDD 101: Unit, Integration, and End-to-End Tests.

How long should Gherkin scenarios be?

Scenarios should be bite-sized. Each scenario should focus on one individual behavior. There’s no hard rule, but I recommend single-digit step counts. Read BDD 101: Writing Good Gherkin for more info.

What are “step definitions” in Cucumber?

Step definitions are the methods in the automation code that execute the steps. When a BDD framework runs a Gherkin scenario as a test, it “glues” each step to a step definition based on some sort of string matching.

How can we minimize duplicate code within a BDD test framework?

Know your steps. Always search for existing steps before writing new steps. Refactor existing steps whenever appropriate. Reuse steps when writing new scenarios. Do pair programming or mob programming when writing scenarios. Put scenarios through code reviews. Apply good coding practices – remember, test automation is software.

I write Gherkin scenarios, but I don’t write test automation code. What’s the best way to write Gherkin scenarios so that they can be automated?

Do pair programming with the automation engineers to write Gherkin scenarios together. Become familiar with existing steps by reading and searching feature files. Otherwise, the Gherkin steps you write in isolation might not be usable. Remember, BDD is a team effort!

The examples in the webinar were all fairly basic. Do you have any examples with more complex systems?

I have some example projects on GitHub in Python and Java with some basic unit, integration, and end-to-end tests, but I don’t have any large-scale examples that I can share publicly.

We wrote hundreds of SpecFlow tests without the other Amigos. Now, there are large test gaps, and many steps aren’t reusable. What should we do?

I’m sorry to hear that. It’s not an uncommon story. There are two paths: (1) refactoring or (2) starting over. Without really knowing the situation, I don’t think it’s my place to say which way is better. Here are some questions to help guide your decision:

  • What are your goals for testing and automation?
  • What’s your overall quality and testing strategy?
  • What parts of the code base are salvageable?
  • What parts of the code base should be removed?
  • If you started again from scratch, what would you do differently to make sure the same problems don’t reoccur?

I strongly recommend taking the Setting a Foundation for Successful Test Automation course from Test Automation University. (It’s free.) I also gave a talk about this very problem, Egad! How Do We Start Writing (Better) Tests?, at a few Python conferences.

We have a large BDD test suite with heavy coupling and slow execution times. The business amigos have also left the company. Should we try to fix what we have or just start over?

Sorry to hear that; same answer as before.

Final Questions

Why do you call yourself the “Automation Panda”?

Pandas are awesome. Everybody loves them. And nobody forgets my moniker.

Where can I get team training in BDD?

Beaufort Fairmont provides a one- or two-day course in BDD and writing Gherkin. Sign up for more information here.

Behavior-Driven Blasphemy

This is my 100th post on Automation Panda! I’m thrilled to see how much this blog has grown and how many people it has helped. For such a monumental occasion, I have chosen to voice a rather controversial opinion about test automation.

Behavior-driven development seems to be the software testing buzzword of the decade. What started as a refinement of test-driven development by developers in Europe and the UK quickly became the big process fad of the 2010’s. The Cucumber project (now 10 years old) developed or inspired Gherkin-based test automation frameworks in all the major programming languages. Companies started requiring Given-When-Then format for acceptance criteria and test scenarios. Three Amigos meetings became standard calendar fixtures during sprints. Organizations that once undertook “Agile transformations” now have similar initiatives for BDD. For better or worse, BDD exists and cannot be ignored.

The dogmatic benefits of BDD are better collaboration and automation. However, leaders frequently insist that Gherkin-style test frameworks add value only when paired with practices like Example Mapping. “BDD is a process, not a tool,” is a common mantra. “Otherwise, the Gherkin just gets in the way.” Although I wholeheartedly agree that behavior-driven practices add significant value to the development process, I nevertheless espouse a rather blasphemous opinion:

BDD test automation frameworks are better than traditional frameworks for black box functional testing even when BDD processes are not followed.

What Exactly Are You Saying?

My claim is that behavior-driven test frameworks like Cucumber, SpecFlow, and behave are significantly better than traditional xUnit-style frameworks for testing live features. For example, I would rather use SpecFlow than NUnit for testing a Web app with Selenium WebDriver, whether or not the other two Amigos are with me. The resulting automation code has better structure, readability, and reusability.

I’m not saying that teams shouldn’t do BDD practices, and I’m not saying that the Three Amigos should be separated. Collaboration is key to success, and BDD really helps. Example Mapping is one of the most useful practices a development team can do. I’m also not saying that BDD frameworks should be used for all testing purposes – they are poorly suited for unit testing and for performance testing.

Objection!

I find myself very lonely in this opinion. BDD leaders repeatedly insist that BDD is not about testing and automation:

The most outspoken BDDers (mostly coalescing around the Cucumber community) have largely moved their focus to the collaboration benefits, almost forsaking the automation benefits. (This may not necessarily be true, but it appears that way based on the literature and materials floating on the Web.) That outlook is somewhat disingenuous because the main tools supporting BDD are, in fact, test frameworks.

BDD also has outspoken opponents – it’s love or hate. I’ve personally spoken with several engineers who despise Gherkin-based frameworks. “I can see how it would be valuable when a whole team embraces behavior-driven practices,” many have told me, “but otherwise, the Gherkin layer just gets in the way of automation.” I’ve heard it called “plaster” and “garbage.” Engineers just want to code their tests. And code should always be readable, right?

hqdefault

Testing is an inherently opinionated space. People can never seem to agree on things.

The Bigger Picture

Test automation must be developed regardless of any specific development practices, and its architecture must stand firmly in its own right. Unfortunately, both sides miss the bigger picture:

The best solution for test automation is a domain-specific language.

A domain-specific language (DSL) is a programming language with a purpose. It is designed to handle very specific needs, rather than general-purpose programming. For example:

  • SQL is a DSL for database queries.
  • XPath is a DSL for finding elements in an XML document.
  • YAML is a DSL for object serialization.

Gherkin is also a DSL – for behavior specification.

Domain-specific languages naturally suit test automation due to the clear difference between test cases and test code. Test cases are procedures that exercise product behavior. Anyone can write a test case. They are dictated or explained in plain language. Test code, however, is the software implementation of test cases. Test code handles function calls, logging, exceptions, and all those other little programming details that help run tests. A test automation DSL separates those concerns: test cases are written in a special language, and the interpreter handles repetitive, low-level details. Some type of extensions can handle product-specific interactions. The purpose of a language is to effectively express intention – and the intention is to test the product.

To truly achieve an optimal solution, however, the DSL and its interpreter must be treated as part of the automation software, just like the test cases and extensions. Remember, a language’s interpreter is just another piece of software. The interpreter is part of the separation of concerns and the single responsibility principle. Concerns that would typically be handled by classes and functions in traditional test code should be moved to the interpreter. For example, the interpreter should automatically log every test case step, rather that forcing the author to write explicit logging statements.

When I worked at NetApp years ago, I implemented a DSL to test platform-level features of our operating system. I called it DS – short for “Design Steps” (from HP ALM) (but also not without an affinity for the Nintendo DS). NetApp’s entire test automation code was developed in Perl at the time, so I implemented the DS interpreter in Perl to reuse existing libraries. DS test cases were typically only a dozen lines long each, and DS expressions could call specially-written Perl modules directly for complete extendability. During the first big release using DS, my team saved countless hours of automation development time as compared to the previous release while delivering a higher number of tests. I also did this before I had ever heard of BDD.

Unfortunately, most teams have neither the time to develop their own testing DSL nor the understanding of compiler theory to build it right. And if they were given such a language, they typically limit themselves to the provided implementation instead of taking ownership to extend the language for their needs.

nintendo-ds-1

The original Nintendo DS. Fun times!

Who Truly Misunderstands Gherkin?

Enter Gherkin: the world’s first major general-purpose, off-the-shelf language for test automation. It is general enough to cover any case through its plain language steps, yet specific enough to standardize tests. Users don’t need to be compiler theory experts – they just make up their own step names and provide the definition code to execute them. Early BDD projects like JBehave and Cucumber packaged an interpreter as a test framework and delivered it to a testing world still stuck on JUnit. The need for a testing DSL was there, whether or not the BDD folks meant to serve it.

Cucumber-ites frequently bemoan that their framework is misunderstood by the masses. They shudder to see teams using their framework purely for test automation. However, Cucumber effectively lowered the entry barrier for teams to make their own testing DSLs. Kodak did the same thing for film: they made it cheap and standard so anyone could be a photographer. Not everyone who uses a BDD framework misunderstands its purpose: some (like me) just see an alternative value proposition than what is preached by orthodox BDD practitioners. Gherkin fills a need that nobody knew. Its popularity validates that claim.

Benefits Apart from Process

Using a BDD framework adds much value to testing and development even without BDD processes. Below are just a handful of benefits:

  1. Focus first on good scenarios. Gherkin forces authors to think before they code.
  2. Faster automation development. Gherkin steps are reusable and parametrizable.
  3. Stronger structure. Engineers know where to put things in the framework.
  4. Test understandability. Anyone can read scenarios because they are written in plain language. Business people can help. New people can pick it up fast.
  5. Test sharing. Feature files can be shared apart from test code, which can be helpful for business partners.
  6. Test similarity. Tests all look the same. Team members can more easily help each other.
  7. Clearer failures. When a scenario fails, reports show exactly what step failed.
  8. Simpler bug reports. Use scenario steps as instructions to reproduce the failure.
  9. 2-phase test reviews. Review Gherkin first and then test code second to make sure the test cases are good before implementing the wrong things.
  10. BDD enablement. Using a BDD framework opens the door for a team to embrace better behavioral practices in the future.

I wrote about these advantages before:

Case Studies

I’m also not the only one who finds value in BDD test frameworks outside of the full BDD process. Below are five case studies.

radish

radish is a Python test framework inspired by Cucumber. Its DSL syntax is a superset of Gherkin that adds preconditions, loops, variables, and expressions. These language additions indicate a bias towards automation because they enable engineers to write tests more programmatically, albeit in a Gherkin-ese way.

Karate

Karate is a test framework with a full DSL based on Gherkin with steps specifically tailored to Web service calls. Although it is implemented in Java, testers do not need to do any Java programming to write complete tests cases from day one. Peter Thomas, the creator of Karate, unabashedly declares that Karate does not truly adhere to BDD but nevertheless uses Cucumber for its automation benefits. (Note: Karate is working to move completely off of Cucumber. See GitHub issue #444.)

REST Assured

REST Assured is a Java package for testing REST APIs. Unlike Karate, REST Assured provides a fluent syntax (and not a DSL) for writing service calls directly in Java code. The fluent syntax is based on Gherkin: given() a request spec is created, when() the call is made, then() verify the response. Although REST Assured is not a full testing framework, it nevertheless pulls inspiration from BDD frameworks for order and structure.

Cycle

Cycle is a BDD-focused solution from Tryon Solutions for testing Web, terminal, and desktop apps. Cycle is unique because it provides out-of-the-box steps for all types of supported testing so that no programming experience is required. Testers write feature files using Cycle 2.0’s slick new Electron app. Scenarios are written in CycleScript, a Gherkin-ese language with additions like variables and sub-scenario calls. Steps tend to be imperative, but that’s the tradeoff for not requiring lower-level programming.

Hexawise

Hexawise is a combinatorial testing tool designed to maximize coverage with minimal test counts by smartly joining feature variations. It helps testers write better tests with less redundancy and fewer gaps. Although Hexawise has historically assisted manual testers, it also can generate Gherkin feature files for test variations.

mexican-coast-dried-sea-cucumber

Not all cucumbers are the same. Above is a sea cucumber.

Good Enough?

Gherkin-based test frameworks are not perfect, but they do provide good structure. They gained popularity outside of the pure BDD movement because they genuinely added value to testing and automation. Like any other tool, teams will use them in both good and bad ways. (Trust me, I’ve seen scary Gherkin.)

It’s interesting to see how groups outside the Cucumber diaspora are attempting to solve the limitations of pure Gherkin. Each case study above showed a unique path. Clearly, the test automation problem has not yet been completely solved, but current BDD frameworks are the best off-the-shelf solutions we have until a new software testing movement comes along.

5 Things I Love About SpecFlow

SpecFlow, a.k.a. “Cucumber for .NET,” is a leading BDD test automation framework for .NET. Created by Gáspár Nagy and maintained as a free, open source project on GitHub by TechTalk, SpecFlow presently has almost 3 million total NuGet downloads. I’ve used it myself at a few companies, and, I must say as an automationeer, it’s awesome! SpecFlow shares a lot in common with other Cucumber frameworks like Cucumber-JVM, but it is not a knockoff – it excels in many ways. Below are five features I love about SpecFlow.

#1: Declarative Specification by Example

SpecFlow is a behavior-driven test framework. Test cases are written as Given-When-Then scenarios in Gherkin “.feature” files. For example, imagine testing a cucumber basket:

Feature: Cucumber Basket
  As a gardener,
  I want to carry many cucumbers in a basket,
  So that I don’t drop them all.
  
  @cucumber-basket
  Scenario: Fill an empty basket with cucumbers
    Given the basket is empty
    When "10" cucumbers are added to the basket
    Then the basket is full

Notice a few things:

  • It is declarative in that steps indicate what should be done at a high level.
  • It is concise in that a full test case is only a few lines long.
  • It is meaningful in that the coverage and purpose of the test are intuitively obvious.
  • It is focused in that the scenario covers only one main behavior.

Gherkin makes it easy to specify behaviors by example. That way, everybody can understand what is happening. C# code will implement each step in lower layers. Even if your team doesn’t do the full-blown BDD process, using a BDD framework like SpecFlow is still great for test automation. Test code naturally abstracts into separate layers, and steps are reusable, too!

#2: Context is King

Safely sharing data (e.g., “context”) between steps is a big challenge in BDD test frameworks. Using static variables is a simple yet terrible solution – any class can access them, but they create collisions for parallel test runs. SpecFlow provides much better patterns for sharing context.

Context injection is SpecFlow’s simple yet powerful mechanism for inversion of control (using BoDi). Any POCOs can be injected into any step definition class, either using default values or using a specific initialization, by declaring the POCO as a step def constructor argument. Those instances will also be shared instances, meaning steps across different classes can share the same objects! For example, steps for Web tests will all need a reference to the scenario’s one WebDriver instance. The context-injected objects are also created fresh for each scenario to protect test case independence.

Another powerful context mechanism is ScenarioContext. Every scenario has a unique context: title, tags, feature, and errors. Arbitrary objects can also be stored in the context object like a Dictionary, which is a simple way to pass data between steps without constructor-level context injection. Step definition classes can access the current scenario context using the static ScenarioContext.Current variable, but a better, thread-safe pattern is to make all step def classes extend the Steps class and simply reference the ScenarioContext instance variable.

#3: Hooks for Any Occasion

Hooks are special methods that insert extra logic at critical points of execution. For example, WebDriver cleanup should happen after a Web test scenario completes, no matter the result. If the cleanup routine were put into a Then step, then it would not be executed if the scenario had a failure in a When step. Hooks are reminiscent of Aspect-Oriented Programming.

Most BDD frameworks have some sort of hooks, but SpecFlow stands out for its hook richness. Hooks can be applied before and after steps, scenario blocks, scenarios, features, and even around the whole test run. (Cucumber-JVM, by contrast, does not support global hooks.) Hooks can be selectively applied using tags, and they can be assigned an order if a project has multiple hooks of the same type. Hook methods will also be picked up from any step definition class. SpecFlow hooks are just awesome!

#4: Thorough Outline Templating

Scenario Outlines are a standard part of Gherkin syntax. They’re very useful for templating scenarios with multiple input combinations. Consider the cucumber basket again:

Feature: Cucumber Basket
  
  Scenario Outline: Add cucumbers to the basket
    Given the basket has "<initial>" cucumbers
    When "<some>" cucumbers are added to the basket
    Then the basket has "<total>" cucumbers

    Examples: Counts
      | initial | some | total |
      | 1       | 2    | 3     |
      | 5       | 3    | 8     |

All BDD frameworks can parametrize step inputs (shown in double quotes). However, SpecFlow can also parametrize the non-input parts of a step!

Feature: Cucumber Basket
  
  Scenario Outline: Use the cucumber basket
    Given the basket has "<initial>" cucumbers
    When "<some>" cucumbers are <handled-with> the basket
    Then the basket has "<total>" cucumbers

    Examples: Counts
      | initial | some | handled-with | total |
      | 1       | 2    | added to     | 3     |
      | 5       | 3    | removed from | 2     |

The step definitions for the add and remove steps are separate. The step text for the action is parametrized, even though it is not a step input:

[When(@"""(\d+)"" cucumbers are added to the basket")]
public void WhenCucumbersAreAddedToTheBasket(int count) { /* */ }

[When(@"""(\d+)"" cucumbers are removed from the basket")]
public void WhenCucumbersAreRemovedFromTheBasket(int count) { /* */ }

That’s cool!

#5: Test Thread Affinity

SpecFlow can use any unit test runner (like MsTest, NUnit, and xUnit.net), but TechTalk provides the official SpecFlow+ Runner for a licensed fee. I’m not associated with TechTalk in any way, but the SpecFlow+ Runner is worth the cost for enterprise-level projects. It has a friendly command line, a profile file to customize execution, parallel execution, and nice integrations.

The major differentiator, in my opinion, is its test thread affinity feature. When running tests in parallel, the major challenge is avoiding collisions. Test thread affinity is a simple yet powerful way to control which tests run on which threads. For example, consider testing a website with user accounts. No two tests should use the same user at the same time, for fear of collision. Scenarios can be tagged for different users, and each thread can have the affinity to run scenarios for a unique user. Some sort of parallel isolation management like test thread affinity is absolutely necessary for test automation at scale. Given that the SpecFlow+ Runner can handle up to 64 threads (according to TechTalk), massive scale-up is possible.

But Wait, There’s More!

SpecFlow is an all-around great test automation framework, whether or not your team is doing full BDD. Feel free to add comments below about other features you love (or *gasp* hate) about SpecFlow!

 

Pipe Character Escape for Gherkin Tables

For the first time today, I had to write a Gherkin behavior scenario in which table text needed to use the pipe character “|”. I wanted a generic step that would find and click web page links by name, and one of the link names had the pipe in it! The first version of the step I wrote looked like this:

When the user follows the links:
  | link              |
  | Category          |
  | Sub-Category      |
  | Index|Description |

Naturally, this step didn’t parse – the “|” was parsed as a table delimiter instead of the intended link text. I could have rewritten the step to search for partial link text, or I could have done a key-value lookup, but I wanted to keep the step simple and direct.

The solution was simple: escape the pipe character “|” with a backslash character “\”. Easy! Thanks, StackOverflow! The updated table looks like this:

When the user follows the links:
 | link               |
 | Category           |
 | Sub-Category       |
 | Index\|Description |

“\|” works for both step tables and scenario outline example tables. It looks like it is fairly standard for test frameworks that use Gherkin. I verified that Cucumber-JVM and SpecFlow support it, and it looks like Cucumber for Ruby does as well. It looks like behave will support it in 1.2.6.

After learning this trick, I updated the BDD 101: The Gherkin Language page.

Note that backslash escape sequences won’t work for quotes in Gherkin steps. Quotes in steps are merely conventions and not part of the Gherkin language standard.

BDD 101: Manual Testing

Behavior-driven development takes an automation-first philosophy: behavior specs should become automated tests. However, BDD can also accommodate manual testing. Manual testing has a place and a purpose, even in BDD. Remember, behavior scenarios are first and foremost behavior specifications, and they provide value beyond testing and automation. Any behavior scenario could be run as a manual test. The main questions, then, are (1) when is manual testing appropriate and (2) how should it be handled.

(Check the Automation Panda BDD page for the full BDD 101 table of contents.)

When is Manual Testing Appropriate?

Automation is not a silver bullet – it doesn’t satisfy all testing needs. Scenarios should be written for all behaviors, but they likely shouldn’t be automated under the following circumstances:

  • The return-on-investment to automate the scenarios is too low.
  • The scenarios won’t be included in regression or continuous integration.
  • The behaviors are temporary (ex: hotfixes).
  • The automation itself would be too complex or too fragile.
  • The nature of the feature is non-functional (ex: performance, UX, etc.).
  • The team is still learning BDD and is not yet ready to automate all scenarios.

Manual testing is also appropriate for exploratory testing, in which engineers rely upon experience rather than explicit test procedures to “explore” the product under test for bugs and quality concerns. It complements automation because both testing styles serve different purposes. However, behavior scenarios themselves are incompatible with exploratory testing. The point of exploring is for engineers to go “unscripted” – without formal test plans – to find problems only a user would catch. Rather than writing scenarios, the appropriate way to approach behavior-driven exploratory testing is more holistic: testers should assume the role of a user and exercise the product under test as a collection of interacting behaviors. If exploring uncovers any glaring behavior gaps, then new behavior scenarios should be added to the catalog.

How Should Manual Testing Be Handled?

Manual testing fits into BDD in much the same way as automated testing because both formats share the same process for behavior specification. Where the two ways diverge is in how the tests are run. There are a few special considerations to make when writing scenarios that won’t be automated.

Repository

Both manual and automated behavior scenarios should be stored in the same repository. The natural way to organize behaviors is by feature, regardless of how the tests will be run. All scenarios should also be managed by some form of version control.

Furthermore, all scenarios should be co-located for document-generation tools like Pickles. Doc tools make it easy to expose behavior specs and steps to everyone. They make it easier for the Three Amigos to collaborate. Non-technical people are not likely to dig into programming projects.

Tags

Scenarios must be classified as manual or automated. When BDD frameworks run tests, they need a way to exclude tests that are not automated. Otherwise, test reports would be full of errors! In Gherkin, scenarios should be classified using tags. For example, scenarios could be tagged as either “@manual” or “@automated”. A third tag, “@automatable”, could be used to distinguish scenarios that are not yet automated but are targeted for automation.

Some BDD frameworks have nifty features for tags. In Cucumber-JVM, tags can be set as runner class options for convenience. This means that tag options could be set to “~@manual” to avoid manual tests. In SpecFlow, any scenario with the special “@ignore” tag will automatically be skipped. Nevertheless, I strongly recommend using custom tags to denote manual tests, since there are many reasons why a test may be ignored (such as known bugs).

Extra Comments

The conciseness of behavior scenarios is problematic for manual testing because steps don’t provide all the information a tester may need. For example, test data may not be written explicitly in the spec. The best way to add extra information to a scenario is to add comments. Gherkin allows any number of lines for comments and description. Comments provide extra information to the reader but are ignored by the automation.

It may be tempting to simply write new Gherkin steps to handle the extra information for manual testing. However, this is not a good approach. Principles of good Gherkin should be used for all scenarios, regardless of whether or not the scenarios will be automated. High-quality specification should be maintained for consistency, for documentation tools, and for potential future automation.

An Example

Below is a feature that shows how to write behavior scenarios for manual tests:

Feature: Google Searching

  @automated
  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  @manual
  Scenario: Image search
    # The Google home page URL is: http://www.google.com/
    # Make sure the images shown include pandas eating bamboo
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

It’s not really different from any other behavior scenarios.

 

As stated in the beginning, BDD should be automation-first. Don’t use the content of this article to justify avoiding automation. Rather, use the techniques outlined here for manual testing only as needed.

 

BDD 101: Frameworks

Every major programming language has a BDD automation framework. Some even have multiple choices. Building upon the structural basics from the previous post, this post provides a survey of the major frameworks available today. Since I cannot possibly cover every BDD framework in depth in this 101 series, my goal is to empower you, the reader, to pick the best framework for your needs. Each framework has support documentation online justifying its unique goodness and detailing how to use it, and I would prefer not to duplicate documentation. Use this post primarily as a reference. (Check the Automation Panda BDD page for the full table of contents.)

Major Frameworks

Most BDD frameworks are Cucumber versions, JBehave derivatives inspired by Dan North, or non-Gherkin spec runners. Some put behavior scenarios into separate files, while others put them directly into the source code.

C# and Microsoft .NET

SpecFlow, created by Gáspár Nagy, is arguably the most popular BDD framework for Microsoft .NET languages. Its tagline is “Cucumber for .NET” – thus fully compliant with Gherkin. SpecFlow also has polished, well-designed hookscontext injection, and parallel execution (especially with test thread affinity). The basic package is free and open source, but SpecFlow also sells licenses for SpecFlow+ extensions. The free version requires a unit test runner like MsTest, NUnit, or xUnit.net in order to run scenarios. This makes SpecFlow flexible but also feels jury-rigged and inelegant. The licensed version provides a slick runner named SpecFlow+ Runner (which is BDD-friendly) and a Microsoft Excel integration tool named SpecFlow+ Excel. Microsoft Visual Studio has extensions for SpecFlow to make development easier.

There are plenty of other BDD frameworks for C# and .NET, too. xBehave.net is an alternative that pairs nicely with xUnit.net. A major difference of xBehave.net is that scenario steps are written directly in the code, instead of in separate text (feature) files. LightBDD bills itself as being more lightweight than other frameworks and basically does some tricks with partial classes to make the code more readable. NSpec is similar to RSpec and Mocha and uses lambda expressions heavily. Concordion offers some interesting ways to write specs, too. NBehave is a JBehave descendant, but the project appears to be dead without any updates since 2014.

Java and JVM Languages

The main Java rivalry is between Cucumber-JVM and JBehave. Cucumber-JVM is the official Cucumber version for Java and other JVM languages (Groovy, Scala, Clojure, etc.). It is fully compliant with Gherkin and generates beautiful reports. The Cucumber-JVM driver can be customized, as well. JBehave is one of the first and foremost BDD frameworks available. It was originally developed by Dan North, the “father of BDD.” However, JBehave is missing key Gherkin features like backgrounds, doc strings, and tags. It was also a pure-Java implementation before Cucumber-JVM existed. Both frameworks are widely used, have plugins for major IDEs, and distribute Maven packages. This popular but older article compares the two in slight favor of JBehave, but I think Cucumber-JVM is better, given its features and support.

The Automation panda article Cucumber-JVM for Java is a thorough guide for the Cucumber-JVM framework.

Java also has a number of other BDD frameworks. JGiven uses a fluent API to spell out scenarios, and pretty HTML reports print the scenarios with the results. It is fairly clean and concise. Spock and JDave are spec frameworks, but JDave has been inactive for years. Scalatest for Scala also has spec-oriented features. Concordion also provides a Java implementation.

JavaScript

Almost all JavaScript BDD frameworks run on Node.js. Jasmine and Mocha are two of the most popular general-purpose JS test frameworks. They differ in that Jasmine has many features included (like assertions and spies) that Mocha does not. This makes Jasmine easier to get started (good for beginners) but makes Mocha more customizable (good for power users). Both claim to be behavior-driven because they structure tests using “describe” and “it-should” phrases in the code, but they do not have the advantage of separate, reusable steps like Gherkin. Personally, I consider Jasmine and Mocha to be behavior-inspired but not fully behavior-driven.

Other BDD frameworks are more true to form. Cucumber provides Cucumber.js for Gherkin-compliant happiness. Yadda is Gherkin-like but with a more flexible syntax. Vows provides a different way to approach behavior using more formalized phrase partitions for a unique form of reusability. The Cucumber blog argues that Cucumber.js is best due to its focus on good communication through plain language steps, whereas other JavaScript BDD frameworks are more code-y. (Keep in mind, though, that Cucumber would naturally boast of its own framework.) Other comparisons are posted here, here, here, and here.

PHP

The two major BDD frameworks for PHP are Behat and Codeception. Behat is the official Cucumber version for PHP, and as such is seen as the more “pure” BDD framework. Codeception is more programmer-focused and can handle other styles of testing. There are plenty of articles comparing the two – here, here, and here (although the last one seems out of date). Both seem like good choices, but Codeception seems more flexible.

Python

Python has a plethora of test frameworks, and many are BDD. behave and lettuce are probably the two most popular players. Feature comparison is analogous to Cucumber-JVM versus JBehave, respectively: behave is practically Gherkin compliant, while lettuce lacks a few language elements. Both have plugins for major IDEs. pytest-bdd is on the rise because it integrates with all the wonderful features of pytestradish is another framework that extends the Gherkin language to include scenario loops, scenario preconditions, and variables. All these frameworks put scenarios into separate feature files. They all also implement step definitions as functions instead of classes, which not only makes steps feel simpler and more independent, but also avoids unnecessary object construction.

Other Python frameworks exist as well. pyspecs is a spec-oriented framework. Freshen was a BDD plugin for Nose, but both Freshen and Nose are discontinued projects.

Ruby

Cucumber, the gold standard for BDD frameworks, was first implemented in Ruby. Cucumber maintains the official Gherkin language standard, and all Cucumber versions are inspired by the original Ruby version. Spinach bills itself as an enhancement to Cucumber by encapsulating steps better. RSpec is a spec-oriented framework that does not use Gherkin.

Which One is Best?

There is no right answer – the best BDD framework is the one that best fits your needs. However, there are a few points to consider when weighing your options:

  • What programming language should I use for test automation?
  • Is it a popular framework that many others use?
  • Is the framework actively supported?
  • Is the spec language compliant with Gherkin?
  • What type of testing will you do with the framework?
  • What are the limitations as compared to other frameworks?

Frameworks that separate scenario text from implementation code are best for shift-left testing. Frameworks that put scenario text directly into the source code are better for white box testing, but they may look confusing to less experienced programmers.

Personally, my favorites are SpecFlow and pytest-bdd. At LexisNexis, I used SpecFlow and Cucumber-JVM. For Python, I used behave at MaxPoint, but I have since fallen in love with pytest-bdd since it piggybacks on the wonderfulness of pytest. (I can’t wait for this open ticket to add pytest-bdd support in PyCharm.) For skill transferability, I recommend Gherkin compliance, as well.

Reference Table

The table below categorizes BDD frameworks by language and type for quick reference. It also includes frameworks in languages not described above. Recommended frameworks are denoted with an asterisk (*). Inactive projects are denoted with an X (x).

Language Framework Type
C Catch In-line Spec
C++ Igloo In-line Spec
C# and .NET Concordion
LightBDD
NBehave x
NSpec
SpecFlow *
xBehave.net
In-line Spec
In-line Gherkin
Separated semi-Gherkin
In-line Spec
Separated Gherkin
In-line Gherkin
Golang Ginkgo In-line Spec
Java and JVM Cucumber-JVM *
JBehave
JDave x
JGiven *
Scalatest
Spock
Separated Gherkin
Separated semi-Gherkin
In-line Spec
In-line Gherkin
In-line Spec
In-line Spec
JavaScript Cucumber.js *
Yadda
Jasmine
Mocha
Vows
Separated Gherkin
Separated semi-Gherkin
In-line Spec
In-line Spec
In-line Spec
Perl Test::BDD::Cucumber Separated Gherkin
PHP Behat
Codeception *
Separated Gherkin
Separated or In-line
Python behave *
freshen x
lettuce
pyspecs
pytest-bdd *
radish
Separated Gherkin
Separated Gherkin
Separated semi-Gherkin
In-line Spec
Separated semi-Gherkin
Separated Gherkin-plus
Ruby Cucumber *
RSpec
Spinach
Separated Gherkin
In-line Spec
Separated Gherkin
Swift / Objective C Quick In-line Spec

 

[4/22/2018] Update: I updated info for C# and Python frameworks.