Gherkin

How Q2 uses BDD with SpecFlow for testing PrecisionLender

This case study was written by Andrew Knight, Lead Software Engineer in Test for Q2’s PrecisionLender product, in collaboration with Q2 and Tricentis. It explains the PrecisionLender team’s continuous testing journey and how SpecFlow served as a cornerstone for success.

What is PrecisionLender?

PrecisionLender is a web application that empowers commercial bankers with in-the-moment insights that help them structure and price commercial deals. Andi®, PrecisionLender’s intelligent virtual analyst, delivers these hyper-focused recommendations in real-time, allowing relationship managers to make data-driven decisions while pricing their commercial deals. PrecisionLender is owned and developed by Q2, a financial experience software company dedicated to providing digital banking and lending solutions to banks, credit unions, alternative finance, and fintech companies in the U.S. and internationally.

*The PrecisionLender Opportunity Screen*
*(Picture taken from the* *PrecisionLender Support Center*)

The starting point

The PrecisionLender team had a robust Continuous Integration (CI) delivery pipeline with strong unit test coverage, but they lacked end-to-end feature coverage. Developers would fill this gap by manually inspecting their changes in a shared development environment. However, as the PrecisionLender app grew, manual checks could not cover all possible integrations. The team knew they needed continuous automated testing to provide a safety net for development to remain lean and efficient. In April 2018, they hired Andrew Knight as their first Software Engineer in Test (SET) – a new role for the company – to lead the effort.

Automating tests with SpecFlow

The PrecisionLender team developed the Boa test solution – a project for automating end-to-end tests at scale. Boa would become PrecisionLender’s internal platform for test automation development. The name “Boa” is a loose acronym for “Behavior-Oriented Automation.”

The team chose SpecFlow to be the core framework for Boa tests. Since the PrecisionLender app’s backend is developed using .NET, SpecFlow was a natural fit. SpecFlow’s Gherkin syntax made tests readable and understandable, even to product owners and product support specialists who do not code.

The SpecFlow framework integrates with tools like Selenium WebDriver for testing Web UIs and RestSharp for testing REST APIs to exercise vital pathways for thorough app coverage. SpecFlow’s dependency injection mechanisms are solid yet simple, and the online docs are thorough. Plus, SpecFlow is an open-source project, so anyone can look at its code to learn how things work, open requests for new features, and even offer code contributions.

*An example Boa test, written in Gherkin using SpecFlow.*

Executing tests with SpecFlow+ Runner

Writing good tests was only part of the challenge. The PrecisionLender team needed to execute Boa tests continuously to provide fast feedback on changes to the app. The team chose to run Boa tests using SpecFlow+ Runner, which is tailored for SpecFlow tests. The team uses SpecFlow+ Runner to launch tests in parallel in TeamCity any time a developer deploys a code change to internal pre-production environments. The entire test suite also runs every night against multiple product configurations. SpecFlow+ Runner produces a helpful test report with everything needed to triage test failures: pass-and-fail tallies overall and per feature, a visual execution timeline, and full system logs. If engineers need to investigate certain failures more closely, they can use SpecFlow tags and SpecFlow+ Runner profiles to selectively filter tests for reruns. SpecFlow+ Runner’s multiple features help the team expedite test execution and investigation.

*The SpecFlow+ Runner report for a dozen smoke tests.*

Sharing features with SpecFlow+ LivingDoc

Good test cases are more than just verification procedures – they are behavior specifications. They define how features should work. Instead of keeping testing work siloed by role, the PrecisionLender team wanted to share Boa tests as behavior specs with all stakeholders to foster greater collaboration and understanding around features. The team also wanted to share Boa tests with specific customers without sharing the entire automation code.

SpecFlow+ LivingDoc enabled the PrecisionLender team to turn Gherkin feature files into living documentation. Whereas the SpecFlow+ Runner report focuses on automation execution, the SpecFlow+ LivingDoc report focuses on behavior specification apart from coding and automation details. LivingDoc displays Gherkin scenarios in a readable, searchable way that both internal folks and customers can consume. It can also optionally include high-level pass-and-fail results for each scenario, providing just enough information to be helpful and not overwhelming. LivingDoc has also helped PrecisionLender’s engineers identify and eliminate unused step definitions within the automation code. PrecisionLender benefits greatly from complementary reports from SpecFlow+ Runner and SpecFlow+ LivingDoc.

*The SpecFlow+ LivingDoc report for a dozen smoke tests with their pass-and-fail results.*

Improving interactions with Boa Constrictor

The Boa test solution initially used the Page Object Model to model interactions with the PrecisionLender app. However, as the PrecisionLender team automated more and more Boa tests, it became apparent that page objects did not scale well. Many page object classes had duplicative methods, making automation code messy. Some methods also did not include appropriate waiting mechanisms, introducing flaky failures.

PrecisionLender’s SETs developed Boa Constrictor, a .NET implementation of the Screenplay Pattern, to make better interactions for better automation. In Screenplay, actors use abilities to perform interactions. For example, an ability could be using Selenium WebDriver, and an interaction could be clicking an element. The Screenplay Pattern can be seen as a refactoring of the Page Object Model that minimizes duplicate code through a better separation of concerns. Individual interactions can be hardened for robustness, eliminating flaky hotspots. The Boa test solution now exclusively uses Boa Constrictor for interactions.

In October 2020, Q2 released Boa Constrictor as an open-source project so that anyone can use it. It is fully compatible with SpecFlow and other .NET test frameworks, and it provides rich interactions for Selenium WebDriver and RestSharp out of the box.

*Boa Constrictor, the .NET Screenplay Pattern.*

Scaling massively with Selenium Grid

When the PrecisionLender team first started automating Boa tests, they ran tests one at a time. That soon became too slow since the average Boa test took 20 to 50 seconds to complete. The team then started running up to 3 tests in parallel on one machine, but that also was not fast enough. They turned to Selenium Grid, a tool for running WebDriver sessions remotely across multiple machines.

PrecisionLender built a set of internal Selenium Grid instances using Microsoft Azure virtual machines to run Boa tests at high scale. As of July 2021, PrecisionLender has over 1800 unique Boa tests that run across four distinct product configurations. Whenever TeamCity detects a code change, it triggers a “continuous” Boa test suite with over 1000 tests running 50 parallel tests using Google Chrome on Selenium Grid. It completes execution in about 10 minutes. TeamCity launches the full test suite every night against all product configurations with 64-100 parallel tests on Selenium Grid. Continuous Integration currently runs up to 10K Boa tests daily against the PrecisionLender app with SpecFlow+ Runner and Selenium Grid.

*The Boa test solution architecture, including Continuous Integration through TeamCity and parallel testing with SpecFlow+ Runner and Selenium Grid.*

Shifting left with BDD

Better testing and automation practices eventually inspired better development practices. Product owners would create user stories, but developers would struggle to understand requirements and business purposes fully. PrecisionLender’s SETs started bringing together the Three Amigos – business, development, and testing roles – to discuss product behaviors proactively while creating user stories. They introduced Behavior-Driven Development (BDD) activities like Example Mapping to explore behaviors together. Then, well-defined stories could be easily connected to SpecFlow tests written in Gherkin following Specification by Example (SBE). Teams repeatedly saved time by thinking before coding and specifying before testing. They built higher quality into features from the beginning, and they stopped before working on half-baked stories with unjustified value propositions. Developers who participated in these behavior-driven practices were also more likely to automate Boa tests on their own. Furthermore, one of PrecisionLender’s developers loved BDD practices so much that he joined the team of SETs! Through Gherkin, SpecFlow provided a foundation that enabled quality work to shift left.

Challenges along the way

Achieving true continuous testing had its challenges along the way. Intermittent failure was the most significant issue PrecisionLender faced at scale. With so many tests, environments, and infrastructural pieces, arbitrary failures were statistically unavoidable. The PrecisionLender team took a two-pronged approach to handle intermittent failures: (1) eliminate race conditions in automation using good interactions with Boa Constrictor, and (2) use SpecFlow+ Runner to automatically retry failed tests to determine if failures were consistent or intermittent. These two approaches reduced the frequency of flaky failures and helped engineers quickly resolve any remaining issues. As a result, Boa tests enjoy well above a 99% success rate, and most failures are due to actual bugs.

PrecisionLender app performance at scale was a second big challenge. Running up to 100 tests in parallel turned functional tests into de facto load tests. Testing at scale repeatedly uncovered performance bottlenecks in the app. Performance issues caused widespread test failures that were difficult to diagnose because they appeared intermittently. Still, the visual timeline and timestamps in the SpecFlow+ Runner report helped the team identify periods of failure that could be crosschecked against backend logs, metrics, and database queries. Developers resolved many performance issues and significantly boost the app’s response times and load capacity.

Training team members to develop solid test automation was the third challenge. At the start of the journey, test automation, Gherkin, and BDD were all new to PrecisionLender. The PrecisionLender SETs took active steps to train others on how to develop good tests and good automation through group workshops, Three Amigos meetings, and one-on-one mentoring sessions. They shared resources like the Automation Panda blog for how to write good tests and good Gherkin. The investment in education paid off: many developers have joined the SETs in writing readable, reliable Boa tests that run continuously.

Benefits to the business

Developing a continuous testing solution brought many incredible benefits to PrecisionLender. First, the quality of the PrecisionLender app improved because continuous testing provided fast feedback on failures that developers could quickly fix. Instead of relying on manual spot checks, the team could trust the comprehensive safety net of Boa tests to catch bugs. Many issues would be caught within an hour of a developer making a code commit, and the longest feedback cycle would be only one business day for the full nightly test suites to run. Boa tests catch failures before customers ever experience them. The continuous nature of testing enables PrecisionLender to publish new releases every two weeks.

Second, the high reliability of the Boa test solution means that the PrecisionLender team can trust test results. When a test passes, the behavior is working. When a test fails, there is a real bug. Reliability also means that engineers spend less time on automation maintenance and more time on more valuable activities, like developing new features and adding new tests. Quality is present in both the product code and the test code.

Third, continuous testing boosts customer confidence in PrecisionLender. Customers trust the software quality because they know that PrecisionLender thoroughly tests every release. The PrecisionLender team also shares SpecFlow+ LivingDoc reports with specific clients to prove quality.

A bright future

PrecisionLender’s continuous testing journey is not over. Since the PrecisionLender team hired its first SET, it has hired three more, in addition to a testing manager, to grow quality improvement efforts. Multiple development teams have written their own Boa tests, and they plan to write more tests independently. SpecFlow’s tools have been indispensable in helping the PrecisionLender team achieve successful quality assurance. As PrecisionLender welcomes more customers, the Boa solution will be ready to scale with more tests, more configurations, and more executions.

Should Gherkin Steps use Past, Present, or Future Tense?

Gherkin’s Given-When-Then syntax is a great structure for specifying behaviors. However, while writing Gherkin may seem easy, writing good Gherkin can be a challenge. One aspect to consider is the tense used for Gherkin steps. Should Gherkin steps use past, present, or future tense?

One approach is to use present tense for all steps, like this:

Scenario: Simple Google search
    Given the Google home page is displayed
    When the user searches for "panda"
    Then the results page shows links related to "panda"

Notice the tense of each verb:

the home page is – present
the user searches – present
the results page shows – present

Present tense is the simplest verb tense to use. It is the least “wordy” tense, and it makes the scenario feel active.

An alternative approach is to use past-present-future tense for Given-When-Then steps respectively, like this:

Scenario: Simple Google search
    Given the Google home page was displayed
    When the user searches for "panda"
    Then the results page will show links related to "panda"

Notice the different verb tenses in this scenario:

the home page was – past
the user searches – present
the result page will show – future

Scenarios exercise behavior. Writing When steps using present tense centers the scenario’s main actions in the present. Since Given steps must happen before the main actions, they would be written using past tense. Likewise, since Then steps represent expected outcomes after the main actions, they would be written using future tense.

Both of these approaches – using all present tense or using past-present-future in order – are good. Personally, I prefer to write all steps using present tense. It’s easier to explain to others, and it frames the full scenario in the moment. However, I don’t think other approaches are good. For example, writing all steps using past tense or future tense would seem weird, and writing steps in order of future-present-past tense would be illogical. Scenarios should be centered in the present because they should timelessly represent the behaviors they cover.

Want to learn more? Check out my other BDD articles, especially Writing Good Gherkin.

Solving: How to write good UI interaction tests? #GivenWhenThenWithStyle

Writing good Gherkin is a passion of mine. Good Gherkin means good behavior specification, which results in better features, better tests, and ultimately better software. To help folks improve their Gherkin skills, Gojko Adzic and SpecFlow are running a series of #GivenWhenThenWithStyle challenges. I love reading each new challenge, and in this article, I provide my answer to one of them.

The Challenge

Challenge 20 states:

This week, we’re looking into one of the most common pain points with Given-When-Then: writing automated tests that interact with a user interface. People new to behaviour driven development often misunderstand what kind of behaviour the specifications should describe, and they write detailed user interactions in Given-When-Then scenarios. This leads to feature files that are very easy to write, but almost impossible to understand and maintain.

Here’s a typical example:

Scenario: Signed-in users get larger capacity
 
Given a user opens https://www.example.com using Chrome
And the user clicks on "Upload Files"
And the page reloads
And the user clicks on "Spreadsheet Formats"
Then the buttons "XLS" and "XLSX" show
And the user clicks on "XLSX"
And the user selects "500kb-sheet.xlsx"
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
And the user clicks on "XLSX"
And the user selects "1mb-sheet.xlsx"
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx" 
And the user clicks on "Login"
And the user enters "testuser123" into the "username" field
And the user enters "$Pass123" into the "password" field
And the user clicks on "Sign in"
And the page reloads
Then the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx" 
And the user clicks on "spreadsheet formats"
Then the buttons "XLS" and "XLSX" show
And the user clicks on "XLSX"
And the user selects "1mb-sheet.xlsx"
Then the upload completes
And the table "Uploaded Files" contains a cell with "1mb-sheet.xlsx" 
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx"

A common way to avoid such issues is to rewrite the specification to avoid the user interface completely. We’ve looked into that option several times in this article series. However, that solution only applies if the risk we’re testing is not in the user interface, but somewhere below. To make this challenge more interesting, let’s say that we actually want to include the user interface in the test, since the risk is in the UI interactions.

Indeed, most behavior-driven practitioners would generally recommend against phrasing steps using language specific to the user interface. However, there are times when testing a user interface itself is valid. For example, I work at PrecisionLender, a Q2 Company, and our main web app is very heavy on the front end. It has many, many interconnected fields for pricing commercial lending opportunities. My team has quite a few tests to cover UI-centric behaviors, such as verifying that entering a new interest rate triggers recalculation for summary amounts. If the target behavior is a piece of UI functionality, and the risk it bears warrants test coverage, then so be it.

Let’s break down the example scenario given above to see how to write Gherkin with style for user interface tests.

Understanding Behavior

Behavior is behavior. If you can describe it, then you can do it. Everything exhibits behavior, from the source code itself to the API, UIs, and full end-to-end workflows. Gherkin scenarios should use verbiage that reflects the context of the target behavior. Thus, the example above uses words like “click,” “select,” and “open.” Since the scenario explicitly covers a user interface, I think it is okay to use these words here. What bothers me, however, are two apparent code smells:

The wall of text
Out-of-order step types

The first issue is the wall of text this scenario presents. Walls of text are hard to read because they present too much information at once. The reader must take time to read through the whole chunk. Many readers simply read the first few lines and then skip the remainder. The example scenario has 27 Given-When-Then steps. Typically, I recommend Gherkin scenarios to have single-digit line length. A scenario with less than 10 steps is easier to understand and less likely to include unnecessary information. Longer scenarios are not necessarily “wrong,” but their longer lengths indicate that, perhaps, these scenarios could be rewritten more concisely.

The second issue in the example scenario is that step types are out of order. Given-When-Then is a formula for success. Gherkin steps should follow strict Given → When → Then ordering because this ordering demarcates individual behaviors. Each Gherkin scenario should cover one individual behavior so that the target behavior is easier to understand, easier to communicate, and easier to investigate whenever the scenario fails during testing. When scenarios break the order of steps, such as Given → Then → Given → Then in the example scenario, it shows that either the scenario covers multiple behaviors or that the author did not bring a behavior-driven understanding to the scenario.

The rules of good behavior don’t disappear when the type of target behavior changes. We should still write Gherkin with best practices in mind, even if our scenarios cover user interfaces.

Breaking Down Scenarios

If I were to rewrite the example scenario, I would start by isolating individual behaviors. Let’s look at the first half of the original example:

Given a user opens https://www.example.com using Chrome
And the user clicks on "Upload Files"
And the page reloads
And the user clicks on "Spreadsheet Formats"
Then the buttons "XLS" and "XLSX" show
And the user clicks on "XLSX"
And the user selects "500kb-sheet.xlsx"
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
And the user clicks on "XLSX"
And the user selects "1mb-sheet.xlsx"
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx"

Here, I see four distinct behaviors covered:

Clicking “Upload Files” reloads the page.
Clicking “Spreadsheet Formats” displays new buttons.
Uploading a spreadsheet file makes the filename appear on the page.
Attempting to upload a spreadsheet file that is 1MB or larger fails.

If I wanted to purely retain the same coverage, then I would rewrite these behavior specs using the following scenarios:

Feature: Example site
 
 
Scenario: Choose to upload files
 
Given the Example site is displayed
When the user clicks the "Upload Files" link
Then the page displays the "Spreadsheet Formats" link
 
 
Scenario: Choose to upload spreadsheets
 
Given the Example site is ready to upload files
When the user clicks the "Spreadsheet Formats" link
Then the page displays the "XLS" and "XLSX" buttons
 
 
Scenario: Upload a spreadsheet file that is smaller than 1MB
 
Given the Example site is ready to upload spreadsheet files
When the user clicks the "XLSX" button
And the user selects "500kb-sheet.xlsx" from the file upload dialog
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
 
 
Scenario: Upload a spreadsheet file that is larger than or equal to 1MB
 
Given the Example site is ready to upload spreadsheet files
When the user clicks the "XLSX" button
And the user selects "1mb-sheet.xlsx" from the file upload dialog
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx"

Now, each scenario covers each individual behavior. The first scenario starts with the Example site in a “blank” state: “Given the Example site is displayed”. The second scenario inherently depends upon the outcome of the first scenario. Rather than repeat all the steps from the first scenario, I wrote a new starting step to establish the initial state more declaratively: “Given the Example site is ready to upload files”. This step’s definition method may need to rerun the same operations as the first scenario, but it guarantees independence between scenarios. (The step could also optimize the operations, but that should be a topic for another challenge.) Likewise, the third and fourth scenarios have a Given step to establish the state they need: “Given the Example site is ready to upload spreadsheet files.” Both scenarios can share the same Given step because they have the same starting point. All three of these new steps are descriptive more than prescriptive. They declaratively establish an initial state, and they leave the details to the automation code in the step definition methods to determine precisely how that state is established. This technique makes it easy for Gherkin scenarios to be individually clear and independently executable.

I also added my own writing style to these scenarios. First, I wrote concise, declarative titles for each scenario. The titles dictate interaction over mechanics. For example, the first scenario’s title uses the word “choose” rather than “click” because, from the user’s perspective, they are “choosing” an action to take. The user will just happen to mechanically “click” a link in the process of making their choice. The titles also provide a level of example. Note that the third and fourth scenarios spell out the target file sizes. For brevity, I typically write scenario titles using active voice: “Choose this,” “Upload that,” or “Do something.” I try to avoid including verification language in titles unless it is necessary to distinguish behaviors.

Another stylistic element of mine was to remove explicit details about the environment. Instead of hard coding the website URL, I gave the site a proper name: “Example site.” I also removed the mention of Chrome as the browser. These details are environment-specific, and they should not be specified in Gherkin. In theory, this site could have multiple instances (like an alpha or a beta), and it should probably run in any major browser (like Firefox and Edge). Environmental characteristics should be specified as inputs to the automation code instead.I also refined some of the language used in the When and Then steps. When I must write steps for mechanical actions like clicks, I like to specify element types for target elements. For example, “When the user clicks the “Upload Files” link” specifies a link by a parameterized name. Saying the element is a link helps provides context to the reader about the user interface. I wrote other steps that specify a button, too. These steps also specified the element name as a parameter so that the step definition method could possibly perform the same interaction for different elements. Keep in mind, however, that these linguistic changes are neither “required” nor “perfect.” They make sense in the immediate context of this feature. While automating step definitions or writing more scenarios, I may revisit the verbiage and do some refactoring.

Determining Value for Each Behavior

The four new scenarios I wrote each covers an independent, individual behavior of the fictitious Example site’s user interface. They are thorough in their level of coverage for these small behaviors. However, not all behaviors may be equally important to cover. Some behaviors are simply more important than others, and thus some tests are more valuable than others. I won’t go into deep detail about how to measure risk and determine value for different tests in this article, but I will offer some suggestions regarding these example scenarios.

First and foremost, you as the tester must determine what is worth testing. These scenarios aptly specify behavior, and they will likely be very useful for collaborating with the Three Amigos, but not every scenario needs to be automated for testing. You as the tester must decide. You may decide that all four of these example scenarios are valuable and should be added to the automated test suite. That’s a fine decision. However, you may instead decide that certain user interface mechanics are not worth explicitly testing. That’s also a fine decision.

In my opinion, the first two scenarios could be candidates for the chopping block:

Choose to upload files
Choose to upload spreadsheets

Even though these are existing behaviors in the Example site, they are tiny. The tests simply verify that a user clicks makes certain links or buttons appear. It would be nice to verify them, but test execution time is finite, and user interface tests are notoriously slow compared to other tests. Consider the Rule of 1’s: typically, by orders of magnitude, a unit test takes about 1 millisecond, a service API test takes about 1 second, and a web UI test takes about 1 minute. Furthermore, these behaviors are implicitly exercised by the other scenarios, even if they don’t have explicit assertions.

One way to condense the scenarios could be like this:

Feature: Example site
 
 
Background:
 
Given the Example site is displayed
When the user clicks the "Upload Files" link
And the user clicks the "Spreadsheet Formats" link
And the user clicks the "XLSX" button
 
 
Scenario: Upload a spreadsheet file that is smaller than 1MB
 
When the user selects "500kb-sheet.xlsx" from the file upload dialog
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
 
 
Scenario: Upload a spreadsheet file that is larger than or equal to 1MB
 
When the user selects "1mb-sheet.xlsx" from the file upload dialog
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx"

This new feature file eliminates the first two scenarios and uses a Background section to cover the setup steps. It also eliminates the need for special Given steps in each scenario to set unique starting points. Implicitly, if the “Upload Files” or “Spreadsheet Formats” links fail to display the expected elements, then those steps would fail.

Again, this modification is not necessarily the “best” way or the “right” way to cover the desired behaviors, but it is a reasonably good way to do so. However, I would assert that both the 4-scenario feature file and the 2-scenario feature file are much better approaches than the original example scenario.

More Gherkin

What I showed in my answer to this Gherkin challenge is how I would handle UI-centric behaviors. I try to keep my Gherkin scenarios concise and focused on individual, independent behaviors. Try using these style techniques to rewrite the second half of Gojko’s original scenario. Feel free to drop your Gherkin in the comments below. I look forward to seeing how y’all write #GivenWhenThenWithStyle!

Using Domain-Specific Languages for Security Testing

I love programming languages. They have fascinated me ever since I first learned to program my TI-83 Plus calculator in ninth grade, many years ago. When I studied computer science in college, I learned how parsers, interpreters, and compilers work. During my internships at IBM, I worked on a language named Enterprise Generation Language as both a tester and a developer. At NetApp, I even developed my own language named DS for test automation. Languages are so much fun to learn, build, and extend.

Today, even though I do not actively work on compilers, I still do some pretty interesting things with languages and testing. I strongly advocate for Behavior-Driven Development and its domain-specific language (DSL) Gherkin. In fact, as I wrote in my article Behavior-Driven Blasphemy, I support using Gherkin-based BDD test frameworks for test automation even if a team is not also doing BDD’s collaborative activities. Why? Gherkin is the world’s first major off-the-shelf DSL for test automation, and it doesn’t require the average tester to know the complexities of compiler theory. DSLs like Gherkin can make tests easier to read, faster to write, and more reliable to run. They provide a healthy separation of concerns between test cases and test code. After working on successful large-scale test automation projects with C# and SpecFlow, I don’t think I could go back to traditional test frameworks.

I’m not the only one who thinks this way. Here’s a tweet from Dinis Cruz, CTO and CISO at Glasswall, after he read one of my articles:

Another great post by @AutomationPanda on the value of BDD

I would actually expand this statement "The best solution for test automation is a domain-specific language."
with

"Continuous refactoring of the App and Test code will evolve into a DSL"https://t.co/tpazQdueWC
— Dinis Cruz (@DinisCruz) December 27, 2020

Dinis then tweeted at me to invite me to speak about using DSLs for testing at the Open Security Summit in 2021:

Hi @AutomationPanda I would like to expand on the idea of "using DSLs for Security testing", are you around in January to participate in an @opensecsummit session around it?https://t.co/r2DlGYGxp3

CC: @OSS_Katie @DGelici
— Dinis Cruz (@DinisCruz) December 27, 2020

Now, I’m not a “security guy” at all, but I do know a thing or two about DSLs and testing. So, I gladly accepted the invitation to speak! I delivered my talk, “Using DSLs for Security Testing” virtually on Thursday, January 14, 2021 at 10am US Eastern. I also uploaded my slides to GitHub at AndyLPK247/using-dsls-for-security-testing. Check out the YouTube recording here:

This talk was not meant to be a technical demo or tutorial. Instead, it was meant to be a “think big” proposal. The main question I raised was, “How can we use DSLs for security testing?” I used my own story to illustrate the value languages deliver, particularly for testing. My call to action breaks that question down into three parts:

Can DSLs make security testing easier to do and thereby more widely practiced?
Is Gherkin good enough for security testing, or do we need to make a DSL specific to security?
Would it be possible to write a set of “standard” or “universal” security tests using a DSL that anyone could either run directly or use as a template?

My goal for this talk was to spark a conversation about DSLs and security testing. Immediately after my talk, Luis Saiz shared two projects he’s working on regarding DSLs and security: SUSTO and Mist. Dinis also invited me back for a session at the Open Source Summit Mini Summit in February to have a follow-up roundtable discussion for my talk. I can’t wait to explore this idea further. It’s an exciting new space for me.

If this topic sparks your interest, be sure to watch my talk recording, and then join us live in February 2021 for the next Open Source Summit event. Virtual sessions are free to join. Many thanks again to Dinis and the whole team behind Open Source Summit for inviting me to speak and organizing the events.

SpecFlow’s Online Gherkin Editor

Finding a good Gherkin editor is difficult. Some editors like Visual Studio Code and similar IDEs work great for engineers but aren’t suitable for product owners and non-programmer Amigos who want to contribute. Other editors like Notepad++ and Atom are lighter in weight but still require extensions and a little expertise. Fancy BDD tools like CucumberStudio and Cucumber for Jira provide Gherkin editors together with a bunch of other nifty features, but they require paid licenses.

For years, I’ve wanted a lightweight Gherkin editor that’s easy to use and accessible to all. Now, one finally exists: the Online Gherkin Editor by SpecFlow!

SpecFlow is the most popular BDD test automation framework for .NET. It’s also my favorite BDD framework. Over the past few years, I’ve built two large-scale test automation solutions with SpecFlow.

The Online Gherkin Editor by SpecFlow is just an editor on a web page. When you first load the page, the editor has example scenarios for you to reference. You can type your own Gherkin into the text area, and the editor highlights it for you. The editor provides line numbers and visual scrolling, too. My language is English, but if you happen to speak German, French, Spanish, or Dutch, then you can change the language setting via a dropdown. Once you’re done writing your Gherkin, you can clear it, copy it to the clipboard, or download it as a feature file using icons in the top-right corner. Be warned, though, that this editor won’t save your Gherkin in the cloud.

If you want to give this new editor a try, here’s the link: https://specflow.org/gherkin-editor/

You can also read SpecFlow’s official announcement here: https://specflow.org/blog/introducing-the-specflow-online-gherkin-editor/

Thanks, SpecFlow! Happy “Gherk-ing”!

4 Rules for Writing Good Gherkin

In Behavior-Driven Development, Gherkin is a specification language for defining behaviors. Gherkin scenarios can easily be automated into test cases. There are many points to writing good Gherkin, but to keep things simple, I like to focus on four major rules:

Gherkin’s Golden Rule
The Cardinal Rule of BDD
The Unique Example Rule
The Good Grammar Rule

Check out my TechBeacon article to learn about these rules in depth!

Get out of a #BDD pickle … Writing good #Gherkin may be challenging, but it certainly isn’t impossible. @AutomationPanda shares four rules that will help you to write readable, automatable, scalable Gherkin #TSQA2020 https://t.co/5inyn4WZsB
— TechBeacon (@TechBeaconCom) February 21, 2020

How Do We Write Good Gherkin as Part of BDD? (Webinar + Q&A)

On July 23, 2019, I gave a webinar entitled, “How Do We Write Good Gherkin as Part of BDD?” in collaboration with Paul Merrill and his company, Beaufort Fairmont. This webinar was the follow-up to a previous webinar, What Is BDD, and How Do We Practice It? It was an honor to partner with Paul again to go further into BDD practices. (If you want to learn more about BDD, check out Beaufort Fairmont’s two-day BDD training offering, as well as their blog and other webinars.)

To see my webinar recording, register here. Definitely watch the previous webinar first.

Just like last time, attendees asks several great questions that we simply could not answer live. I categorized all questions we received and answered them below. Please note that some questions might be rephrased or combined with others.

Questions about BDD

What is BDD?

Behavior-Driven Development! Read more here.

In a typical Agile development process, who should write feature files?

The Three Amigos! Product owners, developers, and testers should all come together to figure out behaviors. I recommend doing Example Mapping to formulate before writing Gherkin scenarios. The green example cards should be turned into feature files. The specific person who writes the feature files is up to team preference. It could be a collaborative effort, or it could be divided-and-conquered. Any one of the Three Amigos can do it.

How can we apply BDD to SAFe (Scaled Agile Framework) teams?

BDD practices like Three Amigos meetings, Example Mapping, Behavior Specification with Gherkin, and Behavior Implementation can become part of any process. All of these practices happen at the level of the development teams. Teams could even share Gherkin steps and test frameworks wherever sharing makes sense. Check out BDD 101: Behavior-Driven Agile.

What advice can you give to teams that use BDD tests frameworks solely as an automation tool and not part of a greater BDD process?

Do the best with what you’ve got. Try to show how other BDD practices can pragmatically improve your team’s development and delivery work. See also:

Questions about Gherkin Syntax

What is the difference between a scenario and a scenario outline?

A scenario is a procedure of Given-When-Then steps that covers one example for one behavior. If there are any parameters for steps, then a scenario has exactly one combination of possible inputs. A scenario outline is a Given-When-Then procedure that can have multiple examples of one behavior provided as a table of input combos. Each input row will run the same steps once, just with different parameter inputs. See BDD 101: Gherkin by Example to see examples.

What do you think about long tables in scenarios?

Long tables in Gherkin usually look terrible. They’re hard to read, and they create a wall of text. They may also include unnecessary variations. Stick to the Unique Example rule.

Are Given steps mandatory, or can scenarios start directly with When steps?

None of the step types are mandatory. It is valid to write a scenario that skips the Given and has only When-Then steps. It is also valid to write scenarios that are Given-Then or Given-When. In fact, it is syntactically valid to put steps in any order. However, I strongly recommend keeping Given-When-Then step order to properly frame behaviors.

Are quotation marks required for parameters?

No, quotation marks are not required for parameters, but they are a popular convention, and one that I recommend. Quotes make parameters easy to identify.

Questions about Gherkin Scenarios

How do we make sure each scenario focuses on an individual, independent behavior?

Do Example Mapping first as a team. Write scenarios together, or review them with others. Ask, “What makes this behavior unique?” Make sure to use strict Given-When-Then step order when defining the behavior. Rethink the scenario if it is more than 10 lines long. Look out for unnecessary complication.

What does it mean for a scenario to be “chronological”?

Scenario steps should be written as if they were on a timeline. Each step will be executed after the previous one, so its description must start where the previous one ended. Remember, steps will be automated as if they were scripts.

How do we write a very low-level scenario without having a wall of text?

Don’t write low-level scenarios! Gherkin is best for feature testing, not unit testing. Steps should focus on intention and business value. Instead of writing “type, type, click, wait,” write “log into the app.” If you absolutely must write a low-level scenario, remember that the same principles apply. Be intuitively descriptive. Focus on individual behaviors. Keep scenarios concise.

If all scenarios in a feature file have only one user, is it okay to use first-person perspective instead of third-person?

In my opinion, no. I favor third-person perspective universally. Trying to limit usage to one feature file won’t work because any step can be used by any feature file within a test project. The entire solution must be either first-person or third-person. There’s no middle ground.

Can we write Gherkin scenarios with personas?

Yes! Personas can make scenarios more meaningful and understandable. Make sure to define the personas well – they could be described under the Feature section or in a separate text file.

How do we write Gherkin scenarios that need to validate lots of information on a page?

Pick the most important pieces of information to check. You could write separate Then steps for each assertion, or you could push small-but-similar validations down to the automation level to avoid Gherkin clutter.

How do we write Gherkin scenarios for validating Web UI fields?

Typically, I treat each field validation as an independent behavior, and thus I write separate scenarios to check each field. If the scenario steps simply enter a textual value and verify a specific message, then I might make a Scenario Outline with example rows for each equivalence class of inputs.

How do we write Gherkin scenarios that have multiple inputs and setup steps? (Example: an API with ten parameters)

Gherkin allows multiple steps of the same type to be written using “And” and “But” keywords. It’s not a problem to have “Given-And-And” or “When-And-And”. If you discover that different scenarios repeat the same setup steps, then I recommend either moving those common steps to a Background section or writing a new step that covers multiple calls (for conciseness).

One example from the webinar showed searching for shoes and adding them to a shopping cart as part of one scenario. Aren’t those two different behaviors?

Here’s the scenario in question:

Scenario: Add shoes to the shopping cart
  Given the ShoeStore home page is displayed
  When the shopper searches for “red pumps”
  And the shopper adds the first result to the cart
  Then the cart has one pair of “red pumps”

We could have split this scenario into two. I just chose to define the behavior this way. This scenario is a bit more end-to-end because it covers a basic but typical workflow. It may also have leveraged existing steps, which expedites automation development. Overall, the scenario is still concise, chronological, and intuitively understandable. Remember, there is an art as well as a science to writing good Gherkin.

Questions about Automation

Do scenarios need to be independent of each other?

Yes, unequivocally. Tests that are not independent could interfere with each other and cause unexpected failures. Independence also reinforces singular behavioral focus.

How do we start a scenario “in media res” without it depending on other tests?

At the Gherkin level, write Given steps that define a new starting point for the behavior. For example, many teams develop Web apps. It’s common to think that the starting point for all tests is login. However, the starting point can be a few pages after login.

At the automation level, it may be useful to implement the Given steps by calling other steps. For example, if a Given step should start at a user’s profile page, then perhaps it could internally call the login step and the click-the-profile-link step. Test steps may repetitively do the same operations for different tests, but test case independence will be preserved, and unique failures will be reported.

What is the best way to handle preconditions like logging into a Web app?

The simplest way to handle preconditions is to write Given steps. If those Given steps are shared by all scenarios in a feature file, then move them to a Background section. Automation hooks can also perform common setup and cleanup actions, depending upon the test framework. Personally, I prefer to use hooks to do automatic login rather than repeat Given steps for many scenarios.

Is it better to set up and tear down new test objects for each test case, or is it better to use shared, pre-created objects?

That depends upon the object. Most objects like WebDrivers and page objects should have scenario scope, meaning they are created fresh for each scenario and then torn down when the scenario ends. The only time an object should be shared across scenarios is if it is immutable or very expensive to create. For example, configuration data could be read in once before all tests and then injected immutably into each scenario. The safe position is always to use fresh objects; justify why sharing is needed before trying it.

I want to use Serenity for BDD and testing. Should I use Cucumber-like Gherkin feature files, or should I use Serenity’s native methods?

That’s up to you and your team. Personally, I would still use Gherkin feature files with Serenity. I like to separate my test case from my test code. Everyone can read Gherkin feature files, but not everyone can read Java or JavaScript test methods.

If a company already has a large BDD test solution that is poorly implemented, would it be better to keep it going or try to change it?

This question can be applied to all software projects, not just BDD test solutions. The answer is situational. Personally, I favor doing things right, even if it means refactoring. Please read Our Test Automation Has Problems. Should We Start Over? for a thorough answer.

Final Questions

Why do you call yourself “Pandy” and the “Automation Panda”?

Pandas are awesome. Everybody loves them. And nobody forgets my moniker. The nickname “Pandy” came about in the Python community to distinguish me from other folks named “Andy.”

Where can I get team training in BDD?

Beaufort Fairmont provides a one- or two-day course in BDD and writing Gherkin. Sign up for more information here.

What is BDD, and How Do We Practice It? (Webinar + Q&A)

On March 18, 2019, I gave a webinar entitled, “What is Behavior-Driven Development, and How Do We Practice It?” in collaboration with Paul Merrill and his company, Beaufort Fairmont. It was both a pleasure and an honor to do this webinar with them. Paul is a top-notch test automation expert, and Beaufort Fairmont is doing really exciting things. Check out their two-day BDD training offering, as well as their blog and other webinars.

To see my webinar recording, register here.

During the webinar, attendees asked more questions than we could answer. I’m excited that so many people asked questions. My answers are below.

Questions about Process

How is BDD different from TDD (Test-Driven Development)?

BDD is an evolution of TDD. In TDD, developers (1) write unit tests and watch them fail, (2) develop the feature to make the tests pass, (3) refactor the code to make it stronger, and (4) repeat the cycle. In BDD, teams do this same loop with feature tests (a.k.a “acceptance” or “black-box” tests) as well as unit tests. Furthermore, BDD adds shift left practices like Example Mapping and Specification by Example so that teams know what they are doing and focus on developing the right things.

Check out Dan North’s article, Introducing BDD, for a more thorough answer.

Can BDD be used with manual testing?

Yes! BDD is not merely an automation tool – it is a set of pragmatic practices to help teams develop better software. Gherkin scenarios are first and foremost behavior specs that help a team’s collaboration and accountability. They function secondarily as test cases that can be executed either manually or with automation.

Can we use BDD with technical stories or backend features?

Yes! If you can describe it, then you can do it.

How many Gherkin scenarios should one story have?

There’s no hard rule, but I recommend no more than a handful of rules per story, and no more than a handful of examples per rule. If you do Example Mapping and feel overwhelmed by the number of cards for a story, then the story should probably be broken into smaller stories.

Should we do Example Mapping for every story? Spending 20-30 minutes for each story would take a long time.

Try doing Example Mapping on one or two stories to start. The first time is always rough, but as you iterate on it, you’ll get better as a team. Even though Example Mapping has an upfront time cost, it will save a lot of time later in the sprint because (a) acceptance criteria is clear, (b) tests are already written, and (c) everyone has a mutual understanding of the story. The team won’t suffer through the inefficiencies of miscommunication and poor planning. You may even want to replace planning meeting with Example Mapping meetings.

What metrics should we use with BDD?

All metrics are flawed, but some metrics are useful. All the standard testing and Agile metrics still apply: code coverage, story velocity, etc. Here are some additional metrics you may consider for BDD:

the percentage of stories that undergo Example Mapping before the sprint
the number of rules and examples that get “missed” during Example Mapping and need to be added later
the percentage of Gherkin scenarios that get automated in the sprint

If you choose to track metrics, make sure their feedback is used to improve team practices. For more info on metrics, please read my Quality Metrics 101 series.

What were the resources you recommended at the end of the webinar?

Questions about Tools

What test management tools should we use with BDD?

I’m sure there are BDD plugins for test management tools, but I don’t have any that I can personally recommend. To be honest, I try to stay away from large test management tools like HP ALM, qTest, VersionOne. When doing BDD, the Gherkin feature files themselves should be the single source of truth for feature-level tests, and they should be version-controlled in a repository. Don’t fall into the trap of slapping “Given-When-Then” keywords onto existing functional tests – that’s not BDD.

Does Jira support Example Mapping?

I have not personally used any Jira plugin for Example Mapping. It looks like there is an Easy Agile User Story Maps plugin that is similar to but slightly different from Example Mapping.

Are there other good tools for BDD and Example Mapping?

Cycle Automation provides a nice app with Gherkin steps out of the box, so you can automate tests without needing a programming language.
TeamUp Labs provides an online Example Mapping tool.
IDEs from JetBrains and Eclipse provide BDD plugins
Gherkin Syntax Highlighting in Notepad++
Gherkin Syntax Highlighting in Visual Studio Code
Gherkin Syntax Highlighting in Atom
Gherkin Syntax Highlighting in Chrome

What’s the difference between Gherkin, Cucumber, and SpecFlow?

Gherkin is the Given-When-Then spec language.
Cucumber is a company and its eponymous test framework that uses Gherkin.
SpecFlow is Cucumber for .NET.

Questions about Testing

Can BDD test frameworks be used for unit testing?

Yes, but I don’t recommend it. BDD frameworks shine for black-box feature testing. They’re a bit too verbose for code-level unit tests. Read BDD 101: Unit, Integration, and End-to-End Tests for more info.

Can BDD test frameworks be used for integration testing?

Yes! See BDD 101: Unit, Integration, and End-to-End Tests.

How long should Gherkin scenarios be?

Scenarios should be bite-sized. Each scenario should focus on one individual behavior. There’s no hard rule, but I recommend single-digit step counts. Read BDD 101: Writing Good Gherkin for more info.

What are “step definitions” in Cucumber?

Step definitions are the methods in the automation code that execute the steps. When a BDD framework runs a Gherkin scenario as a test, it “glues” each step to a step definition based on some sort of string matching.

How can we minimize duplicate code within a BDD test framework?

Know your steps. Always search for existing steps before writing new steps. Refactor existing steps whenever appropriate. Reuse steps when writing new scenarios. Do pair programming or mob programming when writing scenarios. Put scenarios through code reviews. Apply good coding practices – remember, test automation is software.

I write Gherkin scenarios, but I don’t write test automation code. What’s the best way to write Gherkin scenarios so that they can be automated?

Do pair programming with the automation engineers to write Gherkin scenarios together. Become familiar with existing steps by reading and searching feature files. Otherwise, the Gherkin steps you write in isolation might not be usable. Remember, BDD is a team effort!

The examples in the webinar were all fairly basic. Do you have any examples with more complex systems?

I have some example projects on GitHub in Python and Java with some basic unit, integration, and end-to-end tests, but I don’t have any large-scale examples that I can share publicly.

We wrote hundreds of SpecFlow tests without the other Amigos. Now, there are large test gaps, and many steps aren’t reusable. What should we do?

I’m sorry to hear that. It’s not an uncommon story. There are two paths: (1) refactoring or (2) starting over. Without really knowing the situation, I don’t think it’s my place to say which way is better. Here are some questions to help guide your decision:

What are your goals for testing and automation?
What’s your overall quality and testing strategy?
What parts of the code base are salvageable?
What parts of the code base should be removed?
If you started again from scratch, what would you do differently to make sure the same problems don’t reoccur?

I strongly recommend taking the Setting a Foundation for Successful Test Automation course from Test Automation University. (It’s free.) I also gave a talk about this very problem, Egad! How Do We Start Writing (Better) Tests?, at a few Python conferences.

We have a large BDD test suite with heavy coupling and slow execution times. The business amigos have also left the company. Should we try to fix what we have or just start over?

Sorry to hear that; same answer as before.

Final Questions

Why do you call yourself the “Automation Panda”?

Pandas are awesome. Everybody loves them. And nobody forgets my moniker.

Where can I get team training in BDD?

Beaufort Fairmont provides a one- or two-day course in BDD and writing Gherkin. Sign up for more information here.

Behavior-Driven Blasphemy

This is my 100th post on Automation Panda! I’m thrilled to see how much this blog has grown and how many people it has helped. For such a monumental occasion, I have chosen to voice a rather controversial opinion about test automation.

Behavior-driven development seems to be the software testing buzzword of the decade. What started as a refinement of test-driven development by developers in Europe and the UK quickly became the big process fad of the 2010’s. The Cucumber project (now 10 years old) developed or inspired Gherkin-based test automation frameworks in all the major programming languages. Companies started requiring Given-When-Then format for acceptance criteria and test scenarios. Three Amigos meetings became standard calendar fixtures during sprints. Organizations that once undertook “Agile transformations” now have similar initiatives for BDD. For better or worse, BDD exists and cannot be ignored.

The dogmatic benefits of BDD are better collaboration and automation. However, leaders frequently insist that Gherkin-style test frameworks add value only when paired with practices like Example Mapping. “BDD is a process, not a tool,” is a common mantra. “Otherwise, the Gherkin just gets in the way.” Although I wholeheartedly agree that behavior-driven practices add significant value to the development process, I nevertheless espouse a rather blasphemous opinion:

BDD test automation frameworks are better than traditional frameworks for black box functional testing even when BDD processes are not followed.

What Exactly Are You Saying?

My claim is that behavior-driven test frameworks like Cucumber, SpecFlow, and behave are significantly better than traditional xUnit-style frameworks for testing live features. For example, I would rather use SpecFlow than NUnit for testing a Web app with Selenium WebDriver, whether or not the other two Amigos are with me. The resulting automation code has better structure, readability, and reusability.

I’m not saying that teams shouldn’t do BDD practices, and I’m not saying that the Three Amigos should be separated. Collaboration is key to success, and BDD really helps. Example Mapping is one of the most useful practices a development team can do. I’m also not saying that BDD frameworks should be used for all testing purposes – they are poorly suited for unit testing and for performance testing.

Objection!

I find myself very lonely in this opinion. BDD leaders repeatedly insist that BDD is not about testing and automation:

BDD is not about Testing (slides by Dan North)
The world’s most misunderstood collaboration tool by Aslak Hellesøy
BDD is – BDD is not by Augusto Evangelisti
BDD Tool Cucumber is Not a Testing Tool by Jan Stenberg
3 misconceptions about BDD from ThoughtWorks

The most outspoken BDDers (mostly coalescing around the Cucumber community) have largely moved their focus to the collaboration benefits, almost forsaking the automation benefits. (This may not necessarily be true, but it appears that way based on the literature and materials floating on the Web.) That outlook is somewhat disingenuous because the main tools supporting BDD are, in fact, test frameworks.

BDD also has outspoken opponents – it’s love or hate. I’ve personally spoken with several engineers who despise Gherkin-based frameworks. “I can see how it would be valuable when a whole team embraces behavior-driven practices,” many have told me, “but otherwise, the Gherkin layer just gets in the way of automation.” I’ve heard it called “plaster” and “garbage.” Engineers just want to code their tests. And code should always be readable, right?

Testing is an inherently opinionated space. People can never seem to agree on things.

The Bigger Picture

Test automation must be developed regardless of any specific development practices, and its architecture must stand firmly in its own right. Unfortunately, both sides miss the bigger picture:

The best solution for test automation is a domain-specific language.

A domain-specific language (DSL) is a programming language with a purpose. It is designed to handle very specific needs, rather than general-purpose programming. For example:

SQL is a DSL for database queries.
XPath is a DSL for finding elements in an XML document.
YAML is a DSL for object serialization.

Gherkin is also a DSL – for behavior specification.

Domain-specific languages naturally suit test automation due to the clear difference between test cases and test code. Test cases are procedures that exercise product behavior. Anyone can write a test case. They are dictated or explained in plain language. Test code, however, is the software implementation of test cases. Test code handles function calls, logging, exceptions, and all those other little programming details that help run tests. A test automation DSL separates those concerns: test cases are written in a special language, and the interpreter handles repetitive, low-level details. Some type of extensions can handle product-specific interactions. The purpose of a language is to effectively express intention – and the intention is to test the product.

To truly achieve an optimal solution, however, the DSL and its interpreter must be treated as part of the automation software, just like the test cases and extensions. Remember, a language’s interpreter is just another piece of software. The interpreter is part of the separation of concerns and the single responsibility principle. Concerns that would typically be handled by classes and functions in traditional test code should be moved to the interpreter. For example, the interpreter should automatically log every test case step, rather that forcing the author to write explicit logging statements.

When I worked at NetApp years ago, I implemented a DSL to test platform-level features of our operating system. I called it DS – short for “Design Steps” (from HP ALM) (but also not without an affinity for the Nintendo DS). NetApp’s entire test automation code was developed in Perl at the time, so I implemented the DS interpreter in Perl to reuse existing libraries. DS test cases were typically only a dozen lines long each, and DS expressions could call specially-written Perl modules directly for complete extendability. During the first big release using DS, my team saved countless hours of automation development time as compared to the previous release while delivering a higher number of tests. I also did this before I had ever heard of BDD.

Unfortunately, most teams have neither the time to develop their own testing DSL nor the understanding of compiler theory to build it right. And if they were given such a language, they typically limit themselves to the provided implementation instead of taking ownership to extend the language for their needs.

The original Nintendo DS. Fun times!

Who Truly Misunderstands Gherkin?

Enter Gherkin: the world’s first major general-purpose, off-the-shelf language for test automation. It is general enough to cover any case through its plain language steps, yet specific enough to standardize tests. Users don’t need to be compiler theory experts – they just make up their own step names and provide the definition code to execute them. Early BDD projects like JBehave and Cucumber packaged an interpreter as a test framework and delivered it to a testing world still stuck on JUnit. The need for a testing DSL was there, whether or not the BDD folks meant to serve it.

Cucumber-ites frequently bemoan that their framework is misunderstood by the masses. They shudder to see teams using their framework purely for test automation. However, Cucumber effectively lowered the entry barrier for teams to make their own testing DSLs. Kodak did the same thing for film: they made it cheap and standard so anyone could be a photographer. Not everyone who uses a BDD framework misunderstands its purpose: some (like me) just see an alternative value proposition than what is preached by orthodox BDD practitioners. Gherkin fills a need that nobody knew. Its popularity validates that claim.

Benefits Apart from Process

Using a BDD framework adds much value to testing and development even without BDD processes. Below are just a handful of benefits:

Focus first on good scenarios. Gherkin forces authors to think before they code.
Faster automation development. Gherkin steps are reusable and parametrizable.
Stronger structure. Engineers know where to put things in the framework.
Test understandability. Anyone can read scenarios because they are written in plain language. Business people can help. New people can pick it up fast.
Test sharing. Feature files can be shared apart from test code, which can be helpful for business partners.
Test similarity. Tests all look the same. Team members can more easily help each other.
Clearer failures. When a scenario fails, reports show exactly what step failed.
Simpler bug reports. Use scenario steps as instructions to reproduce the failure.
2-phase test reviews. Review Gherkin first and then test code second to make sure the test cases are good before implementing the wrong things.
BDD enablement. Using a BDD framework opens the door for a team to embrace better behavioral practices in the future.

I wrote about these advantages before:

Case Studies

I’m also not the only one who finds value in BDD test frameworks outside of the full BDD process. Below are five case studies.

radish

radish is a Python test framework inspired by Cucumber. Its DSL syntax is a superset of Gherkin that adds preconditions, loops, variables, and expressions. These language additions indicate a bias towards automation because they enable engineers to write tests more programmatically, albeit in a Gherkin-ese way.

Karate

Karate is a test framework with a full DSL based on Gherkin with steps specifically tailored to Web service calls. Although it is implemented in Java, testers do not need to do any Java programming to write complete tests cases from day one. Peter Thomas, the creator of Karate, unabashedly declares that Karate does not truly adhere to BDD but nevertheless uses Cucumber for its automation benefits. (Note: Karate is working to move completely off of Cucumber. See GitHub issue #444.)

REST Assured

REST Assured is a Java package for testing REST APIs. Unlike Karate, REST Assured provides a fluent syntax (and not a DSL) for writing service calls directly in Java code. The fluent syntax is based on Gherkin: given() a request spec is created, when() the call is made, then() verify the response. Although REST Assured is not a full testing framework, it nevertheless pulls inspiration from BDD frameworks for order and structure.

Cycle

Cycle is a BDD-focused test automation platform from Cycle Labs for testing Web, terminal, and desktop apps. Cycle is unique because it provides out-of-the-box steps for all types of supported testing so that no programming experience is required. Testers write feature files using Cycle 2.0’s slick new Electron app. Scenarios are written in CycleScript, a Gherkin-ese language with additions like variables and sub-scenario calls. Steps tend to be imperative, but that’s the tradeoff for not requiring lower-level programming.

Hexawise

Hexawise is a combinatorial testing tool designed to maximize coverage with minimal test counts by smartly joining feature variations. It helps testers write better tests with less redundancy and fewer gaps. Although Hexawise has historically assisted manual testers, it also can generate Gherkin feature files for test variations.

Not all cucumbers are the same. Above is a sea cucumber.

Good Enough?

Gherkin-based test frameworks are not perfect, but they do provide good structure. They gained popularity outside of the pure BDD movement because they genuinely added value to testing and automation. Like any other tool, teams will use them in both good and bad ways. (Trust me, I’ve seen scary Gherkin.)

It’s interesting to see how groups outside the Cucumber diaspora are attempting to solve the limitations of pure Gherkin. Each case study above showed a unique path. Clearly, the test automation problem has not yet been completely solved, but current BDD frameworks are the best off-the-shelf solutions we have until a new software testing movement comes along.

Gherkin Syntax Highlighting in Visual Studio Code

Visual Studio Code is an incredible code editor that’s on the rise. It offers the power of an IDE with the speed and simplicity of a lightweight text editor, similar to Sublime, Atom, and Notepad++. If you’re a BDD addict, then VS Code is a great choice for writing Gherkin features, too! There are a number of extensions for Gherkin. Which one is the best? Below is my recommendation.

TL;DR

Install both:

Extension #1

VS Code has a few free extensions to support Gherkin. The first one I tried was Cucumber (Gherkin) Full Support. This one had the highest number of installs. When I started writing feature files, it provided snippets for each section and syntax colors. The documentation said it could also provide step suggestions (meaning, I type “Given” and it shows me all available Given steps) and navigation to step definition code, but since it looked like it only worked for JavaScript, I didn’t try it myself. that left me with no step suggestions. The indentation looked off, too. Not perfect. I wanted a better extension.

This slideshow requires JavaScript.

Extension #2

The second one I tried was Snippets and Syntax Highlight for Gherkin (Cucumber). It provides colorful syntax highlighting and a few three-letter snippets for Gherkin keywords. When I typed “fea”, a full template for a Feature section appeared with user story stubs (“In order to ___, As a ___, I want ___”). Nice! Good practice. The “sce” snippet did the same thing for the Scenario section with Given, When, and Then steps. Each section was indented nicely, too. The only downside was the lack of a snippet for Examples tables. Nevertheless, tables were still highlighted. But again, no step suggestions.

This slideshow requires JavaScript.

Extension #3

The third extension I tried was Feature Syntax Highlight and Snippets (Cucumber). It was very similar to the previous extension, but it used different colors. The snippet shortcuts were also not as intuitive – they used the letter “f” for feature followed by the first letter of the section. For example, “ff” was a Feature section, and “fs” was a Scenario section. Unfortunately, this extension did not provide step suggestions. Comments and example table rows did not get highlighted, either. Personally, I preferred the previous extension’s color scheme.

This slideshow requires JavaScript.

Extension #4

The fourth extension I tried was Gherkin step autocomplete. This one promised step suggestions! However, I had some trouble setting it up. When I enabled the extension by itself, feature files did not show any syntax highlighting, and the steps had no suggestions. What? Lame. What the README doesn’t say is that it relies on a separate extension for feature file support. So, I enabled extension #2 together with this one. Then, I had to move my feature file into a project-root-level directory named “features.” (This path could be customized in the extension’s settings, but “features” is the default.) And, voila! I got pretty colors and step suggestions.

This slideshow requires JavaScript.

But Wait, There’s More!

There were even more extensions for Gherkin. I was happy with #2 and #4, so I didn’t try others. The others also didn’t have as many installations. If anyone finds goodness out of others, please post in the comments!