Writing good Gherkin is a passion of mine. Good Gherkin means good behavior specification, which results in better features, better tests, and ultimately better software. To help folks improve their Gherkin skills, Gojko Adzic and SpecFlow are running a series of #GivenWhenThenWithStyle challenges. I love reading each new challenge, and in this article, I provide my answer to one of them.
Challenge 20 states:
This week, we’re looking into one of the most common pain points with Given-When-Then: writing automated tests that interact with a user interface. People new to behaviour driven development often misunderstand what kind of behaviour the specifications should describe, and they write detailed user interactions in Given-When-Then scenarios. This leads to feature files that are very easy to write, but almost impossible to understand and maintain.
Here’s a typical example:
Scenario: Signed-in users get larger capacity Given a user opens https://www.example.com using Chrome And the user clicks on "Upload Files" And the page reloads And the user clicks on "Spreadsheet Formats" Then the buttons "XLS" and "XLSX" show And the user clicks on "XLSX" And the user selects "500kb-sheet.xlsx" Then the upload completes And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" And the user clicks on "XLSX" And the user selects "1mb-sheet.xlsx" Then the upload fails And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx" And the user clicks on "Login" And the user enters "testuser123" into the "username" field And the user enters "$Pass123" into the "password" field And the user clicks on "Sign in" And the page reloads Then the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx" And the user clicks on "spreadsheet formats" Then the buttons "XLS" and "XLSX" show And the user clicks on "XLSX" And the user selects "1mb-sheet.xlsx" Then the upload completes And the table "Uploaded Files" contains a cell with "1mb-sheet.xlsx" And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx"
A common way to avoid such issues is to rewrite the specification to avoid the user interface completely. We’ve looked into that option several times in this article series. However, that solution only applies if the risk we’re testing is not in the user interface, but somewhere below. To make this challenge more interesting, let’s say that we actually want to include the user interface in the test, since the risk is in the UI interactions.
Indeed, most behavior-driven practitioners would generally recommend against phrasing steps using language specific to the user interface. However, there are times when testing a user interface itself is valid. For example, I work at PrecisionLender, a Q2 Company, and our main web app is very heavy on the front end. It has many, many interconnected fields for pricing commercial lending opportunities. My team has quite a few tests to cover UI-centric behaviors, such as verifying that entering a new interest rate triggers recalculation for summary amounts. If the target behavior is a piece of UI functionality, and the risk it bears warrants test coverage, then so be it.
Let’s break down the example scenario given above to see how to write Gherkin with style for user interface tests.
Behavior is behavior. If you can describe it, then you can do it. Everything exhibits behavior, from the source code itself to the API, UIs, and full end-to-end workflows. Gherkin scenarios should use verbiage that reflects the context of the target behavior. Thus, the example above uses words like “click,” “select,” and “open.” Since the scenario explicitly covers a user interface, I think it is okay to use these words here. What bothers me, however, are two apparent code smells:
- The wall of text
- Out-of-order step types
The first issue is the wall of text this scenario presents. Walls of text are hard to read because they present too much information at once. The reader must take time to read through the whole chunk. Many readers simply read the first few lines and then skip the remainder. The example scenario has 27 Given-When-Then steps. Typically, I recommend Gherkin scenarios to have single-digit line length. A scenario with less than 10 steps is easier to understand and less likely to include unnecessary information. Longer scenarios are not necessarily “wrong,” but their longer lengths indicate that, perhaps, these scenarios could be rewritten more concisely.
The second issue in the example scenario is that step types are out of order. Given-When-Then is a formula for success. Gherkin steps should follow strict Given → When → Then ordering because this ordering demarcates individual behaviors. Each Gherkin scenario should cover one individual behavior so that the target behavior is easier to understand, easier to communicate, and easier to investigate whenever the scenario fails during testing. When scenarios break the order of steps, such as Given → Then → Given → Then in the example scenario, it shows that either the scenario covers multiple behaviors or that the author did not bring a behavior-driven understanding to the scenario.
The rules of good behavior don’t disappear when the type of target behavior changes. We should still write Gherkin with best practices in mind, even if our scenarios cover user interfaces.
Breaking Down Scenarios
If I were to rewrite the example scenario, I would start by isolating individual behaviors. Let’s look at the first half of the original example:
Given a user opens https://www.example.com using Chrome And the user clicks on "Upload Files" And the page reloads And the user clicks on "Spreadsheet Formats" Then the buttons "XLS" and "XLSX" show And the user clicks on "XLSX" And the user selects "500kb-sheet.xlsx" Then the upload completes And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" And the user clicks on "XLSX" And the user selects "1mb-sheet.xlsx" Then the upload fails And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx"
Here, I see four distinct behaviors covered:
- Clicking “Upload Files” reloads the page.
- Clicking “Spreadsheet Formats” displays new buttons.
- Uploading a spreadsheet file makes the filename appear on the page.
- Attempting to upload a spreadsheet file that is 1MB or larger fails.
If I wanted to purely retain the same coverage, then I would rewrite these behavior specs using the following scenarios:
Feature: Example site Scenario: Choose to upload files Given the Example site is displayed When the user clicks the "Upload Files" link Then the page displays the "Spreadsheet Formats" link Scenario: Choose to upload spreadsheets Given the Example site is ready to upload files When the user clicks the "Spreadsheet Formats" link Then the page displays the "XLS" and "XLSX" buttons Scenario: Upload a spreadsheet file that is smaller than 1MB Given the Example site is ready to upload spreadsheet files When the user clicks the "XLSX" button And the user selects "500kb-sheet.xlsx" from the file upload dialog Then the upload completes And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" Scenario: Upload a spreadsheet file that is larger than or equal to 1MB Given the Example site is ready to upload spreadsheet files When the user clicks the "XLSX" button And the user selects "1mb-sheet.xlsx" from the file upload dialog Then the upload fails And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx"
Now, each scenario covers each individual behavior. The first scenario starts with the Example site in a “blank” state: “Given the Example site is displayed”. The second scenario inherently depends upon the outcome of the first scenario. Rather than repeat all the steps from the first scenario, I wrote a new starting step to establish the initial state more declaratively: “Given the Example site is ready to upload files”. This step’s definition method may need to rerun the same operations as the first scenario, but it guarantees independence between scenarios. (The step could also optimize the operations, but that should be a topic for another challenge.) Likewise, the third and fourth scenarios have a Given step to establish the state they need: “Given the Example site is ready to upload spreadsheet files.” Both scenarios can share the same Given step because they have the same starting point. All three of these new steps are descriptive more than prescriptive. They declaratively establish an initial state, and they leave the details to the automation code in the step definition methods to determine precisely how that state is established. This technique makes it easy for Gherkin scenarios to be individually clear and independently executable.
I also added my own writing style to these scenarios. First, I wrote concise, declarative titles for each scenario. The titles dictate interaction over mechanics. For example, the first scenario’s title uses the word “choose” rather than “click” because, from the user’s perspective, they are “choosing” an action to take. The user will just happen to mechanically “click” a link in the process of making their choice. The titles also provide a level of example. Note that the third and fourth scenarios spell out the target file sizes. For brevity, I typically write scenario titles using active voice: “Choose this,” “Upload that,” or “Do something.” I try to avoid including verification language in titles unless it is necessary to distinguish behaviors.
Another stylistic element of mine was to remove explicit details about the environment. Instead of hard coding the website URL, I gave the site a proper name: “Example site.” I also removed the mention of Chrome as the browser. These details are environment-specific, and they should not be specified in Gherkin. In theory, this site could have multiple instances (like an alpha or a beta), and it should probably run in any major browser (like Firefox and Edge). Environmental characteristics should be specified as inputs to the automation code instead.I also refined some of the language used in the When and Then steps. When I must write steps for mechanical actions like clicks, I like to specify element types for target elements. For example, “When the user clicks the “Upload Files” link” specifies a link by a parameterized name. Saying the element is a link helps provides context to the reader about the user interface. I wrote other steps that specify a button, too. These steps also specified the element name as a parameter so that the step definition method could possibly perform the same interaction for different elements. Keep in mind, however, that these linguistic changes are neither “required” nor “perfect.” They make sense in the immediate context of this feature. While automating step definitions or writing more scenarios, I may revisit the verbiage and do some refactoring.
Determining Value for Each Behavior
The four new scenarios I wrote each covers an independent, individual behavior of the fictitious Example site’s user interface. They are thorough in their level of coverage for these small behaviors. However, not all behaviors may be equally important to cover. Some behaviors are simply more important than others, and thus some tests are more valuable than others. I won’t go into deep detail about how to measure risk and determine value for different tests in this article, but I will offer some suggestions regarding these example scenarios.
First and foremost, you as the tester must determine what is worth testing. These scenarios aptly specify behavior, and they will likely be very useful for collaborating with the Three Amigos, but not every scenario needs to be automated for testing. You as the tester must decide. You may decide that all four of these example scenarios are valuable and should be added to the automated test suite. That’s a fine decision. However, you may instead decide that certain user interface mechanics are not worth explicitly testing. That’s also a fine decision.
In my opinion, the first two scenarios could be candidates for the chopping block:
- Choose to upload files
- Choose to upload spreadsheets
Even though these are existing behaviors in the Example site, they are tiny. The tests simply verify that a user clicks makes certain links or buttons appear. It would be nice to verify them, but test execution time is finite, and user interface tests are notoriously slow compared to other tests. Consider the Rule of 1’s: typically, by orders of magnitude, a unit test takes about 1 millisecond, a service API test takes about 1 second, and a web UI test takes about 1 minute. Furthermore, these behaviors are implicitly exercised by the other scenarios, even if they don’t have explicit assertions.
One way to condense the scenarios could be like this:
Feature: Example site Background: Given the Example site is displayed When the user clicks the "Upload Files" link And the user clicks the "Spreadsheet Formats" link And the user clicks the "XLSX" button Scenario: Upload a spreadsheet file that is smaller than 1MB When the user selects "500kb-sheet.xlsx" from the file upload dialog Then the upload completes And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" Scenario: Upload a spreadsheet file that is larger than or equal to 1MB When the user selects "1mb-sheet.xlsx" from the file upload dialog Then the upload fails And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx"
This new feature file eliminates the first two scenarios and uses a Background section to cover the setup steps. It also eliminates the need for special Given steps in each scenario to set unique starting points. Implicitly, if the “Upload Files” or “Spreadsheet Formats” links fail to display the expected elements, then those steps would fail.
Again, this modification is not necessarily the “best” way or the “right” way to cover the desired behaviors, but it is a reasonably good way to do so. However, I would assert that both the 4-scenario feature file and the 2-scenario feature file are much better approaches than the original example scenario.
What I showed in my answer to this Gherkin challenge is how I would handle UI-centric behaviors. I try to keep my Gherkin scenarios concise and focused on individual, independent behaviors. Try using these style techniques to rewrite the second half of Gojko’s original scenario. Feel free to drop your Gherkin in the comments below. I look forward to seeing how y’all write #GivenWhenThenWithStyle!