Author: Andy Knight

I'm a software engineer who specializes in test automation.

What is BDD, and How Do We Practice It? (Webinar + Q&A)

On March 18, 2019, I gave a webinar entitled, “What is Behavior-Driven Development, and How Do We Practice It?” in collaboration with Paul Merrill and his company, Beaufort Fairmont. It was both a pleasure and an honor to do this webinar with them. Paul is a top-notch test automation expert, and Beaufort Fairmont is doing really exciting things. Check out their two-day BDD training offering, as well as their blog and other webinars.

To see my webinar recording, register here.

During the webinar, attendees asked more questions than we could answer. I’m excited that so many people asked questions. My answers are below.

Questions about Process

How is BDD different from TDD (Test-Driven Development)?

BDD is an evolution of TDD. In TDD, developers (1) write unit tests and watch them fail, (2) develop the feature to make the tests pass, (3) refactor the code to make it stronger, and (4) repeat the cycle. In BDD, teams do this same loop with feature tests (a.k.a “acceptance” or “black-box” tests) as well as unit tests. Furthermore, BDD adds shift left practices like Example Mapping and Specification by Example so that teams know what they are doing and focus on developing the right things.

Check out Dan North’s article, Introducing BDD, for a more thorough answer.

Can BDD be used with manual testing?

Yes! BDD is not merely an automation tool – it is a set of pragmatic practices to help teams develop better software. Gherkin scenarios are first and foremost behavior specs that help a team’s collaboration and accountability. They function secondarily as test cases that can be executed either manually or with automation.

Can we use BDD with technical stories or backend features?

Yes! If you can describe it, then you can do it.

How many Gherkin scenarios should one story have?

There’s no hard rule, but I recommend no more than a handful of rules per story, and no more than a handful of examples per rule. If you do Example Mapping and feel overwhelmed by the number of cards for a story, then the story should probably be broken into smaller stories.

Should we do Example Mapping for every story? Spending 20-30 minutes for each story would take a long time.

Try doing Example Mapping on one or two stories to start. The first time is always rough, but as you iterate on it, you’ll get better as a team. Even though Example Mapping has an upfront time cost, it will save a lot of time later in the sprint because (a) acceptance criteria is clear, (b) tests are already written, and (c) everyone has a mutual understanding of the story. The team won’t suffer through the inefficiencies of miscommunication and poor planning. You may even want to replace planning meeting with Example Mapping meetings.

What metrics should we use with BDD?

All metrics are flawed, but some metrics are useful. All the standard testing and Agile metrics still apply: code coverage, story velocity, etc. Here are some additional metrics you may consider for BDD:

  • the percentage of stories that undergo Example Mapping before the sprint
  • the number of rules and examples that get “missed” during Example Mapping and need to be added later
  • the percentage of Gherkin scenarios that get automated in the sprint

If you choose to track metrics, make sure their feedback is used to improve team practices. For more info on metrics, please read my Quality Metrics 101 series.

What were the resources you recommended at the end of the webinar?

Questions about Tools

What test management tools should we use with BDD?

I’m sure there are BDD plugins for test management tools, but I don’t have any that I can personally recommend. To be honest, I try to stay away from large test management tools like HP ALM, qTest, VersionOne. When doing BDD, the Gherkin feature files themselves should be the single source of truth for feature-level tests, and they should be version-controlled in a repository. Don’t fall into the trap of slapping “Given-When-Then” keywords onto existing functional tests – that’s not BDD.

Does Jira support Example Mapping?

I have not personally used any Jira plugin for Example Mapping. It looks like there is an Easy Agile User Story Maps plugin that is similar to but slightly different from Example Mapping.

Are there other good tools for BDD and Example Mapping?

What’s the difference between Gherkin, Cucumber, and SpecFlow?

  • Gherkin is the Given-When-Then spec language.
  • Cucumber is a company and its eponymous test framework that uses Gherkin.
  • SpecFlow is Cucumber for .NET.

Questions about Testing

Can BDD test frameworks be used for unit testing?

Yes, but I don’t recommend it. BDD frameworks shine for black-box feature testing. They’re a bit too verbose for code-level unit tests. Read BDD 101: Unit, Integration, and End-to-End Tests for more info.

Can BDD test frameworks be used for integration testing?

Yes! See BDD 101: Unit, Integration, and End-to-End Tests.

How long should Gherkin scenarios be?

Scenarios should be bite-sized. Each scenario should focus on one individual behavior. There’s no hard rule, but I recommend single-digit step counts. Read BDD 101: Writing Good Gherkin for more info.

What are “step definitions” in Cucumber?

Step definitions are the methods in the automation code that execute the steps. When a BDD framework runs a Gherkin scenario as a test, it “glues” each step to a step definition based on some sort of string matching.

How can we minimize duplicate code within a BDD test framework?

Know your steps. Always search for existing steps before writing new steps. Refactor existing steps whenever appropriate. Reuse steps when writing new scenarios. Do pair programming or mob programming when writing scenarios. Put scenarios through code reviews. Apply good coding practices – remember, test automation is software.

I write Gherkin scenarios, but I don’t write test automation code. What’s the best way to write Gherkin scenarios so that they can be automated?

Do pair programming with the automation engineers to write Gherkin scenarios together. Become familiar with existing steps by reading and searching feature files. Otherwise, the Gherkin steps you write in isolation might not be usable. Remember, BDD is a team effort!

The examples in the webinar were all fairly basic. Do you have any examples with more complex systems?

I have some example projects on GitHub in Python and Java with some basic unit, integration, and end-to-end tests, but I don’t have any large-scale examples that I can share publicly.

We wrote hundreds of SpecFlow tests without the other Amigos. Now, there are large test gaps, and many steps aren’t reusable. What should we do?

I’m sorry to hear that. It’s not an uncommon story. There are two paths: (1) refactoring or (2) starting over. Without really knowing the situation, I don’t think it’s my place to say which way is better. Here are some questions to help guide your decision:

  • What are your goals for testing and automation?
  • What’s your overall quality and testing strategy?
  • What parts of the code base are salvageable?
  • What parts of the code base should be removed?
  • If you started again from scratch, what would you do differently to make sure the same problems don’t reoccur?

I strongly recommend taking the Setting a Foundation for Successful Test Automation course from Test Automation University. (It’s free.) I also gave a talk about this very problem, Egad! How Do We Start Writing (Better) Tests?, at a few Python conferences.

We have a large BDD test suite with heavy coupling and slow execution times. The business amigos have also left the company. Should we try to fix what we have or just start over?

Sorry to hear that; same answer as before.

Final Questions

Why do you call yourself the “Automation Panda”?

Pandas are awesome. Everybody loves them. And nobody forgets my moniker.

Where can I get team training in BDD?

Beaufort Fairmont provides a one- or two-day course in BDD and writing Gherkin. Sign up for more information here.

PyCaribbean 2019 Reflections

PyCaribbean was held in Santo Domingo, Dominican Republic from February 16-17, 2019. I was blessed with the opportunity to deliver not just one talk but two at the conference! Typically, I write lengthy chronological reflections of my conference experiences, but this time, I’m going to share my big takeaways.

The welcome banner. This conference was #lit!

Python has GREAT people.

Python is truly just as much about the community as it is about the language, and conferences are one of the best ways to become part of that community. Everyone at PyCaribbean was excited to learn, grow, and be inspired. Here’s a short list of Twitter handles for some of the awesome folks who made a direct impact on me at the conference:

There were many others, too. I felt like I really got to connect with these folks, not just meet them in passing.

This was the room for my talk. Everyone was eager to learn about testing!

The Dominican Republic has GREAT people.

First of all, many thanks to Leonardo Jimenez for organizing the conference! He did so much not only to bring together an excellent program of speakers and events, but he also got the support of the local software community and even the government in the DR.

PyCaribbean really showed the best of the software world in the DR. Everyone there was hungry to learn and share. I had no idea how vibrant the software industry was becoming there, too. There’s a bright future ahead.

Don’t wait to make proposals to conferences.

I consider myself especially fortunate to have attended PyCaribbean because I almost didn’t get to go. I submitted my talk proposals one night on a whim after seeing the PyCaribbean Twitter handle appear on my feed. After submitting two talks, my third one got blocked with a message saying submissions had been closed. Had I delayed a few minutes, I would have been too late!

Join the PSF.

The Python Software Foundation (PSF) is a 501(c)(3) non-profit corporation that holds the intellectual property rights behind the Python programming language.

https://www.python.org/psf/

The PSF does awesome things for the community, such as running PyCon. Anyone can become a member, too! There are different membership classes for varying levels of involvement. Lorena Mesa, one of the PSF Directors, encouraged me to join during the conference. If you care about Python, then I encourage you to join as well!

Beach trips are fun.

The day after the conference ended, a bunch of us (mostly speakers) took a day trip to Be Live Collection Canoa at Bayahibe. This was my first time at an all-inclusive beach resort. The water was a clear light blue, and the sand was white. Mixed drinks and Presidente beers were unlimited – you could even order them from a bar in the swimming pool! The buffet lunch was also on point. Plus, the trip offered the perfect chance to get to know the others on a deeper level. I almost didn’t get to go, but thankfully Delta rearranged my flights due to weather delays and gave me an extra day in the DR. I needed that day at the beach to just be me, but relaxed. #WorthIt

Music brings people together.

One of the conference highlights was the electric violin performance on day 2. I don’t know the name of the musician, but he shredded it! He played “Wake Me Up” by Avicii, “Uptown Funk” by Bruno Mars, and “Corazon Espinado” by Santana. As I sat in the auditorium listening with the others, I just thought to myself, “This is really nice. This is a once-in-a-life performance for incredible renditions of these three awesome songs.” Everyone else there seemed to agree with me.

“Wake Me Up” (Avicii)
“Corazon Espinado” (Santana)

Passionfruit is delicious.

I ate passionfruit for the first time in my life in the DR. It was delicious. The edible flesh of the fruit is basically a gooey collection of black seeds in a yellow mucus. It tastes tart but slightly sweet. I normally don’t eat breakfast, but I devoured about four halves on the morning I first discovered them. They had them at the beach resort buffet, too!

Where has passionfruit been all my life?

I need to stick up for myself.

During my trip, it was very obvious that I was a tourist – a white American who spoke no Spanish and wandered around just to look at things. Unfortunately, because of that, some people tried to take advantage of me. I was clearly overcharged for my souvenirs, even after attempting to haggle. A guard at Independence Park took my phone to take pictures of me and then demanded money. On my return flight, a guy sat in my seat on the plane and refused to yield it to me.

These experiences really frustrated me. I’ve always been somewhat shy in social circumstances, and that leaves me vulnerable to others who would take advantage of me. Reflecting on how I handled these situations has made me determined to be more assertive. I won’t become a jerk, but I don’t need to be afraid to stick up for myself. I should use my inner strength and discernment instead of folding.

The world is a fallen place.

One thing truly broke my heart during my trip: I’m 99% sure I witnessed prostitution on multiple occasions. I won’t go into details, but it was shocking to me. Call me naïve. Let’s work to make a better world where this sort of thing doesn’t need to happen.

I know so little.

My PyCaribbean trip was challenging but rewarding. It was my first time visiting the Caribbean and Latin America. Not only did I gain some software experience, but I also gained some life experience. I’m thankful I got to go and that all the details fell perfectly into place. Hopefully, I’ll get to return to learn even more!

Time lapse while riding the JFK AirTrain

Sprint Planning Sucks. Can It Be Fixed?

Warning: This article contains strong opinions that might not be suitable for all audiences. Reader discretion is advised.

It’s Monday morning. After an all-too-short weekend and rush hour traffic, you finally arrive at the office. You throw your bag down at your desk, run to the break room, and queue up for coffee. As the next pot is brewing, you check your phone. It’s 8:44am… now 8:45am, and DING! A meeting reminder appears:

Sprint Planning – 9am to 3pm.

.

What’s your visceral reaction?

.

I can’t tell you mine, because I won’t put profanity on my blog.

Real Talk

In the capital-A Agile Scrum process, sprint planning is the kick-off meeting for the next iteration. The whole team comes together to talk about features, size work items with points, and commit to deliverables for the next “sprint” (typically 2 weeks long). Idealistically, team members collaborate freely as they learn about product needs and give valued input.

Let’s have some real talk, though: sprint planning sucks. Maybe that’s a harsh word, but, if you’re reading this article, then it caught your attention. Personally, my sprint planning experiences have been lousy. Why? Am I just bellyaching, or are there some serious underlying problems?

Sprint planning is a huge time commitment. 9am to 3pm is not an exaggeration. Sprint planning meetings are typically half-day to full-day affairs. Most people can’t stay focused on one thing for that long. Plus, when a sprint is only two weeks long, one hour is a big chunk of time, let alone 3, or 6, or a whole day. The longer the meeting, the higher the opportunity cost, and the deeper the boredom.

Collaboration is a farce. Planning meetings typically devolve into one “leader” (like a scrum master, product owner, or manager) pulling teeth to get info for a pre-determined list of stories. Only two people, the leader and the story-owner, end up talking, while everyone else just stares at their laptops until it’s their turn. Discussions typically don’t follow any routine beyond, “What’s the acceptance criteria?” and, “Does this look right?” with an interloper occasionally chiming in. Each team member typically gets only a few minutes of value out of an hours-long ordeal. That’s an inefficient use of everyone’s time.

No real planning actually happens. These meetings ought to be called “guessing” meetings, instead. Story point sizes are literally made up. Do they measure time or complexity? No, they really just measure groupthink. Teams even play a game called planning poker that subliminally encourages bluffing. Then, point totals are used to guess how much work can be done during the sprint. When the guess turns out to be wrong at the end of the sprint (and it always does), the team berates itself in retro for letting points slip. Every. Time.

Does It Spark Joy?

I’ve long wondered to myself if sprint planning is a good concept just implemented poorly, or if it’s conceptually flawed at its root. I’m pretty sure it’s just flawed. The meetings don’t facilitate efficient collaboration relative to their time commitments, and estimates are based on poor models. Retros can’t fix that. And gut reactions don’t lie.

So, what should we do? Should we Konmari our planning meetings to see if they spark joy? Should we get rid of our ceremonies and start over? Is this an indictment of the whole Agile Scrum process? But then, how will we know what to do, and when things can get done?

I think we can evolve our Agile process with more effective practices than sprint planning. And I don’t think that evolution would be terribly drastic.

Behavior-Driven Planning

What we really want out of a planning meeting is planning, not pulling and not predicting. Planning is the time to figure out what will be done and how it will be done. The size of the work should be based on the size of the blueprint. Enter Example Mapping.

Example Mapping is a Behavior-Driven Development practice for clarifying and confirming stories. The process is straightforward:

  1. Write the story on a yellow card.
  2. Write each rule that the story must satisfy on a blue card.
  3. Illustrate each rule with examples written on green cards.
  4. Got stuck on a question? Write it on a red card and move on.

One story should take about 20-30 minutes to map. The whole team can participate, or the team can split up into small groups to divide-and-conquer. Rules become acceptance criteria, examples become test cases, and questions become spikes.

Here’s a good walkthrough of Example Mapping.

What about story size? That’s easy – count the cards. How many cards does a story have? That’s a rough size for the work to be done based on the blueprint, not bluffing. More cards = more complexity. It’s objective. No games. Frankly, it can’t be any worse that made-up point values.

This is real planning: a blueprint with a course of action.

So, rather than doing traditional sprint planning meetings, try doing Example Mapping sessions. Actually plan the stories, and use card counts for point sizes. Decisions about priority and commitments can happen between rounds of story mapping, too. The Scrum process can otherwise remain the same.

If you want to evolve further, you could eliminate the time boxes of sprints in favor of Kanban. Two-week work item boundaries can arbitrarily fall in the middle of progress, which is not only disruptive to workflow but can also encourage bad responses (like cramming to get things done or shaming for not being complete.) Kanban treats work items as a continuous flow of prioritized work fed to a team in bite-sized pieces. When a new story comes up, it can have its own Example Mapping “planning” meeting. Now, Kanban is not for everyone, but it is popular among post-Agile practitioners. What’s important is to find what works for your team.

Rant Over

I know I expressed strong, controversial opinions in this article. And I also recognize that I’m arguing against bad examples of Agile Scrum. Nevertheless, I believe my points are fair: planning itself is not a waste of time, but the way many teams plan their sprints uses time inefficiently and sets poor expectations. There are better ways to do planning – let’s give them a try!

Web Element Locators for Test Automation

If you do any Web UI test automation (like with Selenium WebDriver), then you probably spend a large chunk of your test development time finding elements on a page, like buttons, inputs, and divs. Finding the right elements, however, can be challenging, especially when they lack unique IDs or class names. This guide will show you how to locate any Web element like a pro.

What are Web elements?

A Web element is an individual entity rendered on a Web page. Everything a user sees on a Web page (and even some things they don’t see) are elements: title headers, okay buttons, input fields, text areas, and more. Elements are specified in HTML by tag name, attributes, and contents. They may also have child elements, such as a table containing rows. CSS may be applied to elements to style them with colors, sizes, position, etc. Programming languages typically access Web elements as nodes in the Document Object Model (DOM).

What are Web element locators?

Web elements and locators are two different things. A Web element locator is an object that finds and returns Web elements on a page using a given query. In short, locators find elements.

Why are locators needed? As human users, we interact with Web pages visually: We look, scroll, click, and type through a browser. However, test automation interacts with Web pages programmatically: it needs a coded way to find and manipulate those same elements. Traditional automation won’t “look” at the page like a human* – it will search the DOM instead.

(*Newer automation technologies enable visual testing, which will be discussed later in this article.)

Selenium WebDriver separates the concerns of element location and interaction. WebDriver calls for these two concerns are frequently written back-to-back:

// WebDriver example: typing a search phrase at www.google.com
// This code is written in C#, but the calls are the same in any language

// First, element location
IWebElement searchField = driver.FindElement(By.Name("q"));

// Second, element interaction
searchField.SendKeys("panda");

WebDriver provides the following locator query types using “By”:

Which one is best? We’ll discuss that below.

Locators may also return multiple elements, or none at all! For example:

// Get the list of results from a Google search
// Using "FindElements" will return a list of all elements found in order
// Using "FindElement" would return the first element found (or throw an exception if no elements were found)
IList<IWebElement> results = driver.FindElements(By.CssSelector("div.r"));
results.Count.Should().BeGreaterThan(0);

Large test frameworks often use design patterns for structuring locators and interactions. The Page Object Model organizes locators and action methods together in classes by page or component. However, I strongly recommend the Screenplay Pattern over page objects because its pieces are more reusable and scalable. Whatever the pattern, locators are needed.

How do I find elements?

Elements can be a hassle to find when writing locators for test automation. To simplify my work flow, I use Google Chrome’s Developer Tools side-by-side with my IDE. Why choose Chrome?

To inspect any Web page in Chrome, simply right-click anywhere on the page:

Voila! DevTools will open. For finding Web elements, we want to use the Elements tab.

Visually pinpointing an element is easy. Click the “select” tool in the upper-left corner of the DevTools pane. (It looks like a square with a cursor on it.) The icon should turn blue.

Then, move the cursor to the desired element on the page. You will see each element highlighted in different colors as the mouse moves over. The corresponding HTML source code in the Elements tab will simultaneously be highlighted, too. Nice! Click on the desired element to set the highlighting so that it won’t disappear when you move the cursor elsewhere.

From here, you can check out the element’s tag, classes, attributes, contents, parents, and children.

How do I write good locators?

Finding the element is half the battle. Forming a unique locator query is the other half. If a locator is too broad, then it could return false positives. However, if a locator is too specific, then it could be susceptible to break whenever the DOM changes, and it could also be difficult for others to read. The best philosophy is this: Write the simplest locator query that uniquely identifies the target element(s).

My locator query type order-of-preference is:

  1. ID (if unique)
  2. Name (if unique)
  3. Class name
  4. CSS Selector
  5. XPath without text or indexing
  6. Link text / partial link text
  7. XPath with text and/or indexing

Unique IDs, names, and class names make locators super easy to write: queries are short and don’t need extra anchors. Always encourage developers on the team to use unique identifiers like class names for all elements. However, many elements do not have them, which means locators must fall back on more complicated CSS selectors and XPaths (*shiver*). Whenever this happens, here’s some advice:

  • Use parents as anchors if they have unique identifiers.
    • CSS selector example: “#some-list > li”
    • XPath example: “//ul[@id=’some-list’]/li”
  • Avoid XPaths that use text or indexing if possible.
    • Bad example: “//div[3]//span[text()=’hello’]”
    • Those tend to be the most brittle checks.
  • Use the “contains” function when checking for classes in XPath.
    • Example: “//div[contains(@class, ‘some-class’)]”
    • Elements frequently have more than one class.
    • “contains” will check a substring instead of the full class string.
    • However, be careful because “some-class2” would be matched!

Always test locators, too. Syntax errors and false positives happen frequently. Chrome DevTools makes testing locators easy. Simply hit Ctrl-F on the Elements tab and then paste the locator query into the finder field. DevTools will highlight all the matching elements in order. Spiffy!

Sometimes, when I can’t figure out why a locator isn’t working for a test case, I’ll do the following:

  1. Run the test case with debugging from my IDE.
  2. Set a break point on the locator.
  3. Wait for the test case to stop at the break point.
  4. Enter DevTools on the active Chrome window.
  5. Check the DOM and test the locators on the live page.

What if my tests are flaky?

Web UI testing is roundly criticized for being “flaky” because tests often crash for unexpected reasons. However, much of the unreliability people hit with Web UI testing (and often with Selenium WebDriver itself) is that all Web interactions inherently pose race conditions. The automation and the browser execute independently, so interactions must be synchronized with page state. Otherwise, WebDriver will throw exceptions for timeouts, stale elements, and elements not found. Many times, these issues happen intermittently, so they can be difficult to trace and resolve.

The best way to avoid race conditions is this: Always wait for an element to exist before interacting with it. This may seem basic, but it’s easy to overlook. Selenium WebDriver packages all offer some sort of WebDriverWait object that will force the driver to wait for a given condition to be true before proceeding. The easiest way to check if an element exists is to check if the list of elements returned by a FindElements (plural) call is non-empty. Adding another call for each interaction may feel burdensome, but design patterns within well-designed frameworks (like the Screenplay Pattern) can make these checks happen automatically.

Another good practice is this: Always fetch fresh elements. Sometimes, automation will first get some elements and then use a second query to get more elements. Or, in the case of the Page Object Factory (which should never be used because, bluntly, its design is terrible), elements are fetched once when the page object is constructed and referenced thereafter. No matter which way, the longer a Web element object exists, the more prone it is to become stale and cause exceptions. I’ve seen elements turn stale inexplicably even when they still seem to be on the page, too. Always get an element in the moment when it is needed. That way, it can’t go stale!

Want some helpful tips for clicking tricky elements? Check out this article: Clicking Web Elements with Selenium WebDriver.

How can AI help Web UI testing?

Several new AI-based projects/products aim to improve automated Web UI testing over traditional methods:

  • Applitools extends Selenium WebDriver automation with checks for nontrivial visual differences.
  • Testim can automatically heal locators whenever they break, avoiding test flakiness due to front-end changes.
  • Mabl is an assistant that will learn and rerun tests that developers teach it without writing any code.
  • Test.ai runs common user tests like login, searching, and shopping on mobile apps based on what its AI has learned from several other apps.
  • Rainforest QA uses crowdsourcing plus AI to run manual tests specified by a team almost like they are automated.

Test Automation University also offers a free course on using AI for element selection: AI for Element Selection: Erasing the Pain of Fragile Test Scripts.

Many AI testing tools definitely add value, but keep in mind, under the hood, locators are still used somewhere.

Testing Web Services with Karate

Karate is a relatively new open source framework for testing Web services. Even though Karate is written in Java, its main value proposition is that testers don’t need to do any Java programming in order to write fully automated tests. Instead, testers use a Gherkin-like language with steps for making requests and validating responses. It’s like Cucumber with out-of-the-box Web API steps! There are a bunch of other nifty features, too.

This article is my quick-start guide for Karate. As a prerequisite, make sure you understand how Web services work (like REST APIs). Knowing BDD will also help.

My System

Since Karate is an open-source Java project, it can run almost anywhere. Here’s my system config:

  • macOS 10.13.6 (High Sierra)
  • Java 1.8.0_191
  • Apache Maven 3.6.0
  • Karate 0.9.0

Warning: I initially attempted to run my Karate project using Java 11.0.1, but I repeatedly hit SSL handshake exceptions despite trying many fixes. Downgrading to Java 8 fixed the errors. See Issue #617.

Project Setup

I created my Karate project using the Maven archetype since I am quite familiar with Maven. I named my project “firstchop”:

mvn archetype:generate \
  -DarchetypeGroupId=com.intuit.karate \
  -DarchetypeArtifactId=karate-archetype \
  -DarchetypeVersion=0.9.0 \
  -DgroupId=com.automationpanda \
  -DartifactId=firstchop

The directory layout was standard for Java Maven / Cucumber-JVM projects. (In fact, Karate was based on Cucumber-JVM until version 0.8.0.) Interestingly, the Karate docs recommend placing feature files under src/test/java instead of src/test/resources.

IDE

The Karate docs recommend using Eclipse or IntelliJ IDEA for developing Karate tests. Both IDEs offer support for JUnit and Cucumber, which Karate can leverage not only for editing but also for running tests. I’d probably use IntelliJ IDEA for serious testing.

However, for my initial exploration, I chose to use Visual Studio Code. Why?

  • It’s fast and easy.
  • It has good support for Java and Gherkin.
  • It has a file explorer and an integrated terminal.

karate-ide

Here’s what my Karate project looked like inside Visual Studio Code. Nice!

The Examples

The archetype project included example tests in the users.feature file. The first scenario from the file, copied below, tests getting users from the JSONPlaceholder REST API:

Feature: sample karate test script

Background:
* url 'https://jsonplaceholder.typicode.com'

Scenario: get all users and then get the first user by id

Given path 'users'
When method get
Then status 200

* def first = response[0]

Given path 'users', first.id
When method get
Then status 200

Anyone familiar with Cucumber will immediately recognize the Given/When/Then format for scenarios. However, Karate’s standard steps make the language more powerful than raw Gherkin:

  • Given steps build requests
  • When steps make request calls
  • Then steps validate responses
  • Catch-all steps (*) provide additional directives, like setting variables

All steps are quite concise. The Java implementation is essentially hidden from the tester (unless, of course, they want to plunge into the framework’s lower levels). Furthermore, this scenario shows how to use data from one response as the input for a second request.

Running Tests

Since I chose to use Maven and Visual Studio Code, the easiest way to run tests without any additional configuration was through the command line using “mvn test”. The example tests came with an ExamplesTest.java file that will run all feature files in the package when it is discovered during Maven’s “test” phase. The project defaulted to using JUnit 4, but JUnit 5 is also supported.

Running the tests will print many lines to the console. Every request and response will be printed. Below is the tail end of a successful run for users.feature:

$ mvn test
.
.
.
---------------------------------------------------------
feature: classpath:examples/users/users.feature
scenarios:  2 | passed:  2 | failed:  0 | time: 1.5232
---------------------------------------------------------
HTML report: (paste into browser to view) | Karate version: 0.9.0
file:/path/to/firstchop/target/surefire-reports/examples.users.users.html
---------------------------------------------------------

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.09 sec

Results :

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  5.635 s
[INFO] Finished at: 2018-12-09T23:32:29-05:00
[INFO] ------------------------------------------------------------------------

Karate can generate helpful test reports, too. The project generates JUnit reports by default, but other report formats like Cucumber reports are also possible.

karate-report-1

The JUnit HTML report shows a step-by-step log for each scenario.

karate-report-2

Full requests and responses are automatically logged, which is great for debugging.

karate-report-3

Failures are visually easy to identify. Above, I hacked the example to deliberately fail.

My Scenarios

Seeing examples is great, but I wanted to write my own scenarios with a real Web service for some hands-on experience. So, I wrote a test for Recipe Puppy, a search engine that provides links to recipes for input ingredients:

Feature: Recipe Puppy

Background:
* url 'http://www.recipepuppy.com'

Scenario Outline: Get a recipe for <ingredient>

Given path 'api'
And params {i: '<ingredient>'}
When method get
Then status 200
And match response contains {results: '#array'}
And match response.results[*] contains
  """
  {
    title: '#string',
    href: '#string',
    ingredients: '#regex <ingredient>',
    thumbnail: '#string'
  }
  """

  Examples:
    | ingredient |
    | tomato     |
    | pepperoni  |
    | cheese     |

 

Here are a few things I learned while writing this new feature file:

  • Scenario Outlines are supported, but data-driven features are preferred.
  • JSON objects can be used as step arguments or as variables.
  • Matching syntax is simple but sophisticated.
  • Markers like “#array” support fuzzy matching when specific values are unknown.
  • Multi-line JSON expressions need block quotes.

This new scenario ran just as successfully as the examples!

Standalone Execution

Even though writing tests using Karate’s domain-specific language does not require Java development skills, setting up the full Karate project does. Thankfully, Karate provides a standalone JAR that can run feature files without any other dependencies or configuration. It simply takes in paths to feature files, runs them, and generates Cucumber reports. The standalone JAR would be a good option for testers who don’t have strong programming skills.

karate-cuke-report

This was the Cucumber report generated by running my Recipe Puppy test with the standalone JAR.

Other Features

Despite being a fairly young project (with a GitHub creation date of February 7, 2017), Karate is full of nifty features that I have yet to try:

  • Parallel test execution
  • A mock servlet
  • A UI for visually debugging scripts
  • Reading files of many types into variables
  • Calling a feature file from another feature file
  • Calling JavaScript code
  • Calling Java code
  • Using scenarios as Gatling performance tests

Project contributors are also experimenting to extend Karate to test browser, mobile, and desktop UIs using Selenium WebDriver.

Analysis

Overall, Karate is a great tool for testing Web services. It handles all of the programming implementation details so that testers can focus more on testing. Its syntax is concise, clear, and versatile. Its native support for JSON object makes request and response handling feel natural. The GitHub docs are on-point. Anyone testing REST APIs should give it serious consideration.

Even though Karate uses Gherkin-like syntax, it is not truly a behavior-driven test framework. Instead, it simply leverages the strengths of Cucumber – readability, reusability, and tooling – to make automated Web service testing easier. I have long stated that one of the best solutions to test automation challenges is to create a domain-specific testing language that automatically handles low-level details. (See Behavior-Driven Blasphemy.) The Karate project validates my claim. Karate’s DSL acts much more like a testing language than a business-oriented specification language. Its Gherkin keywords simply provide structure and familiarity to its steps. And it’s perfectly okay that Karate isn’t “pure BDD” – just ask the project’s creator.

With that said, I would argue that a tester probably needs basic programming skills to be successful with Karate. Execution needs Java setup no matter what, and feature files are really just fancy test scripts. And Web service APIs are inherently code-y.

I look forward to seeing how the Karate project grows, especially if/when WebDriver-based steps are added to the language.

Resources

 

Frameworks: All-in-One or Piece-by-Piece?

Software frameworks are great because they apply the principle of Separation of Concerns. A framework’s tools and code handle a specific need in a standard way for developers to write other code more easily. For example:

  • Web frameworks support receiving requests and sending responses.
  • Test frameworks include test case structure, runners, and reporting mechanisms.
  • Logging frameworks control how messages are gathered and stored.
  • Dependency injection frameworks create and manage object instances.

Recently, a question hit me: How far should a framework go to separate concerns? Should a framework try to do everything all-in-one, or should it behave more like a library that focuses on doing one thing well?

Let’s look at Python Web frameworks as an example. Django, the “Web framework for perfectionists with deadlines,” provides everything a developer could want out of the box. Flask, on the other hand, is a “microframework” that prides itself on minimalism: any extras must be handled by extensions or other packages. The differences between the two become clear when comparing some of their features:

Feature Django Flask
HTTP Requests and Routing Included Werkzeug (bundled)
Templates Included Jinja2 (bundled)
Forms Included None (Flask-WTF)
Object-Relational Mapping (ORM) Included None (SQLAlchemy)
Security Included None (Flask-Security)
Localization Included None (Flask-Babel)
Admin Interface Included None

Clearly, Django is all-in-one, while Flask is piece-by-piece. To make a serious Flask app, developers must pull in many extra pieces. There are many other frameworks with similar competitions:

  • JavaScript testing: Jasmine vs. Mocha
  • JavaScript development: Angular vs. React
  • Java BDD testing: Serenity vs. Cucumber-JVM

I think each approach has its merits. All-in-one frameworks are more convenient to use, especially for beginners. For developers who are new to a domain or just need to get something up fast, all-in-ones are the better choice. They come with all units already integrated together, and they often have good documentation. However, developing an all-in-one framework takes much more work because it covers multiple concerns. Developers may also feel shoehorned into the framework’s way of doing things. All-in-ones typically dictate what they believe to be the “best” solution.

Piece-by-piece frameworks require more expertise but offer greater flexibility. Developers can pick and choose the pieces they need, and they can change the packages used by the solution more easily. Found a better ORM? Not a problem. Need to localize the site in Chinese? Add it! Solutions can avoid excess weight and stay nimble for the future. The big challenge is successful integration. Furthermore, a library or framework for a singular concern tends to solve the concern in better ways simply because project contributors give it exclusive focus. The more I learn about a space, the more I lean towards a piece-by-piece approach.

As always, pick frameworks based on the needs at hand. For example, I like to use Django to make websites for my wife’s small businesses because the admin interface is just so convenient for her, even though I could get away with Flask. However, I’ll probably pick Mocha (piece-by-piece) over Jasmine (all-in-one) whenever I return to JavaScript testing.

PyGotham 2018 Reflections

As only New Yorkers know, if you can get through the twilight, you’ll live through the night. (Dorothy Parker)

PyGotham 2018 was my first PyGotham conference and my third conference of the year. I first heard about PyGotham when Jon Banafato and Dan Crosta, two of the organizers, invited me to submit proposals when we met at the PyCon 2018 Instagram dinner. I enjoyed the conference very much, but my experience was much different than the previous two conferences.

A Tourist Day in New York

PyGotham is hosted in New York City. I booked my flight a day before the conference started to make sure that I’d be on time. My wife also came along for the trip. That meant Thursday, October 4 was a tourist day! Our flight arrived in EWR at about 10:30am. We took a commuter train to Penn Station in Manhattan and walked a block to our hotel, Hotel Pennsylvania. Unfortunately, we couldn’t check into our room until 3pm, so we dropped our bags (for $15) and went along our way.

Our first stop was lunch at Xi’an Famous Foods. XFF is a fast casual restaurant chain for authentic Chinese food local to NYC. We ordered two noodle dishes (one liangpi “cold skin” noodles and the other the wide hand-pulled “biang biang” noodles), a pulled pork “burger,” and hawberry tea. Our meal was absolutely delicious.

We spent our afternoon at the National September 11 Memorial & Museum. I was just starting 8th grade when the terrorist attacks happened, but I remembered all the details clearly. The outdoor memorial – a sunken fountain for each tower’s square base – showed both emptiness and resolve. The museum was built into the underground foundations for the towers, essentially as a cave. The starkness of the concrete, the use of light and shadow, and the pacing of exhibits all highlighted the tragedy of 9/11/2001. Every American should visit the site if they are able.

We checked into our hotel before dinner that evening. And then we learned why Hotel Pennsylvania has only 2.7 stars on Google: it’s nasty. The hotel is 99 years old and looks like it. The carpets are filthy, the lights are dim, and most of the building smells damp. I chose to stay there because the conference was held there, but I don’t think I will be returning.

One plus side of Hotel PA is that it is right next to Koreatown. My wife and I ate dinner at Five Senses, a trendy Korean restaurant with a line nearly out the door. We then walked around a bit to see the hustle and bustle: we tried Gong Cha bubble tea, got breakfast for the next morning at H-Mart, and bought some face masks at Kosette Beauty Market. A Korean friend of ours was jealous!

The Conference

The conference itself lasted two days: Friday, October 5 and Saturday, October 6. About 600 people attended over the two days. By this time, I was well acquainted with Python conference formats. Each day opened with a keynote, followed by several talks in a few rooms with breaks in between. I reconnected with old friends like Trey Hunner and Gabriel Boorse and also made a few new ones.

The opening keynote was “Software Freedom and Ethical Technology” by Karen M. Sandler. Karen explained why free, open source software is foundational for the ethical use and development of technology. She used the example of her Implantable Cardioverter Defibrillator (ICD). Nearly all modern ICDs have wireless connectivity, which renders them vulnerable to hacking. Since the source code in this device and so many others in the Internet of things is closed, individuals are unable to make changes for customization or protection. Karen’s talk was definitely thought-provoking.

My favorite talk, by far, was Trey Hunner’s “Python Oddities Explained”. Trey dove deep into the way variables, data structures, and scope work in Python 3. This talk was particularly helpful for me because I work in several languages and sometimes assume how small details work.

My talk, “Egad! How Do We Start Writing (Better) Tests?“, explained how to build a functional test automation solution in Python from the ground up. It is the same talk I delivered at PyOhio 2018, though I hope it turned out better the second time!

Attendance was pretty good, too.

IMG_3638

Conference room panorama for my talk

Video recordings for all talks can be found on the PyGotham 2018 YouTube channel.

PyGotham had a stronger business vibe than the other Python conferences I attended. Many attendees came as part of company groups. (I heard rumors that about a hundred came from Bloomberg.) Speakers more openly talked about their companies and specific projects. This all made sense considering the location being NYC.

A number of vendors also had tables with goodies to give away. Bank of America, Buzzfeed, Venmo, Linode, Clover, and Django Girls all showed up, along with others I cannot remember off the top of my head. I netted two t-shirts (in addition to two PyGotham t-shirts), a notebook, some silly putty from Linode, and a handful of stickers.

After Hours

On Friday night after the conference, my wife and I took the Subway to Brooklyn and walked the Brooklyn Bridge back to Manhattan. The view of the city skyline was spectacular. We then hopped the Subway up to the Upper West Side for drinks with my friend Evan. On the way back to the hotel, we stopped at a dim sum restaurant for a late night dinner.

We didn’t have time to explore New York on Saturday night because we had an overnight flight to catch: we were going to London, England for the week after the conference! We checked out of the hotel, caught the Long Island Rail Road to JFK, and then flew Norwegian Air across the pond.

IMG_3644

Dim sum? Dim yum!

Takeaways

During PyGotham 2018, I spent a lot of time pondering software testing and automation. I like to use conferences as my time to disconnect from immediacy and think big thoughts.

One thought I formed was the nuance between testing and change detection. A friend of Gabriel’s posed an idea for REST API service testing. Rather than write a bunch of test cases that check responses, why not simply gather responses and diff them against previous responses? Request automation would still be needed, but response assertions could basically be skipped. Any changes would be identified and reported. This approach struck me as novel and useful but not comprehensive. Change detection will reveal when a code change results in a feature change, but it lacks the moral calculus of true testing because it cannot determine if the change is good or bad. It must yield the judgment to a higher authority, like a developer, a tester, or even an AI agent (which arguably further begs the question of goodness). Nevertheless, I think change detection can be very useful in conjunction with testing, either as a tool to assist manual testing or as an extension to an automation framework.

Another thought I had was to notice the widening gap between companies over testing maturity. Some companies are on the bleeding edge with the latest technologies, tools, and frameworks. They have multi-stage pipelines with countless automated tests and several daily deployments. Other companies are stuck in the past, still suffering through “Agile transformations” and failing tests. There are now several vendors pitching advanced testing solutions, like Sauce Labs for cross-browser testing and Applitools for visual testing. I wonder how many of these vendors are too far ahead for companies with less mature testing to catch up.

For the first time at a conference, I felt serious self doubt. Am I behind the times as a Software Engineer in Test? Is my method of software testing already antiquated? Am I truly a developer, or do I just pretend? Is there a future for the SET role? Will my ideas add value to the field? Will I have time to pursue all of my professional and technical desires? Am I good enough to be here at PyGotham? I felt a bit lonely and withdrawn. My malaise did not sour my conference experience, but it did come unexpectedly. Nevertheless, I must use it as impetus to become a better me.

Final Thoughts

I’m thankful for the opportunity I had to speak at PyGotham 2018. I thank the conference organizers for accepting my talk proposal, as well as PrecisionLender for sponsoring my travel costs. I’d love to return for PyGotham 2019 if possible!