BDD

A Cool Panda

Is BDD Dying?

These days, folks keep asking me an uncomfortable question: “Is BDD dying?” Behavior-Driven Development has been around for about twenty years, but recently, it feels like the movement has stalled. Is BDD dead? Are we really at this point?

NO! Heck no. Not if I have anything to do with it. But I’ll be honest, we have some work to do to set things right. BDD needs to evolve. Let’s cover the story of what happened to BDD, what are the good things we should preserve from it, and how we build a better future on behavior-driven principles.

The Core Principles

Let’s start by defining Behavior-Driven Development. I’ve always defined BDD as a set of helpful practices to help you put your primary focus on the behaviors of the software that you are developing. Not on the code, not on the tools, but on the actual functionality of the product. Why? That’s what your users do! If we focus on behaviors first, then everything else falls into place.

Ultimately, behaviors matter more than code. Users don’t care if you wrote your app in JavaScript or Python. They don’t care if it’s running on Azure or AWS. They need your app to solve their problems. To get things done. To do their jobs. You can have the best code in the world with 100% unit test coverage and a perfect Agile process, but it doesn’t matter if the behaviors of the software product itself don’t deliver any meaningful business value.

The Panda is inspired to think about behaviors.

If we want to prioritize software behaviors, then we should ask ourselves a few important questions to shift our thinking:

  1. What if we tested behaviors together with the code? Both are important. It helps to have developers and testers on the same page.
  2. What if we described behaviors in plain language rather than in cryptic programming languages? Software is meant to be used by people. If we can’t explain how to use it in plain language, how will anybody ever be able to understand it enough to use it?
  3. What if we defined all these behaviors before ever touching any code? We could resolve many design issues before committing the time to code it. We would also have an agreement on what should be built that we could use to hold the team accountable.
  4. What if all the different roles on the team – business, development, and testing – all collaborated on these behavior specs? Multiple perspectives bring valuable insights into product development that ultimately contribute to higher quality and greater value.
  5. What if those behavior specs that the team writes together could be automated directly with special tooling? All those specs essentially become test cases. We could set up continuous feedback loops to tell us if we are building the right things and if they are working. The specs become Living Documentation.

And that’s how BDD was born! BDD is an orientation towards business value. It became a set of practices to help people focus first on behaviors and second on implementation details. People may love or hate the outcomes of the movement, but I don’t think anybody can rightfully disagree with the premise.

A Concise History

The first major BDD tools were test frameworks. They appeared in every major language by the early 2010s. Many of you have probably used one of these frameworks. The most popular one was Cucumber. My favorite was SpecFlow – a masterpiece of “automationeering.” Almost all other BDD test frameworks are derivatives of Cucumber.

A panda wearing ancient Chinese robes and holding a cucumber.

The early 2010s were truly the Golden Age of BDD, a time of flourishing and growth. Teams around the world started embracing it. Engineers integrated BDD frameworks with tools like Selenium WebDriver and Jenkins. Design patterns like the Screenplay Pattern arose. There was this incredible outpouring of new, exciting stuff!

BDD’s secret sauce was Gherkin. I’m sure almost everyone has seen Gherkin’s Given-When-Then steps. Its beauty was its simplicity. Folks could read Gherkin scenarios and immediately understand the intended behavior without worrying about all the implementation details.

Since all the BDD frameworks required testers to write their tests in Gherkin, the world came to see Gherkin as a domain-specific language for test automation. The frameworks naturally separated the “what” of the test from the “how.” Testers also realized how effective it was to write steps once and reuse them in any test.

Angry pandas fighting each other with cucumbers.

Alas, BDD’s golden age did not last forever. What ended it? War – primarily fought over Gherkin. Many folks like myself recognized how a programming language for test automation could make tests much easier and faster to automate. Others, however, hated writing an extra layer of steps above their test automation code. They felt like it was completely unnecessary, making tests slower to write and code less “clean.”

Meanwhile, many of the original leaders of the BDD movement felt horrified that teams would use frameworks like Cucumber without cross-role collaboration over behaviors first. They also saw how BDD became pigeonholed as a testing activity.

Many of these leaders reacted by focusing more on the collaboration side of BDD than the automation side. Good things came of this, such as the practice of Example Mapping. However, I believe the pendulum swung too hard the other way, almost to the point where many leaders practically shunned automation, discrediting the value in a language for testing apart from collaborative practices. Unfortunately, this meant that the “thought leaders” and the “trench workers” were heading in opposite directions.

A very sad, upset panda wearing ancient Chinese robes.

Regardless of disagreements, BDD became very popular throughout the software industry. By the end of the 2010s, testing tool vendors took notice. SmartBear bought Cucumber, and Tricentis bought SpecFlow. Initially, there were high hopes. The new owners started investing in building BDD tools beyond the classic test frameworks. BDD was having its moment. Personally, I loved SpecFlow’s LivingDoc generator.

What Happened?

Things have been rough for BDD the past few years.

Test tool vendors failed to commercialize BDD tools and ultimately gave up on them. I don’t know exactly what happened or why. All I know is that SmartBear gave Cucumber to the Open Source Collective, and Tricentis literally just shut down the entire SpecFlow project. Thankfully, Gaspar Nagy forked SpecFlow and rebranded it as Reqnroll. Nevertheless, seeing major test tool vendors move on from BDD tools sent a bleak signal about their future.

In that, we as an industry failed to successfully productize tools for collaboration. The world jumped on BDD because Cucumber-esque frameworks were easy to adopt. The world was less willing to adopt BDD’s collaborative techniques because they were merely processes, not products. Products are sticky; processes are not. Cucumber tests will still be running after we all retire.

The testing world also moved on. Exciting new tools like Cypress and Playwright came out. Then AI became big. Folks just aren’t talking about BDD as much.

Speaking of AI, LLMs made Gherkin feel old and clunky. Gherkin enables folks to write their specs in plain language, but the steps they write must be written perfectly and identically every time. In large projects, it becomes difficult to find the “right” step. Tools using LLMs let users write free-form steps and then figure out what the users meant. They are more user-friendly. They “do what I mean” rather than “do what I say.” 

And finally, the movement itself slowed down. There is still good work happening, but from my perspective, it is not as groundbreaking as the golden age. We haven’t seen many new things. The movement feels like it has stalled, especially post-COVID.

Deep Thoughts

I’ve thought about this a lot. A LOT. The principles of BDD are still valuable. In fact, I think they’re more applicable than ever. As a software industry, we continue to flounder through the same problems. I’ve also realized that the tooling doesn’t match the principles. As a movement, we haven’t productized aspects of the process in ways that help people improve their software development practices. We haven’t seen advancements to the same degree that we saw during BDD’s golden age of the Cucumber patch.

A panda deep sitting at a desk and writing in a notebook while deep in thought.

I’ve been asking myself two questions:

  1. What “Behavior” could be?
  2. How could we provide better tooling for it?

I think Artificial Intelligence technologies could help provide much better tooling. If used appropriately, it could remove many points of friction in the development process. AI would not replace product owners, developers, or testers but rather empower them… and perhaps even coach them into better practices. I can’t help but think about what BDD would have looked like if it had technology like LLMs from the start.

I’ve also come to realize that the “intelligence” itself is not artificial. It’s just regurgitated from existing sources. We can bake what we already know into it. The challenge then is not merely prompting it with the right queries. The challenge is delivering the insights it offers in a way that seamlessly becomes part of the greater development experience. So, rather than “artificial” intelligence, what we truly need is…

Automated Intelligence. We need insights to automatically come to us as we are developing software. We need coaching to help us stay oriented on behaviors. We need agents to take care of grindwork like organizing the artifacts of our process, running tests, and analyzing results. Intelligence doesn’t always need to come from heavyweight technology like LLMs, either. Sometimes, it can be as simple as an extra comment inserted into a spec or a rote question asked by a chatbot.

I think the notion of “automated” intelligence fits perfectly into the BDD process. I first learned about the process from Seb Rose and Gaspar Nagy in The BDD Books. There are three phases: Discovery → Formulation → Automation. This process frames software development into one seamless, cohesive process that focuses on behaviors through and through.

Behaviors: Discovery -> Formulation -> Automation in one Seamless Process

Let’s walk through these phases together.

A Renewed Process

Discovery

Discovery is the first phase. It is an activity of learning. In Discovery, teams are figuring out what the business-critical behaviors should be.

To be truly successful, teams need Cross-Role Collaboration to gain insights from all perspectives. This has commonly been referred to as “The Three Amigos” of business, development, and testing, but I prefer to think of it more broadly as cross-role collaboration to account for multiple people in those roles as well as people in other roles such as UI/UX design.

To be productive, teams engage in Structured Activities to make sure their meetings produce meaningful artifacts that move development forward. Structured activities include story mapping, example mapping, and question storming. Nobody likes to be part of a soul-crushing meeting with no agenda and no tangible outcome.

Furthermore, the artifacts like stories, rules, and examples help teams do real planning instead of guessing. Estimates for size are much more accurate when they are based on actual rules and examples.

I feel like Discovery has an untapped market. Any time I have led these kinds of sessions, every participant finds them to be shockingly beneficial. Yet every time I teach others how to run these sessions on their own, they always get stuck. Now, I am usually not an expert in the domain of the stories we end up exploring, but I do have coaching skills to ask the right questions. The process works magic; I just need to do a little smoke in mirrors as a faux magician.

That’s what I think tools for activities like Example Mapping really need. You can build the most beautiful diagramming tools, but if folks don’t have guidance on how to use them, then they’re worthless. Now, with some social engineering and a little bit of AI, we could build a virtual coach into the tool to help teams complete the structured activities. The coach could ask the team probing questions to get them thinking. It could be a “scribe” for the team, rendering cards on the board and moving them around as a team’s exploration progresses. It could challenge the team when they are moving too slowly or becoming distracted by a tangent. Overall, the coach should help the team discover the product’s most important behaviors as efficiently and painlessly as possible.

Your Discovery Coach, helping with Example Mapping

Formulation

Formulation is the second phase of the behavior process. It is an activity of defining. In Formulation, teams write carefully-phrased specifications for the behaviors they explored during the Discovery phase.

Plain-language definitions with concrete examples are vital for good formulation. If a team cannot explain how a behavior works in plain language, then how could they ever expect a user to understand it? It’s much easier to explain behaviors with real-world examples.

Scenarios also need structure. Gherkin is the go-to language for BDD because its Given-When-Then steps follow the Arrange-Act-Assert pattern. Given the system is ready, When an actor performs an interaction, Then the system produces a desired outcome. This pattern frames one behavior individually and independently. Each scenario covers one behavior, which makes it easier to write, easier to understand, and easier to automate.

Gherkin syntax itself is fine for Formulation, but behavior tooling could be supercharged with a Formulation Copilot. In the same way that a Discovery Coach could help teams push through structured activities, a Formulation Copilot could provide advice and insights to team members as they are writing their behavior specifications. It should also be able to start by turning the artifacts generated by Discovery such as Example Mapping cards into “starter specs.”

Your Formulation Copilot, guiding your Gherkin.

Automation

Automation is the third phase of the behavior process. It is an activity of verifying. In Automation, teams automate their specs using test frameworks.

They should run their tests in Fast Feedback Loops like Continuous Integration pipelines to get value from their tests for every single code change. Remember, Continuous Integration is the production environment for test automation. If tests are not running in CI, then they are practically useless. Fast feedback enables teams to learn about the quality of the software and make informed business decisions about it.

When tests run continuously, their specs and their results become Living Documentation for the product. The results reveal the actual quality of the system as an active health monitor. The specs pinpoint where quality issues are happening.

Carrying out a cohesive process from inspiration to implementation also makes behaviors traceable. The specs act as a receipt or a proof-of-purchase for what the team “bought” with their planning. They hold the team accountable to delivering the behaviors that were intended.

There is so much that an intelligent Automation Watchdog could do for the Automation phase. If AI delivers on all its promises, a watchdog could just automate all the plain-language specs written during the previous phase. For example with web testing, it could either generate code in, say, Playwright, or it could just go into the web browser and enact the interactions based on its natural language understanding of the steps. Perhaps that’s too ambitious. More realistically, the watchdog could yield better analysis of test results and manage some of the triaging grindwork.

Furthermore, an Automation Watchdog could help us shift test automation away from “failure is bad.” For the longest time, we as testers have automated tests that perform interactions and assert verification. Interactions plus verification. That’s it; that’s testing. So, we hard-code the interactions, and we hard-code the verifications. Unfortunately, one of the most common problems with test automation is that any change in the behaviors under test will inevitably require tests to change as well, or else the tests will break and fail and send red X’s everywhere. Psychologically, it spreads bad vibes to say a test is “failing” or “broken” when the automation technically was successful in detecting a change. Perhaps an Automation Watchdog could become the ultimate change-detector and steer teams towards healthier perspectives about failures whenever they inevitably occur.

Your Automation Watchdog, always running tests and providing feedback.

All Together

Discovery → Formulation → Automation. That is the full BDD process. I just outlined many ways we could reinvigorate these practices with better tooling. I want to top it off with one more idea: What if all this could be done within one app as one seamlessly integrated experience? One of the biggest pain points I’ve felt with BDD is the fact that feature files holding Gherkin scenarios can never truly be read and written by everyone on a team. Teams either stuff them into Jira for the product folks or commit them to Git repositories for developers and testers. There’s no possibility for a single source of truth because the tooling just is not there. It’s maddening. If we truly believe in these behavior-driven principles and practices, then as an industry and a community, we need to commit to them with better tooling.

I believe in these principles. I believe in these practices. I am thoroughly behavior-driven, and I do not believe I could approach software development any other way. It pains me to see how the BDD movement has stumbled when its tenets still have so much value to offer.

Time to Rebrand

Before I conclude, I want to propose one more big idea: I think it is time we rebrand BDD. I think it needs a new name. The current name has way too much baggage. Too many people hate it – or rather, they hate the terrible experiences they had under its impact. If we can be honest, “Behavior-Driven Development” just isn’t a good name anymore. Heck, anything that is “Something-Something-Driven Development” is a pretty lousy name.

I’ve thought to myself, what if we remove the word “driven”? Driving is for cars. We could shorten the name to “Behavior Development”. Still, that sounds clunky, and it doesn’t convey how folks need to focus more on the principles than the practices.

What I genuinely want folks to have is a Behavior Mindset when they approach software development. If folks actually put behaviors first and foremost in their mind, then everything else will fall into place. The specs, the code, and the practices all become artifacts of the process. We already have software development methodologies. Behavior orientation is really more of a mindset that complements existing paradigms.

Behavior Mindset!

So, enough with the three-letter acronyms. Enough with the baggage of the past. Let’s build software right. Let’s focus on the business value that matters.

We do Agile. We do DevOps. I think it’s time we also do Behavior.

Let’s build this future together!

Bulldog-Driven Development

Today is a special day in the Knight family. On November 5, 2021, my wife and I drove down to Waxhaw, North Carolina to adopt the cutest French Bulldog puppy in the state. We named her “Suki” – a Japanese name meaning “beloved.”

Suki was the first dog my wife and I had ever owned. She was only 7.5 weeks old when we picked her up, and she weighed only 3 pounds! I could scoop her up with one hand. When she first arrived home, she was so nervous that she tried to bury her face in her new doggy bed to hide. Now, three years later, Suki is a BEAUTIFUL 20-pound beefcake who has crisscrossed the country with us. We love her dearly. She is a gift from God.

I often joke about #BDD being “Bulldog-Driven” rather than “Behavior-Driven.” Remember, though, we are all driven by something. In my software work, I absolutely take a behavior-driven mindset, but I’m also driven by my Family, my Faith, and my Frenchie. Never lose sight of what drives you.

Photo: Suki with her portrait as “AstroSuki” – my speaker gift from QA or the Highway 2024!

Making Great Waves: 8 Software Testing Convictions

The Great Wave Off Kanagawa.

Katsushika Hokusai, 1830.

It is one of the most recognizable works of art in the world. It is so famous, it has an emoji: 🌊.

The Great Wave Off Kanagawa is a Japanese woodblock print. It is not a painting or a drawing but a print. In Japanese, the term for this type of art is ukiyo-e, which means “pictures of the floating world.” Ukiyo-e prints first appeared around the 1660s and did not decline in popularity until the Meiji Restoration two centuries later. While most artists focused on subjects of people, late masters like Hokusai captured perspectives of landscapes and nature. Here, in The Great Wave, we see a giant wave, full of energy and ferocity, crashing down onto three fast boats attempting to transport live fish to market. Its vibrant blue water and stark white peaks contrast against a yellowish-gray sky. In the distance is Mount Fuji, the highest mountain in Japan, yet it is dwarfed in perspective by the waves. In fact, the water spray from the waves appears to fall over Mount Fuji like snow. If you didn’t look closely, you might presume that Mount Fuji is just the crest of another wave.

The Great Wave is absolutely stunning. It is arguably Hokusai’s finest work. The colors and the lines reflect boldness. The claws of the wave impart vitality. The men on the boat show submission and possibly fear. The spray from the wave reveals delicacy and attention to detail. Personally, I love ukiyo-e prints like this. I travel the world to see them in person. The quality, creativity, and craftsmanship they exhibit inspire me to instill the highest quality possible into my own work.

As software quality professionals, there are several lessons we can learn from ukiyo-e masters like Hokusai. Testing is an art as much as it is engineering. We can take cues from these prolific artists in how we approach quality in our own work. In this article, I will share how we can make our own “Great Waves” using 8 software testing convictions inspired by ukiyo-e prints like The Great Wave. Let’s begin!

Conviction #1: Focus on behavior

Although we hold these Japanese woodblock prints today in high regard, they were seen as anything but fancy centuries ago in Japan. Ukiyo-e was “low” art for the common people, whereas paintings on silk scrolls were considered “high” art for the high classes.

Folks would buy these prints from local merchants for slightly more than the cost of a bowl of noodles – about $5 to $10 US dollars today – and they would use these prints to decorate their homes. By comparison, a print of The Great Wave sold at auction for $1.11 million in September 2020.

These prints weren’t very large, either. The Great Wave measures 10 inches tall by 15 inches wide, and most prints were of similar size. That made them convenient to buy at the market, carry them home, and display on the wall. To understand how the Japanese people treated these prints in their day, think about the decorations in your homes that you bought at stores like Home Goods and Target. You probably have some screen prints or posters on your walls.

Since the target consumer for ukiyo-e prints were ordinary people with working-class budgets, they needed to be affordable, popular, and recognizable. When Hokusai published The Great Wave, it wasn’t a standalone piece. It was the first print in a series named Thirty-six Views of Mount Fuji. Below are three other prints from that series. The central feature in each print is Mount Fuji, which would be instantly recognizable to any Japanese person. The various views would also be relatable.

Fine Wind, Clear Morning
Fine Wind, Clear Morning shows nice weather against the slopes of the mountain with a powerful contrast of colors.
Thunderstorm Beneath the Summit depicts Mount Fuji from a nearly identical profile, but with lightning striking the lower slopes of the mountain amidst a far darker palate.
Kajikazawa in Kai Province depicts two fisherman with Mount Fuji in the background.

The features of these prints made them valuable. Anyone could find a favorite print or two out of a series of 36. They made art accessible. They were inexpensive yet impressive. They were artsy yet accessible. Artists like Hokusai knew what people wanted, and they delivered the goods.

This isn’t any different from software development. Features add value for the users. For example, if you’re developing a banking app, folks better be able to log in securely and view their latest transactions. If those features are broken or unintuitive, folks might as well move their accounts to other banks! We, as the developers and testers, are like the ukiyo-e artists: we need to know what our customers need. We need to make products that they not only want, but they also enjoy.

Features add value. However, I would use a better word to describe this aspect of a product: behavior. Behavior is the way one acts or conducts oneself. In software, we define behaviors in terms of inputs and responses. For example, login is a behavior: you enter valid credentials, and you expect to gain access. You gave inputs, the app did something, and you got the result.

My conviction on software testing AND development is that if you focus on good software behaviors, then everything else falls into place. When you plan development work, you prioritize the most important behaviors. When you test the features, you cover the most important behaviors. When users get your new product, they gain value from those features, and hopefully you make that money, just like Hokusai did.

This is why I strongly believe in the value of Behavior-Driven Development, or BDD for short. As a set of pragmatic practices, BDD helps you and your team stay focused on the things that matter. BDD involves activities like Three Amigos collaboration, Example Mapping, and writing Gherkin. When you focus on behavior – not on shiny new tech, or story points, or some other distractions – you win big.

Conviction #2: Prioritize on risk

Ukiyo-e artists depicted more than just views of Mount Fuji. In fact, landscape scenes became popular only during the late period of woodblock printing – the 1830s to the 1860s. Before then, artists focused primarily on people: geisha, courtesans, sumo wrestlers, kabuki actors, and legendary figures. These were all characters from the “floating world,” a world of pleasure and hedonism apart from the dreary everyday life of feudal Japan.

Here is a renowned print of a kabuki actor by Sharaku, printed in 1794:

Kabuki Actor Ōtani Oniji III as Yakko Edobei in the Play The Colored Reins of a Loving Wife
Tōshūsai Sharaku, 1794

Sharaku was active only for one year, but he produced some of the most expressive portraits seen during ukiyo-e’s peak period. A yakko was a samurai’s henchman. In this portrait, we see Edobei ready for dirty deeds, with a stark grimace on his face and hands pulsing with anger.

Why would artists like Sharaku print faces like these? Because they would sell. Remember, ukiyo-e was not high-class art. It was a business. Artists would make a series of prints and sell them on the streets of Edo (now Tokyo). They needed to make prints that people wanted to buy. If they picked lousy or boring subjects, their prints wouldn’t sell. No soba noodles for them! So, what subjects did they choose? Celebrities. Actors. “Female beauties.” And some content that was not safe for work.

Artists prioritized their work based on business risk. They chose subjects that would be easy to sell. They pursued value. As testers, we should also prioritize test coverage based on risk.

I know there’s a popular slogan saying, “Test all the things!”, but that’s just impossible. It’s like saying, “Print all the pictures!” Modern apps are too complex to attempt any sort of “complete” or “100%” coverage. Instead, we should focus our testing efforts on the most important behaviors, the ones that would cause the most problems if they broke. Testing is ultimately a risk-mitigating activity. We do testing to de-risk problems that enter during development.

So, what does a risk-based testing strategy look like? Well, start by covering the most valuable behaviors. You can call them the MVBs. These are behaviors that are core to your app. If they break, then it’s game over. No soba noodles. For example, if you can’t log in, you’re done-zo. The MVBs should be tested before every release. They are non-negotiable test coverage. If your team doesn’t have enough resources to run these tests, then get more resources.

In addition to the MVBs, cover areas that were changed since the previous release. For example, if your banking app just added mobile deposits, then you should test mobile deposits. Things break where developers make changes. Also, look at testing different layers and aspects of the product. Not every test should be a web UI test. Add unit tests to pinpoint failures in the code. Add API tests to catch problems at the service layer. Consider aspects like security, accessibility, and visuals.

When planning these tests, try to keep them fast and atomic, covering individual behaviors instead of long workflows. Shorter tests are more reliable and give space for more coverage. And if you do have the resources for more coverage beyond the MVBs and areas of change, expand your coverage as resources permit. Keep adding coverage for the next most valuable behaviors until you either run out of time or the coverage isn’t worth the time.

Overall, ask yourself this when weighing risks: How painful would it be if a particular behavior failed? Would it ruin a user’s experience, or would they barely notice?

Conviction #3: Automate

The copy of The Great Wave shown at the top of this article is located at the Metropolitan Museum of Art in New York City. However, that’s not the only version. When ukiyo-e artists produced their prints, they kept printing copies until the woodblocks wore out! Remember, these weren’t precious paintings for the rich, they were posters for the commoners. One set of woodblocks could print thousands of impressions of popular designs for the masses. It’s estimated that there were five to eight thousand original impressions of The Great Wave, but nobody knows for sure. To this day, only a few hundred have survived. And much to my own frustration, museums that have copies do not put them on public display because the pieces are so fragile.

Here are different copies of The Great Wave from different museums:

Print production had to be efficient and smooth. Remember, this was a business. Publishers would make more money if they could print more impressions from the same set of woodblocks. They’d gain more renown if their prints maintained high quality throughout the lifetime of the blocks. And the faster they could get their prints to market, the sooner they could get paid and enjoy all the soba noodles.

What can we learn from this? Automate! That’s our third conviction.

What can we learn from this? Automate! Automation is a force multiplier. If Hokusai spent all his time manually laboring over one copy of The Great Wave, then we probably wouldn’t be talking about it today. But because woodblock printing was a whole process, he produced thousands of copies for everyone to enjoy. I wouldn’t call the woodblock printing process fully “automated” because it had several tedious steps with manual labor, but in Edo period Japan, it was about as automated as you could get.

Compare this to testing. If we run a test manually, we cover the target behavior one time. That’s it: lots of labor for one instance. However, if we automate that test, we can run it thousands of times. It can deliver value again and again. That’s the difference between a painting and a print.

So, how should we go about test automation? First, you should define your goals. What do you hope to achieve with automation? Do you want to speed up your testing cycles? Are you looking to widen your test coverage? Perhaps you want to empower Continuous Delivery through Continuous Testing? Carefully defining your goals from the start will help you make good decisions in your test automation strategy.

When you start automating tests, treat it like full software development. You aren’t just writing a bunch of scripts, you are developing a software system. Follow recommended practices. Use design patterns. Do code reviews. Fix bugs quickly. These principles apply whether you are using coded or codeless tools.

Another trap to avoid is delaying test automation. So many times, I’ve heard teams struggle to automate their tests because they schedule automation work as their lowest priority. They wish they could develop automation, but they just never have the time. Instead, they grind through testing their MVBs manually just to get the job done. My advice is flip that attitude right-side up. Automate first, not last. Instead of planning a few tests to automate if there’s time, plan to automate first and cover anything that couldn’t be automated with manual testing.

Furthermore, integrate automated tests into the team’s Continuous Integration system as soon as possible. Automated tests that aren’t running are dead to me. Get them running automatically in CI so they can deliver value. Running them nightly or even weekly can be a good start, as long as they run on a continuous cadence.

Finally, learn good practices. Test automation technologies are ever-evolving. It seems like new tools and frameworks hit the market all the time. If you’re new to automation or you want to catch up with the latest trends, then take time to learn. One of the best resources I can recommend is Test Automation University. TAU has about 70 courses on everything you can imagine, taught by the best instructors in the world, and it’s 100% FREE!

Now, you might be thinking, “Andy, come on, you know everything can’t be automated!” And that’s true. There are times when human intervention adds value. We see this in ukiyo-e prints, too. Here is Plum Garden at Kameido by Utagawa Hiroshige, Hokusai’s main rival. Notice the gradient colors of green and red in the background:

Plum Garden in Kameido
Plum Garden at Kameido
Utagawa Hiroshige, 1857

Printers added these gradients using a technique called bokashi, in which they would apply layers of ink to the woodblocks by hand. Sometimes, they would even paint layers directly on the prints. In these cases, the “automation” of the printing process was insufficient, and humans needed to manually intervene.

It’s always good to have humans test-drive software. Automation is great for functional verification, but it can’t validate user experience. Exploratory testing is an awesome complement to automated testing because it mitigates different risks.

Nevertheless, automation is able to do things it could never do before. As I said before, I work at Applitools, where we specialize in automated visual testing. Take a look at these two prints of Matsumoto Hoji’s Frog from Meika Gafu. Notice anything different between the two?

Two different versions of Matsumoto Hoji’s Frog.

If we use Visual AI to compare these two prints, it will quickly identify the main difference:

Applitools Visual AI identifying visual differences (highlighted in magenta) between two prints.

The signature block is in a different location! Small differences like small pixel offsets are ignored, while major differences are highlighted. If you apply this style of visual testing to your web and mobile apps, you could catch a ton of visual bugs before they cause problems for your users. Modern test automation can do some really cool tricks!

Conviction #4: Shift left and right

Mokuhanga, or woodblock printing, was a huge process with multiple steps. Artists like Hokusai and Hiroshige did not print their artwork themselves. In fact, printing required multiple roles to be successful: a publisher, an artist, a carver, and a printer.

  1. The publisher essentially ran the process. They commissioned, financed, and distributed prints. They would even collaborate with artists on print design to keep them up with the latest trends.
  2. The artist designed the patterns for the prints. They would sketch the patterns on washi paper and give instructions to the carver and printer on how to properly produce the prints.
  3. The carver would chisel the artist’s pattern into a set of wooden printing blocks. Each layer of ink would have its own block. Carvers typically used a smooth, hard wood like cherry.
  4. The printer used the artist’s patterns and carver’s woodblocks to actually make the prints. They would coat the blocks in appropriately-colored water-based inks and then press paper onto the blocks.

Quality had to be considered at every step in the process, not just at the end. If the artist was not clear about colors, then the printer might make a mistake. If the carver cut a groove too deep, then ink might not adhere to the paper as intended. If the printer misaligned a page during printing, then they’d need to throw it away – wasting time, supplies, and woodblock life – or risk tarnishing everyone’s reputation with a misprint. Hokusai was noted for his stringent quality standards for carvers and printers.

The words of W. Edwards Deming ring true:

Inspection does not improve the quality, nor guarantee quality. Inspection is too late. The quality, good or bad, is already in the product. As Harold F. Dodge said, “You cannot inspect quality into a product.”

W. Edwards deming

This is just like software development. We can substitute the word “testing” for “inspection” in Deming’s quote. Testers don’t exclusively “own” quality. Every role – business, development, and testing – has a responsibility for high-caliber work. If a product owner doesn’t understand what the customer needs, or a developer skips code reviews, or if a tester neglects an important feature, then software quality will suffer.

How do we engage the whole team in quality work? Shift left and right.

Most testers are probably familiar with the term shift left. It means, start doing testing work earlier in the development process. Don’t wait until developers are “done” and throw their code “over the fence” to be tested. Run tests continuously during development. Automate tests in-sprint. Adopt test-driven and behavior-driven practices. Require unit tests. Add test implementation to the “Definition of Done.”

But what about shift right? This is a newer phase, but not necessarily a newer practice. Shift right means, continue to monitor software quality during and after releases. Build observability into apps. Monitor apps for bugs, failures, and poor performance. Do canary deployments to see how systems respond to updates. Perform chaos testing to see how resilient environments are to outages. Issue different UIs to user groups as part of A/B testing to find out what’s most effective. And feed everything you learn back into development a la “shift left.”

The DevOps Infinity Loop
(Source: https://www.atlassian.com/devops)

The famous DevOps infinity loop shows how “shift left” and “shift right” are really all part of the same flow. If you start in the middle where the paths cross, you can see arrows pointing leftward for feedback, planning, and building. Then, they push rightward with continuous integration, deployment, monitoring, and operations. We can (and should) take all the quality measures we said before as we spin through this loop perpetually. When we plan, we should build quality in with good design and feedback from the field. When we develop, we should do testing together with coding. As we deploy, automated safety checks should give thumbs-up or thumbs-down. Post-deployment, we continue to watch, learn, and adjust.

Conviction #5: Give fast feedback

The acronym CI/CD is ubiquitous in our industry, but I feel like it’s missing something important: “CT”, or Continuous Testing. CI and CD are great for pushing code fast, but without testing, they could be pushing garbage. Testing does not improve quality directly, but continuous revelation of quality helps teams find and resolve issues fast. It demands response. Continuous Testing keeps the DevOps infinity loop safe.

Fast feedback is critical. The sooner and faster teams discover problems, the less pain those problems will cause. Think about it: if a developer is notified that their code change caused a failure within a minute, they can immediately flip back to their code, which is probably still open in an editor. If they find out within an hour, they’ll still have their code fresh in their mind. Within a day, it’ll still be familiar. A week or more later? Fuggedaboutit! Heaven forbid the problem goes undetected until a customer hits it.

Continuous testing enables fast feedback. Automation enables continuous testing. Test automation that isn’t running continuously is worthless because it provides no feedback.

Japanese woodblock printers also relied on fast feedback. If they noticed anything wrong with the prints as they pressed them, they could scrap the misprint and move on. However, since they were meticulous about quality, misprints were rare. Nevertheless, each print was unique because each impression was done manually. The amount, placement, and hue of ink could vary slightly from print to print. Over time, the woodblocks themselves wore down, too.

Here, you can see differences in the title cartouche between different prints of The Great Wave:

Differences in the title cartouche between two prints of The Great Wave.
(Source: https://blog.britishmuseum.org/the-great-wave-spot-the-difference/)

On the left, the outline around the title is solid, whereas on the right, the outline has breaks. This is because the keyblock had very fine ridges for printing outlines, which suffered the most from wear and tear during repeated impressions. Furthermore, if you look very closely, you can see that the Japanese characters appear bolder on the right than the left. The printer must have used more ink or pressed the title harder for the impression on the right.

Printers would need to spot these issues quickly so they could either correct their action for future prints or warn the publisher that the woodblocks were wearing down. If the print was popular, the publisher could commission a carver to carve new woodblocks to keep production going.

Conviction #6: Go lean

As I’ve said many times now, woodblock printing was a business. Ukiyo-e was commercial art, and competition was fierce. By the 1840s, production peaked with about 250 different publishers. Artists like Hokusai and Hiroshige were rivals. While today we recognize famous prints like The Great Wave, countless other prints were also made.

Publishers competed in a rat race for the best talent and the best prints. They had to be savvy. They had to build good reputations. They needed to respond to market demands for subject material. For example, Kitagawa Utamaro was famous for prints of “female beauties.”

Two Beauties with Bamboo
Kitagawa Utamaro, 1795

Ukiyo-e artists also took inspiration from each other. If one artist made a popular design, then other artists would copy their style. Here is a print from Hiroshige’s series, Thirty-Six Views of Mount Fuji. That’s right, Hokusai’s biggest rival made his own series of 36 prints about Mount Fuji, and he also made his own version of The Great Wave. If you can’t beat ‘em, join ‘em!

The Sea off Satta in Suruga Province
The Sea off Satta in Suruga Province
Utagawa Hiroshige, 1858

Publishers also had to innovate. Oftentimes, after a print had been in production for a while, they would instruct the printer to change the color scheme. Here are two versions of Hokusai’s Kajikazawa in Kai Province, from Thirty-six Views of Mount Fuji:

The print on the left is an early impression. The only colors used were shades of blue. This was Hokusai’s original artistic intention. However, later prints, like the one on the right, added different colors to the palette. The fishermen now wear red coats. The land has a bokashi green-yellow gradient. The sky incorporates orange tones to contrast the blue. Publishers changed up the colors to squeeze more money out of existing designs without needing to pay artists for new work or carvers for new woodblocks.

However, sometimes when doing this, artistic quality was lost. Compare the fine detail in the land between these two prints. In the early impression, you can see dark blue shading used to pronounce the shadows on the side of the rocks, giving them height and depth, and making the fisherman appear high above the water. However, in the later impression, the green strip of land has almost no shading, making it appear flat and less prominent.

Ukiyo-e publishers would have completely agreed with today’s lean business model. Seek first and foremost to deliver value to your customers. Learn what they want. Try some designs, and if they fail, pivot to something else. When you find what works, get a full end-to-end process in place, and then continuously improve as you go. Respond quickly to changes.

Going lean is very important for software testing, too. Testing is engineering, and it has serious business value. At the same time, testing activities never seem to have as many resources as they should. Testers must be scrappy to deliver valuable quality feedback using the resources they have.

When I think about software testing going lean, I’m not implying that testers should skip tests or skimp on coverage. Rather, I’m saying that world-class systems and processes cannot be built overnight. The most important thing a team can do is build basic end-to-end feedback loops from the start, especially for test automation.

The Quality Feedback Loop

So many times, I’ve seen teams skew their test automation strategy entirely towards implementation. They spend weeks and weeks developing suites of automated tests before they set up any form of Continuous Testing. Instead of triggering tests as part of Continuous Integration, folks must manually push buttons or run commands to make them start. Other folks on the team see results sporadically, if ever. When testers open bug reports, developers might feel surprised.

I recommend teams set up Continuous Testing with feedback loops from the start. As soon as you automate your first test, move onto running it from CI and sending you notifications for results before automating your second test. Close the feedback loop. Start delivering results immediately. As you find hotspots, add more coverage. Talk with developers about the kinds of results they find most valuable. Then, grow your suite once you demonstrate its value. Increase the throughput. Turn those sidewalks into highways. Continue to iteratively improve upon the system as you go. Don’t waste time on tests that don’t matter or dashboards that nobody reads. Going lean means allocating your resources to the most valuable activities. What you’ll find is that success will snowball!

Conviction #7: Open up

Once you have a good thing going, whether it’s woodblock printing or software testing, how can you take it to the next level? Open up! Innovation stalls when you end up staring at your own belly button for too long. Outside influences inspire new creativity.

Ukiyo-e prints had a profound impact on Western art. After Japan opened up to the rest of the world in the mid-1800s, Europeans became fascinated by Japanese art, and European artists began incorporating Japanese styles and subjects into their work. This phenomenon became known as Japonisme. Here, Claude Monet, famous for his impressionist paintings, painted a picture of his wife wearing a kimono with fans adorning the wall behind her:

La Japonaise
Claude Monet, 1876

Vincent van Gogh in particular loved Japanese woodblock prints. He painted his own versions of different prints. Here, we see Hiroshige’s Plum Garden at Kameido side-by-side with Van Gogh’s Flowering Plum Orchard (after Hiroshige):

Van Gogh was drawn to the bold lines and vibrant colors of ukiyo-e prints. There is even speculation that The Great Wave inspired the design of The Starry Night, arguably Van Gogh’s most famous painting:

Notice how the shapes of the waves mirror the shapes of the swirls in the sky. Notice also how deep shades of blue contrast yellows in each. Ukiyo-e prints served as great inspiration for what became known as Modern art in the West.

Influence was also bidirectional. Not only did Japan influence the West, but the West influenced Japan! One thing common to all of the prints in Thirty-six Views of Mount Fuji is the extensive use of blue ink. Prussian blue pigment had recently come to Japan from Europe, and Hokusai’s publisher wanted to make extensive use of the new color to make the prints stand out. Indeed, they did. To this day, Hokusai is renowned for popularizing the deep shades of Prussian blue in ukiyo-e prints.

It’s important in any line of work to be open to new ideas. If Hokusai had not been willing to experiment with new pigments, then we wouldn’t have pieces like The Great Wave.

That’s why I’m a huge proponent of Open Testing. What if we open our tests like we open our source? There are so many great advantages to open source software: helping folks learn, helping folks develop better software, and helping folks become better maintainers. If we become more open in our testing, we can improve the quality of our testing work, and thus also the quality of the software products we are building. Open testing involves many things: building open source test frameworks, getting developers involved in testing, and even publicly sharing test cases and results.

Conviction #8: Show empathy

In this article, we’ve seen lots of great artwork, and we’ve learned lots of valuable lessons from it. I think ukiyo-e prints remain popular today because their subject matter focuses on the beauty of the world. Artists strived to make pieces of the “floating world” tangible for the common people.

Ukiyo-e prints revealed the supple humanity of the Japanese people, like in this print by Utagawa Kunisada:

Twilight Snowfall at Ueno
Utagawa Kunisada, 1850

They revealed the serene beauty of nature in harmony with civilization, like in these prints from Hiroshige’s One Hundred Famous Views of Edo:

Prints from One Hundred Famous Views of Edo
Utagawa Hiroshige, 1856-1858

Ukiyo-e prints also revealed ordinary people living out their lives, like this print from Hokusai’s Thirty-six Views of Mount Fuji:

Fuji View Field in Owari Province
Katsushika Hokusai, 1830

Art is compelling. And software, like art, is meant for people. Show empathy. Care about your customers. Remember, as a tester, you are advocating for your users. Try to help solve their problems. Do things that matter for them. Build things that actually bring them value. Be thoughtful, mindful, and humble. Don’t be a jerk.

The Golden Conviction

These eight convictions are things I’ve learned the hard way throughout my career:

  1. Focus on behavior
  2. Prioritize on risk
  3. Automate
  4. Shift left and right
  5. Give fast feedback
  6. Go lean
  7. Open up
  8. Show empathy

I live and breathe these convictions every day. Whether you are making woodblock prints or running test cases, these principles can help you do your best work.

If I could sum up these eight convictions in one line, it would be this: Be excellent in all things. If you test software, then you are both an artist and an engineer. You have a craft. Do it with excellence.

Open Testing: Opening tests like opening source

This article is based on a talk I gave on Open Testing at a few conferences: STARWEST 2021, TAU: The Homecoming, TSQA 2022, QA or the Highway 2022, and Conf42: SRE 2022.

I’m super excited to introduce a somewhat new idea to you and to our industry: Open Testing: What if we open our tests like we open our source? I’m not merely talking about creating open source test frameworks. I’m talking about opening the tests themselves. What if it became normal to share test cases and automated procedures? What if it became normal for companies to publicly share their test results? And what are the degrees of openness in testing for which we should strive as an industry?

I think that we – whether we are testers, developers, managers, or any other role in software – can greatly improve the quality of our work if we adopt principles of openness into our testing practices. To help me explain, I’d like to share how I learned about the main benefits of open source software, and then we can cross those benefits over into testing work.

So, let’s go way back in time to when I first encountered open source software.

My first encounter with open source code

I first started programming when I was in high school. At 13 years old, I was an incoming freshman at Parkville High School in their magnet school for math, science, and computer science in good old Baltimore, Maryland. (Fun fact: Parkville’s mascots were the Knights, which is my last name!) All students in the magnet program needed to have a TI-83 Plus graphing calculator. Now, mind you, this was back in the day before smart phones existed. Flip phones were the cool trend! The TI-83 Plus was cutting-edge handheld technology at that time. It was so advanced that when I first got it, it took me 5 minutes to figure out how to turn it off!

The TI-83 Plus

I quickly learned that the TI-83 Plus was just a mini-computer in disguise. Did you know that this thing has a full programming language built into it? TI-BASIC! Within the first two weeks of my freshman Intro to Computer Science class, our teacher taught us how to program math formulas: Slope. Circle circumference and area. The quadratic formula. You name it, I programmed it, even if it wasn’t a homework assignment. It felt awesome! It was more fun to me than playing video games, and believe me, I was a huge Nintendo fan.

There were two extra features of the TI-83 Plus that made it ideal for programming. First, it had a link cable for sharing programs. Two people could connect their calculators and copy programs from one to the other. Needless to say, with all my formulas, I became quite popular around test time. Second, anyone could open any program file on the calculator and read its code. The TI-BASIC source code could not be hidden. By design, it was “open source.”

This is how I learned my very first lesson about open source software: Open source helps me learn. Whenever I would copy programs from others, including games, I would open the program and read the code to see how it worked. Sometimes, I would make changes to improve it. More importantly, though, many times, I would learn something new that would help me write better programs. This is how I taught myself to code. All on this tiny screen. All through ripping open other people’s code and learning it. All because the code was open to me.

From the moment I wrote my first calculator program, I knew I wanted to become a software engineer. I had that spark.

My first open source library

Let’s fast-forward to college. I entered the Computer Science program at Rochester Institute of Technology – Go Tigers! By my freshman year in college, I had learned Java, C++, a little Python, and, of all things, COBOL. All the code in all my projects until that point had been written entirely by me. Sometimes, I would look at examples in books as a guide, but I’d never use other people’s code. In fact, if a professor caught you using copied code, then you’d fail that assignment and risk being expelled from the school.

Then, in my first software engineering course, we learned how to write unit tests using a library called JUnit. We downloaded JUnit from somewhere online – this was before Maven became big – and hooked it into our Java path. Then, we started writing test classes with test case methods, and somehow, it all ran magically in ways I couldn’t figure out at the time.

I was astounded that I could use software that I didn’t write myself in a project. Permission from a professor was one thing, but the fact that someone out there in the world was giving away good code for free just blew my mind. I saw the value in unit tests, and I immediately saw the value in a simple, free test framework like JUnit.

That’s when I learned my second lesson about open source software: Open source helps me become a better developer. I could have written my own test framework, but that would have taken me a lot of time. JUnit was ready to go and free to use. Plus, since several individuals had already spent years developing JUnit, it would have more features and fewer bugs than anything I could develop on my own for a college project. Using a package like JUnit helped me write and run my unit tests without needing to become an expert in test automation frameworks. I could build cool things without needing to build every single component.

That revelation felt empowering. Within a few years of taking that software engineering course, sites for hosting open source projects like GitHub became huge. Programming language package indexes like Maven, NuGet, PyPI, and NPM became development mainstays. The running joke within Python became that you could import anything! This was way better than swapping calculator games with link cables.

My first chance to give back

When I graduated college, I was zealous for open source software. I believed in it. I was an ardent supporter. But, I was mostly a consumer. As a Software Engineer in Test, I used many major test tools and frameworks: JUnit, TestNG, Cucumber, NUnit, xUnit.net, SpecFlow, pytest, Jasmine, Mocha, Selenium WebDriver, RestSharp, Rest Assured – the list goes on and on. As a Python developer, I used many modules and frameworks in the Python ecosystem like Django, Flask, and requests.

Then, I got the chance to give back: I launched an open source project called Boa Constrictor. Boa Constrictor is a .NET implementation of the Screenplay Pattern. It helps you make better interactions for better automation. Out of the box, it provides Web UI interactions using Selenium WebDriver and Rest API interactions using RestSharp, but you can use it to implement any interactions you want.

My company and I released Boa Constrictor publicly in October 2020. You can check out the boa-constrictor repository on GitHub. Originally, my team and I at Q2 developed all the code. We released it as an open source project hoping that it could help others in the industry. But then, something cool happened: folks in the industry helped us! We started receiving pull requests for new features. In fact, we even started using some new interactions developed by community members internally in our company’s test automation project. We also proudly participated in Hacktoberfest in 2020 and 2021.

Boa Constrictor: The .NET Screenplay Pattern

That’s when I learned my third lesson about open source software: Open source helps me become a better maintainer. Large projects need all the help they can get. Even a team of core maintainers can’t always handle all the work. However, when a project is open source, anyone who uses it can help out. Each little contribution can add value for the whole user base. Maintaining software then becomes easier, and the project can become more impactful.

Struggling with poor quality

As a Software Engineer in Test, I found myself caught between two worlds. In one world, I was a developer at heart who loved to write code to solve problems. In the other world, I was a software quality professional who tested software and advocated for improvements. These worlds came together primarily through test automation and continuous integration. Now that I’m a developer advocate, I still occupy this intersectionality with a greater responsibility for helping others.

However, throughout my entire career, I keep hitting one major problem: Software quality has a problem with quality. Let that sink in: software quality has a big problem with quality. I’ve worked on teams with titles ranging from “Software Quality Assurance” to “Test Engineering & Architecture,” and even an “Automation Center of Excellence.” Despite the titular focus on quality, every team has suffered from aspects of poor quality in workmanship.

Here are a few poignant examples:

  • Manual test case repositories are full of tests with redundant steps.
  • Test automation projects are riddled with duplicate code.
  • Setup and cleanup steps are copy-pasted endlessly, whether needed or not.
  • Automation code uses poor practices, such as global variables instead of dependency injection.
  • A 90% success rate is treated as a “good” day with “limited” flakiness.
  • Many tests cover silly, pointless, or unimportant things instead of valuable, meaningful behaviors.

How can we call ourselves quality professionals when our own work suffers from poor quality? Why are these kinds of problems so pervasive? I think they build up over time. Copy-pasting one procedure feels innocuous. One rogue variable won’t be noticed. One flaky test is no big deal. Once this starts happening, teams insularly keep repeating these practices until they make a mess. I don’t think giving teams more time to work on these problems will solve them, either, because more time does not interrupt inertia – it merely prolongs it.

The developer in me desperately wants to solve these problems. But how? I can do it in my own projects, but because my tests are sealed behind company doors, I can’t use it to show others how to do it at scale. Many of the articles and courses we have on how-to-do-X are full of toy examples, too.

Changing our quality culture

So, how do we get teams to break bad habits? I think our industry needs a culture change. If we could be more open with testing like we are open with source code, then perhaps we could bring many of the benefits we see from open source into testing:

  1. Helping people learn testing
  2. Helping people become better testers
  3. Helping people become better test maintainers

If we cultivate a culture of openness, then we could lead better practices by example. Furthermore, if we become transparent about our quality, it could bolster our users’ confidence in our products while simultaneously keeping us motivated to keep quality high.

There are multiple ways to start pursuing this idea of open testing. Not every possibility may be applicable for every circumstance, but my goal is to get y’all thinking about it. Hopefully, these ideas can inspire better practices for better quality.

Openness through internal collaboration

For a starting point of reference, let’s consider the least open context for testing. Imagine a team where testing work is entirely siloed by role. In this type of team, there is a harsh line between developers and testers. Only the testers ever see test cases, access test repositories, or touch automation. Test cases and test plans are essentially “closed” to non-testers due to access, readability, or even apathy. The only output from testers are failure percentages and bug reports. Results are based more on trust than on evidence.

This kind of team sounds pretty bleak. I hope this isn’t the kind of team you’re on, but maybe it is. Let’s see how openness can make things better.

The first step towards open testing is internal openness. Let’s break down some siloes. Testers don’t exclusively own quality. Not everyone needs to be a tester by title, but everyone on the team should become quality-conscious. In fact, any software development team has three main roles: Business, Development, and Testing. Business looks for what problems to solve, Development addresses how to implement solutions, and Testing provides feedback on the solution. These three roles together are known as “The Three Amigos” or “The Three Hats.”

Each role offers a valuable perspective with unique expertise. When the Three Amigos stay apart, features under development don’t have the benefit of multiple perspectives. They might have serious design flaws, they might be unreasonable to implement, or they might be difficult to test. Misunderstandings could also cause developers to build the wrong things or testers to write useless tests. However, when the Three Amigos get together, they can jointly contribute to the design of product features. Everyone can get on the same page. The team can build quality into the product from the start. They could do activities like Question Storming and Example Mapping to help them define behaviors.

The Three Amigos

As part of this collaboration, not everyone may end up writing tests, but everyone will be thinking about quality. Testing then becomes easier because expected behaviors are well-defined and well-understood. Testers get deeper insight into what is important to cover. When testers share results and open bugs, other team members are more receptive because the feedback is more meaningful and more valuable.

We practiced Three Amigos collaboration at my previous company, Q2. My friend Steve was a developer who saw the value in Example Mapping. Many times, he’d pick up poorly-defined user stories with conflicting information or missing acceptance criteria. Sometimes, he’d burn a whole sprint just trying to figure things out! Once he learned about Example Mapping, he started setting up half-hour sessions with the other two Amigos (one of whom was me) to better understand user stories from the start. He got into it. Thanks to proactive collaboration, he could develop the stories more smoothly. One time, I remember we stopped working on a story because we couldn’t justify its business value, which saved Steve two weeks of pointless work. The story didn’t end there: Steve became a Software Engineer in Test! He shifted left so hard that he shifted into a whole new role.

Openness through living specs

Another step towards open testing is living documentation through specification by example. Collaboration like we saw with the Three Amigos is great, but the value it provides can be fleeting if it is not written down. Teams need artifacts to record designs, examples, and eventually test cases.

One reason why I love Example Mapping is because it facilitates a team to spell out stories, rules, examples, and questions onto color-coded cards that they can keep for future refinement.

  1. Stories become work items.
  2. Rules become acceptance criteria.
  3. Examples become test cases.
  4. Questions become spikes or future stories.

During Example Mapping, folks typically write cards quickly. An example card describes a behavior to test, but it might not carefully design the scenario. It needs further refinement. Defining behaviors using a clear, concise format like Given-When-Then makes behaviors easy to understand and easy to test.

For example, let’s say we wanted to test a web search engine. The example could be to search for a phrase like”panda”. We could write this example as the following scenario:

  1. Given the search engine page is displayed
  2. When the user searches for the phrase “panda”
  3. Then the results page shows a list of links for “panda”

This special Given-When-Then format is known as the Gherkin language. Gherkin comes from Behavior-Driven Development tools like Cucumber, but it can be helpful for any type of testing. Gherkin defines testable behaviors in a concise way that follows the Arrange-Act-Assert pattern. You set things up, you interact with the feature, and you verify the outcomes.

Furthermore, Gherkin encourages Specification by Example. This scenario provides clear instructions on how to perform a search. It has real data, which is the search phrase “panda,” and clear results. Using real-world examples in specifications like this helps all Three Amigos understand the precise behavior.

Turning Example Mapping cards into Gherkin behavior specs

Behavior specifications are multifaceted artifacts:

  1. They are requirements that define how a feature should behave.
  2. They are acceptance criteria that must be met for a deliverable to be complete.
  3. They are test cases with clear instructions.
  4. They could become automated scripts with the right kind of test framework.
  5. They are living documentation for the product.

Living documentation is open and powerful. Anyone on the team or outside the team can read it to learn about the product. Refining ideas into example cards into behavior specs becomes a pipeline that delivers living doc as a byproduct of the software development lifecycle.

SpecFlow is one of the best frameworks that supports this type of openness with Specification by Example and Living Documentation. SpecFlow is a free and open-source test automation framework for .NET. In SpecFlow, you write your test cases as Gherkin scenarios, and you automate each Given-When-Then step using C# methods.

One of SpecFlow’s niftiest features, however, is SpecFlow+ LivingDoc. Most test frameworks focus exclusively on automation code. When a test is automated, then only a programmer can read it and understand it. Gherkin makes this easier because steps are written in plain language, but Gherkin scenarios are nevertheless stored in a code repository that’s inaccessible to many team members. SpecFlow+ LivingDoc breaks that pattern. It turns Gherkin scenarios into a searchable doc site accessible to all Three Amigos. It makes test cases and test automation much more open. LivingDoc also provides test results for each scenario. Green check marks indicate passing tests, while red X’s indicate failures.

Historically, testers use reports like this to provide feedback in-house to their managers and developers. Results indicate what works and what needs to be fixed. However, test results can be useful to more people than just internal team members. What if test results were shared with users and customers? I’m going to repeat that statement, because it might seem shocking: What if users and customers could see test results? 

Think about it. Open test results have very positive effects. Transparency with users builds trust. If users can see that things are tested and working, then they will gain confidence in the quality of the product. If they could peer into the living documentation, then they could learn how to use the product even better. On the flip side, transparency holds development teams accountable to keeping quality high, both in the product and in the testing. Open test results offer these benefits only if the results can be trusted. If tests are useless or failures are rampant, then public test results could actually hurt the ones developing the product.

A SpecFlow+ LivingDoc report

This type of radical transparency would require an enormous culture shift. It may not be appropriate for every company to create public dashboards with their test results, but it could be a strategic differentiator when used wisely. For example, when I worked at Q2, we shared LivingDoc reports with specific PrecisionLender customers after every two-week release. It built trust. Plus, since the LivingDoc report includes only high-level behavior specs with simple results, even a vice president could read it! We could share tests without sharing automation code. That was powerful.

Openness through open source

Let’s keep extending open testing outward. In addition to sharing test results and living documentation, folks can also share tools, frameworks, and other parts of their tests. This is where open testing truly is open source.

We already covered a bunch of open source projects for test automation. As an industry, we are truly blessed with so many incredible projects. Every single one of them represents a team of testers who not only solved a problem but decided to share their solution with the world. Each solution is abstract enough to apply to many circumstances but concrete enough to provide a helpful implementation. Collectively, the projects on this page have probably been downloaded more than a billion times, and that’s no joke. And if you want, you could read the open source code for any of them.

Popular open source test automation projects

Cool new projects appear all the time, too. One of my favorite projects that started in the past few years is Playwright, an awesome browser automation tool from Microsoft. Playwright makes end-to-end web testing easy, reliable, and fast. It provides cross-browser and cross-language support like Selenium, a concise syntax like Cypress, and a bunch of advanced features like automatic waiting, tracing, and code generation. Plus, Playwright is magnitudes faster than other automation tools. It took things that made Selenium, Cypress, and Puppeteer great, and it took them to the next level.

Openness through shared test suites

So far, all the ways of approaching open testing are things we could do today. Many of us are probably already doing these things, even if we didn’t think of them under the phrase “open testing.” But where can these ideas go in the future?

My mind goes back to one of the big problems with testing that I mentioned earlier: duplication. Opening up collaboration fixes some bad habits, and sharing components eliminates some duplication in the plumbing of test automation, but so many of our tests across the industry repeat the same kinds of steps and follow the same types of patterns.

For example, think about any time you’ve ordered something from an online store. It could be Amazon, Walmart, Target – whatever. Every single online store has a virtual shopping cart. Whenever you want to buy something, you add it to your cart. Then, when you’re done shopping, you proceed to pay for all the items in your cart. If you decide you don’t want something anymore, you remove it from the cart. Easy-peasy.

As I describe this type of shopping cart, I don’t need to show you screenshots from the store website to explain it. Y’all have done so much online shopping that you intuitively know how it works, regardless of the store. Heck, I recently ordered a bunch of parts for an old Volkswagen Beetle from a site named JBugs, and the shopping cart was the same.

If so many applications have the same parts, then why do we keep duplicating the same tests in different places? Think about it. Think about how many times different teams have written nearly identical shopping cart tests. Ouch. Think about how much time was wasted on that duplication of effort.

I think this is something where Artificial Intelligence and Machine Learning could help. What if we could develop machine learning models to learn common behaviors for apps and services? The learning agents would look for things like standard icons and typical workflows. We could essentially create test suites for things like login, search, shopping, and payment that could run successfully on most apps. These kinds of tests probably couldn’t cover everything in any given application, but they could cover basic, common behaviors. Maybe that could cover a quarter of all behaviors worth testing. Maybe a third? Maybe half? Every little bit helps!

AI and ML can help us achieve true Autonomous Testing

Now, imagine sharing those generic test suites publicly. In the same way developers have open source projects to help expedite their coding, and in the same way data scientists have open data sets to use for modeling, testers could have open test suites that they could pick up and run as applicable. Not test tools – but actual runnable tests that could run against any application. If these kinds of test suites prove to be valuable, then prominent ones could become universally-accepted bars of quality for software apps. For example, in the future, companies could download and execute tests that run on any system for the apps they’re developing in addition to the tests they develop in-house. I think that could be a really cool opportunity.

This type of testing – Autonomous Testing – is the future. Software developer and testers will use AI-backed tools to better learn, explore, and exercise app behaviors. These tools will make it easier than ever to automate scriptable tests.

How to start pursuing openness

As we have covered, open testing could take many forms:

  1. It could be openness in collaboration to build better quality from the start.
  2. It could be openness in specification by example and living documentation.
  3. It could be openness in sharing tests and their results with customers and users.
  4. It could be openness in sharing tools, frameworks, and platforms.
  5. It could be openness in building shared test sets for common application behaviors.

Some of these ideas might seem far-fetched or aspirational, but quite honestly, I think each of them could add lots of value to testing practices. I think every tester and every team should look at this list and ask themselves, “Could we try some of these things?” Perhaps your team could take baby steps with better collaboration or better specification. Perhaps your team has a cool project you built in-house that you could release as an open source project, like my old team and I did with Boa Constrictor. Perhaps there’s a startup idea in using machine learning for autonomous testing. Perhaps there are other ways to achieve open testing that aren’t listed here. Who knows? It could be cool!

We should also consider the flip side. Are there certain aspects of testing that should remain closed? My mind goes to security. Could fully open testing inadvertently reveal security vulnerabilities? Could lack of coverage in some areas welcome expedited exploitation? I don’t know, but I think we should consider possibilities like these.

If you want to pursue open testing, here are three questions to get you started:

  1. How is your testing today?
    1. In what ways is it already open?
    2. In what ways is it closed?
  2. How could your testing improve with incremental openness?
    1. We’re talking baby steps here – small improvements that you could easily achieve today.
    2. It could be as small as trying Example Mapping or joining a mob programming session.
  3. How could your testing improve with radical openness?
    1. Shoot the moon! Dream big! Get creative!
    2. In the world of software, anything is possible.

Conclusion

We should also remember that open testing isn’t a goal unto itself. It’s a means to an end, and that end is higher quality: quality in our practices, quality in our artifacts, and ultimately quality in the software we create. We shouldn’t seek openness in testing just because I’m spouting lots of buzzwords in this article. At the same time, we also shouldn’t brush off these ideas as too radical or idealistic. What we should do is seek ways for perpetual improvement. Remember that this whole idea of open testing came from the tried-and-true benefits of open source code.

How Q2 uses BDD with SpecFlow for testing PrecisionLender

This case study was written by Andrew Knight, Lead Software Engineer in Test for Q2’s PrecisionLender product, in collaboration with Q2 and Tricentis. It explains the PrecisionLender team’s continuous testing journey and how SpecFlow served as a cornerstone for success.

What is PrecisionLender?

PrecisionLender is a web application that empowers commercial bankers with in-the-moment insights that help them structure and price commercial deals. Andi®, PrecisionLender’s intelligent virtual analyst, delivers these hyper-focused recommendations in real-time, allowing relationship managers to make data-driven decisions while pricing their commercial deals. PrecisionLender is owned and developed by Q2, a financial experience software company dedicated to providing digital banking and lending solutions to banks, credit unions, alternative finance, and fintech companies in the U.S. and internationally.

The PrecisionLender Opportunity Screen
(Picture taken from the PrecisionLender Support Center)

The starting point

The PrecisionLender team had a robust Continuous Integration (CI) delivery pipeline with strong unit test coverage, but they lacked end-to-end feature coverage. Developers would fill this gap by manually inspecting their changes in a shared development environment. However, as the PrecisionLender app grew, manual checks could not cover all possible integrations. The team knew they needed continuous automated testing to provide a safety net for development to remain lean and efficient. In April 2018, they hired Andrew Knight as their first Software Engineer in Test (SET) – a new role for the company – to lead the effort.

Automating tests with SpecFlow

The PrecisionLender team developed the Boa test solution – a project for automating end-to-end tests at scale. Boa would become PrecisionLender’s internal platform for test automation development. The name “Boa” is a loose acronym for “Behavior-Oriented Automation.”

The team chose SpecFlow to be the core framework for Boa tests. Since the PrecisionLender app’s backend is developed using .NET, SpecFlow was a natural fit. SpecFlow’s Gherkin syntax made tests readable and understandable, even to product owners and product support specialists who do not code.

The SpecFlow framework integrates with tools like Selenium WebDriver for testing Web UIs and RestSharp for testing REST APIs to exercise vital pathways for thorough app coverage. SpecFlow’s dependency injection mechanisms are solid yet simple, and the online docs are thorough. Plus, SpecFlow is an open-source project, so anyone can look at its code to learn how things work, open requests for new features, and even offer code contributions.

An example Boa test, written in Gherkin using SpecFlow.

Executing tests with SpecFlow+ Runner

Writing good tests was only part of the challenge. The PrecisionLender team needed to execute Boa tests continuously to provide fast feedback on changes to the app. The team chose to run Boa tests using SpecFlow+ Runner, which is tailored for SpecFlow tests. The team uses SpecFlow+ Runner to launch tests in parallel in TeamCity any time a developer deploys a code change to internal pre-production environments. The entire test suite also runs every night against multiple product configurations. SpecFlow+ Runner produces a helpful test report with everything needed to triage test failures: pass-and-fail tallies overall and per feature, a visual execution timeline, and full system logs. If engineers need to investigate certain failures more closely, they can use SpecFlow tags and SpecFlow+ Runner profiles to selectively filter tests for reruns. SpecFlow+ Runner’s multiple features help the team expedite test execution and investigation.

The SpecFlow+ Runner report for a dozen smoke tests.

Sharing features with SpecFlow+ LivingDoc

Good test cases are more than just verification procedures – they are behavior specifications. They define how features should work. Instead of keeping testing work siloed by role, the PrecisionLender team wanted to share Boa tests as behavior specs with all stakeholders to foster greater collaboration and understanding around features. The team also wanted to share Boa tests with specific customers without sharing the entire automation code.

SpecFlow+ LivingDoc enabled the PrecisionLender team to turn Gherkin feature files into living documentation. Whereas the SpecFlow+ Runner report focuses on automation execution, the SpecFlow+ LivingDoc report focuses on behavior specification apart from coding and automation details. LivingDoc displays Gherkin scenarios in a readable, searchable way that both internal folks and customers can consume. It can also optionally include high-level pass-and-fail results for each scenario, providing just enough information to be helpful and not overwhelming. LivingDoc has also helped PrecisionLender’s engineers identify and eliminate unused step definitions within the automation code. PrecisionLender benefits greatly from complementary reports from SpecFlow+ Runner and SpecFlow+ LivingDoc.

The SpecFlow+ LivingDoc report for a dozen smoke tests with their pass-and-fail results.

Improving interactions with Boa Constrictor

The Boa test solution initially used the Page Object Model to model interactions with the PrecisionLender app. However, as the PrecisionLender team automated more and more Boa tests, it became apparent that page objects did not scale well. Many page object classes had duplicative methods, making automation code messy. Some methods also did not include appropriate waiting mechanisms, introducing flaky failures.

PrecisionLender’s SETs developed Boa Constrictor, a .NET implementation of the Screenplay Pattern, to make better interactions for better automation. In Screenplay, actors use abilities to perform interactions. For example, an ability could be using Selenium WebDriver, and an interaction could be clicking an element. The Screenplay Pattern can be seen as a refactoring of the Page Object Model that minimizes duplicate code through a better separation of concerns. Individual interactions can be hardened for robustness, eliminating flaky hotspots. The Boa test solution now exclusively uses Boa Constrictor for interactions.

In October 2020, Q2 released Boa Constrictor as an open-source project so that anyone can use it. It is fully compatible with SpecFlow and other .NET test frameworks, and it provides rich interactions for Selenium WebDriver and RestSharp out of the box.

Boa Constrictor, the .NET Screenplay Pattern.

Scaling massively with Selenium Grid

When the PrecisionLender team first started automating Boa tests, they ran tests one at a time. That soon became too slow since the average Boa test took 20 to 50 seconds to complete. The team then started running up to 3 tests in parallel on one machine, but that also was not fast enough. They turned to Selenium Grid, a tool for running WebDriver sessions remotely across multiple machines.

PrecisionLender built a set of internal Selenium Grid instances using Microsoft Azure virtual machines to run Boa tests at high scale. As of July 2021, PrecisionLender has over 1800 unique Boa tests that run across four distinct product configurations. Whenever TeamCity detects a code change, it triggers a “continuous” Boa test suite with over 1000 tests running 50 parallel tests using Google Chrome on Selenium Grid. It completes execution in about 10 minutes. TeamCity launches the full test suite every night against all product configurations with 64-100 parallel tests on Selenium Grid. Continuous Integration currently runs up to 10K Boa tests daily against the PrecisionLender app with SpecFlow+ Runner and Selenium Grid.

The Boa test solution architecture, including Continuous Integration through TeamCity and parallel testing with SpecFlow+ Runner and Selenium Grid.

Shifting left with BDD

Better testing and automation practices eventually inspired better development practices. Product owners would create user stories, but developers would struggle to understand requirements and business purposes fully. PrecisionLender’s SETs started bringing together the Three Amigos – business, development, and testing roles – to discuss product behaviors proactively while creating user stories. They introduced Behavior-Driven Development (BDD) activities like Example Mapping to explore behaviors together. Then, well-defined stories could be easily connected to SpecFlow tests written in Gherkin following Specification by Example (SBE). Teams repeatedly saved time by thinking before coding and specifying before testing. They built higher quality into features from the beginning, and they stopped before working on half-baked stories with unjustified value propositions. Developers who participated in these behavior-driven practices were also more likely to automate Boa tests on their own. Furthermore, one of PrecisionLender’s developers loved BDD practices so much that he joined the team of SETs! Through Gherkin, SpecFlow provided a foundation that enabled quality work to shift left.

Challenges along the way

Achieving true continuous testing had its challenges along the way. Intermittent failure was the most significant issue PrecisionLender faced at scale. With so many tests, environments, and infrastructural pieces, arbitrary failures were statistically unavoidable. The PrecisionLender team took a two-pronged approach to handle intermittent failures: (1) eliminate race conditions in automation using good interactions with Boa Constrictor, and (2) use SpecFlow+ Runner to automatically retry failed tests to determine if failures were consistent or intermittent. These two approaches reduced the frequency of flaky failures and helped engineers quickly resolve any remaining issues. As a result, Boa tests enjoy well above a 99% success rate, and most failures are due to actual bugs.

PrecisionLender app performance at scale was a second big challenge. Running up to 100 tests in parallel turned functional tests into de facto load tests. Testing at scale repeatedly uncovered performance bottlenecks in the app. Performance issues caused widespread test failures that were difficult to diagnose because they appeared intermittently. Still, the visual timeline and timestamps in the SpecFlow+ Runner report helped the team identify periods of failure that could be crosschecked against backend logs, metrics, and database queries. Developers resolved many performance issues and significantly boost the app’s response times and load capacity.

Training team members to develop solid test automation was the third challenge. At the start of the journey, test automation, Gherkin, and BDD were all new to PrecisionLender. The PrecisionLender SETs took active steps to train others on how to develop good tests and good automation through group workshops, Three Amigos meetings, and one-on-one mentoring sessions. They shared resources like the Automation Panda blog for how to write good tests and good Gherkin. The investment in education paid off: many developers have joined the SETs in writing readable, reliable Boa tests that run continuously.

Benefits to the business

Developing a continuous testing solution brought many incredible benefits to PrecisionLender. First, the quality of the PrecisionLender app improved because continuous testing provided fast feedback on failures that developers could quickly fix. Instead of relying on manual spot checks, the team could trust the comprehensive safety net of Boa tests to catch bugs. Many issues would be caught within an hour of a developer making a code commit, and the longest feedback cycle would be only one business day for the full nightly test suites to run. Boa tests catch failures before customers ever experience them. The continuous nature of testing enables PrecisionLender to publish new releases every two weeks.

Second, the high reliability of the Boa test solution means that the PrecisionLender team can trust test results. When a test passes, the behavior is working. When a test fails, there is a real bug. Reliability also means that engineers spend less time on automation maintenance and more time on more valuable activities, like developing new features and adding new tests. Quality is present in both the product code and the test code.

Third, continuous testing boosts customer confidence in PrecisionLender. Customers trust the software quality because they know that PrecisionLender thoroughly tests every release. The PrecisionLender team also shares SpecFlow+ LivingDoc reports with specific clients to prove quality.

A bright future

PrecisionLender’s continuous testing journey is not over. Since the PrecisionLender team hired its first SET, it has hired three more, in addition to a testing manager, to grow quality improvement efforts. Multiple development teams have written their own Boa tests, and they plan to write more tests independently. SpecFlow’s tools have been indispensable in helping the PrecisionLender team achieve successful quality assurance. As PrecisionLender welcomes more customers, the Boa solution will be ready to scale with more tests, more configurations, and more executions.

Should Gherkin Steps use Past, Present, or Future Tense?

Gherkin’s Given-When-Then syntax is a great structure for specifying behaviors. However, while writing Gherkin may seem easy, writing good Gherkin can be a challenge. One aspect to consider is the tense used for Gherkin steps. Should Gherkin steps use past, present, or future tense?

One approach is to use present tense for all steps, like this:

Scenario: Simple Google search
    Given the Google home page is displayed
    When the user searches for "panda"
    Then the results page shows links related to "panda"

Notice the tense of each verb:

  1. the home page is – present
  2. the user searches – present
  3. the results page shows – present

Present tense is the simplest verb tense to use. It is the least “wordy” tense, and it makes the scenario feel active.

An alternative approach is to use past-present-future tense for Given-When-Then steps respectively, like this:

Scenario: Simple Google search
    Given the Google home page was displayed
    When the user searches for "panda"
    Then the results page will show links related to "panda"

Notice the different verb tenses in this scenario:

  1. the home page was – past
  2. the user searches – present
  3. the result page will show – future

Scenarios exercise behavior. Writing When steps using present tense centers the scenario’s main actions in the present. Since Given steps must happen before the main actions, they would be written using past tense. Likewise, since Then steps represent expected outcomes after the main actions, they would be written using future tense.

Both of these approaches – using all present tense or using past-present-future in order – are good. Personally, I prefer to write all steps using present tense. It’s easier to explain to others, and it frames the full scenario in the moment. However, I don’t think other approaches are good. For example, writing all steps using past tense or future tense would seem weird, and writing steps in order of future-present-past tense would be illogical. Scenarios should be centered in the present because they should timelessly represent the behaviors they cover.

Want to learn more? Check out my other BDD articles, especially Writing Good Gherkin.

Solving: How to write good UI interaction tests? #GivenWhenThenWithStyle

Writing good Gherkin is a passion of mine. Good Gherkin means good behavior specification, which results in better features, better tests, and ultimately better software. To help folks improve their Gherkin skills, Gojko Adzic and SpecFlow are running a series of #GivenWhenThenWithStyle challenges. I love reading each new challenge, and in this article, I provide my answer to one of them.

The Challenge

Challenge 20 states:

This week, we’re looking into one of the most common pain points with Given-When-Then: writing automated tests that interact with a user interface. People new to behaviour driven development often misunderstand what kind of behaviour the specifications should describe, and they write detailed user interactions in Given-When-Then scenarios. This leads to feature files that are very easy to write, but almost impossible to understand and maintain.

Here’s a typical example:

Scenario: Signed-in users get larger capacity
 
Given a user opens https://www.example.com using Chrome
And the user clicks on "Upload Files"
And the page reloads
And the user clicks on "Spreadsheet Formats"
Then the buttons "XLS" and "XLSX" show
And the user clicks on "XLSX"
And the user selects "500kb-sheet.xlsx"
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
And the user clicks on "XLSX"
And the user selects "1mb-sheet.xlsx"
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx" 
And the user clicks on "Login"
And the user enters "testuser123" into the "username" field
And the user enters "$Pass123" into the "password" field
And the user clicks on "Sign in"
And the page reloads
Then the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx" 
And the user clicks on "spreadsheet formats"
Then the buttons "XLS" and "XLSX" show
And the user clicks on "XLSX"
And the user selects "1mb-sheet.xlsx"
Then the upload completes
And the table "Uploaded Files" contains a cell with "1mb-sheet.xlsx" 
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx"

A common way to avoid such issues is to rewrite the specification to avoid the user interface completely. We’ve looked into that option several times in this article series. However, that solution only applies if the risk we’re testing is not in the user interface, but somewhere below. To make this challenge more interesting, let’s say that we actually want to include the user interface in the test, since the risk is in the UI interactions.

Indeed, most behavior-driven practitioners would generally recommend against phrasing steps using language specific to the user interface. However, there are times when testing a user interface itself is valid. For example, I work at PrecisionLender, a Q2 Company, and our main web app is very heavy on the front end. It has many, many interconnected fields for pricing commercial lending opportunities. My team has quite a few tests to cover UI-centric behaviors, such as verifying that entering a new interest rate triggers recalculation for summary amounts. If the target behavior is a piece of UI functionality, and the risk it bears warrants test coverage, then so be it.

Let’s break down the example scenario given above to see how to write Gherkin with style for user interface tests.

Understanding Behavior

Behavior is behavior. If you can describe it, then you can do it. Everything exhibits behavior, from the source code itself to the API, UIs, and full end-to-end workflows. Gherkin scenarios should use verbiage that reflects the context of the target behavior. Thus, the example above uses words like “click,” “select,” and “open.” Since the scenario explicitly covers a user interface, I think it is okay to use these words here. What bothers me, however, are two apparent code smells:

  1. The wall of text
  2. Out-of-order step types

The first issue is the wall of text this scenario presents. Walls of text are hard to read because they present too much information at once. The reader must take time to read through the whole chunk. Many readers simply read the first few lines and then skip the remainder. The example scenario has 27 Given-When-Then steps. Typically, I recommend Gherkin scenarios to have single-digit line length. A scenario with less than 10 steps is easier to understand and less likely to include unnecessary information. Longer scenarios are not necessarily “wrong,” but their longer lengths indicate that, perhaps, these scenarios could be rewritten more concisely.

The second issue in the example scenario is that step types are out of order. Given-When-Then is a formula for success. Gherkin steps should follow strict Given → When → Then ordering because this ordering demarcates individual behaviors. Each Gherkin scenario should cover one individual behavior so that the target behavior is easier to understand, easier to communicate, and easier to investigate whenever the scenario fails during testing. When scenarios break the order of steps, such as Given → Then → Given → Then in the example scenario, it shows that either the scenario covers multiple behaviors or that the author did not bring a behavior-driven understanding to the scenario.

The rules of good behavior don’t disappear when the type of target behavior changes. We should still write Gherkin with best practices in mind, even if our scenarios cover user interfaces.

Breaking Down Scenarios

If I were to rewrite the example scenario, I would start by isolating individual behaviors. Let’s look at the first half of the original example:

Given a user opens https://www.example.com using Chrome
And the user clicks on "Upload Files"
And the page reloads
And the user clicks on "Spreadsheet Formats"
Then the buttons "XLS" and "XLSX" show
And the user clicks on "XLSX"
And the user selects "500kb-sheet.xlsx"
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
And the user clicks on "XLSX"
And the user selects "1mb-sheet.xlsx"
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx"

Here, I see four distinct behaviors covered:

  1. Clicking “Upload Files” reloads the page.
  2. Clicking “Spreadsheet Formats” displays new buttons.
  3. Uploading a spreadsheet file makes the filename appear on the page.
  4. Attempting to upload a spreadsheet file that is 1MB or larger fails.

If I wanted to purely retain the same coverage, then I would rewrite these behavior specs using the following scenarios:

Feature: Example site
 
 
Scenario: Choose to upload files
 
Given the Example site is displayed
When the user clicks the "Upload Files" link
Then the page displays the "Spreadsheet Formats" link
 
 
Scenario: Choose to upload spreadsheets
 
Given the Example site is ready to upload files
When the user clicks the "Spreadsheet Formats" link
Then the page displays the "XLS" and "XLSX" buttons
 
 
Scenario: Upload a spreadsheet file that is smaller than 1MB
 
Given the Example site is ready to upload spreadsheet files
When the user clicks the "XLSX" button
And the user selects "500kb-sheet.xlsx" from the file upload dialog
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
 
 
Scenario: Upload a spreadsheet file that is larger than or equal to 1MB
 
Given the Example site is ready to upload spreadsheet files
When the user clicks the "XLSX" button
And the user selects "1mb-sheet.xlsx" from the file upload dialog
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx"

Now, each scenario covers each individual behavior. The first scenario starts with the Example site in a “blank” state: “Given the Example site is displayed”. The second scenario inherently depends upon the outcome of the first scenario. Rather than repeat all the steps from the first scenario, I wrote a new starting step to establish the initial state more declaratively: “Given the Example site is ready to upload files”. This step’s definition method may need to rerun the same operations as the first scenario, but it guarantees independence between scenarios. (The step could also optimize the operations, but that should be a topic for another challenge.) Likewise, the third and fourth scenarios have a Given step to establish the state they need: “Given the Example site is ready to upload spreadsheet files.” Both scenarios can share the same Given step because they have the same starting point. All three of these new steps are descriptive more than prescriptive. They declaratively establish an initial state, and they leave the details to the automation code in the step definition methods to determine precisely how that state is established. This technique makes it easy for Gherkin scenarios to be individually clear and independently executable.

I also added my own writing style to these scenarios. First, I wrote concise, declarative titles for each scenario. The titles dictate interaction over mechanics. For example, the first scenario’s title uses the word “choose” rather than “click” because, from the user’s perspective, they are “choosing” an action to take. The user will just happen to mechanically “click” a link in the process of making their choice. The titles also provide a level of example. Note that the third and fourth scenarios spell out the target file sizes. For brevity, I typically write scenario titles using active voice: “Choose this,” “Upload that,” or “Do something.” I try to avoid including verification language in titles unless it is necessary to distinguish behaviors.

Another stylistic element of mine was to remove explicit details about the environment. Instead of hard coding the website URL, I gave the site a proper name: “Example site.” I also removed the mention of Chrome as the browser. These details are environment-specific, and they should not be specified in Gherkin. In theory, this site could have multiple instances (like an alpha or a beta), and it should probably run in any major browser (like Firefox and Edge). Environmental characteristics should be specified as inputs to the automation code instead.I also refined some of the language used in the When and Then steps. When I must write steps for mechanical actions like clicks, I like to specify element types for target elements. For example, “When the user clicks the “Upload Files” link” specifies a link by a parameterized name. Saying the element is a link helps provides context to the reader about the user interface. I wrote other steps that specify a button, too. These steps also specified the element name as a parameter so that the step definition method could possibly perform the same interaction for different elements. Keep in mind, however, that these linguistic changes are neither “required” nor “perfect.” They make sense in the immediate context of this feature. While automating step definitions or writing more scenarios, I may revisit the verbiage and do some refactoring.

Determining Value for Each Behavior

The four new scenarios I wrote each covers an independent, individual behavior of the fictitious Example site’s user interface. They are thorough in their level of coverage for these small behaviors. However, not all behaviors may be equally important to cover. Some behaviors are simply more important than others, and thus some tests are more valuable than others. I won’t go into deep detail about how to measure risk and determine value for different tests in this article, but I will offer some suggestions regarding these example scenarios.

First and foremost, you as the tester must determine what is worth testing. These scenarios aptly specify behavior, and they will likely be very useful for collaborating with the Three Amigos, but not every scenario needs to be automated for testing. You as the tester must decide. You may decide that all four of these example scenarios are valuable and should be added to the automated test suite. That’s a fine decision. However, you may instead decide that certain user interface mechanics are not worth explicitly testing. That’s also a fine decision.

In my opinion, the first two scenarios could be candidates for the chopping block:

  1. Choose to upload files
  2. Choose to upload spreadsheets

Even though these are existing behaviors in the Example site, they are tiny. The tests simply verify that a user clicks makes certain links or buttons appear. It would be nice to verify them, but test execution time is finite, and user interface tests are notoriously slow compared to other tests. Consider the Rule of 1’s: typically, by orders of magnitude, a unit test takes about 1 millisecond, a service API test takes about 1 second, and a web UI test takes about 1 minute. Furthermore, these behaviors are implicitly exercised by the other scenarios, even if they don’t have explicit assertions.

One way to condense the scenarios could be like this:

Feature: Example site
 
 
Background:
 
Given the Example site is displayed
When the user clicks the "Upload Files" link
And the user clicks the "Spreadsheet Formats" link
And the user clicks the "XLSX" button
 
 
Scenario: Upload a spreadsheet file that is smaller than 1MB
 
When the user selects "500kb-sheet.xlsx" from the file upload dialog
Then the upload completes
And the table "Uploaded Files" contains a cell with "500kb-sheet.xlsx" 
 
 
Scenario: Upload a spreadsheet file that is larger than or equal to 1MB
 
When the user selects "1mb-sheet.xlsx" from the file upload dialog
Then the upload fails
And the table "Uploaded Files" does not contain a cell with "1mb-sheet.xlsx" 

This new feature file eliminates the first two scenarios and uses a Background section to cover the setup steps. It also eliminates the need for special Given steps in each scenario to set unique starting points. Implicitly, if the “Upload Files” or “Spreadsheet Formats” links fail to display the expected elements, then those steps would fail.

Again, this modification is not necessarily the “best” way or the “right” way to cover the desired behaviors, but it is a reasonably good way to do so. However, I would assert that both the 4-scenario feature file and the 2-scenario feature file are much better approaches than the original example scenario.

More Gherkin

What I showed in my answer to this Gherkin challenge is how I would handle UI-centric behaviors. I try to keep my Gherkin scenarios concise and focused on individual, independent behaviors. Try using these style techniques to rewrite the second half of Gojko’s original scenario. Feel free to drop your Gherkin in the comments below. I look forward to seeing how y’all write #GivenWhenThenWithStyle!

Using Domain-Specific Languages for Security Testing

I love programming languages. They have fascinated me ever since I first learned to program my TI-83 Plus calculator in ninth grade, many years ago. When I studied computer science in college, I learned how parsers, interpreters, and compilers work. During my internships at IBM, I worked on a language named Enterprise Generation Language as both a tester and a developer. At NetApp, I even developed my own language named DS for test automation. Languages are so much fun to learn, build, and extend.

Today, even though I do not actively work on compilers, I still do some pretty interesting things with languages and testing. I strongly advocate for Behavior-Driven Development and its domain-specific language (DSL) Gherkin. In fact, as I wrote in my article Behavior-Driven Blasphemy, I support using Gherkin-based BDD test frameworks for test automation even if a team is not also doing BDD’s collaborative activities. Why? Gherkin is the world’s first major off-the-shelf DSL for test automation, and it doesn’t require the average tester to know the complexities of compiler theory. DSLs like Gherkin can make tests easier to read, faster to write, and more reliable to run. They provide a healthy separation of concerns between test cases and test code. After working on successful large-scale test automation projects with C# and SpecFlow, I don’t think I could go back to traditional test frameworks.

I’m not the only one who thinks this way. Here’s a tweet from Dinis Cruz, CTO and CISO at Glasswall, after he read one of my articles:

Dinis then tweeted at me to invite me to speak about using DSLs for testing at the Open Security Summit in 2021:

Now, I’m not a “security guy” at all, but I do know a thing or two about DSLs and testing. So, I gladly accepted the invitation to speak! I delivered my talk, “Using DSLs for Security Testing” virtually on Thursday, January 14, 2021 at 10am US Eastern. I also uploaded my slides to GitHub at AndyLPK247/using-dsls-for-security-testing. Check out the YouTube recording here:

This talk was not meant to be a technical demo or tutorial. Instead, it was meant to be a “think big” proposal. The main question I raised was, “How can we use DSLs for security testing?” I used my own story to illustrate the value languages deliver, particularly for testing. My call to action breaks that question down into three parts:

  1. Can DSLs make security testing easier to do and thereby more widely practiced?
  2. Is Gherkin good enough for security testing, or do we need to make a DSL specific to security?
  3. Would it be possible to write a set of “standard” or “universal” security tests using a DSL that anyone could either run directly or use as a template?

My goal for this talk was to spark a conversation about DSLs and security testing. Immediately after my talk, Luis Saiz shared two projects he’s working on regarding DSLs and security: SUSTO and Mist. Dinis also invited me back for a session at the Open Source Summit Mini Summit in February to have a follow-up roundtable discussion for my talk. I can’t wait to explore this idea further. It’s an exciting new space for me.

If this topic sparks your interest, be sure to watch my talk recording, and then join us live in February 2021 for the next Open Source Summit event. Virtual sessions are free to join. Many thanks again to Dinis and the whole team behind Open Source Summit for inviting me to speak and organizing the events.

SpecFlow’s Online Gherkin Editor

Finding a good Gherkin editor is difficult. Some editors like Visual Studio Code and similar IDEs work great for engineers but aren’t suitable for product owners and non-programmer Amigos who want to contribute. Other editors like Notepad++ and Atom are lighter in weight but still require extensions and a little expertise. Fancy BDD tools like CucumberStudio and Cucumber for Jira provide Gherkin editors together with a bunch of other nifty features, but they require paid licenses.

For years, I’ve wanted a lightweight Gherkin editor that’s easy to use and accessible to all. Now, one finally exists: the Online Gherkin Editor by SpecFlow!

SpecFlow is the most popular BDD test automation framework for .NET. It’s also my favorite BDD framework. Over the past few years, I’ve built two large-scale test automation solutions with SpecFlow.

The Online Gherkin Editor by SpecFlow is just an editor on a web page. When you first load the page, the editor has example scenarios for you to reference. You can type your own Gherkin into the text area, and the editor highlights it for you. The editor provides line numbers and visual scrolling, too. My language is English, but if you happen to speak German, French, Spanish, or Dutch, then you can change the language setting via a dropdown. Once you’re done writing your Gherkin, you can clear it, copy it to the clipboard, or download it as a feature file using icons in the top-right corner. Be warned, though, that this editor won’t save your Gherkin in the cloud.

If you want to give this new editor a try, here’s the link: https://specflow.org/gherkin-editor/

You can also read SpecFlow’s official announcement here: https://specflow.org/blog/introducing-the-specflow-online-gherkin-editor/

Thanks, SpecFlow! Happy “Gherk-ing”!

Introducing Boa Constrictor: The .NET Screenplay Pattern

Today, I’m excited to announce the release of a new open source project for test automation: Boa Constrictor, the .NET Screenplay Pattern!

The Screenplay Pattern helps you make better interactions for better automation. The pattern can be summarized in one line: Actors use Abilities to perform Interactions.

  • Actors initiate Interactions. Every test has an Actor.
  • Abilities enable Actors to perform Interactions. They hold objects that Interactions need, like WebDrivers or REST API clients.
  • Interactions exercise behaviors under test. They could be clicks, requests, commands, and anything else.

This separation of concerns makes Screenplay code very reusable and scalable, much more so than traditional page objects. Check it out, here’s a C# script to test a search engine:

// Create the Actor
IActor actor = new Actor(logger: new ConsoleLogger());

// Add an Ability to use a WebDriver
actor.Can(BrowseTheWeb.With(new ChromeDriver()));

// Load the search engine
actor.AttemptsTo(Navigate.ToUrl(SearchPage.Url));

// Get the page's title
string title = actor.AsksFor(Title.OfPage());

// Search for something
actor.AttemptsTo(Search.For("panda"));

// Wait for results
actor.AttemptsTo(Wait.Until(
    Appearance.Of(ResultPage.ResultLinks),
    IsEqualTo.True()));

Boa Constrictor provides many interactions for Selenium WebDriver and RestSharp out of the box, like Navigate, Title, and Appearance shown above. It also lets you compose interactions together, like how Search is a composition of typing and clicking.

Over the past two years, my team and I at PrecisionLender, a Q2 Company, developed Boa Constrictor internally as the cornerstone of Boa, our comprehensive end-to-end test automation solution. We were inspired by Serenity BDD‘s Screenplay implementation. After battle-hardening Boa Constrictor with thousands of automated tests, we are releasing it publicly as an open source project. Our goal is to help everyone make better interactions for better test automation.

If you’d like to give Boa Constrictor a try, then start with the tutorial. You’ll implement that search engine test from above in full. Then, once you’re ready to use it for some serious test automation, add the Boa.Constrictor NuGet package to your .NET project and go!

You can view the full source code on GitHub at q2ebanking/boa-constrictor. Check out the repository for full information. In the coming weeks, we’ll be developing more content and code. Since Boa Constrictor is open source, we’d love for you to contribute to the project, too!