Which web testing tool should I use?

This article is based on my talk at PyCon US 2023. The web app under test and most of the example code is written in Python, but the information presented is applicable to any stack.

There are several great tools and frameworks for automating browser-based web UI testing these days. Personally, I gravitate towards open source projects that require coding skills to use, rather than low-code/no-code automation tools. The big three browser automation tools right now are Selenium, Cypress, and Playwright. There are other great tools, too, but these three seem to be the ones everyone is talking about the most.

It can be tough to pick right right tool for your needs. In this article, let’s compare and contrast these tools.

Choosing a web app to test

I developed a small web app named Bulldoggy, the reminders app. You can clone the repository and run it yourself. The repository URL is https://github.com/AutomationPanda/bulldoggy-reminders-app.

Bulldoggy is a full-stack Python app:

It uses FastAPI for APIs.
It uses Jinja templates for HTML and CSS files.
It uses HTMX for handling dynamic interactions without needing any explicit JavaScript.
It uses TinyDB to store data.
It uses Pydantic to model data.

If you want to run it locally, all you need is Python!

The app is pretty simple. When you first load it, it presents a standard login page. I actually used ChatGPT to help me write the HTML and CSS:

After logging in, you’ll see the reminders page:

The title card at the top has the app’s name, the logo, and a logout button. On the left, there is a card for reminder lists. Here, I have different lists for Chores and Projects. On the right, there is a card for all the reminders in the selected list. So, when I click the Chores list, I see reminders like “Buy groceries” and “Walk the dog.” I can click individual reminder rows to strike them out, indicating that they are complete. I can also add, edit, or delete reminders and lists through the buttons along the right sides of the cards.

Now that we have a web app to test, let’s learn how to use the big three web testing tools to automate tests for it.

Selenium

Selenium WebDriver is the classic and still the most popular browser automation tool. It’s the original. It carries that old-school style and swagger. Selenium manipulates the browser using the WebDriver protocol, a W3C Recommendation that all major browsers have adopted. The Selenium project is fully open source. It relies on open standards, and it is run by community volunteers according to open governance policies. Selenium WebDriver offers language bindings for Java, JavaScript, C#, and – my favorite language – Python.

Selenium WebDriver works with real, live browsers through a proxy server running on the same machine as the target browser. When test automation starts, it will launch the WebDriver executable for the proxy and then send commands through it via the WebDriver protocol.

To set up Selenium WebDriver, you need to install the WebDriver executables on your machine’s system path for the browsers you intend to test. Make sure the versions all match!

Then, you’ll need to add the appropriate Selenium package(s) to your test automation project. The names for the packages and the methods for installation are different for each language. For example, in Python, you’ll probably run pip install selenium.

In your project, you’ll need to construct a WebDriver instance. The best place to do that is in a setup method within a test framework. If you are using Python with pytest, that would go into a fixture like this:

We could hardcode the browser type we want to use as shown here in the example, or we could dynamically pick the browser type based on some sort of test inputs. We may also set options on the WebDriver instance, such as running it headless or setting an implicit wait. For cleanup after the yield command, we need to explicitly quit the browser.

Here’s what a login test would look like when using Selenium in Python:

The test function would receive the WebDriver instance through the browser fixture we just wrote. When I write tests, I follow the Arrange-Act-Assert pattern, and I like to write my test steps using Given-When-Then language in comments.

The first step is, “Given the login page is displayed.” Here, we call “browser dot get” with the full URL for the Bulldoggy app running on the local machine.

The second step is, “When the user logs into the app with valid credentials.” This actually requires three interactions: typing the username, typing the password, and clicking the login button. For each of these, the test must first call “browser dot find element” with a locator to get the element object. They locate the username and password fields using CSS selectors based on input name, and they locate the login button using an XPath that searches for the text of the button. Once the elements are found, the test can call interactions on them like “send keys” and “click”.

Now, one thing to note is that these calls should probably use page objects or the Screenplay Pattern to make them reusable, but I chose to put raw Selenium code here to keep it basic.

The third step is, “Then the reminders page is displayed.” These lines perform assertions, but they need to wait for the reminders page to load before they can check any elements. The WebDriverWait object enables explicit waiting. With Selenium WebDriver, we need to handle waiting by ourselves, or else tests will crash when they can’t find target elements. Improper waiting is the main cause for flakiness in tests. Furthermore, implicit and explicit waits don’t mix. We must choose one or the other. Personally, I’ve found that any test project beyond a small demo needs explicit waits to be maintainable and runnable.

Selenium is great because it works well, but it does have some paint points:

Like we just said, there is no automatic waiting. Folks often write flaky tests unintentionally because they don’t handle waiting properly. Therefore, it is strongly recommended to use a layer on top of raw Selenium like Pylenium, SeleniumBase, or a Screenplay implementation. Selenium isn’t a full test framework by itself – it is a browser automation tool that becomes part of a test framework.
Selenium setup can be annoying. We need to install matching WebDriver executables onto the system path for every browser we test, and we need to keep their versions in sync. It’s very common to discover that tests start failing one day because a browser automatically updated its version and no longer matched its WebDriver executable. Thankfully, a new part of the Selenium project named Selenium Manager now automatically handles the executables.
Selenium-based tests have a bad reputation for slowness. Usually, poor performance comes more from the apps under test than the tool itself, but Selenium setup and cleanup do cause a performance hit.

Cypress

Cypress is a modern frontend test framework with rich developer experience. Instead of using the WebDriver protocol, it manipulates the browser via in-browser JavaScript calls. The tests and the app operate in the same browser process. Cypress is an open source project, and the company behind it sells advanced features for it as a paid service. It can run tests on Chrome, Firefox, Edge, Electron, and WebKit (but not Safari). It also has built-in API testing support. Unfortunately, due to its design, Cypress tests must be written exclusively in JavaScript (or TypeScript).

Here’s the code for the Bulldoggy login test in Cypress in JavaScript:

The steps are pretty much the same as before. Instead of creating some sort of browser object, all Cypress calls go to its cy object. The syntax is very concise and readable. We could even fit in a few more assertions. Cypress also handles waiting automatically, which makes the code less prone to flakiness.

The rich developer experience comes alive when running Cypress tests. Cypress will open a browser window that will visually execute the test in front of us. Every step is traced so we can quickly pinpoint failures. Cypress is essentially a web app that tests web apps.

While Cypress is awesome, it is JavaScript-only, which stinks for folks who use other programming languages. For example, I’m a Pythonista at heart. Would I really want to test a full-stack Python web app like Bulldoggy with a browser automation tool that doesn’t have a Python language binding? Cypress is also trapped in the browser. It has some inherent limitations, like the fact that it can’t handle more than one open tab.

Playwright

Playwright is similar to Cypress in that it’s a modern, open source test framework that is developed and maintained by a company. Playwright manipulates the browser via debug protocols, which make it the fastest of the three tools we’ve discussed today. Playwright also takes a unique approach to browsers. Instead of testing full browsers like Chrome, Firefox, and Safari, it tests the corresponding browser engines: Chromium, Firefox (Gecko), and WebKit. Like Cypress, Playwright can also test APIs, and like Selenium, Playwright offers bindings for multiple popular languages, including Python.

To set up Playwright, of course we need to install the dependency packages. Then, we need to install the browser engines. Thankfully, Playwright manages its browsers for us. All we need to do is run the appropriate “Playwright install” for the chosen language.

Playwright takes a unique approach to browser setup. Instead of launching a new browser instance for each test, it uses one browser instance for all tests in the suite. Each test then creates a unique browser context within the browser instance, which is like an incognito session within the browser. It is very fast to create and destroy – much faster than a full browser instance. One browser instance may simultaneously have multiple contexts. Each context keeps its own cookies and session storage, so contexts are independent of each other. Each context may also have multiple pages or tabs open at any given time. Contexts also enable scalable parallel execution. We could easily run tests in parallel with the same browser instance because each context is isolated.

Let’s see that Bulldoggy login test one more time, but this time with Playwright code in Python. Again, the code is pretty similar to what we saw before. The major differences between these browser automation tools is not so much the appearance of the code but rather how they work and perform:

With Playwright, all interactions happen with the “page” object. By default, Playwright will create:

One browser instance to be shared by all tests in a suite
One context for each test case
One page within the context for each test case

When we read this code, we see locators for finding elements and methods for acting upon found elements. Notice how, like Cypress, Playwright automatically handles waiting. Playwright also packs an extensive assertion library with conditions that will wait for a reasonable timeout for their intended conditions to become true.

Again, like we said for the Selenium example code, if this were a real-world project, we would probably want to use page objects or the Screenplay Pattern to handle interactions rather than raw calls.

Playwright has a lot more cool stuff, such as the code generator and the trace viewer. However, Playwright isn’t perfect, and it also has some pain points:

Playwright tests browser engines, not full browsers. For example, Chrome is not the same as Chromium. There might be small test gaps between the two. Your team might also need to test full browsers to satisfy compliance rules.
Playwright is still new. It is years younger than Selenium and Cypress, so its community is smaller. You probably won’t find as many StackOverflow articles to help you as you would for the other tools. Features are also evolving rapidly, so brace yourself for changes.

Which one should you choose?

So, now that we have learned all about Selenium, Cypress, and Playwright, here’s the million-dollar question: Which one should we use? Well, the best web test tool to choose really depends on your needs. They are all great tools with pros and cons. I wanted to compare these tools head-to-head, so I created this table for quick reference:

In summary:

Selenium WebDriver is the classic tool that historically has appealed to testers. It supports all major browsers and several programming languages. It abides by open source, standards, and governance. However, it is a low-level browser automation tool, not a full test framework. Use it with a layer on top like Serenity, Boa Constrictor, or Pylenium.
Cypress is the darling test framework for frontend web developers. It is essentially a web app that tests web apps, and it executes tests in the same browser process as the app under test. It supports many browsers but must be coded exclusively in JavaScript. Nevertheless, its developer experience is top-notch.
Playwright is gaining popularity very quickly for its speed and innovative optimizations. It packs all the modern features of Cypress with the multilingual support of Selenium. Although it is newer than Cypress and Selenium, it’s growing fast in terms of features and user base.

If you want to know which one I would choose, come talk with me about it! You can also watch my PyCon US 2023 talk recording to see which one I would specifically choose for my personal Python projects.

Passing Test Inputs into pytest

Someone recently asked me this question:

I’m developing a pytest project to test an API. How can I pass environment information into my tests? I need to run tests against different environments like DEV, TEST, and PROD. Each environment has a different URL and a unique set of users.

This is a common problem for automated test suites, not just in Python or pytest. Any information a test needs about the environment under test is called configuration metadata. URLs and user accounts are common configuration metadata values. Tests need to know what site to hit and how to authenticate.

Using config files with an environment variable

There are many ways to handle inputs like this. I like to create JSON files to store the configuration metadata for each environment. So, something like this:

dev.json
test.json
prod.json

Each one could look like this:

{
  "base_url": "http://my.site.com/",
  "username": "pandy",
  "password": "DandyAndySugarCandy"
}

The structure of each file must be the same so that tests can treat them interchangeably.

I like using JSON files because:

they are plain text files with a standard format
they are easy to diff
they store data hierarchically
Python’s standard json module turns them into dictionaries in 2 lines flat

Then, I create an environment variable to set the desired config file:

export TARGET_ENV=dev.json

In my pytest project, I write a fixture to get the config file path from this environment variable and then read that file as a dictionary:

import json
import os
import pytest

@pytest.fixture
def target_env(scope='session'):
  config_path = os.environ['TARGET_ENV']
  with open(config_path) as config_file:
    config_data = json.load(config_file)
  return config_data

I’ll put this fixture in a conftest.py file so all tests can share it. Since it uses session scope, pytest will execute it one time before all tests. Test functions can call it like this:

import requests

def test_api_get(target_env):
  url = target_env['base_url']
  creds = (target_env['username'], target_env['password'])
  response = requests.get(url, auth=creds)
  assert response.status_code == 200

Selecting the config file with a command line argument

If you don’t want to use environment variables to select the config file, you could instead create a custom pytest command line argument. Bas Dijkstra wrote an excellent article showing how to do this. Basically, you could add the following function to conftest.py to add the custom argument:

def pytest_addoption(parser):
  parser.addoption(
    '--target-env',
    action='store',
    default='dev.json',
    help='Path to the target environment config file')

Then, update the target_env fixture:

import json
import pytest

@pytest.fixture
def target_env(request):
  config_path = request.config.getoption('--target-env')
  with open(config_path) as config_file:
    config_data = json.load(config_file)
  return config_data

When running your tests, you would specify the config file path like this:

python -m pytest --target-env dev.json

Why bother with JSON files?

In theory, you could pass all inputs into your tests with pytest command line arguments or environment variables. You don’t need config files. However, I find that storing configuration metadata in files is much more convenient than setting a bunch of inputs each time I need to run my tests. In our example above, passing one value for the config file path is much easier than passing three different values for base URL, username, and password. Real-world test projects might need more inputs. Plus, configurations don’t change frequency, so it’s okay to save them in a file for repeated use. Just make sure to keep your config files safe if they have any secrets.

Validating inputs

Whenever reading inputs, it’s good practice to make sure their values are good. Otherwise, tests could crash! I like to add a few basic assertions as safety checks:

import json
import os
import pytest

@pytest.fixture
def target_env(request):
  config_path = request.config.getoption('--target-env')
  assert os.path.isfile(config_path)

  with open(config_path) as config_file:
    config_data = json.load(config_file)

  assert 'base_url' in config_data
  assert 'username' in config_data
  assert 'password' in config_data

  return config_data

Now, pytest will stop immediately if inputs are wrong.

Environment Files Help You Store Variables

Note: For this article, I’m going to focus on environment variables for UNIX based operating systems like macOS and Linux.

Environment variables are both a blessing and a curse. They let you easily pass data into processes like applications, scripts, and containers. I develop lots of test automation projects, and environment variables are one of the most common mechanisms for passing test inputs. For example, when I run a test suite against a web app, I might need to set inputs like this:

export BASE_URL="http://my.website.com/"
export USERNAME="pandy"
export PASSWORD="DandyAndySugarCandy"
export SECRET_API_KEY="1234567890abcdefghijklmnopqrstuvwxyz"

I can just run these commands directly in my terminal to set the variables I need. Unfortunately, any time I need to run my tests in another terminal session, I need to repeat the commands to set them again. That’s a big hassle, especially for secrets and long tokens. It would be nice to store these variables in a reusable way with my project.

Thankfully, there is: the environment file. You can create a file named .env and put all your “export” commands for setting variables in it. Basically, just copy those lines above into the .env file. Then, run the following command whenever you want to set those variables in your terminal:

source .env

You can verify the value of the variables using the “echo” command. Just remember to prefix variable names with “$“. For example:

echo $BASE_URL

The output should be:

http://my.website.com/

I like to create a .env file in every project that needs environment variables. That way, I can easily keep track of all the variables the project needs in one place. I put the .env in the project’s root directory to make it easy to find. Any time I need to run the project, I run the “source” command without any worries.

If the project is stored in a Git repository, then I also add “.env” to the repository’s .gitignore file. That way, my variables won’t be committed to the repository. It’s rude to commit personal settings to a repository, and it’s dangerous and insecure to commit secrets. Many .gitignore templates already include a “.env” entry, too, since using environment files like this is a common practice.

If you really want to share your variables, here are a few options:

Just commit them to the repository.
Post them to a secrets sharing service (like LastPass).
Send them via an email or message.

Democratizing the Screenplay Pattern

I started Boa Constrictor back in 2018 because I loathed page objects. On a previous project, I saw page objects balloon to several thousand lines long with duplicative methods. Developing new tests became a nightmare, and about 10% of tests failed daily because they didn’t handle waiting properly.

So, while preparing a test strategy at a new company, I invested time in learning the Screenplay Pattern. To be honest, the pattern seemed a bit confusing at first, but I was willing to try anything other than page objects again. Eventually, it clicked for me: Actors use Abilities to perform Interactions. Boom! It was a clean separation of concerns.

Unfortunately, the only major implementations I could find for the Screenplay Pattern at the time were Serenity BDD in Java and JavaScript. My company was a .NET shop. I looked for C# implementations, but I didn’t find anything that I trusted. So, I took matters into my own hands and implemented the Screenplay Pattern myself in .NET. Initially, I implemented Selenium WebDriver interactions. Later, my team and I added RestSharp interactions. We eventually released Boa Constrictor as an open source project in October 2020 as part of Hacktoberfest.

With Boa Constrictor, I personally sought to reinvigorate interest in the Screenplay Pattern. By bringing the Screenplay Pattern to .NET, we enabled folks outside of the Java and JavaScript communities to give it a try. With our rich docs, examples, and videos, we made it easy to onboard new users. And through conference talks and webinars, we popularized the concepts behind Screenplay, even for non-C# programmers. It’s been awesome to see so many other folks in the testing community start talking about the Screenplay Pattern in the past few years.

I also wanted to provide a standalone implementation of the Screenplay Pattern. Since the Screenplay Pattern is a design for automating interactions, it could and should integrate with any .NET test framework: SpecFlow, MsTest, NUnit, xUnit.net, and any others. With Boa Constrictor, we focused singularly on making interactions as excellent as possible, and we let other projects handle separate concerns. I did not want Boa Constrictor to be locked into any particular tool or system. In this sense, Boa Constrictor diverged from Serenity BDD – it was not meant to be a .NET version of Serenity, despite taking much inspiration from Serenity.

Furthermore, in the design and all the messaging for Boa Constrictor, I strived to make the Screenplay Pattern easy to understand. So many folks I knew gave up on Screenplay in the past because they thought it was too complicated. I wanted to break things down so that any automation developer could pick it up quickly. Hence, I formed the soundbite, “Actors use Abilities to perform Interactions,” to describe the pattern in one line. I also coined the project’s slogan, “Better Interactions for Better Automation,” to clearly communicate why Screenplay should be used over alternatives like raw calls or page objects.

So far, Boa Constrictor has succeeded modestly well in these goals. Now, the project is pursuing one more goal: democratizing the Screenplay Pattern.

At its heart, the Screenplay Pattern is a generic pattern for any kind of interactions. The core pattern should not favor any particular tool or package. Anyone should be able to implement interaction libraries using the tools (or “Abilities”) they want, and each of those libraries should be treated equally without preference. Recently, in our plans for Boa Constrictor 3, we announced that we want to create separate packages for the “core” pattern and for each library of interactions. We also announced plans to add new libraries for Playwright and Applitools. The existing libraries – Selenium WebDriver and RestSharp – need not be the only libraries. Boa Constrictor was never meant to be merely a WebDriver wrapper or a superior page object. It was meant to provide better interactions for any kind of test automation.

In version 3.0.0, we successfully separated the Boa.Constrictor project into three new .NET projects and released a NuGet package for each:

This separation enables folks to pick the parts they need. If they only need Selenium WebDriver interactions, then they can use just the Boa.Constrictor.Selenium package. If they want to implement their own interactions and don’t need Selenium or RestSharp, then they can use the Boa.Constrictor.Screenplay package without being forced to take on those extra dependencies.

Furthermore, we continued to maintain the “classic” Boa.Constrictor package. Now, this package simply claims dependencies on the other three packages in order to preserve backwards compatibility for folks who used previous version of Boa Constrictor. As part of the upgrade from 2.0.x to 3.0.x, we did change some namespaces (which are documented in the project changelog), but the rest of the code remained the same. We wanted the upgrade to be as straightforward as possible.

The core contributors and I will continue to implement our plans for Boa Constrictor 3 over the coming weeks. There’s a lot to do, and we will do our best to implement new code with thoughtfulness and quality. We will also strive to keep everything documented. Please be patient with us as development progresses. We also welcome your contributions, ideas, and feedback. Let’s make Boa Constrictor excellent together.

Plans for Boa Constrictor 3

Boa Constrictor is the .NET Screenplay Pattern. It helps you make better interactions for better test automation!

I originally created Boa Constrictor starting in 2018 as the cornerstone of PrecisionLender‘s end-to-end test automation project. In October 2020, my team and I released it as an open source project hosted on GitHub. Since then, the Boa Constrictor NuGet package has been downloaded over 44K times, and my team and I have shared the project through multiple conference talks and webinars. It’s awesome to see the project really take off!

Unfortunately, Boa Constrictor has had very little development over the past year. The latest release was version 2.0.0 in November 2021. What happened? Well, first, I left Q2 (the company that acquired PrecisionLender) to join Applitools, so I personally was not working on Boa Constrictor as part of my day job. Second, Boa Constrictor didn’t need much development. The core Screenplay Pattern was well-established, and the interactions for Selenium WebDriver and RestSharp were battle-hardened. Even though we made no new releases for a year, the project remained alive and well. The team at Q2 still uses Boa Constrictor as part of thousands of test iterations per day!

The time has now come for new development. Today, I’m excited to announce our plans for the next phase of Boa Constrictor! In this article, I’ll share the vision that the core contributors and I have for the project – tentatively casting it as “version 3.” We will also share a rough timeline for development.

Separate interaction packages

Currently, the Boa.Constrictor NuGet package has three main parts:

The Screenplay Pattern’s core interfaces and classes
Interactions for Selenium WebDriver
Interactions for RestSharp

This structure is convenient for a test automation project that uses Selenium and RestSharp, but it forces projects that don’t use them to take on their dependencies. What if a project uses Playwright instead of Selenium, or RestAssured.NET instead of RestSharp? What if a project wants to make different kinds of interactions, like mobile interactions with Appium?

At its heart, the Screenplay Pattern is a generic pattern for any kind of interactions. In theory, the core pattern should not favor any particular tool or package. Anyone should be able to implement interaction libraries using the core pattern.

With that in mind, we intend to split the current Boa.Constrictor package into three separate packages, one for each of the existing parts. That way, a project can declare dependencies only on the parts of Boa Constrictor that it needs. It also enables us (and others) to develop new packages for different kinds of interactions.

Playwright support

One of the new interaction packages we intend to create is a library for Playwright interactions. Playwright is a fantastic new web testing framework from Microsoft. It provides several advantages over Selenium WebDriver, such as faster execution, automatic waiting, and trace logging.

We want to give people the ability to choose between Selenium WebDriver or Playwright for their web UI interactions. Since a test automation project would use only one, and since there could be overlap in the names and types of interactions, separating interaction packages as detailed in the previous section will be a prerequisite for developing Playwright support.

We may also try to develop an adapter for Playwright interactions that uses the same interfaces as Selenium interactions so that folks could switch from Selenium to Playwright without rewriting their interactions.

Applitools support

Another new interaction package we intend to create is a library for Applitools interactions. Applitools is the premier visual testing platform. Visual testing catches UI bugs that are difficult to catch with traditional assertions, such as missing elements, broken styling, and overlapping text. A Boa Constrictor package for Applitools interactions would make it easier to capture visual snapshots together with Selenium WebDriver interactions. It would also be an “optional” feature since it would be its own package.

Shadow DOM support

Shadow DOM is a technique for encapsulating parts of a web page. It enables a hidden DOM tree to be attached to an element in the “regular” DOM tree so that different parts between the two DOMs do not clash. Shadow DOM usage has become quite prevalent in web apps these days.

We intend to add support for Selenium interactions to pierce the shadow DOM. Selenium WebDriver requires extra calls to pierce the shadow DOM. Unfortunately, Boa Constrictor’s Selenium interactions currently do not support shadow DOM interactivity. Most likely, we will add new builder methods for Selenium-based Tasks and Questions that take in a locator for the shadow root element and then update the action methods to handle the shadow DOM if necessary.

.NET 7 targets

The main Boa Constrictor project, the unit tests project, and the example project all target .NET 5. Unfortunately, NET 5 is no longer supported by Microsoft. The latest release is .NET 7.

We intend to add .NET 7 targets. We will make the library packages target .NET 7, .NET 5 (for backwards compatibility), and .NET Standard 2.0 (again, for backwards compatibility). We will change the unit test and example projects to target .NET 7 exclusively. In fact, we have already made this change in version 2.0.2!

Dependency updates

Many of Boa Constrictor’s dependencies have released new versions over the past year. GitHub’s Dependabot has also flagged some security vulnerabilities. It’s time to update dependency versions. This is standard periodic maintenance for any project. Already, we have updated our Selenium WebDriver dependencies to version 4.6.

Documentation enhancements

Boa Constrictor has a doc site hosted using GitHub Pages. As we make the changes described above, we must also update the documentation for the project. Most notably, we will need to update our tutorial and example project, since the packages will be different, and we will have support for more kinds of interactions.

What’s the timeline?

The core contributors and I plan to implement these enhancements within the next three months:

Today, we just released two new versions with incremental changes: 2.0.1 and 2.0.2.
This week, we hope to split the existing package into three, which we intend to release as version 3.0.
In December, we will refresh the GitHub Issues for the project.
In January, the core contributors and I will host an in-person hackathon (a “Constrictathon”) in Cary, NC.

There is tons of work ahead, and we’d love for you to join us. Check out the GitHub repository, read our contributing guide, and join our Discord server!

What does Developer Relations actually do?

Have you ever attended a developer conference? Or maybe you’ve taken a software product tutorial? Perhaps you’ve listened to a tech podcast, or you follow someone with an avocado emoji 🥑 on Twitter? Chances are that the folks behind lots of that content are involved in Developer Relations.

Many software companies run Developer Relations (or “DevRel”) practices these days. They might go by different names, such as Developer Advocacy, Developer Experience, or Developer Evangelism, but what do folks in DevRel actually do? There’s a lot going on behind the posts and public appearances. In this article, I want to share insights I’ve gained from my time as a Developer Advocate for Applitools.

What’s the main purpose?

The main purpose of Developer Relations is to bridge company and community:

It helps the community understand the value of the company’s technology and how to start using it.
It helps the company understand the community’s needs and facilitates the community’s feedback.

More specifically, DevRel focuses on the community of developers – the practitioners who will actually use the technology. In fact, many DevRel folks like me are developers ourselves. We seek to serve our own communities. Our job is not to sell the C-suite on buying products and services, but rather to help developers get the most value out of what the company offers. DevRel’s goal could be restated as “making life better for developers.”

With that in mind, it is important for DevRel folks to be authentic. If members of the community feel like DevRel is merely sales in disguise, then they will flatly ignore DevRel. Any work DevRel attempts to do will be in vain.

Me, a Developer Advocate, giving a talk at All Things Open 2022 in Raleigh, NC.

How’s it done?

Saying that DevRel “bridges company and community” sounds very abstract. DevRel encompasses several activities to accomplish that purpose. For example, in my job as a Developer Advocate:

I speak at public conferences and company events.
I write articles about software testing, automation, and quality.
I develop tutorials for my company’s SDKs.
I deliver workshops on test automation and on visual testing.
I talk with customers to learn the challenges they face with testing.

The types of duties done by DevRel naturally cluster around distinct roles. DevRel is a team sport, and it is rare that one person will be highly skilled at all the different roles, let alone have enough time to fulfill them. There are six primary roles that make up a DevRel team.

The Evangelist

The evangelist popularizes the technology in an authentic, helpful way. Their job is to make sure people in the community know the big picture. Thus, the evangelist is a very outwardly-focused role. They make many public appearances at events, and they develop loads of content, like videos, articles, and online courses. Travel and strong social media presence are practically required for the job. Since they are on the “front lines,” they also keep their ears open for any feedback they might hear about the technology or the company. Evangelists may be considered thought leaders and influencers.

Since evangelists are so active in their communities, it’s common for them to have large followings on Twitter, LinkedIn, and YouTube. They may even become somewhat of celebrities in their spaces. More attention on them brings more attention to the companies they represent. Celebrity status must be handled carefully, though. One bad tweet can ruin a reputation. Or, if a company builds their entire DevRel program on one evangelist’s high profile, then the program could collapse if that person ever leaves the company.

The Educator

The educator teaches members of the community how to use the technology. While the evangelist may give introductions and demos, the educator teaches the finer details. Typically, they write product documentation, quickstart guides, and tutorials. They may even teach courses online or in person. The educator must have strong skills in technical writing and education. In fact, in many DevRel practices, educators transitioned into tech from academia! The educator does not need a high profile like the evangelist. They can be more behind-the-scenes if they choose.

My friend Sarah and I giving a BDD tutorial in person at STARWEST 2022 in Anaheim, CA.

The Community Builder

The community builder fosters scalable relationships with those using the technology. They drive and sustain engagement with the community. A good community builder creates an environment where everyone feels a sense of belonging. They provide a space for organic interactions, vulnerable questions, and helpful camaraderie. For example, folks should be able to ask basic product questions in a Slack room or product forum and get helpful answers from others in the community.

The “scalability” of those relationships is rooted in the programs the community builder runs, such as:

Building a public communication forum (like Slack, Discord, or Discourse)
Hosting events and spaces for folks to gather
Providing newsletters or product updates
Connecting folks together for cool opportunities

Furthermore, the community builder provides front-row seats to what’s happening with the technology. They get folks excited for upcoming features, new releases, and the possibilities for the future. They could provide early access to beta versions of features or hold release parties. In doing so, the community builder also solicits feedback from the community to help make the technology even better.

The Advocate

The advocate uses their expertise and community feedback to build better user experiences into the technology. In a sense, advocates are the opposite of evangelists. While evangelists are focused primarily on pushing information out from the company to the community, advocates work in the other direction, bringing feedback from the community into the company.

Specifically, advocates make feedback actionable. They work with product teams to improve the technology. They might even develop changes themselves! Sometimes, advocates are called “developer experience engineers” to emphasize how good DX (developer experience) doesn’t just happen – it must be built.

The Platform Builder

The platform builder creates the systems and infrastructure that maintain DevRel programs. They are essentially developers whose customers are other DevRel folks. For example, the platform builder could:

Create example projects for demos and educational material
Build, deploy, and administer documentation websites
Keep package versions for tutorials and example projects up to date
Implement the infrastructure needed for big events like a hackathon

The platform builder is very much behind-the-scenes and oftentimes the unsung hero.

The Visionary

The visionary casts the vision for the company’s Developer Relations practice. They lead with big ideas, set goals, and guide the team to success. Usually, the visionary carries a title like “director” or “head.” Since a DevRel practice could encompass so many different kinds of activities, they need to be wise in how they align resources to meet their goals. For example, if the company is a startup that’s just starting to roll out early releases, then the visionary should probably focus most on evangelism and community building. Different companies will have different needs.

Developer Relations work can require lots of travel. Try not to miss your flights!

How do these roles work together?

As I stated previously, DevRel is a team sport! Here’s an example of how these six roles could work together for a program like Test Automation University:

The visionary notices that folks are struggling to use the company’s technology, so they decide to create an online learning platform with free courses about the technology.
The platform builder develops the web app for the learning platform and deploys it to cloud resources.
The educator creates training courses and uploads them to the learning platform.
The evangelist promotes the learning platform when they speak at events and post on social media.
The community builder welcomes new students to the learning platform, sets up a Slack room for them to ask questions, and makes announcements when new courses are published.
The advocate finds ways to improve courses, such as adding completion progress visuals as students complete chapters.

These roles have some overlapping responsibilities, but they clearly represent distinct areas of responsibility.

Where does this team belong?

From what I’ve seen, most Developer Relations teams fall under one of two organizations: Product or Marketing. Indeed, DevRel has responsibilities to both. DevRel activities happen at the top of the marketing funnel. The team should work close with marketing folks on messaging. At the same time, DevRel folks must be somewhat independent of sales and marketing in order to be authentic in the community. They should focus on good, valuable content that supports the product and genuinely helps their users. Ultimately, the parent organization will ultimately determine how a DevRel team’s performance is measured.

Software Stickers! — One way to measure DevRel success is how many people take your stickers!

How should success be measured?

The impact of Developer Relations is tough to measure because it benefits so many aspects of a company. It teaches folks about the company’s technology. It draws them into the marketing funnel. It provides helpful guides to get them started. In many instances, though, it’s hard to measure how one particular conference talk encouraged someone to sign up for a free account, or how one piece of vital documentation prevented a user from rage-quitting.

Overall, the best measure of success is engagement. The more engaged people are in the products and in the community, the more likely they are to pay for more. Here are a few engagement metrics to consider:

How many active users the products have and how they use them
How many people sign up for an account after an event
How many people visit certain blog articles or documentation pages
How many stars, forks, and watchers a GitHub project has
The completion rate for tutorials and courses

There are many other things that could be measured. Just remember that no single metric tells the full story.

Does DevRel really matter?

YES! Developer Relations matters as a practice because developer experience (DX) matters for the consumption of technology. It’s the 2020s. People don’t want to use lousy products. Developers expect tools and services to work well while solving their problems. DevRel is critical for building bridges with developers. It helps them understand why your fancy new tech is worth their consideration, and it helps you as a company understand what they need and what delights them.

Making Great Waves: 8 Software Testing Convictions

The Great Wave Off Kanagawa.

Katsushika Hokusai, 1830.

It is one of the most recognizable works of art in the world. It is so famous, it has an emoji: 🌊.

The Great Wave Off Kanagawa is a Japanese woodblock print. It is not a painting or a drawing but a print. In Japanese, the term for this type of art is ukiyo-e, which means “pictures of the floating world.” Ukiyo-e prints first appeared around the 1660s and did not decline in popularity until the Meiji Restoration two centuries later. While most artists focused on subjects of people, late masters like Hokusai captured perspectives of landscapes and nature. Here, in The Great Wave, we see a giant wave, full of energy and ferocity, crashing down onto three fast boats attempting to transport live fish to market. Its vibrant blue water and stark white peaks contrast against a yellowish-gray sky. In the distance is Mount Fuji, the highest mountain in Japan, yet it is dwarfed in perspective by the waves. In fact, the water spray from the waves appears to fall over Mount Fuji like snow. If you didn’t look closely, you might presume that Mount Fuji is just the crest of another wave.

The Great Wave is absolutely stunning. It is arguably Hokusai’s finest work. The colors and the lines reflect boldness. The claws of the wave impart vitality. The men on the boat show submission and possibly fear. The spray from the wave reveals delicacy and attention to detail. Personally, I love ukiyo-e prints like this. I travel the world to see them in person. The quality, creativity, and craftsmanship they exhibit inspire me to instill the highest quality possible into my own work.

As software quality professionals, there are several lessons we can learn from ukiyo-e masters like Hokusai. Testing is an art as much as it is engineering. We can take cues from these prolific artists in how we approach quality in our own work. In this article, I will share how we can make our own “Great Waves” using 8 software testing convictions inspired by ukiyo-e prints like The Great Wave. Let’s begin!

Conviction #1: Focus on behavior

Although we hold these Japanese woodblock prints today in high regard, they were seen as anything but fancy centuries ago in Japan. Ukiyo-e was “low” art for the common people, whereas paintings on silk scrolls were considered “high” art for the high classes.

Folks would buy these prints from local merchants for slightly more than the cost of a bowl of noodles – about $5 to $10 US dollars today – and they would use these prints to decorate their homes. By comparison, a print of The Great Wave sold at auction for $1.11 million in September 2020.

These prints weren’t very large, either. The Great Wave measures 10 inches tall by 15 inches wide, and most prints were of similar size. That made them convenient to buy at the market, carry them home, and display on the wall. To understand how the Japanese people treated these prints in their day, think about the decorations in your homes that you bought at stores like Home Goods and Target. You probably have some screen prints or posters on your walls.

Since the target consumer for ukiyo-e prints were ordinary people with working-class budgets, they needed to be affordable, popular, and recognizable. When Hokusai published The Great Wave, it wasn’t a standalone piece. It was the first print in a series named Thirty-six Views of Mount Fuji. Below are three other prints from that series. The central feature in each print is Mount Fuji, which would be instantly recognizable to any Japanese person. The various views would also be relatable.

*Fine Wind, Clear Morning* shows nice weather against the slopes of the mountain with a powerful contrast of colors.

*Thunderstorm Beneath the Summit* depicts Mount Fuji from a nearly identical profile, but with lightning striking the lower slopes of the mountain amidst a far darker palate.

*Kajikazawa in Kai Province* depicts two fisherman with Mount Fuji in the background.

The features of these prints made them valuable. Anyone could find a favorite print or two out of a series of 36. They made art accessible. They were inexpensive yet impressive. They were artsy yet accessible. Artists like Hokusai knew what people wanted, and they delivered the goods.

This isn’t any different from software development. Features add value for the users. For example, if you’re developing a banking app, folks better be able to log in securely and view their latest transactions. If those features are broken or unintuitive, folks might as well move their accounts to other banks! We, as the developers and testers, are like the ukiyo-e artists: we need to know what our customers need. We need to make products that they not only want, but they also enjoy.

Features add value. However, I would use a better word to describe this aspect of a product: behavior. Behavior is the way one acts or conducts oneself. In software, we define behaviors in terms of inputs and responses. For example, login is a behavior: you enter valid credentials, and you expect to gain access. You gave inputs, the app did something, and you got the result.

My conviction on software testing AND development is that if you focus on good software behaviors, then everything else falls into place. When you plan development work, you prioritize the most important behaviors. When you test the features, you cover the most important behaviors. When users get your new product, they gain value from those features, and hopefully you make that money, just like Hokusai did.

This is why I strongly believe in the value of Behavior-Driven Development, or BDD for short. As a set of pragmatic practices, BDD helps you and your team stay focused on the things that matter. BDD involves activities like Three Amigos collaboration, Example Mapping, and writing Gherkin. When you focus on behavior – not on shiny new tech, or story points, or some other distractions – you win big.

Conviction #2: Prioritize on risk

Ukiyo-e artists depicted more than just views of Mount Fuji. In fact, landscape scenes became popular only during the late period of woodblock printing – the 1830s to the 1860s. Before then, artists focused primarily on people: geisha, courtesans, sumo wrestlers, kabuki actors, and legendary figures. These were all characters from the “floating world,” a world of pleasure and hedonism apart from the dreary everyday life of feudal Japan.

Here is a renowned print of a kabuki actor by Sharaku, printed in 1794:

*Kabuki Actor Ōtani Oniji III as Yakko Edobei in the Play The Colored Reins of a Loving Wife*
Tōshūsai Sharaku, 1794

Sharaku was active only for one year, but he produced some of the most expressive portraits seen during ukiyo-e’s peak period. A yakko was a samurai’s henchman. In this portrait, we see Edobei ready for dirty deeds, with a stark grimace on his face and hands pulsing with anger.

Why would artists like Sharaku print faces like these? Because they would sell. Remember, ukiyo-e was not high-class art. It was a business. Artists would make a series of prints and sell them on the streets of Edo (now Tokyo). They needed to make prints that people wanted to buy. If they picked lousy or boring subjects, their prints wouldn’t sell. No soba noodles for them! So, what subjects did they choose? Celebrities. Actors. “Female beauties.” And some content that was not safe for work.

Artists prioritized their work based on business risk. They chose subjects that would be easy to sell. They pursued value. As testers, we should also prioritize test coverage based on risk.

I know there’s a popular slogan saying, “Test all the things!”, but that’s just impossible. It’s like saying, “Print all the pictures!” Modern apps are too complex to attempt any sort of “complete” or “100%” coverage. Instead, we should focus our testing efforts on the most important behaviors, the ones that would cause the most problems if they broke. Testing is ultimately a risk-mitigating activity. We do testing to de-risk problems that enter during development.

So, what does a risk-based testing strategy look like? Well, start by covering the most valuable behaviors. You can call them the MVBs. These are behaviors that are core to your app. If they break, then it’s game over. No soba noodles. For example, if you can’t log in, you’re done-zo. The MVBs should be tested before every release. They are non-negotiable test coverage. If your team doesn’t have enough resources to run these tests, then get more resources.

In addition to the MVBs, cover areas that were changed since the previous release. For example, if your banking app just added mobile deposits, then you should test mobile deposits. Things break where developers make changes. Also, look at testing different layers and aspects of the product. Not every test should be a web UI test. Add unit tests to pinpoint failures in the code. Add API tests to catch problems at the service layer. Consider aspects like security, accessibility, and visuals.

When planning these tests, try to keep them fast and atomic, covering individual behaviors instead of long workflows. Shorter tests are more reliable and give space for more coverage. And if you do have the resources for more coverage beyond the MVBs and areas of change, expand your coverage as resources permit. Keep adding coverage for the next most valuable behaviors until you either run out of time or the coverage isn’t worth the time.

Overall, ask yourself this when weighing risks: How painful would it be if a particular behavior failed? Would it ruin a user’s experience, or would they barely notice?

Conviction #3: Automate

The copy of The Great Wave shown at the top of this article is located at the Metropolitan Museum of Art in New York City. However, that’s not the only version. When ukiyo-e artists produced their prints, they kept printing copies until the woodblocks wore out! Remember, these weren’t precious paintings for the rich, they were posters for the commoners. One set of woodblocks could print thousands of impressions of popular designs for the masses. It’s estimated that there were five to eight thousand original impressions of The Great Wave, but nobody knows for sure. To this day, only a few hundred have survived. And much to my own frustration, museums that have copies do not put them on public display because the pieces are so fragile.

Here are different copies of The Great Wave from different museums:

The Great Wave Off Kanagawa — From The Metropolitan Museum of Art

Print production had to be efficient and smooth. Remember, this was a business. Publishers would make more money if they could print more impressions from the same set of woodblocks. They’d gain more renown if their prints maintained high quality throughout the lifetime of the blocks. And the faster they could get their prints to market, the sooner they could get paid and enjoy all the soba noodles.

What can we learn from this? Automate! That’s our third conviction.

What can we learn from this? Automate! Automation is a force multiplier. If Hokusai spent all his time manually laboring over one copy of The Great Wave, then we probably wouldn’t be talking about it today. But because woodblock printing was a whole process, he produced thousands of copies for everyone to enjoy. I wouldn’t call the woodblock printing process fully “automated” because it had several tedious steps with manual labor, but in Edo period Japan, it was about as automated as you could get.

Compare this to testing. If we run a test manually, we cover the target behavior one time. That’s it: lots of labor for one instance. However, if we automate that test, we can run it thousands of times. It can deliver value again and again. That’s the difference between a painting and a print.

So, how should we go about test automation? First, you should define your goals. What do you hope to achieve with automation? Do you want to speed up your testing cycles? Are you looking to widen your test coverage? Perhaps you want to empower Continuous Delivery through Continuous Testing? Carefully defining your goals from the start will help you make good decisions in your test automation strategy.

When you start automating tests, treat it like full software development. You aren’t just writing a bunch of scripts, you are developing a software system. Follow recommended practices. Use design patterns. Do code reviews. Fix bugs quickly. These principles apply whether you are using coded or codeless tools.

Another trap to avoid is delaying test automation. So many times, I’ve heard teams struggle to automate their tests because they schedule automation work as their lowest priority. They wish they could develop automation, but they just never have the time. Instead, they grind through testing their MVBs manually just to get the job done. My advice is flip that attitude right-side up. Automate first, not last. Instead of planning a few tests to automate if there’s time, plan to automate first and cover anything that couldn’t be automated with manual testing.

Furthermore, integrate automated tests into the team’s Continuous Integration system as soon as possible. Automated tests that aren’t running are dead to me. Get them running automatically in CI so they can deliver value. Running them nightly or even weekly can be a good start, as long as they run on a continuous cadence.

Finally, learn good practices. Test automation technologies are ever-evolving. It seems like new tools and frameworks hit the market all the time. If you’re new to automation or you want to catch up with the latest trends, then take time to learn. One of the best resources I can recommend is Test Automation University. TAU has about 70 courses on everything you can imagine, taught by the best instructors in the world, and it’s 100% FREE!

Now, you might be thinking, “Andy, come on, you know everything can’t be automated!” And that’s true. There are times when human intervention adds value. We see this in ukiyo-e prints, too. Here is Plum Garden at Kameido by Utagawa Hiroshige, Hokusai’s main rival. Notice the gradient colors of green and red in the background:

Plum Garden in Kameido — *Plum Garden at Kameido*
Utagawa Hiroshige, 1857

Printers added these gradients using a technique called bokashi, in which they would apply layers of ink to the woodblocks by hand. Sometimes, they would even paint layers directly on the prints. In these cases, the “automation” of the printing process was insufficient, and humans needed to manually intervene.

It’s always good to have humans test-drive software. Automation is great for functional verification, but it can’t validate user experience. Exploratory testing is an awesome complement to automated testing because it mitigates different risks.

Nevertheless, automation is able to do things it could never do before. As I said before, I work at Applitools, where we specialize in automated visual testing. Take a look at these two prints of Matsumoto Hoji’s Frog from Meika Gafu. Notice anything different between the two?

Two different versions of Matsumoto Hoji’s *Frog*.

If we use Visual AI to compare these two prints, it will quickly identify the main difference:

Applitools Visual AI identifying visual differences (highlighted in magenta) between two prints.

The signature block is in a different location! Small differences like small pixel offsets are ignored, while major differences are highlighted. If you apply this style of visual testing to your web and mobile apps, you could catch a ton of visual bugs before they cause problems for your users. Modern test automation can do some really cool tricks!

Conviction #4: Shift left and right

Mokuhanga, or woodblock printing, was a huge process with multiple steps. Artists like Hokusai and Hiroshige did not print their artwork themselves. In fact, printing required multiple roles to be successful: a publisher, an artist, a carver, and a printer.

The publisher essentially ran the process. They commissioned, financed, and distributed prints. They would even collaborate with artists on print design to keep them up with the latest trends.
The artist designed the patterns for the prints. They would sketch the patterns on washi paper and give instructions to the carver and printer on how to properly produce the prints.
The carver would chisel the artist’s pattern into a set of wooden printing blocks. Each layer of ink would have its own block. Carvers typically used a smooth, hard wood like cherry.
The printer used the artist’s patterns and carver’s woodblocks to actually make the prints. They would coat the blocks in appropriately-colored water-based inks and then press paper onto the blocks.

Quality had to be considered at every step in the process, not just at the end. If the artist was not clear about colors, then the printer might make a mistake. If the carver cut a groove too deep, then ink might not adhere to the paper as intended. If the printer misaligned a page during printing, then they’d need to throw it away – wasting time, supplies, and woodblock life – or risk tarnishing everyone’s reputation with a misprint. Hokusai was noted for his stringent quality standards for carvers and printers.

The words of W. Edwards Deming ring true:

Inspection does not improve the quality, nor guarantee quality. Inspection is too late. The quality, good or bad, is already in the product. As Harold F. Dodge said, “You cannot inspect quality into a product.”
W. Edwards deming

This is just like software development. We can substitute the word “testing” for “inspection” in Deming’s quote. Testers don’t exclusively “own” quality. Every role – business, development, and testing – has a responsibility for high-caliber work. If a product owner doesn’t understand what the customer needs, or a developer skips code reviews, or if a tester neglects an important feature, then software quality will suffer.

How do we engage the whole team in quality work? Shift left and right.

Most testers are probably familiar with the term shift left. It means, start doing testing work earlier in the development process. Don’t wait until developers are “done” and throw their code “over the fence” to be tested. Run tests continuously during development. Automate tests in-sprint. Adopt test-driven and behavior-driven practices. Require unit tests. Add test implementation to the “Definition of Done.”

But what about shift right? This is a newer phase, but not necessarily a newer practice. Shift right means, continue to monitor software quality during and after releases. Build observability into apps. Monitor apps for bugs, failures, and poor performance. Do canary deployments to see how systems respond to updates. Perform chaos testing to see how resilient environments are to outages. Issue different UIs to user groups as part of A/B testing to find out what’s most effective. And feed everything you learn back into development a la “shift left.”

The DevOps Infinity Loop
(Source: https://www.atlassian.com/devops)

The famous DevOps infinity loop shows how “shift left” and “shift right” are really all part of the same flow. If you start in the middle where the paths cross, you can see arrows pointing leftward for feedback, planning, and building. Then, they push rightward with continuous integration, deployment, monitoring, and operations. We can (and should) take all the quality measures we said before as we spin through this loop perpetually. When we plan, we should build quality in with good design and feedback from the field. When we develop, we should do testing together with coding. As we deploy, automated safety checks should give thumbs-up or thumbs-down. Post-deployment, we continue to watch, learn, and adjust.

Conviction #5: Give fast feedback

The acronym CI/CD is ubiquitous in our industry, but I feel like it’s missing something important: “CT”, or Continuous Testing. CI and CD are great for pushing code fast, but without testing, they could be pushing garbage. Testing does not improve quality directly, but continuous revelation of quality helps teams find and resolve issues fast. It demands response. Continuous Testing keeps the DevOps infinity loop safe.

Fast feedback is critical. The sooner and faster teams discover problems, the less pain those problems will cause. Think about it: if a developer is notified that their code change caused a failure within a minute, they can immediately flip back to their code, which is probably still open in an editor. If they find out within an hour, they’ll still have their code fresh in their mind. Within a day, it’ll still be familiar. A week or more later? Fuggedaboutit! Heaven forbid the problem goes undetected until a customer hits it.

Continuous testing enables fast feedback. Automation enables continuous testing. Test automation that isn’t running continuously is worthless because it provides no feedback.

Japanese woodblock printers also relied on fast feedback. If they noticed anything wrong with the prints as they pressed them, they could scrap the misprint and move on. However, since they were meticulous about quality, misprints were rare. Nevertheless, each print was unique because each impression was done manually. The amount, placement, and hue of ink could vary slightly from print to print. Over time, the woodblocks themselves wore down, too.

Here, you can see differences in the title cartouche between different prints of The Great Wave:

Differences in the title cartouche between two prints of *The Great Wave*.
(Source: https://blog.britishmuseum.org/the-great-wave-spot-the-difference/)

On the left, the outline around the title is solid, whereas on the right, the outline has breaks. This is because the keyblock had very fine ridges for printing outlines, which suffered the most from wear and tear during repeated impressions. Furthermore, if you look very closely, you can see that the Japanese characters appear bolder on the right than the left. The printer must have used more ink or pressed the title harder for the impression on the right.

Printers would need to spot these issues quickly so they could either correct their action for future prints or warn the publisher that the woodblocks were wearing down. If the print was popular, the publisher could commission a carver to carve new woodblocks to keep production going.

Conviction #6: Go lean

As I’ve said many times now, woodblock printing was a business. Ukiyo-e was commercial art, and competition was fierce. By the 1840s, production peaked with about 250 different publishers. Artists like Hokusai and Hiroshige were rivals. While today we recognize famous prints like The Great Wave, countless other prints were also made.

Publishers competed in a rat race for the best talent and the best prints. They had to be savvy. They had to build good reputations. They needed to respond to market demands for subject material. For example, Kitagawa Utamaro was famous for prints of “female beauties.”

*Two Beauties with Bamboo*
Kitagawa Utamaro, 1795

Ukiyo-e artists also took inspiration from each other. If one artist made a popular design, then other artists would copy their style. Here is a print from Hiroshige’s series, Thirty-Six Views of Mount Fuji. That’s right, Hokusai’s biggest rival made his own series of 36 prints about Mount Fuji, and he also made his own version of The Great Wave. If you can’t beat ‘em, join ‘em!

*The Sea off Satta in Suruga Province*
Utagawa Hiroshige, 1858

Publishers also had to innovate. Oftentimes, after a print had been in production for a while, they would instruct the printer to change the color scheme. Here are two versions of Hokusai’s Kajikazawa in Kai Province, from Thirty-six Views of Mount Fuji:

Kajikazawa in Kai Province — An early impression

The print on the left is an early impression. The only colors used were shades of blue. This was Hokusai’s original artistic intention. However, later prints, like the one on the right, added different colors to the palette. The fishermen now wear red coats. The land has a bokashi green-yellow gradient. The sky incorporates orange tones to contrast the blue. Publishers changed up the colors to squeeze more money out of existing designs without needing to pay artists for new work or carvers for new woodblocks.

However, sometimes when doing this, artistic quality was lost. Compare the fine detail in the land between these two prints. In the early impression, you can see dark blue shading used to pronounce the shadows on the side of the rocks, giving them height and depth, and making the fisherman appear high above the water. However, in the later impression, the green strip of land has almost no shading, making it appear flat and less prominent.

Ukiyo-e publishers would have completely agreed with today’s lean business model. Seek first and foremost to deliver value to your customers. Learn what they want. Try some designs, and if they fail, pivot to something else. When you find what works, get a full end-to-end process in place, and then continuously improve as you go. Respond quickly to changes.

Going lean is very important for software testing, too. Testing is engineering, and it has serious business value. At the same time, testing activities never seem to have as many resources as they should. Testers must be scrappy to deliver valuable quality feedback using the resources they have.

When I think about software testing going lean, I’m not implying that testers should skip tests or skimp on coverage. Rather, I’m saying that world-class systems and processes cannot be built overnight. The most important thing a team can do is build basic end-to-end feedback loops from the start, especially for test automation.

So many times, I’ve seen teams skew their test automation strategy entirely towards implementation. They spend weeks and weeks developing suites of automated tests before they set up any form of Continuous Testing. Instead of triggering tests as part of Continuous Integration, folks must manually push buttons or run commands to make them start. Other folks on the team see results sporadically, if ever. When testers open bug reports, developers might feel surprised.

I recommend teams set up Continuous Testing with feedback loops from the start. As soon as you automate your first test, move onto running it from CI and sending you notifications for results before automating your second test. Close the feedback loop. Start delivering results immediately. As you find hotspots, add more coverage. Talk with developers about the kinds of results they find most valuable. Then, grow your suite once you demonstrate its value. Increase the throughput. Turn those sidewalks into highways. Continue to iteratively improve upon the system as you go. Don’t waste time on tests that don’t matter or dashboards that nobody reads. Going lean means allocating your resources to the most valuable activities. What you’ll find is that success will snowball!

Conviction #7: Open up

Once you have a good thing going, whether it’s woodblock printing or software testing, how can you take it to the next level? Open up! Innovation stalls when you end up staring at your own belly button for too long. Outside influences inspire new creativity.

Ukiyo-e prints had a profound impact on Western art. After Japan opened up to the rest of the world in the mid-1800s, Europeans became fascinated by Japanese art, and European artists began incorporating Japanese styles and subjects into their work. This phenomenon became known as Japonisme. Here, Claude Monet, famous for his impressionist paintings, painted a picture of his wife wearing a kimono with fans adorning the wall behind her:

Vincent van Gogh in particular loved Japanese woodblock prints. He painted his own versions of different prints. Here, we see Hiroshige’s Plum Garden at Kameido side-by-side with Van Gogh’s Flowering Plum Orchard (after Hiroshige):

Plum Garden at Kameido — Hiroshige’s original print

Flowering Plum Orchard (after Hiroshige) — Hiroshige’s original print

Van Gogh was drawn to the bold lines and vibrant colors of ukiyo-e prints. There is even speculation that The Great Wave inspired the design of The Starry Night, arguably Van Gogh’s most famous painting:

The Starry Night — Hokusai’s *The Great Wave Off Kanagawa*

Notice how the shapes of the waves mirror the shapes of the swirls in the sky. Notice also how deep shades of blue contrast yellows in each. Ukiyo-e prints served as great inspiration for what became known as Modern art in the West.

Influence was also bidirectional. Not only did Japan influence the West, but the West influenced Japan! One thing common to all of the prints in Thirty-six Views of Mount Fuji is the extensive use of blue ink. Prussian blue pigment had recently come to Japan from Europe, and Hokusai’s publisher wanted to make extensive use of the new color to make the prints stand out. Indeed, they did. To this day, Hokusai is renowned for popularizing the deep shades of Prussian blue in ukiyo-e prints.

It’s important in any line of work to be open to new ideas. If Hokusai had not been willing to experiment with new pigments, then we wouldn’t have pieces like The Great Wave.

That’s why I’m a huge proponent of Open Testing. What if we open our tests like we open our source? There are so many great advantages to open source software: helping folks learn, helping folks develop better software, and helping folks become better maintainers. If we become more open in our testing, we can improve the quality of our testing work, and thus also the quality of the software products we are building. Open testing involves many things: building open source test frameworks, getting developers involved in testing, and even publicly sharing test cases and results.

Conviction #8: Show empathy

In this article, we’ve seen lots of great artwork, and we’ve learned lots of valuable lessons from it. I think ukiyo-e prints remain popular today because their subject matter focuses on the beauty of the world. Artists strived to make pieces of the “floating world” tangible for the common people.

Ukiyo-e prints revealed the supple humanity of the Japanese people, like in this print by Utagawa Kunisada:

*Twilight Snowfall at Ueno*
Utagawa Kunisada, 1850

They revealed the serene beauty of nature in harmony with civilization, like in these prints from Hiroshige’s One Hundred Famous Views of Edo:

Prints from *One Hundred Famous Views of Edo*
Utagawa Hiroshige, 1856-1858

Ukiyo-e prints also revealed ordinary people living out their lives, like this print from Hokusai’s Thirty-six Views of Mount Fuji:

*Fuji View Field in Owari Province*
Katsushika Hokusai, 1830

Art is compelling. And software, like art, is meant for people. Show empathy. Care about your customers. Remember, as a tester, you are advocating for your users. Try to help solve their problems. Do things that matter for them. Build things that actually bring them value. Be thoughtful, mindful, and humble. Don’t be a jerk.

The Golden Conviction

These eight convictions are things I’ve learned the hard way throughout my career:

Focus on behavior
Prioritize on risk
Automate
Shift left and right
Give fast feedback
Go lean
Open up
Show empathy

I live and breathe these convictions every day. Whether you are making woodblock prints or running test cases, these principles can help you do your best work.

If I could sum up these eight convictions in one line, it would be this: Be excellent in all things. If you test software, then you are both an artist and an engineer. You have a craft. Do it with excellence.

7 Major Trends in Front End Web Testing

This article is based on my opening keynote address for Front End Test Fest 2022.

In the featured image for this article, you see a beautiful front end. It’s probably not the kind of “front end” you expected. It’s the front end of a 1974 Volkswagen Karmann Ghia. The Karmann Ghia was known as the “poor man’s Porsche.” It’s a very special car. It was actually a collaboration project between Wilhelm Karmann, a German automobile manufacturer, and Carrozzeria Ghia, an Italian automobile designer. Ghia designed the body as a work of art, and Karmann put it on the tried-and-true platform of the classic Volkswagen Beetle. When the Volkswagen executives saw it, they couldn’t say no to mass production.

The Karmann Ghia is a perfect symbol of the state of web development today. We strive to make beautiful front ends with reliable platforms supporting them on the back end. Collaboration from both sides is key to success, but what people remember most is the experience they have with your apps. My mom drove a Karmann Ghia like this when she was a teenager, and to this day she still talks about the good times she had with it.

Good quality, design, and experience are indispensable aspects of front ends – whether for classic cars or for the Web. In this article, I’ll share seven major trends I see in front end web testing. While there’s a lot of cool new things happening, I want y’all to keep in mind one main thing: tools and technologies may change, but the fundamentals of testing remain the same. Testing is interaction plus verification. Tests reveal the truth about our code and our features. We do testing as part of development to gather fast feedback for fixes and improvements. All the trends I will share today are rooted in these principles. With good testing, you can make sure your apps will look visually perfect, just like… you know.

#1. End-to-end testing

Here’s our first trend: End-to-end testing has become a three-way battle. For clarity, when I say “end-to-end” testing, I mean black-box test automation that interacts with a live web app in an active browser.

Historically, Selenium has been the most popular tool for browser automation. The project has been around for over a decade, and the WebDriver protocol is a W3C standard. It is open source, open standards, and open governance. Selenium WebDriver has bindings for C#, Java, JavaScript, Ruby, PHP, and Python. The project also includes Selenium IDE, a record-and-playback tool, and Selenium Grid, a scalable cluster for cross-browser testing. Selenium is alive and well, having just released version 4.

Over the years, though, Selenium has received a lot of criticism. Selenium WebDriver is a low-level protocol. It does not handle waiting automatically, leading many folks to unknowingly write flaky scripts. It requires clunky setup since WebDriver executables must be separately installed. Many developers dislike Selenium because coding with it requires a separate workflow or state of mind from the main apps they are developing.

Cypress was the answer to Selenium’s shortcomings. It aimed to be a modern framework with excellent developer experience, and in a few short years, it quickly became the darling test tool for front end developers. Cypress tests run in the browser side-by-side with the app under test. The syntax is super concise. There’s automatic waiting, meaning less flakiness. There’s visual tracing. There’s API calls. It’s nice. And it took a big chomp out of Selenium’s market share.

Cypress isn’t perfect, though. Its browser support is limited to Chromium-based browsers and Firefox. Cypress is also JavaScript-only, which excludes several communities. While Cypress is open source, it does not follow open standards or open governance like Selenium. And, sadly, Cypress’ performance is slow – equivalent tests run slower than Selenium.

Enter Playwright, the new open source test framework from Microsoft. Playwright is the spiritual successor to Puppeteer. It boasts the wide browser and language compatibility of Selenium with the refined developer experience of Cypress. It even has a code generator to help write tests. Plus, Playwright is fast – multiple times faster than Selenium or Cypress.

Playwright is still a newcomer, and it doesn’t yet have the footprint of the other tools. Some folks might be cautious that it uses browser projects instead of stock browsers. Nevertheless, it’s growing fast, and it could be a major contender for the #1 title. In Applitools’ recent Let The Code Speak code battles, Playwright handily beat out both Selenium and Cypress.

A side-by-side comparison of Selenium, Cypress, and Playwright

Selenium, Cypress, and Playwright are definitely now the “big three” browser automation tools for testing. A respectable fourth mention would be WebdriverIO. WebdriverIO is a JavaScript-based tool that can use WebDriver or debug protocols. It has a very large user base, but it is JavaScript-only, and it is not as big as Cypress. There are other tools, too. Puppeteer is still very popular but used more for web crawling than testing. Protractor, once developed by the Angular team, is now deprecated.

All these are good tools to choose (except Protractor). They can handle any kind of web app that you’re building. If you want to learn more about them, Test Automation University has courses for each.

#2. Component testing

End-to-end testing isn’t the only type of testing a team can or should do. Component testing is on the rise because components are on the rise! Many teams now build shareable component libraries to enforce consistency in their web design and to avoid code duplication. Each component is like a “unit of user interface.” Not only do they make development easier, they also make testing easier.

Component testing is distinct from unit testing. A unit test interacts directly with code. It calls a function or method and verifies its outcomes. Since components are inherently visual, they need to be rendered in the browser for proper testing. They might have multiple behaviors, or they may even trigger API calls. However, they can be tested in isolation of other components, so individually, they don’t need full end-to-end tests. That’s why, from a front end perspective, component testing is the new integration testing.

Storybook is a very popular tool for building and testing components in isolation. In Storybook, each component has a set of stories that denote how that component looks and behaves. While developing components, you can render them in the Storybook viewer. You can then manually test the component by interacting with them or changing their settings. Applitools also provides an SDK for automatically running visual tests against a Storybook library.

Cypress is also entering the component testing game. On June 1, 2022, Cypress released version 10, which included component testing support. This is a huge step forward. Before, folks would need to cobble together their own component test framework, usually as an extension of a unit test project or an end-to-end test project. Many solutions just ran automated component tests purely as Node.js processes without any browser component. Now, Cypress makes it natural to exercise component behaviors individually yet visually.

I love this quote from Cypress about their approach to component testing:

When testing anything for the web, we believe that tests should view and interact with the application in the same way that an actual user does. Anything less, and it’s hard to have confidence that your application is doing what it is supposed to.
https://www.cypress.io/blog/2022/06/01/cypress-10-release/

This quote hits on something big. So many automated tests fail to interact with apps like real users. They hinge on things like IDs, CSS selectors, and XPaths. They make minimal checks like appearance of certain elements or text. Pages could be completely broken, but automated tests could still pass.

#3. Visual testing

We really want the best of both worlds: the simplicity and sensibility of manual testing with the speed and scalability of automated testing. Historically, this has been a painful tradeoff. Most teams struggle to decide what to automate, what to check manually, and what to skip. I think there is tremendous opportunity in bridging the gap. Modern tools should help us automate human-like sensibilities into our tests, not merely fire events on a page.

That’s why visual testing has become indispensable for front end testing. Web apps are visual encounters. Visuals are the DNA of user experience. Functionality alone is insufficient. Users expect to be wowed. As app creators, we need to make sure those vital visuals are tested. Heaven forbid a button goes missing or our CSS goes sideways. And since we live in a world of continuous development and delivery, we need those visual checkpoints happening continuously at scale. Real human eyes are just too slow.

For example, I could have a login page that has an original version (left) and a changed version (right):

Visual comparison between versions of a login page

Visual testing tools alert you to meaningful changes and make it easy to compare them side-by-side. They catch things you might miss. Plus, they run just like any other automated test suite. Visual testing was tough in the past because tools merely did pixel-to-pixel comparisons, which generated lots of noise for small changes and environmental differences. Now, with a tool like Applitools Visual AI, visual comparisons accurately pinpoint the changes that matter.

Test automation needs to check visuals these days. Traditional scripts interact with only the basic bones of the page. You could break the layout and remove all styling like this, and there’s a good chance a traditional automated test would still pass:

The same login page from before, but without any CSS styling

With visual testing techniques, you can also rethink how you approach cross-browser and cross-device testing. Instead of rerunning full tests against every browser configuration you need, you can run them once and then simply re-render the visual snapshots they capture against different browsers to verify the visuals. You can do this even for browsers that the test framework doesn’t natively support! For example, using a platform like Applitools Ultrafast Test Cloud, you could run Cypress tests against Electron in CI and then perform visual checks in the Cloud against Safari and Internet Explorer, among other browsers. This style of cross-platform testing is faster, more reliable, and less expensive than traditional ways.

#4. Performance testing

Functionality isn’t the only aspect of quality that matters. Performance can make or break user experience. Most people expect any given page to load in a second or two. Back in 2016, Google discovered that half of all people leave a site if it takes longer than 3 seconds to load. As an industry, we’ve put in so much work to make the front end faster. Modern techniques like server-side rendering, hydration, and bloat reduction all aim to improve response times. It’s important to test the performance of our pages to make sure the user experience is tight.

Thankfully, performance testing is easier than ever before. There’s no excuse for not testing performance when it is so vital to success. There are many great ways to get started.

The simplest approach is right in your browser. You can profile any site with Chrome DevTools. Just right click the page, select “Inspect,” and switch to the Performance tab. Then start the profiler and start interacting with the page. Chrome DevTools will capture full metrics as a visual time series so you can explore exactly what happens as you interact with the page. You can also flip over to the Network tab to look for any API calls that take too long. If you want to learn more about this type of performance analysis, Test Automation University offers a course entitled Tools and Techniques for Performance and Load Testing by Amber Race. Amber shows how to get the most value out of that Performance tab.

Another nifty tool that’s also available in Chrome DevTools is Google Lighthouse. Lighthouse is a website auditor. It scores how well your site performs for performance, accessibility, progressive web apps, SEO, and more. It will also provide recommendations for how to improve your scores right within its reports. You can run Lighthouse from the command line or as a Node module instead of from Chrome DevTools as well.

Using Chrome DevTools manually for one-off checks or exploratory testing is helpful, but regular testing needs automation. One really cool way to automate performance checks is using Playwright, the end-to-end test framework I mentioned earlier. In Playwright, you can create a Chrome DevTools Protocol session and gather all the metrics you want. You can do other cool things with profiling and interception. It’s like a backdoor into the browser. Best of all, you could gather these metrics together with functional testing! One framework can meet the needs of both functional and performance test automation.

John Hill is a trailblazer in this space. He’s currently doing this as part of the Open MCT project. He’s the one who showed me how to automate performance tests with Playwright! If you want to learn more, check out this talk he gave recently on performance testing with Playwright, as well as his js-perf-toolkit project on GitHub.

Below is an example snippet I copied from js-perf-toolkit showing how to gather performance metrics using Playwright:

const client = await page.context().newCDPSession(page);
await client.send('Performance.enable'); 

await page.goto('https://www.google.com/');
await page.click('[aria-label="Search"]');
await page.fill('[aria-label="Search"]', 'playwright');

await Promise.all([
    page.waitForNavigation(),
    page.press('[aria-label="Search"]', 'Enter')
]);

let perfMetrics = await client.send('Performance.getMetrics');
console.log( perfMetrics.metrics );

#5. Machine learning models

There’s another curve ball when testing websites: what about machine learning models? For example, whenever you shop at an online store, the bottom of almost every product page has a list of recommendations for similar or complementary products. For example, when I searched Amazon for the latest Pokémon video game, Amazon recommended other games and toys:

Recommendation systems like this might be hard-coded for small stores, but large retailers like Amazon and Walmart use machine learning models to back up their recommendations. Models like this are notoriously difficult to test. How do we know if a recommendation is “good” or “bad”? How do I know if folks who like Pokémon would be enticed to buy a Kirby game or a Zelda game? Lousy recommendations are a lost business opportunity. Other models could have more serious consequences, like introducing harmful biases that affect users.

Machine learning models need separate approaches to testing. It might be tempting to skip data validation because it’s harder than basic functional testing, but that’s a risk not worth taking. To do testing right, separate the functional correctness of the frontend from the validity of data given to it. For example, we could provide mocked data for product recommendations so that tests would have consistent outcomes for verifying visuals. Then, we could test the recommendation system apart from the UI to make sure its answers seem correct. Separating these testing concerns makes each type of test more helpful in figuring out bugs. It also makes machine learning models faster to test, since testers or scripts don’t need to navigate a UI just to exercise them.

If you want to learn more about testing machine learning courses, Carlos Kidman created an excellent course all about it on Test Automation University named Intro to Testing Machine Learning Models. In his course, Carlos shows how to test models for adversarial attacks, behavioral aspects, and unfair biases.

#6. JavaScript

Now, the next trend I see will probably be controversial to many of you out there: JavaScript isn’t everything. Historically, JavaScript has been the only language for front end web development. As a result, a JavaScript monoculture has developed around the front end ecosystem. There’s nothing inherently wrong with that, but I see that changing in the coming years – and I don’t mean TypeScript.

In recent years, frustrations with single-page applications (SPAs) and client-heavy front ends have spurred a server-side renaissance. In addition to JavaScript frameworks that support SSR, classic server-side projects like Django, Rails, and Laravel are alive and kicking. Folks in those communities do JavaScript when they must, but they love exploring alternatives. For example, HTMX is a framework that provides hypertext directives for many dynamic actions that would otherwise be coded directly in JavaScript. I could use any of those classic web frameworks with HTMX and almost completely avoid JavaScript code. That makes it easier for programmers to make cool things happen on the front end without needing to navigate a foreign ecosystem.

Below is an example snippet of HTML code with HTMX attributes for posting a click and showing the response:

  <script src="https://unpkg.com/htmx.org@1.7.0"></script>
  <!-- have a button POST a click via AJAX -->
  <button hx-post="/clicked" hx-swap="outerHTML">
    Click Me
  </button>

WebAssembly, or “Wasm” is also here. WebAssembly is essentially an assembly language for browsers. Code written in higher-level languages can be compiled down into WebAssembly code and run on the browser. All major browsers now support WebAssembly to some degree. That means JavaScript no longer holds a monopoly on the browser.

I don’t know if any language will ever dethrone JavaScript in the browser, but I predict that browsers will become multilingual platforms through WebAssembly in the coming years. For example, at PyCon 2022, Anaconda announced PyScript, a framework for running Python code in the browser. Blazor enables C# code to run in-browser. Emscripten compiles C/C++ programs to WebAssembly. Other languages like Ruby and Rust also have WebAssembly support.

Regardless of what happens inside the browser, black-box testing tools and frameworks outside the browser can use any language. Tools like Playwright and Selenium support languages other than JavaScript. That brings many more people to the table. Testers shouldn’t be forced to learn JavaScript just to automate some tests when they already know another language. This is happening today, and I don’t expect it to change.

#7. Autonomous testing

Finally, there is one more trend I want to share, and this one is more about the future than the present: autonomous testing is coming. Ironically, today’s automated testing is still manually-intensive. Someone needs to figure out features, write down the test steps, develop the scripts, and maintain them when they inevitably break. Visual testing makes verification autonomous because assertions don’t need explicit code, but figuring out the right interactions to exercise features is still a hard problem.

I think the next big advancement for testing and automation will be autonomous testing: tools that autonomously look at an app, figure out what tests should be run, and then run those tests automatically. The key to making this work will be machine learning algorithms that can learn the context of the apps they target for testing. Human testers will need to work together with these tools to make them truly effective. For example, one type of tool could be a test recommendation engine that proposes tests for an app, and the human tester could pick the ones to run.

Autonomous testing will greatly simplify testing. It will make developers and testers far more productive. As an industry, we aren’t there yet, but it’s coming, and I think it’s coming soon. I delivered a keynote address on this topic at Future of Testing: Frameworks 2022:

Conclusion

There’s lots of exciting stuff happening in the world of the front end. As I said before, tools and technologies may change, but fundamentals remain the same. Each of these trends is rooted in tried-and-true principles of testing. They remind us that software quality is a multifaceted challenge, and the best strategy is the one that provides the most value for your project.

So, what do you think? Did I hit all the major front end trends? Did I miss anything? Let me know in the comments!

Modernizing Software Quality Assurance with Visual Testing

This article introduces visual testing as a technique that can revolutionize software quality assurance (QA) practices. It is based on a talk I delivered on June 9, 2022 at AITP-RTP, and its target audience includes IT professionals and leaders who may not be hands-on with testing, coding, or automation.

Visual testing techniques are an incredible way to maximize the value of your functional tests. Instead of checking traditional things like text or attributes, visual testing captures full snapshots of your application’s pages and looks for visual differences over time. This isn’t just another nice-to-have feature that’s on the bleeding edge of technology. It’s a tried-and-true technique that anyone can use, and it makes testing easier!

In this article, I want to “open your eyes” to see how visual testing can revolutionize how you approach software quality. I want you to see things in new ways, and I’ll cover five key advantages of visual testing. I’ll use Applitools as the visual testing tool for demonstration. And don’t worry, everything will be high-level – I’ll be light on the code.

What is software testing?

We all know that there are several different kinds of testing. Here’s a short list:

Unit
Integration
End-to-End
Web UI
REST API
Mobile
Load testing
Performance testing
Property-based testing
Behavior-driven
Data-driven

You name it, there’s a test for it. We could play buzzword bingo if we wanted. But what is “testing”? In simplest terms, testing = interaction + verification. That’s it! You do something, and you make sure it works. Every kind of testing reduces to this formula.

We’ve been testing software since the dawn of computers. The “first computer bug” happened on September 9, 1947, when a moth flew into one of the relays of the Mark II computer at Harvard University. What you’re seeing here is Grace Hopper’s bug report, with the dead moth taped onto the notebook page.

The first computer bug, discovered by Grace Hopper in 1947.
Source: https://education.nationalgeographic.org/resource/worlds-first-computer-bug

Traditional testing practices

Historically, all testing was done manually. Whether it was Grace Hopper pulling a dead moth out of computer relays with tweezers or someone banging on a keyboard to navigate through a desktop app, humans have driven testing. Manual testing was practically the only way to do testing for decades. As applications became more user-centric with the rise of PCs in the 1980s, testing became a much more approachable discipline. Folks didn’t need to hold computer science degrees or to be software engineers to be successful – they just needed common sense and grit. Companies built entire organizations for testers. Releases wouldn’t ship until QA gave them seals of approval. Test repositories could have hundreds, even thousands, of test procedures.

Unfortunately, manual testing does not scale very well. It’s a slow process. If you want to test an app, you need to set everything up, log in, and exercise all the different features. Any time you discover a problem, you need to stop, investigate, and write a report. Every time there’s a new development build, you need to do it all over again. The only way to scale is to hire more testers. Even with more people, testing cycles could take days, weeks, or even months. When I worked at NetApp, the main functional testing phase for a major release took over half a year to complete.

Manual testing is a great way to test software features because it is simple and sensible, but it doesn’t scale well.

The rise of automation

Then, automation came. It started becoming popular with unit testing for functions and methods directly in the code itself in the late 1990s, but then black box automation tools and frameworks started becoming popular in the mid 2000s. Instead of manually performing test cases step by step, testers would write scripts to automatically execute test steps.

Tools like Selenium made it possible to automate browser interactions for testing web apps. Folks could code Selenium calls using the programming language of their choice: Java, JavaScript, C#, Python, Ruby, or PHP. Later, frameworks like Cypress and Playwright refined the experience that Selenium started. Other tools like SoapUI and (later) Postman made it easy to peel back frontend layers and test APIs directly. Appium made it possible to automate tests for mobile apps. So many solutions hit the market. The ones here are only a few. (Please don’t hate me if I didn’t mention your favorite tool here!) Many were free and open source, while others were licensed software products.

Automation offered several benefits over manual testing. With automation, you could run tests more quickly. Scripts don’t need to wait for humans to react to pages or write down results. You could also run tests more frequently. Teams started running tests continuously – nightly at first, and then after every code change. These benefits enabled teams to widen their test coverage and provide faster feedback. Testing work that would take a full team days to complete could be finished in a matter of hours, if not minutes. Test results would be posted in real time instead of at the end of testing cycles. Instead of endlessly executing tests manually, testers gained time back to work on other things, like automating even more tests or doing exploratory testing activities.

Challenges with automation

Unfortunately, it wasn’t all rainbows and unicorns. Test automation was hard to develop. Since it was inherently more complex than manual testing, it required more skills. Testers needed to learn how to use tools like Selenium or Postman. On top of that, they needed to learn how to do programming. If they wanted to use codeless tools instead, then their companies probably had to shell out a pretty penny for licenses. Regardless of the tools chosen, automated scripts could never be made perfect. They are inherently fragile because they depend directly upon the features under test. For example, if a button on a web page changes, then the script will crash. Automated tests also gained a reputation for being flaky when testers didn’t appropriately handle waiting for things on the page to load. Furthermore, automation was only suitable for checking low-level things like text and numbers. That’s fine for unit tests and API tests, but it’s not suitable for user interfaces that are inherently visual. Passing tests could miss a lot of problems, giving a false sense of security.

When considering all these challenges together, we discovered as an industry that test automation isn’t fully autonomous. Despite dreaming of testing-made-easy, automation just made things harder. Teams who could build good test automation projects reaped handsome returns, but for many, the bar was too high. It was out of reach. Many tried and failed. Trust me, I’ve talked with lots of folks who struggle with test automation.

What we really want is the best of both worlds. We want the simplicity and sensibility of manual testing, but with the speed and scalability of automated testing. To get both, most teams use a split testing strategy. They automate some tests while running others manually. Actually, I’ve commonly seen teams run all their tests manually and then automate whatever they can with the time they have left. Some teams are more forward with their automation work, but not all. Folks perpetually make tradeoffs.

But, what if there was a way to get the simplicity and sensibility of manual testing with automation? What if automation could visually inspect our applications for differences like a human could?

Walking through an example

Consider a basic web application with a standard login page:

When we look at this from top to bottom, we see:

A logo
A page title
A username field
A password field
A sign-in button
A remember-me checkbox
Links to social media

However, during the course of development, we know things change – for better or worse. Here’s a different version of the same page:

Can you spot the differences? Looking at these two pages side-by-side makes comparison easier:

The logos are different, and the sign-in buttons are different. While I’d probably ask the developers about the sign-in button change, I’d categorically consider that logo change a bug. My gut tells me a human tester would catch these differences if they were paying attention, but there’s a chance they could miss them. Traditional automation would most likely fly right by these changes without stopping.

In fact, pages can be radically broken visually yet still have passing automated tests. In this version, I stripped all the CSS off the page:

We would definitely call this page broken. A traditional functional test script hinges on the most basic functionality of web pages, like IDs and element attributes. If it clicks, it works! It completely misses visuals. I even wrote a short test script with basic assertions, and sure enough, it passed on all three versions of this login page. Those are huge test gaps.

The magic of visual testing

So, what if we could visually inspect this page with automation? That would easily catch any changes that human eyes would detect, but with speed and scale. We could take a baseline snapshot that we consider “good,” and every time we run our tests, we take a new “checkpoint” snapshot. Then, we can compare the two side-by-side to detect any changes. This is what we call visual testing: take a baseline snapshot to start, take a checkpoint snapshot after every change, and look for any visual differences programmatically. If a picture is worth a thousand words, then a snapshot is worth a thousand assertions.

Visual testing: identifying differences between baseline snapshots to checkpoint snapshots.

One visual snapshot captures everything on the page. As a tester, you don’t need to explicitly state what to check: a snapshot implicitly covers layout, color, size, shape, and styling. That’s a huge advantage over traditional functional test automation.

Visual Testing Advantage #1:
Visual testing covers everything on a page.

Unfortunately, not all visual testing techniques are created equal. Programming a tool to capture snapshots and perform pixel-by-pixel comparisons isn’t too difficult, but determining if those changes matter is very difficult. A good visual testing tool should ignore changes that don’t matter – like small padding differences – and focus on changes that do matter – like missing elements. Otherwise, human testers will need to review every single result, nullifying any benefit of automating visual tests.

Take a look at these two pictures. They show a cute underwater scene. There are a total of ten differences between the two pictures. Can you find them?

Unfortunately, a pixel-to-pixel comparison won’t find any of them. I ran these two pictures through Applitools Eyes using an exact pixel-to-pixel comparison, and this is what happened:

Except for the whitespace on the sides, every pixel was different. As humans, we can clearly see that these images are very similar, but because they were a few pixels off on the sides, automation failed to pinpoint meaningful differences.

This is where AI really helps. Applitools uses Visual AI to detect meaningful changes that humans would see and ignore inconsequential differences that just make noise. Here, I used Applitools’ “strict” comparison, which pinpointed each of the ten differences:

That’s the second advantage of good automated visual testing: Visual AI focuses on meaningful changes to avoid noise. Visual test results shouldn’t waste testers’ time over small pixel shifts or things a human wouldn’t even notice. They should highlight what matters, like missing elements, different colors, or skewed layouts. Visual AI is a differentiator for visual testing tools. Not all tools rise above pixel-to-pixel comparisons.

Visual Testing Advantage #2:
Visual AI focuses on meaningful changes to avoid noise.

Simplifying test cases

Now, there are two main ways to automate tests. One path is to use coded tools. Tools like Selenium WebDriver are “coded” tools because they require testers to call them directly from programming code. Selenium WebDriver has bindings in Java, JavaScript, C#, Python, or Ruby, so testers can pick the language of their choice. Nevertheless, testers must essentially be developers to use coded tools.

The second path to automation is using codeless tools. Codeless tools don’t require testers to have programming skills. Instead, they record testers as they exercise features under test, and then they can replay those recorded tests at the push of a button. Most codeless tools also have some sort of visual builder through which testers can tweak and update their tests. There are several codeless tools available on the market, and many of them require paid licenses. However, Selenium IDE is a free and open source tool that does the job quite nicely.

Coded and codeless tools serve different needs. Coded tools are great for folks like me who know how to code and want high-power, customizable automation. Codeless tools are great for teams that are just getting started with automation, especially when most of their testing has historically been done manually. Regardless of approach, the good news is that you can do visual testing either way! For example, if you use Applitools, then there are SDKs and integrations for many different tools and frameworks.

As we recall, testing is interaction plus verification. When automating tests, the interactions and the verifications are scripted using either a coded or codeless tool. Testers must specify each of those operations. For example, if a test is exercising login behavior on this login page:

Then the interactions would be:

Loading the page
Entering username
Entering password
Clicking the login button
Waiting for the main page to load

And then, the verifications would be checking that the main page loads correctly:

As we can see, this main page has lots of stuff on it. We could check several things:

The title bar at the top
The side bar with different card types and lending options
The warning message about nearby branches closing soon
The values in the financial overview
The table of recent transactions

But, what should we check? The more things we verify in a test, the more coverage the test will have. However, the test will take longer to develop, require more time to run, and have a higher risk of breaking as development proceeds.

I wrote some Java code to perform high-level assertions on this page:

// Check various page elements
waitForAppearance(By.cssSelector("div.logo-w"));
waitForAppearance(By.cssSelector("div.element-search.autosuggest-search-activator > input"));
waitForAppearance(By.cssSelector("div.avatar-w img"));
waitForAppearance(By.cssSelector("ul.main-menu"));
waitForAppearance(By.xpath("//a/span[.='Add Account']"));
waitForAppearance(By.xpath("//a/span[.='Make Payment']"));
waitForAppearance(By.xpath("//a/span[.='View Statement']"));
waitForAppearance(By.xpath("//a/span[.='Request Increase']"));
waitForAppearance(By.xpath("//a/span[.='Pay Now']"));

// Check time message
assertTrue(Pattern.matches(
        "Your nearest branch closes in:( \\d+[hms])+",
        driver.findElement(By.id("time")).getText()));

// Check menu element names
var menuElements = driver.findElements(By.cssSelector("ul.main-menu li span"));
var menuItems = menuElements.stream().map(i -> i.getText().toLowerCase()).toList();
var expected = Arrays.asList("card types", "credit cards", "debit cards", "lending", "loans", "mortgages");
assertEquals(expected, menuItems);

// Check transaction statuses
var statusElements = driver.findElements(By.xpath("//td[./span[contains(@class, 'status-pill')]]/span[2]"));
var statusNames = statusElements.stream().map(n -> n.getText().toLowerCase()).toList();
var acceptableNames = Arrays.asList("complete", "pending", "declined");
assertTrue(acceptableNames.containsAll(statusNames));

If you don’t know Java, please don’t be frightened by this code! It checks that certain elements and links appear, that the warning message displays a timeframe, and that correct names for menu items and transaction statuses appear. As you can see, that’s a lot of complicated code – and that’s what I want you to see.

Sadly, its coverage is quite shallow. This code doesn’t check the placement of any elements. It doesn’t check the title bar, the financial overview values, or any transaction values other than status. If I wanted to cover all these things, I’d probably need to add at least another hundred lines of code. That might take me an hour to find all the locators, parse the text values, and run it a few times to make sure it works. Someone else would need to do a code review before the changes could be merged, as well.

If I do visual testing, then I could eliminate all this code with a one-line snapshot call:

eyes.check(Target.window().fully().withName("Main page"));

One. Line.

As an engineer, I cannot overstate how much this simplifies test development. A single snapshot implicitly covers everything on the page: visuals, text, placement, and color. I don’t need to make tradeoffs about what to check and what not to check. Visual snapshots remove a tremendous cognitive burden. They improve test coverage and make tests more robust. This is the same whether you are using a coded tool like Selenium WebDriver in Java or a codeless tool like Selenium IDE.

This is the third major advantage visual testing has over traditional functional testing: visual snapshots greatly simplify assertions. Instead of spending hours deciding what to check, figuring out locators, and writing transformation logic, you can make one concise snapshot call and be done. I said it before, and I’ll say it again: If a picture is worth a thousand words, then a snapshot is worth a thousand assertions.

Visual Testing Advantage #3:
A snapshot is worth a thousand assertions.

Testing different browsers and devices

So, what about cross-browser and cross-device testing? It’s great if my app works on my machine, but it also needs to work on everyone else’s machine. The major browsers these days are Chrome, Edge, Firefox, and Safari. The two main mobile platforms are iOS and Android. That might not sound like too much hassle at first, but then consider:

All the versions of each browser – typically, you want to verify that your app works on the last two or three releases.
All the screen sizes – modern web apps have responsive designs that change based on viewport.
All the device types – desktops and laptops have various operating systems, and phones and tablets come in a plethora of models.

We have a combinatorial explosion! Traditional functional tests must be run start-to-finish in their entirety on each of these platforms. Most teams will pick a few of the most popular combinations to test and skip the rest, but that could still require lots of test execution.

Visual testing simplifies things here, too. We already know that visual testing captures snapshots of pages in our applications to look for differences over time. Note how I used the word “snapshot” and not “screenshot.” That was deliberate. A screenshot is merely a rasterized capture of pixels reflecting an instantaneous view. It’s frozen in time and in size. A snapshot, however, captures everything that makes up the page: the HTML structure, the CSS styling, and the JavaScript code that brings it to life.

With cross-platform visual testing, a snapshot can be captured once and then re-rendered on any browser or device configuration.

Snapshots are more powerful than screenshots because snapshots can be re-rendered. For example, I could run my test one time on my local machine using Google Chrome, and then I could re-render any snapshots I capture from that test on Firefox, Safari, or Edge. I wouldn’t need to run the test from start to finish three more times – I just need to re-render the snapshots in the new browsers and run the Visual AI checker. I could re-render them using different versions and screen sizes, too, because I have the full page, not just a flat screenshot. This works for web apps as well as mobile apps.

Visually-based cross-platform testing is lightning fast. A typical UI test case takes about a minute to run. It could be more or less, but from my experience, 1 minute is a rough industry average. A visual checkpoint backed by Visual AI takes only a few seconds to complete. Do the math: if you have a large test suite with hundreds to thousands of tests that you need to test across multiple configurations, then visual testing could save you hours, if not days, of test execution time per cycle. Plus, if you use a service like Applitools Ultrafast Test Cloud, then you won’t need to set up all those different configurations yourself. You’ll spend less time and money on your full test efforts.

Visual Testing Advantage #4:
Visual snapshots enable lightning-fast cross-platform testing.

When to start visual testing

There is one more thing I want y’all to consider: when should a team adopt visual testing into their quality strategy? I can’t tell you how many times folks have told me, “Andy, that visual testing thing looks so cool and so helpful, but I don’t think my team will ever get there. We’re just getting started, and we’re new to automation, and automation is so hard, and I don’t think we’ll ever be mature enough to adopt visual testing techniques.” Every time I hear these reasons, I can’t help but do a facepalm.

Me, whenever others presume that visual testing is out of reach for them.
Source: https://en.wikipedia.org/wiki/Facepalm#/media/File:Paris_Tuileries_Garden_Facepalm_statue.jpg

Visual testing makes automation easier:

It makes verifications much easier to perform.
Visual snapshots cover more of a view than traditional assertions ever could.
Visual AI ensures that any visual differences identified are important.
Re-rendering snapshots on different configurations simplifies cross-platform testing.

I really think teams should do visual testing from the start. Consider this strategy: start by automating a few basic tests that navigate to different pages of an app and capture snapshots of each. The interactions would be straightforward, and the verifications would be single-step one-liners. If the testers are new to automation, they could go codeless with Selenium IDE just to get started. That would provide an immense amount of value for relatively little automation work. It’s the 80/20 rule: 80% of the value for 20% of the work. Then, later, when the team has more time or more maturity, they can expand the automation project with larger tests that use both traditional and visual assertions.

Visual Testing Advantage #5:
Visual testing makes functional testing easier.

Test automation is hard, no matter what tool or what language you use. Teams struggle to automate tests in time and to keep them running. Visual testing simplifies implementation and execution while catching more problems. It offers the advantage of making functional testing easier. It’s not a technique only for those on the bleeding edge. It’s here today, and it’s accessible to anyone doing test automation.

Next Steps

Overall, visual testing is a winning strategy. It has several advantages over traditional functional testing. Please note, however, that visual testing does not replace functional testing. Instead, it supercharges it. With a visual testing tool like Applitools Eyes, you can do visual testing in any major language or test framework you like, and with Applitools Ultrafast Test Cloud, you can do visual testing using any major browser or mobile configuration.

If you want to give visual testing a try with Applitools, start by registering a free account. Then, take of one of the Applitools tutorials. You can pick a tutorial for any of the supported SDKs. If you get stuck and need help, just contact me – I’ll be more than happy to help!

Open Testing: Opening tests like opening source

This article is based on a talk I gave on Open Testing at a few conferences: STARWEST 2021, TAU: The Homecoming, TSQA 2022, QA or the Highway 2022, and Conf42: SRE 2022.

I’m super excited to introduce a somewhat new idea to you and to our industry: Open Testing: What if we open our tests like we open our source? I’m not merely talking about creating open source test frameworks. I’m talking about opening the tests themselves. What if it became normal to share test cases and automated procedures? What if it became normal for companies to publicly share their test results? And what are the degrees of openness in testing for which we should strive as an industry?

I think that we – whether we are testers, developers, managers, or any other role in software – can greatly improve the quality of our work if we adopt principles of openness into our testing practices. To help me explain, I’d like to share how I learned about the main benefits of open source software, and then we can cross those benefits over into testing work.

So, let’s go way back in time to when I first encountered open source software.

My first encounter with open source code

I first started programming when I was in high school. At 13 years old, I was an incoming freshman at Parkville High School in their magnet school for math, science, and computer science in good old Baltimore, Maryland. (Fun fact: Parkville’s mascots were the Knights, which is my last name!) All students in the magnet program needed to have a TI-83 Plus graphing calculator. Now, mind you, this was back in the day before smart phones existed. Flip phones were the cool trend! The TI-83 Plus was cutting-edge handheld technology at that time. It was so advanced that when I first got it, it took me 5 minutes to figure out how to turn it off!

I quickly learned that the TI-83 Plus was just a mini-computer in disguise. Did you know that this thing has a full programming language built into it? TI-BASIC! Within the first two weeks of my freshman Intro to Computer Science class, our teacher taught us how to program math formulas: Slope. Circle circumference and area. The quadratic formula. You name it, I programmed it, even if it wasn’t a homework assignment. It felt awesome! It was more fun to me than playing video games, and believe me, I was a huge Nintendo fan.

There were two extra features of the TI-83 Plus that made it ideal for programming. First, it had a link cable for sharing programs. Two people could connect their calculators and copy programs from one to the other. Needless to say, with all my formulas, I became quite popular around test time. Second, anyone could open any program file on the calculator and read its code. The TI-BASIC source code could not be hidden. By design, it was “open source.”

This is how I learned my very first lesson about open source software: Open source helps me learn. Whenever I would copy programs from others, including games, I would open the program and read the code to see how it worked. Sometimes, I would make changes to improve it. More importantly, though, many times, I would learn something new that would help me write better programs. This is how I taught myself to code. All on this tiny screen. All through ripping open other people’s code and learning it. All because the code was open to me.

From the moment I wrote my first calculator program, I knew I wanted to become a software engineer. I had that spark.

My first open source library

Let’s fast-forward to college. I entered the Computer Science program at Rochester Institute of Technology – Go Tigers! By my freshman year in college, I had learned Java, C++, a little Python, and, of all things, COBOL. All the code in all my projects until that point had been written entirely by me. Sometimes, I would look at examples in books as a guide, but I’d never use other people’s code. In fact, if a professor caught you using copied code, then you’d fail that assignment and risk being expelled from the school.

Then, in my first software engineering course, we learned how to write unit tests using a library called JUnit. We downloaded JUnit from somewhere online – this was before Maven became big – and hooked it into our Java path. Then, we started writing test classes with test case methods, and somehow, it all ran magically in ways I couldn’t figure out at the time.

I was astounded that I could use software that I didn’t write myself in a project. Permission from a professor was one thing, but the fact that someone out there in the world was giving away good code for free just blew my mind. I saw the value in unit tests, and I immediately saw the value in a simple, free test framework like JUnit.

That’s when I learned my second lesson about open source software: Open source helps me become a better developer. I could have written my own test framework, but that would have taken me a lot of time. JUnit was ready to go and free to use. Plus, since several individuals had already spent years developing JUnit, it would have more features and fewer bugs than anything I could develop on my own for a college project. Using a package like JUnit helped me write and run my unit tests without needing to become an expert in test automation frameworks. I could build cool things without needing to build every single component.

That revelation felt empowering. Within a few years of taking that software engineering course, sites for hosting open source projects like GitHub became huge. Programming language package indexes like Maven, NuGet, PyPI, and NPM became development mainstays. The running joke within Python became that you could import anything! This was way better than swapping calculator games with link cables.

My first chance to give back

When I graduated college, I was zealous for open source software. I believed in it. I was an ardent supporter. But, I was mostly a consumer. As a Software Engineer in Test, I used many major test tools and frameworks: JUnit, TestNG, Cucumber, NUnit, xUnit.net, SpecFlow, pytest, Jasmine, Mocha, Selenium WebDriver, RestSharp, Rest Assured – the list goes on and on. As a Python developer, I used many modules and frameworks in the Python ecosystem like Django, Flask, and requests.

Then, I got the chance to give back: I launched an open source project called Boa Constrictor. Boa Constrictor is a .NET implementation of the Screenplay Pattern. It helps you make better interactions for better automation. Out of the box, it provides Web UI interactions using Selenium WebDriver and Rest API interactions using RestSharp, but you can use it to implement any interactions you want.

My company and I released Boa Constrictor publicly in October 2020. You can check out the boa-constrictor repository on GitHub. Originally, my team and I at Q2 developed all the code. We released it as an open source project hoping that it could help others in the industry. But then, something cool happened: folks in the industry helped us! We started receiving pull requests for new features. In fact, we even started using some new interactions developed by community members internally in our company’s test automation project. We also proudly participated in Hacktoberfest in 2020 and 2021.

Boa Constrictor: The .NET Screenplay Pattern

That’s when I learned my third lesson about open source software: Open source helps me become a better maintainer. Large projects need all the help they can get. Even a team of core maintainers can’t always handle all the work. However, when a project is open source, anyone who uses it can help out. Each little contribution can add value for the whole user base. Maintaining software then becomes easier, and the project can become more impactful.

Struggling with poor quality

As a Software Engineer in Test, I found myself caught between two worlds. In one world, I was a developer at heart who loved to write code to solve problems. In the other world, I was a software quality professional who tested software and advocated for improvements. These worlds came together primarily through test automation and continuous integration. Now that I’m a developer advocate, I still occupy this intersectionality with a greater responsibility for helping others.

However, throughout my entire career, I keep hitting one major problem: Software quality has a problem with quality. Let that sink in: software quality has a big problem with quality. I’ve worked on teams with titles ranging from “Software Quality Assurance” to “Test Engineering & Architecture,” and even an “Automation Center of Excellence.” Despite the titular focus on quality, every team has suffered from aspects of poor quality in workmanship.

Here are a few poignant examples:

Manual test case repositories are full of tests with redundant steps.
Test automation projects are riddled with duplicate code.
Setup and cleanup steps are copy-pasted endlessly, whether needed or not.
Automation code uses poor practices, such as global variables instead of dependency injection.
A 90% success rate is treated as a “good” day with “limited” flakiness.
Many tests cover silly, pointless, or unimportant things instead of valuable, meaningful behaviors.

How can we call ourselves quality professionals when our own work suffers from poor quality? Why are these kinds of problems so pervasive? I think they build up over time. Copy-pasting one procedure feels innocuous. One rogue variable won’t be noticed. One flaky test is no big deal. Once this starts happening, teams insularly keep repeating these practices until they make a mess. I don’t think giving teams more time to work on these problems will solve them, either, because more time does not interrupt inertia – it merely prolongs it.

The developer in me desperately wants to solve these problems. But how? I can do it in my own projects, but because my tests are sealed behind company doors, I can’t use it to show others how to do it at scale. Many of the articles and courses we have on how-to-do-X are full of toy examples, too.

Changing our quality culture

So, how do we get teams to break bad habits? I think our industry needs a culture change. If we could be more open with testing like we are open with source code, then perhaps we could bring many of the benefits we see from open source into testing:

Helping people learn testing
Helping people become better testers
Helping people become better test maintainers

If we cultivate a culture of openness, then we could lead better practices by example. Furthermore, if we become transparent about our quality, it could bolster our users’ confidence in our products while simultaneously keeping us motivated to keep quality high.

There are multiple ways to start pursuing this idea of open testing. Not every possibility may be applicable for every circumstance, but my goal is to get y’all thinking about it. Hopefully, these ideas can inspire better practices for better quality.

Openness through internal collaboration

For a starting point of reference, let’s consider the least open context for testing. Imagine a team where testing work is entirely siloed by role. In this type of team, there is a harsh line between developers and testers. Only the testers ever see test cases, access test repositories, or touch automation. Test cases and test plans are essentially “closed” to non-testers due to access, readability, or even apathy. The only output from testers are failure percentages and bug reports. Results are based more on trust than on evidence.

This kind of team sounds pretty bleak. I hope this isn’t the kind of team you’re on, but maybe it is. Let’s see how openness can make things better.

The first step towards open testing is internal openness. Let’s break down some siloes. Testers don’t exclusively own quality. Not everyone needs to be a tester by title, but everyone on the team should become quality-conscious. In fact, any software development team has three main roles: Business, Development, and Testing. Business looks for what problems to solve, Development addresses how to implement solutions, and Testing provides feedback on the solution. These three roles together are known as “The Three Amigos” or “The Three Hats.”

Each role offers a valuable perspective with unique expertise. When the Three Amigos stay apart, features under development don’t have the benefit of multiple perspectives. They might have serious design flaws, they might be unreasonable to implement, or they might be difficult to test. Misunderstandings could also cause developers to build the wrong things or testers to write useless tests. However, when the Three Amigos get together, they can jointly contribute to the design of product features. Everyone can get on the same page. The team can build quality into the product from the start. They could do activities like Question Storming and Example Mapping to help them define behaviors.

As part of this collaboration, not everyone may end up writing tests, but everyone will be thinking about quality. Testing then becomes easier because expected behaviors are well-defined and well-understood. Testers get deeper insight into what is important to cover. When testers share results and open bugs, other team members are more receptive because the feedback is more meaningful and more valuable.

We practiced Three Amigos collaboration at my previous company, Q2. My friend Steve was a developer who saw the value in Example Mapping. Many times, he’d pick up poorly-defined user stories with conflicting information or missing acceptance criteria. Sometimes, he’d burn a whole sprint just trying to figure things out! Once he learned about Example Mapping, he started setting up half-hour sessions with the other two Amigos (one of whom was me) to better understand user stories from the start. He got into it. Thanks to proactive collaboration, he could develop the stories more smoothly. One time, I remember we stopped working on a story because we couldn’t justify its business value, which saved Steve two weeks of pointless work. The story didn’t end there: Steve became a Software Engineer in Test! He shifted left so hard that he shifted into a whole new role.

Openness through living specs

Another step towards open testing is living documentation through specification by example. Collaboration like we saw with the Three Amigos is great, but the value it provides can be fleeting if it is not written down. Teams need artifacts to record designs, examples, and eventually test cases.

One reason why I love Example Mapping is because it facilitates a team to spell out stories, rules, examples, and questions onto color-coded cards that they can keep for future refinement.

Stories become work items.
Rules become acceptance criteria.
Examples become test cases.
Questions become spikes or future stories.

During Example Mapping, folks typically write cards quickly. An example card describes a behavior to test, but it might not carefully design the scenario. It needs further refinement. Defining behaviors using a clear, concise format like Given-When-Then makes behaviors easy to understand and easy to test.

For example, let’s say we wanted to test a web search engine. The example could be to search for a phrase like”panda”. We could write this example as the following scenario:

Given the search engine page is displayed
When the user searches for the phrase “panda”
Then the results page shows a list of links for “panda”

This special Given-When-Then format is known as the Gherkin language. Gherkin comes from Behavior-Driven Development tools like Cucumber, but it can be helpful for any type of testing. Gherkin defines testable behaviors in a concise way that follows the Arrange-Act-Assert pattern. You set things up, you interact with the feature, and you verify the outcomes.

Furthermore, Gherkin encourages Specification by Example. This scenario provides clear instructions on how to perform a search. It has real data, which is the search phrase “panda,” and clear results. Using real-world examples in specifications like this helps all Three Amigos understand the precise behavior.

Turning Example Mapping cards into Gherkin behavior specs

Behavior specifications are multifaceted artifacts:

They are requirements that define how a feature should behave.
They are acceptance criteria that must be met for a deliverable to be complete.
They are test cases with clear instructions.
They could become automated scripts with the right kind of test framework.
They are living documentation for the product.

Living documentation is open and powerful. Anyone on the team or outside the team can read it to learn about the product. Refining ideas into example cards into behavior specs becomes a pipeline that delivers living doc as a byproduct of the software development lifecycle.

SpecFlow is one of the best frameworks that supports this type of openness with Specification by Example and Living Documentation. SpecFlow is a free and open-source test automation framework for .NET. In SpecFlow, you write your test cases as Gherkin scenarios, and you automate each Given-When-Then step using C# methods.

One of SpecFlow’s niftiest features, however, is SpecFlow+ LivingDoc. Most test frameworks focus exclusively on automation code. When a test is automated, then only a programmer can read it and understand it. Gherkin makes this easier because steps are written in plain language, but Gherkin scenarios are nevertheless stored in a code repository that’s inaccessible to many team members. SpecFlow+ LivingDoc breaks that pattern. It turns Gherkin scenarios into a searchable doc site accessible to all Three Amigos. It makes test cases and test automation much more open. LivingDoc also provides test results for each scenario. Green check marks indicate passing tests, while red X’s indicate failures.

Historically, testers use reports like this to provide feedback in-house to their managers and developers. Results indicate what works and what needs to be fixed. However, test results can be useful to more people than just internal team members. What if test results were shared with users and customers? I’m going to repeat that statement, because it might seem shocking: What if users and customers could see test results?

Think about it. Open test results have very positive effects. Transparency with users builds trust. If users can see that things are tested and working, then they will gain confidence in the quality of the product. If they could peer into the living documentation, then they could learn how to use the product even better. On the flip side, transparency holds development teams accountable to keeping quality high, both in the product and in the testing. Open test results offer these benefits only if the results can be trusted. If tests are useless or failures are rampant, then public test results could actually hurt the ones developing the product.

This type of radical transparency would require an enormous culture shift. It may not be appropriate for every company to create public dashboards with their test results, but it could be a strategic differentiator when used wisely. For example, when I worked at Q2, we shared LivingDoc reports with specific PrecisionLender customers after every two-week release. It built trust. Plus, since the LivingDoc report includes only high-level behavior specs with simple results, even a vice president could read it! We could share tests without sharing automation code. That was powerful.

Openness through open source

Let’s keep extending open testing outward. In addition to sharing test results and living documentation, folks can also share tools, frameworks, and other parts of their tests. This is where open testing truly is open source.

We already covered a bunch of open source projects for test automation. As an industry, we are truly blessed with so many incredible projects. Every single one of them represents a team of testers who not only solved a problem but decided to share their solution with the world. Each solution is abstract enough to apply to many circumstances but concrete enough to provide a helpful implementation. Collectively, the projects on this page have probably been downloaded more than a billion times, and that’s no joke. And if you want, you could read the open source code for any of them.

Popular open source test automation projects

Cool new projects appear all the time, too. One of my favorite projects that started in the past few years is Playwright, an awesome browser automation tool from Microsoft. Playwright makes end-to-end web testing easy, reliable, and fast. It provides cross-browser and cross-language support like Selenium, a concise syntax like Cypress, and a bunch of advanced features like automatic waiting, tracing, and code generation. Plus, Playwright is magnitudes faster than other automation tools. It took things that made Selenium, Cypress, and Puppeteer great, and it took them to the next level.

Openness through shared test suites

So far, all the ways of approaching open testing are things we could do today. Many of us are probably already doing these things, even if we didn’t think of them under the phrase “open testing.” But where can these ideas go in the future?

My mind goes back to one of the big problems with testing that I mentioned earlier: duplication. Opening up collaboration fixes some bad habits, and sharing components eliminates some duplication in the plumbing of test automation, but so many of our tests across the industry repeat the same kinds of steps and follow the same types of patterns.

For example, think about any time you’ve ordered something from an online store. It could be Amazon, Walmart, Target – whatever. Every single online store has a virtual shopping cart. Whenever you want to buy something, you add it to your cart. Then, when you’re done shopping, you proceed to pay for all the items in your cart. If you decide you don’t want something anymore, you remove it from the cart. Easy-peasy.

As I describe this type of shopping cart, I don’t need to show you screenshots from the store website to explain it. Y’all have done so much online shopping that you intuitively know how it works, regardless of the store. Heck, I recently ordered a bunch of parts for an old Volkswagen Beetle from a site named JBugs, and the shopping cart was the same.

If so many applications have the same parts, then why do we keep duplicating the same tests in different places? Think about it. Think about how many times different teams have written nearly identical shopping cart tests. Ouch. Think about how much time was wasted on that duplication of effort.

I think this is something where Artificial Intelligence and Machine Learning could help. What if we could develop machine learning models to learn common behaviors for apps and services? The learning agents would look for things like standard icons and typical workflows. We could essentially create test suites for things like login, search, shopping, and payment that could run successfully on most apps. These kinds of tests probably couldn’t cover everything in any given application, but they could cover basic, common behaviors. Maybe that could cover a quarter of all behaviors worth testing. Maybe a third? Maybe half? Every little bit helps!

AI and ML can help us achieve true Autonomous Testing

Now, imagine sharing those generic test suites publicly. In the same way developers have open source projects to help expedite their coding, and in the same way data scientists have open data sets to use for modeling, testers could have open test suites that they could pick up and run as applicable. Not test tools – but actual runnable tests that could run against any application. If these kinds of test suites prove to be valuable, then prominent ones could become universally-accepted bars of quality for software apps. For example, in the future, companies could download and execute tests that run on any system for the apps they’re developing in addition to the tests they develop in-house. I think that could be a really cool opportunity.

This type of testing – Autonomous Testing – is the future. Software developer and testers will use AI-backed tools to better learn, explore, and exercise app behaviors. These tools will make it easier than ever to automate scriptable tests.

How to start pursuing openness

As we have covered, open testing could take many forms:

It could be openness in collaboration to build better quality from the start.
It could be openness in specification by example and living documentation.
It could be openness in sharing tests and their results with customers and users.
It could be openness in sharing tools, frameworks, and platforms.
It could be openness in building shared test sets for common application behaviors.

Some of these ideas might seem far-fetched or aspirational, but quite honestly, I think each of them could add lots of value to testing practices. I think every tester and every team should look at this list and ask themselves, “Could we try some of these things?” Perhaps your team could take baby steps with better collaboration or better specification. Perhaps your team has a cool project you built in-house that you could release as an open source project, like my old team and I did with Boa Constrictor. Perhaps there’s a startup idea in using machine learning for autonomous testing. Perhaps there are other ways to achieve open testing that aren’t listed here. Who knows? It could be cool!

We should also consider the flip side. Are there certain aspects of testing that should remain closed? My mind goes to security. Could fully open testing inadvertently reveal security vulnerabilities? Could lack of coverage in some areas welcome expedited exploitation? I don’t know, but I think we should consider possibilities like these.

If you want to pursue open testing, here are three questions to get you started:

How is your testing today?
1. In what ways is it already open?
2. In what ways is it closed?
How could your testing improve with incremental openness?
1. We’re talking baby steps here – small improvements that you could easily achieve today.
2. It could be as small as trying Example Mapping or joining a mob programming session.
How could your testing improve with radical openness?
1. Shoot the moon! Dream big! Get creative!
2. In the world of software, anything is possible.

Conclusion

We should also remember that open testing isn’t a goal unto itself. It’s a means to an end, and that end is higher quality: quality in our practices, quality in our artifacts, and ultimately quality in the software we create. We shouldn’t seek openness in testing just because I’m spouting lots of buzzwords in this article. At the same time, we also shouldn’t brush off these ideas as too radical or idealistic. What we should do is seek ways for perpetual improvement. Remember that this whole idea of open testing came from the tried-and-true benefits of open source code.