automation

1974 VW Karmann Ghia Convertible

7 Major Trends in Front End Web Testing

This article is based on my opening keynote address for Front End Test Fest 2022.

In the featured image for this article, you see a beautiful front end. It’s probably not the kind of “front end” you expected. It’s the front end of a 1974 Volkswagen Karmann Ghia. The Karmann Ghia was known as the “poor man’s Porsche.” It’s a very special car. It was actually a collaboration project between Wilhelm Karmann, a German automobile manufacturer, and Carrozzeria Ghia, an Italian automobile designer. Ghia designed the body as a work of art, and Karmann put it on the tried-and-true platform of the classic Volkswagen Beetle. When the Volkswagen executives saw it, they couldn’t say no to mass production.

The Karmann Ghia is a perfect symbol of the state of web development today. We strive to make beautiful front ends with reliable platforms supporting them on the back end. Collaboration from both sides is key to success, but what people remember most is the experience they have with your apps. My mom drove a Karmann Ghia like this when she was a teenager, and to this day she still talks about the good times she had with it.

Good quality, design, and experience are indispensable aspects of front ends – whether for classic cars or for the Web. In this article, I’ll share seven major trends I see in front end web testing. While there’s a lot of cool new things happening, I want y’all to keep in mind one main thing: tools and technologies may change, but the fundamentals of testing remain the same. Testing is interaction plus verification. Tests reveal the truth about our code and our features. We do testing as part of development to gather fast feedback for fixes and improvements. All the trends I will share today are rooted in these principles. With good testing, you can make sure your apps will look visually perfect, just like… you know.

#1. End-to-end testing

Here’s our first trend: End-to-end testing has become a three-way battle. For clarity, when I say “end-to-end” testing, I mean black-box test automation that interacts with a live web app in an active browser.

Historically, Selenium has been the most popular tool for browser automation. The project has been around for over a decade, and the WebDriver protocol is a W3C standard. It is open source, open standards, and open governance. Selenium WebDriver has bindings for C#, Java, JavaScript, Ruby, PHP, and Python. The project also includes Selenium IDE, a record-and-playback tool, and Selenium Grid, a scalable cluster for cross-browser testing. Selenium is alive and well, having just released version 4.

Over the years, though, Selenium has received a lot of criticism. Selenium WebDriver is a low-level protocol. It does not handle waiting automatically, leading many folks to unknowingly write flaky scripts. It requires clunky setup since WebDriver executables must be separately installed. Many developers dislike Selenium because coding with it requires a separate workflow or state of mind from the main apps they are developing.

Cypress was the answer to Selenium’s shortcomings. It aimed to be a modern framework with excellent developer experience, and in a few short years, it quickly became the darling test tool for front end developers. Cypress tests run in the browser side-by-side with the app under test. The syntax is super concise. There’s automatic waiting, meaning less flakiness. There’s visual tracing. There’s API calls. It’s nice. And it took a big chomp out of Selenium’s market share.

Cypress isn’t perfect, though. Its browser support is limited to Chromium-based browsers and Firefox. Cypress is also JavaScript-only, which excludes several communities. While Cypress is open source, it does not follow open standards or open governance like Selenium. And, sadly, Cypress’ performance is slow – equivalent tests run slower than Selenium.

Enter Playwright, the new open source test framework from Microsoft. Playwright is the spiritual successor to Puppeteer. It boasts the wide browser and language compatibility of Selenium with the refined developer experience of Cypress. It even has a code generator to help write tests. Plus, Playwright is fast – multiple times faster than Selenium or Cypress.

Playwright is still a newcomer, and it doesn’t yet have the footprint of the other tools. Some folks might be cautious that it uses browser projects instead of stock browsers. Nevertheless, it’s growing fast, and it could be a major contender for the #1 title. In Applitools’ recent Let The Code Speak code battles, Playwright handily beat out both Selenium and Cypress.

A side-by-side comparison of Selenium, Cypress, and Playwright
A side-by-side comparison of Selenium, Cypress, and Playwright

Selenium, Cypress, and Playwright are definitely now the “big three” browser automation tools for testing. A respectable fourth mention would be WebdriverIO. WebdriverIO is a JavaScript-based tool that can use WebDriver or debug protocols. It has a very large user base, but it is JavaScript-only, and it is not as big as Cypress. There are other tools, too. Puppeteer is still very popular but used more for web crawling than testing. Protractor, once developed by the Angular team, is now deprecated.

All these are good tools to choose (except Protractor). They can handle any kind of web app that you’re building. If you want to learn more about them, Test Automation University has courses for each.

#2. Component testing

End-to-end testing isn’t the only type of testing a team can or should do. Component testing is on the rise because components are on the rise! Many teams now build shareable component libraries to enforce consistency in their web design and to avoid code duplication. Each component is like a “unit of user interface.” Not only do they make development easier, they also make testing easier.

Component testing is distinct from unit testing. A unit test interacts directly with code. It calls a function or method and verifies its outcomes. Since components are inherently visual, they need to be rendered in the browser for proper testing. They might have multiple behaviors, or they may even trigger API calls. However, they can be tested in isolation of other components, so individually, they don’t need full end-to-end tests. That’s why, from a front end perspective, component testing is the new integration testing.

Storybook is a very popular tool for building and testing components in isolation. In Storybook, each component has a set of stories that denote how that component looks and behaves. While developing components, you can render them in the Storybook viewer. You can then manually test the component by interacting with them or changing their settings. Applitools also provides an SDK for automatically running visual tests against a Storybook library.

The Storybook viewer
The Storybook viewer

Cypress is also entering the component testing game. On June 1, 2022, Cypress released version 10, which included component testing support. This is a huge step forward. Before, folks would need to cobble together their own component test framework, usually as an extension of a unit test project or an end-to-end test project. Many solutions just ran automated component tests purely as Node.js processes without any browser component. Now, Cypress makes it natural to exercise component behaviors individually yet visually.

I love this quote from Cypress about their approach to component testing:

When testing anything for the web, we believe that tests should view and interact with the application in the same way that an actual user does. Anything less, and it’s hard to have confidence that your application is doing what it is supposed to.

https://www.cypress.io/blog/2022/06/01/cypress-10-release/

This quote hits on something big. So many automated tests fail to interact with apps like real users. They hinge on things like IDs, CSS selectors, and XPaths. They make minimal checks like appearance of certain elements or text. Pages could be completely broken, but automated tests could still pass.

#3. Visual testing

We really want the best of both worlds: the simplicity and sensibility of manual testing with the speed and scalability of automated testing. Historically, this has been a painful tradeoff. Most teams struggle to decide what to automate, what to check manually, and what to skip. I think there is tremendous opportunity in bridging the gap. Modern tools should help us automate human-like sensibilities into our tests, not merely fire events on a page.

That’s why visual testing has become indispensable for front end testing. Web apps are visual encounters. Visuals are the DNA of user experience. Functionality alone is insufficient. Users expect to be wowed. As app creators, we need to make sure those vital visuals are tested. Heaven forbid a button goes missing or our CSS goes sideways. And since we live in a world of continuous development and delivery, we need those visual checkpoints happening continuously at scale. Real human eyes are just too slow.

For example, I could have a login page that has an original version (left) and a changed version (right):

Visual comparison between versions of a login page
Visual comparison between versions of a login page

Visual testing tools alert you to meaningful changes and make it easy to compare them side-by-side. They catch things you might miss. Plus, they run just like any other automated test suite. Visual testing was tough in the past because tools merely did pixel-to-pixel comparisons, which generated lots of noise for small changes and environmental differences. Now, with a tool like Applitools Visual AI, visual comparisons accurately pinpoint the changes that matter.

Test automation needs to check visuals these days. Traditional scripts interact with only the basic bones of the page. You could break the layout and remove all styling like this, and there’s a good chance a traditional automated test would still pass:

The same login page from before, but without any CSS styling
The same login page from before, but without any CSS styling

With visual testing techniques, you can also rethink how you approach cross-browser and cross-device testing. Instead of rerunning full tests against every browser configuration you need, you can run them once and then simply re-render the visual snapshots they capture against different browsers to verify the visuals. You can do this even for browsers that the test framework doesn’t natively support! For example, using a platform like Applitools Ultrafast Test Cloud, you could run Cypress tests against Electron in CI and then perform visual checks in the Cloud against Safari and Internet Explorer, among other browsers. This style of cross-platform testing is faster, more reliable, and less expensive than traditional ways.

#4. Performance testing

Functionality isn’t the only aspect of quality that matters. Performance can make or break user experience. Most people expect any given page to load in a second or two. Back in 2016, Google discovered that half of all people leave a site if it takes longer than 3 seconds to load. As an industry, we’ve put in so much work to make the front end faster. Modern techniques like server-side rendering, hydration, and bloat reduction all aim to improve response times. It’s important to test the performance of our pages to make sure the user experience is tight.

Thankfully, performance testing is easier than ever before. There’s no excuse for not testing performance when it is so vital to success. There are many great ways to get started.

The simplest approach is right in your browser. You can profile any site with Chrome DevTools. Just right click the page, select “Inspect,” and switch to the Performance tab. Then start the profiler and start interacting with the page. Chrome DevTools will capture full metrics as a visual time series so you can explore exactly what happens as you interact with the page. You can also flip over to the Network tab to look for any API calls that take too long. If you want to learn more about this type of performance analysis, Test Automation University offers a course entitled Tools and Techniques for Performance and Load Testing by Amber Race. Amber shows how to get the most value out of that Performance tab.

Chrome DevTools Performance tab
Chrome DevTools Performance tab

Another nifty tool that’s also available in Chrome DevTools is Google Lighthouse. Lighthouse is a website auditor. It scores how well your site performs for performance, accessibility, progressive web apps, SEO, and more. It will also provide recommendations for how to improve your scores right within its reports. You can run Lighthouse from the command line or as a Node module instead of from Chrome DevTools as well.

Google Lighthouse from Chrome DevTools
Google Lighthouse from Chrome DevTools

Using Chrome DevTools manually for one-off checks or exploratory testing is helpful, but regular testing needs automation. One really cool way to automate performance checks is using Playwright, the end-to-end test framework I mentioned earlier. In Playwright, you can create a Chrome DevTools Protocol session and gather all the metrics you want. You can do other cool things with profiling and interception. It’s like a backdoor into the browser. Best of all, you could gather these metrics together with functional testing! One framework can meet the needs of both functional and performance test automation.

John Hill is a trailblazer in this space. He’s currently doing this as part of the Open MCT project. He’s the one who showed me how to automate performance tests with Playwright! If you want to learn more, check out this talk he gave recently on performance testing with Playwright, as well as his js-perf-toolkit project on GitHub.

Below is an example snippet I copied from js-perf-toolkit showing how to gather performance metrics using Playwright:

const client = await page.context().newCDPSession(page);
await client.send('Performance.enable'); 

await page.goto('https://www.google.com/');
await page.click('[aria-label="Search"]');
await page.fill('[aria-label="Search"]', 'playwright');

await Promise.all([
    page.waitForNavigation(),
    page.press('[aria-label="Search"]', 'Enter')
]);

let perfMetrics = await client.send('Performance.getMetrics');
console.log( perfMetrics.metrics );

#5. Machine learning models

There’s another curve ball when testing websites: what about machine learning models? For example, whenever you shop at an online store, the bottom of almost every product page has a list of recommendations for similar or complementary products. For example, when I searched Amazon for the latest Pokémon video game, Amazon recommended other games and toys:

Recommendation systems like this might be hard-coded for small stores, but large retailers like Amazon and Walmart use machine learning models to back up their recommendations. Models like this are notoriously difficult to test. How do we know if a recommendation is “good” or “bad”? How do I know if folks who like Pokémon would be enticed to buy a Kirby game or a Zelda game? Lousy recommendations are a lost business opportunity. Other models could have more serious consequences, like introducing harmful biases that affect users.

Machine learning models need separate approaches to testing. It might be tempting to skip data validation because it’s harder than basic functional testing, but that’s a risk not worth taking. To do testing right, separate the functional correctness of the frontend from the validity of data given to it. For example, we could provide mocked data for product recommendations so that tests would have consistent outcomes for verifying visuals. Then, we could test the recommendation system apart from the UI to make sure its answers seem correct. Separating these testing concerns makes each type of test more helpful in figuring out bugs. It also makes machine learning models faster to test, since testers or scripts don’t need to navigate a UI just to exercise them.

If you want to learn more about testing machine learning courses, Carlos Kidman created an excellent course all about it on Test Automation University named Intro to Testing Machine Learning Models. In his course, Carlos shows how to test models for adversarial attacks, behavioral aspects, and unfair biases.

#6. JavaScript

Now, the next trend I see will probably be controversial to many of you out there: JavaScript isn’t everything. Historically, JavaScript has been the only language for front end web development. As a result, a JavaScript monoculture has developed around the front end ecosystem. There’s nothing inherently wrong with that, but I see that changing in the coming years – and I don’t mean TypeScript.

In recent years, frustrations with single-page applications (SPAs) and client-heavy front ends have spurred a server-side renaissance. In addition to JavaScript frameworks that support SSR, classic server-side projects like Django, Rails, and Laravel are alive and kicking. Folks in those communities do JavaScript when they must, but they love exploring alternatives. For example, HTMX is a framework that provides hypertext directives for many dynamic actions that would otherwise be coded directly in JavaScript. I could use any of those classic web frameworks with HTMX and almost completely avoid JavaScript code. That makes it easier for programmers to make cool things happen on the front end without needing to navigate a foreign ecosystem.

Below is an example snippet of HTML code with HTMX attributes for posting a click and showing the response:

  <script src="https://unpkg.com/htmx.org@1.7.0"></script>
  <!-- have a button POST a click via AJAX -->
  <button hx-post="/clicked" hx-swap="outerHTML">
    Click Me
  </button>

WebAssembly, or “Wasm” is also here. WebAssembly is essentially an assembly language for browsers. Code written in higher-level languages can be compiled down into WebAssembly code and run on the browser. All major browsers now support WebAssembly to some degree. That means JavaScript no longer holds a monopoly on the browser.

I don’t know if any language will ever dethrone JavaScript in the browser, but I predict that browsers will become multilingual platforms through WebAssembly in the coming years. For example, at PyCon 2022, Anaconda announced PyScript, a framework for running Python code in the browser. Blazor enables C# code to run in-browser. Emscripten compiles C/C++ programs to WebAssembly. Other languages like Ruby and Rust also have WebAssembly support.

Regardless of what happens inside the browser, black-box testing tools and frameworks outside the browser can use any language. Tools like Playwright and Selenium support languages other than JavaScript. That brings many more people to the table. Testers shouldn’t be forced to learn JavaScript just to automate some tests when they already know another language. This is happening today, and I don’t expect it to change.

#7. Autonomous testing

Finally, there is one more trend I want to share, and this one is more about the future than the present: autonomous testing is coming. Ironically, today’s automated testing is still manually-intensive. Someone needs to figure out features, write down the test steps, develop the scripts, and maintain them when they inevitably break. Visual testing makes verification autonomous because assertions don’t need explicit code, but figuring out the right interactions to exercise features is still a hard problem.

I think the next big advancement for testing and automation will be autonomous testing: tools that autonomously look at an app, figure out what tests should be run, and then run those tests automatically. The key to making this work will be machine learning algorithms that can learn the context of the apps they target for testing. Human testers will need to work together with these tools to make them truly effective. For example, one type of tool could be a test recommendation engine that proposes tests for an app, and the human tester could pick the ones to run.

Autonomous testing will greatly simplify testing. It will make developers and testers far more productive. As an industry, we aren’t there yet, but it’s coming, and I think it’s coming soon. I delivered a keynote address on this topic at Future of Testing: Frameworks 2022:

Conclusion

There’s lots of exciting stuff happening in the world of the front end. As I said before, tools and technologies may change, but fundamentals remain the same. Each of these trends is rooted in tried-and-true principles of testing. They remind us that software quality is a multifaceted challenge, and the best strategy is the one that provides the most value for your project.

So, what do you think? Did I hit all the major front end trends? Did I miss anything? Let me know in the comments!

Van Gogh

Modernizing Software Quality Assurance with Visual Testing

This article introduces visual testing as a technique that can revolutionize software quality assurance (QA) practices. It is based on a talk I delivered on June 9, 2022 at AITP-RTP, and its target audience includes IT professionals and leaders who may not be hands-on with testing, coding, or automation.

Visual testing techniques are an incredible way to maximize the value of your functional tests. Instead of checking traditional things like text or attributes, visual testing captures full snapshots of your application’s pages and looks for visual differences over time. This isn’t just another nice-to-have feature that’s on the bleeding edge of technology. It’s a tried-and-true technique that anyone can use, and it makes testing easier!

In this article, I want to “open your eyes” to see how visual testing can revolutionize how you approach software quality. I want you to see things in new ways, and I’ll cover five key advantages of visual testing. I’ll use Applitools as the visual testing tool for demonstration. And don’t worry, everything will be high-level – I’ll be light on the code.

What is software testing?

We all know that there are several different kinds of testing. Here’s a short list:

  • Unit
  • Integration
  • End-to-End
  • Web UI
  • REST API
  • Mobile
  • Load testing
  • Performance testing
  • Property-based testing
  • Behavior-driven
  • Data-driven

You name it, there’s a test for it. We could play buzzword bingo if we wanted. But what is “testing”? In simplest terms, testing = interaction + verification. That’s it! You do something, and you make sure it works. Every kind of testing reduces to this formula.

We’ve been testing software since the dawn of computers. The “first computer bug” happened on September 9, 1947, when a moth flew into one of the relays of the Mark II computer at Harvard University. What you’re seeing here is Grace Hopper’s bug report, with the dead moth taped onto the notebook page.

The first computer bug, discovered by Grace Hopper in 1947.
Source: https://education.nationalgeographic.org/resource/worlds-first-computer-bug

Traditional testing practices

Historically, all testing was done manually. Whether it was Grace Hopper pulling a dead moth out of computer relays with tweezers or someone banging on a keyboard to navigate through a desktop app, humans have driven testing. Manual testing was practically the only way to do testing for decades. As applications became more user-centric with the rise of PCs in the 1980s, testing became a much more approachable discipline. Folks didn’t need to hold computer science degrees or to be software engineers to be successful – they just needed common sense and grit. Companies built entire organizations for testers. Releases wouldn’t ship until QA gave them seals of approval. Test repositories could have hundreds, even thousands, of test procedures.

Unfortunately, manual testing does not scale very well. It’s a slow process. If you want to test an app, you need to set everything up, log in, and exercise all the different features. Any time you discover a problem, you need to stop, investigate, and write a report. Every time there’s a new development build, you need to do it all over again. The only way to scale is to hire more testers. Even with more people, testing cycles could take days, weeks, or even months. When I worked at NetApp, the main functional testing phase for a major release took over half a year to complete.

Manual testing is a great way to test software features because it is simple and sensible, but it doesn’t scale well.

The rise of automation

Then, automation came. It started becoming popular with unit testing for functions and methods directly in the code itself in the late 1990s, but then black box automation tools and frameworks started becoming popular in the mid 2000s. Instead of manually performing test cases step by step, testers would write scripts to automatically execute test steps.

Tools like Selenium made it possible to automate browser interactions for testing web apps. Folks could code Selenium calls using the programming language of their choice: Java, JavaScript, C#, Python, Ruby, or PHP. Later, frameworks like Cypress and Playwright refined the experience that Selenium started. Other tools like SoapUI and (later) Postman made it easy to peel back frontend layers and test APIs directly. Appium made it possible to automate tests for mobile apps. So many solutions hit the market. The ones here are only a few. (Please don’t hate me if I didn’t mention your favorite tool here!) Many were free and open source, while others were licensed software products.

Automation offered several benefits over manual testing. With automation, you could run tests more quickly. Scripts don’t need to wait for humans to react to pages or write down results. You could also run tests more frequently. Teams started running tests continuously – nightly at first, and then after every code change. These benefits enabled teams to widen their test coverage and provide faster feedback. Testing work that would take a full team days to complete could be finished in a matter of hours, if not minutes. Test results would be posted in real time instead of at the end of testing cycles. Instead of endlessly executing tests manually, testers gained time back to work on other things, like automating even more tests or doing exploratory testing activities.

Popular test automation tools

Challenges with automation

Unfortunately, it wasn’t all rainbows and unicorns. Test automation was hard to develop. Since it was inherently more complex than manual testing, it required more skills. Testers needed to learn how to use tools like Selenium or Postman. On top of that, they needed to learn how to do programming. If they wanted to use codeless tools instead, then their companies probably had to shell out a pretty penny for licenses. Regardless of the tools chosen, automated scripts could never be made perfect. They are inherently fragile because they depend directly upon the features under test. For example, if a button on a web page changes, then the script will crash. Automated tests also gained a reputation for being flaky when testers didn’t appropriately handle waiting for things on the page to load. Furthermore, automation was only suitable for checking low-level things like text and numbers. That’s fine for unit tests and API tests, but it’s not suitable for user interfaces that are inherently visual. Passing tests could miss a lot of problems, giving a false sense of security.

When considering all these challenges together, we discovered as an industry that test automation isn’t fully autonomous. Despite dreaming of testing-made-easy, automation just made things harder. Teams who could build good test automation projects reaped handsome returns, but for many, the bar was too high. It was out of reach. Many tried and failed. Trust me, I’ve talked with lots of folks who struggle with test automation.

What we really want is the best of both worlds. We want the simplicity and sensibility of manual testing, but with the speed and scalability of automated testing. To get both, most teams use a split testing strategy. They automate some tests while running others manually. Actually, I’ve commonly seen teams run all their tests manually and then automate whatever they can with the time they have left. Some teams are more forward with their automation work, but not all. Folks perpetually make tradeoffs.

But, what if there was a way to get the simplicity and sensibility of manual testing with automation? What if automation could visually inspect our applications for differences like a human could?

Walking through an example

Consider a basic web application with a standard login page:

When we look at this from top to bottom, we see:

  • A logo
  • A page title
  • A username field
  • A password field
  • A sign-in button
  • A remember-me checkbox
  • Links to social media

However, during the course of development, we know things change – for better or worse. Here’s a different version of the same page:

Can you spot the differences? Looking at these two pages side-by-side makes comparison easier:

The logos are different, and the sign-in buttons are different. While I’d probably ask the developers about the sign-in button change, I’d categorically consider that logo change a bug. My gut tells me a human tester would catch these differences if they were paying attention, but there’s a chance they could miss them. Traditional automation would most likely fly right by these changes without stopping.

In fact, pages can be radically broken visually yet still have passing automated tests. In this version, I stripped all the CSS off the page:

We would definitely call this page broken. A traditional functional test script hinges on the most basic functionality of web pages, like IDs and element attributes. If it clicks, it works! It completely misses visuals. I even wrote a short test script with basic assertions, and sure enough, it passed on all three versions of this login page. Those are huge test gaps.

The magic of visual testing

So, what if we could visually inspect this page with automation? That would easily catch any changes that human eyes would detect, but with speed and scale. We could take a baseline snapshot that we consider “good,” and every time we run our tests, we take a new “checkpoint” snapshot. Then, we can compare the two side-by-side to detect any changes. This is what we call visual testing: take a baseline snapshot to start, take a checkpoint snapshot after every change, and look for any visual differences programmatically. If a picture is worth a thousand words, then a snapshot is worth a thousand assertions.

Visual testing: identifying differences between baseline snapshots to checkpoint snapshots.

One visual snapshot captures everything on the page. As a tester, you don’t need to explicitly state what to check: a snapshot implicitly covers layout, color, size, shape, and styling. That’s a huge advantage over traditional functional test automation.

Visual Testing Advantage #1:

Visual testing covers everything on a page.

Unfortunately, not all visual testing techniques are created equal. Programming a tool to capture snapshots and perform pixel-by-pixel comparisons isn’t too difficult, but determining if those changes matter is very difficult. A good visual testing tool should ignore changes that don’t matter – like small padding differences – and focus on changes that do matter – like missing elements. Otherwise, human testers will need to review every single result, nullifying any benefit of automating visual tests.

Take a look at these two pictures. They show a cute underwater scene. There are a total of ten differences between the two pictures. Can you find them?

Unfortunately, a pixel-to-pixel comparison won’t find any of them. I ran these two pictures through Applitools Eyes using an exact pixel-to-pixel comparison, and this is what happened:

Except for the whitespace on the sides, every pixel was different. As humans, we can clearly see that these images are very similar, but because they were a few pixels off on the sides, automation failed to pinpoint meaningful differences.

This is where AI really helps. Applitools uses Visual AI to detect meaningful changes that humans would see and ignore inconsequential differences that just make noise. Here, I used Applitools’ “strict” comparison, which pinpointed each of the ten differences:

That’s the second advantage of good automated visual testing: Visual AI focuses on meaningful changes to avoid noise. Visual test results shouldn’t waste testers’ time over small pixel shifts or things a human wouldn’t even notice. They should highlight what matters, like missing elements, different colors, or skewed layouts. Visual AI is a differentiator for visual testing tools. Not all tools rise above pixel-to-pixel comparisons.

Visual Testing Advantage #2:

Visual AI focuses on meaningful changes to avoid noise.

Simplifying test cases

Now, there are two main ways to automate tests. One path is to use coded tools. Tools like Selenium WebDriver are “coded” tools because they require testers to call them directly from programming code. Selenium WebDriver has bindings in Java, JavaScript, C#, Python, or Ruby, so testers can pick the language of their choice. Nevertheless, testers must essentially be developers to use coded tools.

The second path to automation is using codeless tools. Codeless tools don’t require testers to have programming skills. Instead, they record testers as they exercise features under test, and then they can replay those recorded tests at the push of a button. Most codeless tools also have some sort of visual builder through which testers can tweak and update their tests. There are several codeless tools available on the market, and many of them require paid licenses. However, Selenium IDE is a free and open source tool that does the job quite nicely.

Coded and codeless tools serve different needs. Coded tools are great for folks like me who know how to code and want high-power, customizable automation. Codeless tools are great for teams that are just getting started with automation, especially when most of their testing has historically been done manually. Regardless of approach, the good news is that you can do visual testing either way! For example, if you use Applitools, then there are SDKs and integrations for many different tools and frameworks.

As we recall, testing is interaction plus verification. When automating tests, the interactions and the verifications are scripted using either a coded or codeless tool. Testers must specify each of those operations. For example, if a test is exercising login behavior on this login page:

Then the interactions would be:

  1. Loading the page
  2. Entering username
  3. Entering password
  4. Clicking the login button
  5. Waiting for the main page to load

And then, the verifications would be checking that the main page loads correctly:

As we can see, this main page has lots of stuff on it. We could check several things:

  • The title bar at the top
  • The side bar with different card types and lending options
  • The warning message about nearby branches closing soon
  • The values in the financial overview
  • The table of recent transactions

But, what should we check? The more things we verify in a test, the more coverage the test will have. However, the test will take longer to develop, require more time to run, and have a higher risk of breaking as development proceeds.

I wrote some Java code to perform high-level assertions on this page:

// Check various page elements
waitForAppearance(By.cssSelector("div.logo-w"));
waitForAppearance(By.cssSelector("div.element-search.autosuggest-search-activator > input"));
waitForAppearance(By.cssSelector("div.avatar-w img"));
waitForAppearance(By.cssSelector("ul.main-menu"));
waitForAppearance(By.xpath("//a/span[.='Add Account']"));
waitForAppearance(By.xpath("//a/span[.='Make Payment']"));
waitForAppearance(By.xpath("//a/span[.='View Statement']"));
waitForAppearance(By.xpath("//a/span[.='Request Increase']"));
waitForAppearance(By.xpath("//a/span[.='Pay Now']"));

// Check time message
assertTrue(Pattern.matches(
        "Your nearest branch closes in:( \\d+[hms])+",
        driver.findElement(By.id("time")).getText()));

// Check menu element names
var menuElements = driver.findElements(By.cssSelector("ul.main-menu li span"));
var menuItems = menuElements.stream().map(i -> i.getText().toLowerCase()).toList();
var expected = Arrays.asList("card types", "credit cards", "debit cards", "lending", "loans", "mortgages");
assertEquals(expected, menuItems);

// Check transaction statuses
var statusElements = driver.findElements(By.xpath("//td[./span[contains(@class, 'status-pill')]]/span[2]"));
var statusNames = statusElements.stream().map(n -> n.getText().toLowerCase()).toList();
var acceptableNames = Arrays.asList("complete", "pending", "declined");
assertTrue(acceptableNames.containsAll(statusNames));

If you don’t know Java, please don’t be frightened by this code! It checks that certain elements and links appear, that the warning message displays a timeframe, and that correct names for menu items and transaction statuses appear. As you can see, that’s a lot of complicated code – and that’s what I want you to see.

Sadly, its coverage is quite shallow. This code doesn’t check the placement of any elements. It doesn’t check the title bar, the financial overview values, or any transaction values other than status. If I wanted to cover all these things, I’d probably need to add at least another hundred lines of code. That might take me an hour to find all the locators, parse the text values, and run it a few times to make sure it works. Someone else would need to do a code review before the changes could be merged, as well.

If I do visual testing, then I could eliminate all this code with a one-line snapshot call:

eyes.check(Target.window().fully().withName("Main page"));

One. Line.

As an engineer, I cannot overstate how much this simplifies test development. A single snapshot implicitly covers everything on the page: visuals, text, placement, and color. I don’t need to make tradeoffs about what to check and what not to check. Visual snapshots remove a tremendous cognitive burden. They improve test coverage and make tests more robust. This is the same whether you are using a coded tool like Selenium WebDriver in Java or a codeless tool like Selenium IDE.

This is the third major advantage visual testing has over traditional functional testing: visual snapshots greatly simplify assertions. Instead of spending hours deciding what to check, figuring out locators, and writing transformation logic, you can make one concise snapshot call and be done. I said it before, and I’ll say it again: If a picture is worth a thousand words, then a snapshot is worth a thousand assertions.

Visual Testing Advantage #3:

A snapshot is worth a thousand assertions.

Testing different browsers and devices

So, what about cross-browser and cross-device testing? It’s great if my app works on my machine, but it also needs to work on everyone else’s machine. The major browsers these days are Chrome, Edge, Firefox, and Safari. The two main mobile platforms are iOS and Android. That might not sound like too much hassle at first, but then consider:

  • All the versions of each browser – typically, you want to verify that your app works on the last two or three releases.
  • All the screen sizes – modern web apps have responsive designs that change based on viewport.
  • All the device types – desktops and laptops have various operating systems, and phones and tablets come in a plethora of models.

We have a combinatorial explosion! Traditional functional tests must be run start-to-finish in their entirety on each of these platforms. Most teams will pick a few of the most popular combinations to test and skip the rest, but that could still require lots of test execution.

Visual testing simplifies things here, too. We already know that visual testing captures snapshots of pages in our applications to look for differences over time. Note how I used the word “snapshot” and not “screenshot.” That was deliberate. A screenshot is merely a rasterized capture of pixels reflecting an instantaneous view. It’s frozen in time and in size. A snapshot, however, captures everything that makes up the page: the HTML structure, the CSS styling, and the JavaScript code that brings it to life.

With cross-platform visual testing, a snapshot can be captured once and then re-rendered on any browser or device configuration.

Snapshots are more powerful than screenshots because snapshots can be re-rendered. For example, I could run my test one time on my local machine using Google Chrome, and then I could re-render any snapshots I capture from that test on Firefox, Safari, or Edge. I wouldn’t need to run the test from start to finish three more times – I just need to re-render the snapshots in the new browsers and run the Visual AI checker. I could re-render them using different versions and screen sizes, too, because I have the full page, not just a flat screenshot. This works for web apps as well as mobile apps.

Visually-based cross-platform testing is lightning fast. A typical UI test case takes about a minute to run. It could be more or less, but from my experience, 1 minute is a rough industry average. A visual checkpoint backed by Visual AI takes only a few seconds to complete. Do the math: if you have a large test suite with hundreds to thousands of tests that you need to test across multiple configurations, then visual testing could save you hours, if not days, of test execution time per cycle. Plus, if you use a service like Applitools Ultrafast Test Cloud, then you won’t need to set up all those different configurations yourself. You’ll spend less time and money on your full test efforts.

Visual Testing Advantage #4: 

Visual snapshots enable lightning-fast cross-platform testing.

When to start visual testing

There is one more thing I want y’all to consider: when should a team adopt visual testing into their quality strategy? I can’t tell you how many times folks have told me, “Andy, that visual testing thing looks so cool and so helpful, but I don’t think my team will ever get there. We’re just getting started, and we’re new to automation, and automation is so hard, and I don’t think we’ll ever be mature enough to adopt visual testing techniques.” Every time I hear these reasons, I can’t help but do a facepalm.

Me, whenever others presume that visual testing is out of reach for them.
Source: https://en.wikipedia.org/wiki/Facepalm#/media/File:Paris_Tuileries_Garden_Facepalm_statue.jpg

Visual testing makes automation easier:

  • It makes verifications much easier to perform.
  • Visual snapshots cover more of a view than traditional assertions ever could.
  • Visual AI ensures that any visual differences identified are important.
  • Re-rendering snapshots on different configurations simplifies cross-platform testing.

I really think teams should do visual testing from the start. Consider this strategy: start by automating a few basic tests that navigate to different pages of an app and capture snapshots of each. The interactions would be straightforward, and the verifications would be single-step one-liners. If the testers are new to automation, they could go codeless with Selenium IDE just to get started. That would provide an immense amount of value for relatively little automation work. It’s the 80/20 rule: 80% of the value for 20% of the work. Then, later, when the team has more time or more maturity, they can expand the automation project with larger tests that use both traditional and visual assertions.

Visual Testing Advantage #5: 

Visual testing makes functional testing easier.

Test automation is hard, no matter what tool or what language you use. Teams struggle to automate tests in time and to keep them running. Visual testing simplifies implementation and execution while catching more problems. It offers the advantage of making functional testing easier. It’s not a technique only for those on the bleeding edge. It’s here today, and it’s accessible to anyone doing test automation.

Next Steps

Overall, visual testing is a winning strategy. It has several advantages over traditional functional testing. Please note, however, that visual testing does not replace functional testing. Instead, it supercharges it. With a visual testing tool like Applitools Eyes, you can do visual testing in any major language or test framework you like, and with Applitools Ultrafast Test Cloud, you can do visual testing using any major browser or mobile configuration.

If you want to give visual testing a try with Applitools, start by registering a free account. Then, take of one of the Applitools tutorials. You can pick a tutorial for any of the supported SDKs. If you get stuck and need help, just contact me – I’ll be more than happy to help!

Open Testing: Opening tests like opening source

This article is based on a talk I gave on Open Testing at a few conferences: STARWEST 2021, TAU: The Homecoming, TSQA 2022, QA or the Highway 2022, and Conf42: SRE 2022.

I’m super excited to introduce a somewhat new idea to you and to our industry: Open Testing: What if we open our tests like we open our source? I’m not merely talking about creating open source test frameworks. I’m talking about opening the tests themselves. What if it became normal to share test cases and automated procedures? What if it became normal for companies to publicly share their test results? And what are the degrees of openness in testing for which we should strive as an industry?

I think that we – whether we are testers, developers, managers, or any other role in software – can greatly improve the quality of our work if we adopt principles of openness into our testing practices. To help me explain, I’d like to share how I learned about the main benefits of open source software, and then we can cross those benefits over into testing work.

So, let’s go way back in time to when I first encountered open source software.

My first encounter with open source code

I first started programming when I was in high school. At 13 years old, I was an incoming freshman at Parkville High School in their magnet school for math, science, and computer science in good old Baltimore, Maryland. (Fun fact: Parkville’s mascots were the Knights, which is my last name!) All students in the magnet program needed to have a TI-83 Plus graphing calculator. Now, mind you, this was back in the day before smart phones existed. Flip phones were the cool trend! The TI-83 Plus was cutting-edge handheld technology at that time. It was so advanced that when I first got it, it took me 5 minutes to figure out how to turn it off!

The TI-83 Plus

I quickly learned that the TI-83 Plus was just a mini-computer in disguise. Did you know that this thing has a full programming language built into it? TI-BASIC! Within the first two weeks of my freshman Intro to Computer Science class, our teacher taught us how to program math formulas: Slope. Circle circumference and area. The quadratic formula. You name it, I programmed it, even if it wasn’t a homework assignment. It felt awesome! It was more fun to me than playing video games, and believe me, I was a huge Nintendo fan.

There were two extra features of the TI-83 Plus that made it ideal for programming. First, it had a link cable for sharing programs. Two people could connect their calculators and copy programs from one to the other. Needless to say, with all my formulas, I became quite popular around test time. Second, anyone could open any program file on the calculator and read its code. The TI-BASIC source code could not be hidden. By design, it was “open source.”

This is how I learned my very first lesson about open source software: Open source helps me learn. Whenever I would copy programs from others, including games, I would open the program and read the code to see how it worked. Sometimes, I would make changes to improve it. More importantly, though, many times, I would learn something new that would help me write better programs. This is how I taught myself to code. All on this tiny screen. All through ripping open other people’s code and learning it. All because the code was open to me.

From the moment I wrote my first calculator program, I knew I wanted to become a software engineer. I had that spark.

My first open source library

Let’s fast-forward to college. I entered the Computer Science program at Rochester Institute of Technology – Go Tigers! By my freshman year in college, I had learned Java, C++, a little Python, and, of all things, COBOL. All the code in all my projects until that point had been written entirely by me. Sometimes, I would look at examples in books as a guide, but I’d never use other people’s code. In fact, if a professor caught you using copied code, then you’d fail that assignment and risk being expelled from the school.

Then, in my first software engineering course, we learned how to write unit tests using a library called JUnit. We downloaded JUnit from somewhere online – this was before Maven became big – and hooked it into our Java path. Then, we started writing test classes with test case methods, and somehow, it all ran magically in ways I couldn’t figure out at the time.

I was astounded that I could use software that I didn’t write myself in a project. Permission from a professor was one thing, but the fact that someone out there in the world was giving away good code for free just blew my mind. I saw the value in unit tests, and I immediately saw the value in a simple, free test framework like JUnit.

That’s when I learned my second lesson about open source software: Open source helps me become a better developer. I could have written my own test framework, but that would have taken me a lot of time. JUnit was ready to go and free to use. Plus, since several individuals had already spent years developing JUnit, it would have more features and fewer bugs than anything I could develop on my own for a college project. Using a package like JUnit helped me write and run my unit tests without needing to become an expert in test automation frameworks. I could build cool things without needing to build every single component.

That revelation felt empowering. Within a few years of taking that software engineering course, sites for hosting open source projects like GitHub became huge. Programming language package indexes like Maven, NuGet, PyPI, and NPM became development mainstays. The running joke within Python became that you could import anything! This was way better than swapping calculator games with link cables.

My first chance to give back

When I graduated college, I was zealous for open source software. I believed in it. I was an ardent supporter. But, I was mostly a consumer. As a Software Engineer in Test, I used many major test tools and frameworks: JUnit, TestNG, Cucumber, NUnit, xUnit.net, SpecFlow, pytest, Jasmine, Mocha, Selenium WebDriver, RestSharp, Rest Assured – the list goes on and on. As a Python developer, I used many modules and frameworks in the Python ecosystem like Django, Flask, and requests.

Then, I got the chance to give back: I launched an open source project called Boa Constrictor. Boa Constrictor is a .NET implementation of the Screenplay Pattern. It helps you make better interactions for better automation. Out of the box, it provides Web UI interactions using Selenium WebDriver and Rest API interactions using RestSharp, but you can use it to implement any interactions you want.

My company and I released Boa Constrictor publicly in October 2020. You can check out the boa-constrictor repository on GitHub. Originally, my team and I at Q2 developed all the code. We released it as an open source project hoping that it could help others in the industry. But then, something cool happened: folks in the industry helped us! We started receiving pull requests for new features. In fact, we even started using some new interactions developed by community members internally in our company’s test automation project. We also proudly participated in Hacktoberfest in 2020 and 2021.

Boa Constrictor: The .NET Screenplay Pattern

That’s when I learned my third lesson about open source software: Open source helps me become a better maintainer. Large projects need all the help they can get. Even a team of core maintainers can’t always handle all the work. However, when a project is open source, anyone who uses it can help out. Each little contribution can add value for the whole user base. Maintaining software then becomes easier, and the project can become more impactful.

Struggling with poor quality

As a Software Engineer in Test, I found myself caught between two worlds. In one world, I was a developer at heart who loved to write code to solve problems. In the other world, I was a software quality professional who tested software and advocated for improvements. These worlds came together primarily through test automation and continuous integration. Now that I’m a developer advocate, I still occupy this intersectionality with a greater responsibility for helping others.

However, throughout my entire career, I keep hitting one major problem: Software quality has a problem with quality. Let that sink in: software quality has a big problem with quality. I’ve worked on teams with titles ranging from “Software Quality Assurance” to “Test Engineering & Architecture,” and even an “Automation Center of Excellence.” Despite the titular focus on quality, every team has suffered from aspects of poor quality in workmanship.

Here are a few poignant examples:

  • Manual test case repositories are full of tests with redundant steps.
  • Test automation projects are riddled with duplicate code.
  • Setup and cleanup steps are copy-pasted endlessly, whether needed or not.
  • Automation code uses poor practices, such as global variables instead of dependency injection.
  • A 90% success rate is treated as a “good” day with “limited” flakiness.
  • Many tests cover silly, pointless, or unimportant things instead of valuable, meaningful behaviors.

How can we call ourselves quality professionals when our own work suffers from poor quality? Why are these kinds of problems so pervasive? I think they build up over time. Copy-pasting one procedure feels innocuous. One rogue variable won’t be noticed. One flaky test is no big deal. Once this starts happening, teams insularly keep repeating these practices until they make a mess. I don’t think giving teams more time to work on these problems will solve them, either, because more time does not interrupt inertia – it merely prolongs it.

The developer in me desperately wants to solve these problems. But how? I can do it in my own projects, but because my tests are sealed behind company doors, I can’t use it to show others how to do it at scale. Many of the articles and courses we have on how-to-do-X are full of toy examples, too.

Changing our quality culture

So, how do we get teams to break bad habits? I think our industry needs a culture change. If we could be more open with testing like we are open with source code, then perhaps we could bring many of the benefits we see from open source into testing:

  1. Helping people learn testing
  2. Helping people become better testers
  3. Helping people become better test maintainers

If we cultivate a culture of openness, then we could lead better practices by example. Furthermore, if we become transparent about our quality, it could bolster our users’ confidence in our products while simultaneously keeping us motivated to keep quality high.

There are multiple ways to start pursuing this idea of open testing. Not every possibility may be applicable for every circumstance, but my goal is to get y’all thinking about it. Hopefully, these ideas can inspire better practices for better quality.

Openness through internal collaboration

For a starting point of reference, let’s consider the least open context for testing. Imagine a team where testing work is entirely siloed by role. In this type of team, there is a harsh line between developers and testers. Only the testers ever see test cases, access test repositories, or touch automation. Test cases and test plans are essentially “closed” to non-testers due to access, readability, or even apathy. The only output from testers are failure percentages and bug reports. Results are based more on trust than on evidence.

This kind of team sounds pretty bleak. I hope this isn’t the kind of team you’re on, but maybe it is. Let’s see how openness can make things better.

The first step towards open testing is internal openness. Let’s break down some siloes. Testers don’t exclusively own quality. Not everyone needs to be a tester by title, but everyone on the team should become quality-conscious. In fact, any software development team has three main roles: Business, Development, and Testing. Business looks for what problems to solve, Development addresses how to implement solutions, and Testing provides feedback on the solution. These three roles together are known as “The Three Amigos” or “The Three Hats.”

Each role offers a valuable perspective with unique expertise. When the Three Amigos stay apart, features under development don’t have the benefit of multiple perspectives. They might have serious design flaws, they might be unreasonable to implement, or they might be difficult to test. Misunderstandings could also cause developers to build the wrong things or testers to write useless tests. However, when the Three Amigos get together, they can jointly contribute to the design of product features. Everyone can get on the same page. The team can build quality into the product from the start. They could do activities like Question Storming and Example Mapping to help them define behaviors.

The Three Amigos

As part of this collaboration, not everyone may end up writing tests, but everyone will be thinking about quality. Testing then becomes easier because expected behaviors are well-defined and well-understood. Testers get deeper insight into what is important to cover. When testers share results and open bugs, other team members are more receptive because the feedback is more meaningful and more valuable.

We practiced Three Amigos collaboration at my previous company, Q2. My friend Steve was a developer who saw the value in Example Mapping. Many times, he’d pick up poorly-defined user stories with conflicting information or missing acceptance criteria. Sometimes, he’d burn a whole sprint just trying to figure things out! Once he learned about Example Mapping, he started setting up half-hour sessions with the other two Amigos (one of whom was me) to better understand user stories from the start. He got into it. Thanks to proactive collaboration, he could develop the stories more smoothly. One time, I remember we stopped working on a story because we couldn’t justify its business value, which saved Steve two weeks of pointless work. The story didn’t end there: Steve became a Software Engineer in Test! He shifted left so hard that he shifted into a whole new role.

Openness through living specs

Another step towards open testing is living documentation through specification by example. Collaboration like we saw with the Three Amigos is great, but the value it provides can be fleeting if it is not written down. Teams need artifacts to record designs, examples, and eventually test cases.

One reason why I love Example Mapping is because it facilitates a team to spell out stories, rules, examples, and questions onto color-coded cards that they can keep for future refinement.

  1. Stories become work items.
  2. Rules become acceptance criteria.
  3. Examples become test cases.
  4. Questions become spikes or future stories.

During Example Mapping, folks typically write cards quickly. An example card describes a behavior to test, but it might not carefully design the scenario. It needs further refinement. Defining behaviors using a clear, concise format like Given-When-Then makes behaviors easy to understand and easy to test.

For example, let’s say we wanted to test a web search engine. The example could be to search for a phrase like”panda”. We could write this example as the following scenario:

  1. Given the search engine page is displayed
  2. When the user searches for the phrase “panda”
  3. Then the results page shows a list of links for “panda”

This special Given-When-Then format is known as the Gherkin language. Gherkin comes from Behavior-Driven Development tools like Cucumber, but it can be helpful for any type of testing. Gherkin defines testable behaviors in a concise way that follows the Arrange-Act-Assert pattern. You set things up, you interact with the feature, and you verify the outcomes.

Furthermore, Gherkin encourages Specification by Example. This scenario provides clear instructions on how to perform a search. It has real data, which is the search phrase “panda,” and clear results. Using real-world examples in specifications like this helps all Three Amigos understand the precise behavior.

Turning Example Mapping cards into Gherkin behavior specs

Behavior specifications are multifaceted artifacts:

  1. They are requirements that define how a feature should behave.
  2. They are acceptance criteria that must be met for a deliverable to be complete.
  3. They are test cases with clear instructions.
  4. They could become automated scripts with the right kind of test framework.
  5. They are living documentation for the product.

Living documentation is open and powerful. Anyone on the team or outside the team can read it to learn about the product. Refining ideas into example cards into behavior specs becomes a pipeline that delivers living doc as a byproduct of the software development lifecycle.

SpecFlow is one of the best frameworks that supports this type of openness with Specification by Example and Living Documentation. SpecFlow is a free and open-source test automation framework for .NET. In SpecFlow, you write your test cases as Gherkin scenarios, and you automate each Given-When-Then step using C# methods.

One of SpecFlow’s niftiest features, however, is SpecFlow+ LivingDoc. Most test frameworks focus exclusively on automation code. When a test is automated, then only a programmer can read it and understand it. Gherkin makes this easier because steps are written in plain language, but Gherkin scenarios are nevertheless stored in a code repository that’s inaccessible to many team members. SpecFlow+ LivingDoc breaks that pattern. It turns Gherkin scenarios into a searchable doc site accessible to all Three Amigos. It makes test cases and test automation much more open. LivingDoc also provides test results for each scenario. Green check marks indicate passing tests, while red X’s indicate failures.

Historically, testers use reports like this to provide feedback in-house to their managers and developers. Results indicate what works and what needs to be fixed. However, test results can be useful to more people than just internal team members. What if test results were shared with users and customers? I’m going to repeat that statement, because it might seem shocking: What if users and customers could see test results? 

Think about it. Open test results have very positive effects. Transparency with users builds trust. If users can see that things are tested and working, then they will gain confidence in the quality of the product. If they could peer into the living documentation, then they could learn how to use the product even better. On the flip side, transparency holds development teams accountable to keeping quality high, both in the product and in the testing. Open test results offer these benefits only if the results can be trusted. If tests are useless or failures are rampant, then public test results could actually hurt the ones developing the product.

A SpecFlow+ LivingDoc report

This type of radical transparency would require an enormous culture shift. It may not be appropriate for every company to create public dashboards with their test results, but it could be a strategic differentiator when used wisely. For example, when I worked at Q2, we shared LivingDoc reports with specific PrecisionLender customers after every two-week release. It built trust. Plus, since the LivingDoc report includes only high-level behavior specs with simple results, even a vice president could read it! We could share tests without sharing automation code. That was powerful.

Openness through open source

Let’s keep extending open testing outward. In addition to sharing test results and living documentation, folks can also share tools, frameworks, and other parts of their tests. This is where open testing truly is open source.

We already covered a bunch of open source projects for test automation. As an industry, we are truly blessed with so many incredible projects. Every single one of them represents a team of testers who not only solved a problem but decided to share their solution with the world. Each solution is abstract enough to apply to many circumstances but concrete enough to provide a helpful implementation. Collectively, the projects on this page have probably been downloaded more than a billion times, and that’s no joke. And if you want, you could read the open source code for any of them.

Popular open source test automation projects

Cool new projects appear all the time, too. One of my favorite projects that started in the past few years is Playwright, an awesome browser automation tool from Microsoft. Playwright makes end-to-end web testing easy, reliable, and fast. It provides cross-browser and cross-language support like Selenium, a concise syntax like Cypress, and a bunch of advanced features like automatic waiting, tracing, and code generation. Plus, Playwright is magnitudes faster than other automation tools. It took things that made Selenium, Cypress, and Puppeteer great, and it took them to the next level.

Openness through shared test suites

So far, all the ways of approaching open testing are things we could do today. Many of us are probably already doing these things, even if we didn’t think of them under the phrase “open testing.” But where can these ideas go in the future?

My mind goes back to one of the big problems with testing that I mentioned earlier: duplication. Opening up collaboration fixes some bad habits, and sharing components eliminates some duplication in the plumbing of test automation, but so many of our tests across the industry repeat the same kinds of steps and follow the same types of patterns.

For example, think about any time you’ve ordered something from an online store. It could be Amazon, Walmart, Target – whatever. Every single online store has a virtual shopping cart. Whenever you want to buy something, you add it to your cart. Then, when you’re done shopping, you proceed to pay for all the items in your cart. If you decide you don’t want something anymore, you remove it from the cart. Easy-peasy.

As I describe this type of shopping cart, I don’t need to show you screenshots from the store website to explain it. Y’all have done so much online shopping that you intuitively know how it works, regardless of the store. Heck, I recently ordered a bunch of parts for an old Volkswagen Beetle from a site named JBugs, and the shopping cart was the same.

If so many applications have the same parts, then why do we keep duplicating the same tests in different places? Think about it. Think about how many times different teams have written nearly identical shopping cart tests. Ouch. Think about how much time was wasted on that duplication of effort.

I think this is something where Artificial Intelligence and Machine Learning could help. What if we could develop machine learning models to learn common behaviors for apps and services? The learning agents would look for things like standard icons and typical workflows. We could essentially create test suites for things like login, search, shopping, and payment that could run successfully on most apps. These kinds of tests probably couldn’t cover everything in any given application, but they could cover basic, common behaviors. Maybe that could cover a quarter of all behaviors worth testing. Maybe a third? Maybe half? Every little bit helps!

AI and ML can help us achieve true Autonomous Testing

Now, imagine sharing those generic test suites publicly. In the same way developers have open source projects to help expedite their coding, and in the same way data scientists have open data sets to use for modeling, testers could have open test suites that they could pick up and run as applicable. Not test tools – but actual runnable tests that could run against any application. If these kinds of test suites prove to be valuable, then prominent ones could become universally-accepted bars of quality for software apps. For example, in the future, companies could download and execute tests that run on any system for the apps they’re developing in addition to the tests they develop in-house. I think that could be a really cool opportunity.

This type of testing – Autonomous Testing – is the future. Software developer and testers will use AI-backed tools to better learn, explore, and exercise app behaviors. These tools will make it easier than ever to automate scriptable tests.

How to start pursuing openness

As we have covered, open testing could take many forms:

  1. It could be openness in collaboration to build better quality from the start.
  2. It could be openness in specification by example and living documentation.
  3. It could be openness in sharing tests and their results with customers and users.
  4. It could be openness in sharing tools, frameworks, and platforms.
  5. It could be openness in building shared test sets for common application behaviors.

Some of these ideas might seem far-fetched or aspirational, but quite honestly, I think each of them could add lots of value to testing practices. I think every tester and every team should look at this list and ask themselves, “Could we try some of these things?” Perhaps your team could take baby steps with better collaboration or better specification. Perhaps your team has a cool project you built in-house that you could release as an open source project, like my old team and I did with Boa Constrictor. Perhaps there’s a startup idea in using machine learning for autonomous testing. Perhaps there are other ways to achieve open testing that aren’t listed here. Who knows? It could be cool!

We should also consider the flip side. Are there certain aspects of testing that should remain closed? My mind goes to security. Could fully open testing inadvertently reveal security vulnerabilities? Could lack of coverage in some areas welcome expedited exploitation? I don’t know, but I think we should consider possibilities like these.

If you want to pursue open testing, here are three questions to get you started:

  1. How is your testing today?
    1. In what ways is it already open?
    2. In what ways is it closed?
  2. How could your testing improve with incremental openness?
    1. We’re talking baby steps here – small improvements that you could easily achieve today.
    2. It could be as small as trying Example Mapping or joining a mob programming session.
  3. How could your testing improve with radical openness?
    1. Shoot the moon! Dream big! Get creative!
    2. In the world of software, anything is possible.

Conclusion

We should also remember that open testing isn’t a goal unto itself. It’s a means to an end, and that end is higher quality: quality in our practices, quality in our artifacts, and ultimately quality in the software we create. We shouldn’t seek openness in testing just because I’m spouting lots of buzzwords in this article. At the same time, we also shouldn’t brush off these ideas as too radical or idealistic. What we should do is seek ways for perpetual improvement. Remember that this whole idea of open testing came from the tried-and-true benefits of open source code.

Want to practice test automation? Try these demo sites!

One of the biggest struggles in learning how to develop top-notch automation is practice. Testing is as much an art as it is a science. It takes time to discern when to add explicit waits, how to craft robust locators, and why to verify one element over another. It also requires apps with specific elements or endpoints to try certain operations. Unfortunately, although resources for learning how to automate tests (like Test Automation University) are abundant these days, public demo web sites for practicing remain elusive. I’ve struggled to find ones that I like, and others frequently ask me for recommendations.

Which demo sites are good?

Below is a list of demo sites I’ve found either by searching online or through recommendations from friends. Many of these sites appear on other “Top N Demo Sites for Testing” articles as well. My list aims not only to provide links to popular demo sites but also to provide recommendations on how to use them.

Types of sites:

  • Web UI site – looks like a real web site
  • Web UI elements – tutorial pages showcasing web element types
  • Mobile UI site – looks like a real mobile site
  • API site – provides public APIs for testing
  • DIY – “do it yourself”; you must set up and run the demo site yourself
Demo SiteTypeDescription
ParaBankWeb UI & API siteAn online banking site from Parasoft with login and REST/SOAP APIs. You can also access the database if you run the project locally using the source code.
Restful BookerWeb UI & API siteAn online site for bed & breakfast bookings from Mark Winteringham. The frontend is a React app (source), and the backend is a REST API (source).
Automation Practice WebsiteWeb UI siteA basic online store with optional login from SeleniumFramework.com. Great for example web UI tests.
Luma – Magento eCommerceWeb UI siteAnother online store with several pages.
DemoblazeWeb UI siteA basic online store with optional login from BlazeMeter. Great for example web UI tests.
Swag LabsWeb UI siteA basic online store with required login from Sauce Labs. Great for example web UI tests.
Applitools demo siteWeb UI siteA small site with login page and home page from Applitools. Compare/contrast with a second version for visual testing.
Automation BookstoreWeb UI siteA one-page site for dynamically searching book titles. Good for testing responsive design in short demos.
JPetStore DemoWeb UI siteA pet store site from MyBatis built on top of MyBatis 3, Spring 3, and Stripes (source).
GlobalSQA Banking ProjectWeb UI siteA basic banking app with dropdown login and simple pages from GlobalSQA.
Gatling Computers DatabaseWeb UI siteA one-page site from Gatling that provides a paginated list of computer models with the ability to filter and add new computers.
CandyMapperWeb UI siteA Halloween-themed site from Paul Grossman that shows scary bugs. It also comes with a second release that has the fixes.
OWASP Juice ShopDIY Web UI siteA test site written entirely in JavaScript using Node.js, Express and Angular. Specifically built for testing security vulnerabilities.
Cypress Real-World AppDIY Web UI siteA fake payment app from Cypress meant for demonstrating real-world Cypress testing. Could be used for other purposes.
RealWorld example appsDIY Web UI siteOne demo app implemented in multiple languages and frameworks. Not developed for testing but could nevertheless be used.
the-internetWeb UI elementsA site from Dave Haeffner and Elemental Selenium with several concise examples of web elements and interactions.
Selenium Test PagesWeb UI elementsA site with several pages that have slightly deeper examples than the-internet.
LetCodeWeb UI elementsA set of very clean pages along with video tutorials explaining how to automate interactions.
DemoQAWeb UI elementsA practice site from ToolsQA that includes pages for elements, forms, frames, interactions, and even a small bookstore.
Ultimate QA Automation PracticeWeb UI elementsA set of rich practice pages from Ultimate QA.
UI Test Automation PlaygroundWeb UI elementsA set of educational pages with interactable elements from the Rapise team.
SelectorsHub Practice PageWeb UI elementsA practice page from SelectorsHub for interacting with different kinds of web elements.
WebDriverUniversity.comWeb UI elementsAnother set of educational pages with interactable elements.
Sauce Labs Native Sample ApplicationDIY Mobile UI siteA mobile app from Sauce Labs for Android and iOS that is very similar to the Swag Labs web site.
JSONPlaceholderAPI siteA public REST API for generating fake data from a set of predefined resources. Follow the guide to learn how to make requests.
Swagger PetstoreAPI siteA public REST API from Swagger for testing authentication and CRUD operations for pet store data.
Public APIsAPI siteA long list of public APIs that can be used for testing.
Device Registry ServiceDIY API siteA Flask app I developed for teaching how to test REST APIs. Includes the app and automated API tests, which can both be run locally.
Best Buy API PlaygroundDIY API siteA JavaScript-based REST API demo service from Best Buy.

Which ones do you recommend most?

There are lots of demo sites above. Here are the ones I’d personally recommend for different needs:

Of course, the others are still good. That’s why they made the list!

Why bother with “fake” sites?

You might be wondering, “Why do we need demo sites? Why don’t we just use real sites?” Well, demo or “fake” web sites meet a few needs that real sites cannot:

  • Demo sites offer consistency. They are implemented one way and then do not change. Folks can trust that their tests will always work on them.
  • Demo sites are often simpler than real sites. They feel less intimidating for newcomers.
  • Demo sites can be designed for instruction. If they are part of a tutorial, the author can add features to the site to demonstrate concepts.
  • Demo sites are safer for publications like articles, tutorials, and books. Written content is static, so any sites referenced by examples should also be static. Real sites could change.
  • Real sites might require end user agreements that forbid automated requests. Some may even throttle or block requests if they suspect the requests come from a “bot.”
  • Real sites might also bear legal or copyright implications, especially if a company is using the sites for their own content.

What limitations do demo sites have?

Unfortunately, demo sites do have limitations:

  • Demo sites may be too simplified. They may lack large workflows or real-world data. Folks might become frustrated when they discover inactive elements that appear to be real.
  • Demo sites may not be built to scale. High request volume or parallel testing scale might cripple them.
  • Demo sites may seem like they have poor quality, whether or not they actually have poor quality. Sometimes, they are built quickly for testing purposes and therefore don’t have the attention to detail as real sites.
  • Demo sites with significant company branding may be inappropriate to use. For example, if A and B are competitors, then A shouldn’t use B’s demo site for their product tutorials.

Other great lists

Here are other great articles that list demo sites for testing. I referred to them while compiling this list:

Do you have other demo sites to recommend? Put them in the comments below!

Playwright Workshop for TAU: The Homecoming

Want to learn Playwright with Python? Take this workshop!

Playwright is an awesome new browser automation library. With Playwright, you can automate web UI interactions for testing or for web scraping with a concise, uniform API in one of four languages: Python, C#, Java, and JavaScript. Playwright is also completely open source and backed by Microsoft. It’s a powerful alternative to Selenium WebDriver.

On December 1, 2021, I delivered a workshop on Playwright for TAU: The Homecoming. In my workshop, I taught how to build a test automation project in Python using Playwright with pytest, Python’s most popular test framework. We automated a test case together for performing a DuckDuckGo web search.

If you missed the workshop, no worries: You can still take the workshop as a self-guided tutorial! The workshop instructions and example code are located in this GitHub repository:

https://github.com/AutomationPanda/tau-playwright-workshop

To take the workshop as a self-guided tutorial, read the repository’s README, and then follow the instructions in the Markdown guides under the workshop folder. The workshop has five main parts:

  1. Getting started
    1. What is Playwright?
    2. Our web search test
    3. Test project setup
  2. First steps with Playwright
    1. Browsers, contexts, and pages
    2. Navigating to a web page
    3. Performing a search
  3. Writing assertions
    1. Checking the search field
    2. Checking the result links
    3. Checking the title
  4. Refactoring using page objects
    1. The search page
    2. The result page
    3. Page object fixtures
  5. Nifty Playwright tricks
    1. Testing different browsers
    2. Capturing screenshots and videos
    3. Running tests in parallel

If you get stuck or have any questions, please open issues against the GitHub repository, and I’ll try to help. Happy coding!

Boa Constrictor’s Awesome Hacktoberfest 2021

Boa Constrictor is the .NET Screenplay Pattern. It helps you make better interactions for better automation! Its primary use case is Web UI and REST API test automation, but it can be used to automate any kind of interactions. The Screenplay Pattern is much more scalable for development and execution than the Page Object Model.

The Boa Constrictor maintainers and I strongly support open source software. That’s why we participated in Hacktoberfest 2021. In fact, this was the second Hacktoberfest we did. We launched Boa Constrictor as an open source project a year ago during Hacktoberfest 2020! We love sharing our code with the community and inspiring others to get involved. To encourage participation this year, we added the “hacktoberfest” label to open issues, and we offered cool stickers to anyone who contributed.

Boa Constrictor sticker
Boa Constrictor: The .NET Screenplay Pattern
Sticker Medallion

Hacktoberfest 2021 was a tremendous success for Boa Constrictor. Even though the project is small, we received several contributions. Here’s a summary of all the new stuff we added to Boa Constrictor:

  • Updated WebDriver interactions to use Selenium WebDriver 4.0
  • Implemented asynchronous programming for Tasks and Questions
  • Extended the Wait Task to wait for multiple Questions using AND and OR logic
  • Standardized ToString methods for all WebDriver interactions
  • Automated unit tests for WebDriver Questions
  • Wrote new user guides for test framework integrations and interaction patterns
  • Made small refinements to the doc site
  • Created GitHub templates for issues and pull requests
  • Replaced the symbols NuGet package with embedded debugging
  • Added the README to the NuGet package
  • Added Shields to the README
  • Restructured projects for docs, logos, and talk

During Hacktoberfest 2021, we made a series of four releases because we believe in lean development that puts new features in the hands of developers ASAP. The final capstone release was version 2.0.0: a culmination of all Hacktoberfest work! Here’s a view of the Boa Constrictor NuGet package with its new README (Shields included):

The Boa Constrictor NuGet package with the new README and Shields
The Boa Constrictor NuGet package with the new README and Shields

If you like project stats, then here’s a breakdown of the contributions by numbers:

  • 11 total contributors (5 submitting more than one pull request)
  • 41 pull requests closed
  • 151 commits made
  • Over 10K new lines of code

GitHub’s Code Frequency graph for Boa Constrictor shown below illustrates how much activity the project had during Hacktoberfest 2021. Notice the huge green and red spikes on the right side of the chart corresponding to the month of October 2021. That’s a lot of activity!

Hacktoberfest Contributions
The GitHub Code Frequency Graph for Boa Constrictor

Furthermore, every member of my Test Engineering & Architecture (TEA) team at Q2 completed four pull requests for Hacktoberfest, thus earning our prizes and our bragging rights. For the three others on the team, this was their first Hacktoberfest, and Boa Constrictor was their first open source project. We all joined together to make Boa Constrictor better for everyone. I’m very proud of each of them individually and of our team as a whole.

Personally, I gained more experience as an open source project maintainer. I brainstormed ideas with my team, assigned work to volunteers, and provided reviews for pull requests. I also had to handle slightly awkward situations, like politely turning down pull requests that could not be accepted. Thankfully, the project had very little spam, but we did have many potential contributors request to work on issues but then essentially disappear after being assigned. That made me appreciate the folks who did complete their pull requests even more.

Overall, Hacktoberfest 2021 was a great success for Boa Constrictor. We added several new features, docs, and quality-of-life improvements to the project. We also got people excited about open source contributions. Many thanks to Digital Ocean, Appwrite, Intel, and DeepSource for sponsoring Hacktoberfest 2021. Also, special thanks to Digital Ocean for featuring Boa Constrictor in their Hacktoberfest kickoff event. Keep on hacking!

Boa Constrictor is doing Hacktoberfest 2021!

Boa Constrictor is the .NET Screenplay Pattern. It helps you make better interactions for better automation! Its primary use case is Web UI and REST API test automation, but it can be used to automate any kind of interactions. The Screenplay Pattern is much more scalable for development and execution than the Page Object Model.

My team and I at Q2 developed Boa Constrictor for testing the PrecisionLender web app. Originally, we developed it internally as part of our C# test automation solution named “Boa”, but we later released it as an open source project on GitHub so that others could use it. In fact, we released it publicly in October 2020 during last year’s Hacktoberfest!

We are delighted to announce that Boa Constrictor will participate in Hacktoberfest 2021. Open source software is vital for our industry, and we strongly support efforts like Hacktoberfest to encourage folks to contribute to open source projects. Many thanks to Digital Ocean, Appwrite, Intel, and DeepSource for sponsoring Hacktoberfest again this year.

So, how can you contribute to Boa Constrictor? Take these four easy steps:

  1. Start by learning about the project.
  2. Read our guide to contributing code.
  3. Clone the GitHub repository.
  4. Look for unassigned open issues labeled “hacktoberfest”.
    1. Or, open an issue to propose a new idea!
  5. Add a comment to the issue saying that you’d like to do it.

To encourage contributions, I will give free Boa Constrictor stickers to anyone who makes a valid pull request to the project during Hacktoberfest 2021! (I’ll share a link where you can privately share your mailing address. I’ll mail stickers anywhere in the world – not just inside the United States.) The sticker is a 2″ medallion that looks like this:

Boa Constrictor sticker
The Boa Constrictor Sticker

Remember, you have until October 31 to make four qualifying pull requests for Hacktoberfest. We’d love for you to make at least one of those pull requests for Boa Constrictor.

How Q2 uses BDD with SpecFlow for testing PrecisionLender

This case study was written by Andrew Knight, Lead Software Engineer in Test for Q2’s PrecisionLender product, in collaboration with Q2 and Tricentis. It explains the PrecisionLender team’s continuous testing journey and how SpecFlow served as a cornerstone for success.

What is PrecisionLender?

PrecisionLender is a web application that empowers commercial bankers with in-the-moment insights that help them structure and price commercial deals. Andi®, PrecisionLender’s intelligent virtual analyst, delivers these hyper-focused recommendations in real-time, allowing relationship managers to make data-driven decisions while pricing their commercial deals. PrecisionLender is owned and developed by Q2, a financial experience software company dedicated to providing digital banking and lending solutions to banks, credit unions, alternative finance, and fintech companies in the U.S. and internationally.

The PrecisionLender Opportunity Screen
(Picture taken from the PrecisionLender Support Center)

The starting point

The PrecisionLender team had a robust Continuous Integration (CI) delivery pipeline with strong unit test coverage, but they lacked end-to-end feature coverage. Developers would fill this gap by manually inspecting their changes in a shared development environment. However, as the PrecisionLender app grew, manual checks could not cover all possible integrations. The team knew they needed continuous automated testing to provide a safety net for development to remain lean and efficient. In April 2018, they hired Andrew Knight as their first Software Engineer in Test (SET) – a new role for the company – to lead the effort.

Automating tests with SpecFlow

The PrecisionLender team developed the Boa test solution – a project for automating end-to-end tests at scale. Boa would become PrecisionLender’s internal platform for test automation development. The name “Boa” is a loose acronym for “Behavior-Oriented Automation.”

The team chose SpecFlow to be the core framework for Boa tests. Since the PrecisionLender app’s backend is developed using .NET, SpecFlow was a natural fit. SpecFlow’s Gherkin syntax made tests readable and understandable, even to product owners and product support specialists who do not code.

The SpecFlow framework integrates with tools like Selenium WebDriver for testing Web UIs and RestSharp for testing REST APIs to exercise vital pathways for thorough app coverage. SpecFlow’s dependency injection mechanisms are solid yet simple, and the online docs are thorough. Plus, SpecFlow is an open-source project, so anyone can look at its code to learn how things work, open requests for new features, and even offer code contributions.

An example Boa test, written in Gherkin using SpecFlow.

Executing tests with SpecFlow+ Runner

Writing good tests was only part of the challenge. The PrecisionLender team needed to execute Boa tests continuously to provide fast feedback on changes to the app. The team chose to run Boa tests using SpecFlow+ Runner, which is tailored for SpecFlow tests. The team uses SpecFlow+ Runner to launch tests in parallel in TeamCity any time a developer deploys a code change to internal pre-production environments. The entire test suite also runs every night against multiple product configurations. SpecFlow+ Runner produces a helpful test report with everything needed to triage test failures: pass-and-fail tallies overall and per feature, a visual execution timeline, and full system logs. If engineers need to investigate certain failures more closely, they can use SpecFlow tags and SpecFlow+ Runner profiles to selectively filter tests for reruns. SpecFlow+ Runner’s multiple features help the team expedite test execution and investigation.

The SpecFlow+ Runner report for a dozen smoke tests.

Sharing features with SpecFlow+ LivingDoc

Good test cases are more than just verification procedures – they are behavior specifications. They define how features should work. Instead of keeping testing work siloed by role, the PrecisionLender team wanted to share Boa tests as behavior specs with all stakeholders to foster greater collaboration and understanding around features. The team also wanted to share Boa tests with specific customers without sharing the entire automation code.

SpecFlow+ LivingDoc enabled the PrecisionLender team to turn Gherkin feature files into living documentation. Whereas the SpecFlow+ Runner report focuses on automation execution, the SpecFlow+ LivingDoc report focuses on behavior specification apart from coding and automation details. LivingDoc displays Gherkin scenarios in a readable, searchable way that both internal folks and customers can consume. It can also optionally include high-level pass-and-fail results for each scenario, providing just enough information to be helpful and not overwhelming. LivingDoc has also helped PrecisionLender’s engineers identify and eliminate unused step definitions within the automation code. PrecisionLender benefits greatly from complementary reports from SpecFlow+ Runner and SpecFlow+ LivingDoc.

The SpecFlow+ LivingDoc report for a dozen smoke tests with their pass-and-fail results.

Improving interactions with Boa Constrictor

The Boa test solution initially used the Page Object Model to model interactions with the PrecisionLender app. However, as the PrecisionLender team automated more and more Boa tests, it became apparent that page objects did not scale well. Many page object classes had duplicative methods, making automation code messy. Some methods also did not include appropriate waiting mechanisms, introducing flaky failures.

PrecisionLender’s SETs developed Boa Constrictor, a .NET implementation of the Screenplay Pattern, to make better interactions for better automation. In Screenplay, actors use abilities to perform interactions. For example, an ability could be using Selenium WebDriver, and an interaction could be clicking an element. The Screenplay Pattern can be seen as a refactoring of the Page Object Model that minimizes duplicate code through a better separation of concerns. Individual interactions can be hardened for robustness, eliminating flaky hotspots. The Boa test solution now exclusively uses Boa Constrictor for interactions.

In October 2020, Q2 released Boa Constrictor as an open-source project so that anyone can use it. It is fully compatible with SpecFlow and other .NET test frameworks, and it provides rich interactions for Selenium WebDriver and RestSharp out of the box.

Boa Constrictor, the .NET Screenplay Pattern.

Scaling massively with Selenium Grid

When the PrecisionLender team first started automating Boa tests, they ran tests one at a time. That soon became too slow since the average Boa test took 20 to 50 seconds to complete. The team then started running up to 3 tests in parallel on one machine, but that also was not fast enough. They turned to Selenium Grid, a tool for running WebDriver sessions remotely across multiple machines.

PrecisionLender built a set of internal Selenium Grid instances using Microsoft Azure virtual machines to run Boa tests at high scale. As of July 2021, PrecisionLender has over 1800 unique Boa tests that run across four distinct product configurations. Whenever TeamCity detects a code change, it triggers a “continuous” Boa test suite with over 1000 tests running 50 parallel tests using Google Chrome on Selenium Grid. It completes execution in about 10 minutes. TeamCity launches the full test suite every night against all product configurations with 64-100 parallel tests on Selenium Grid. Continuous Integration currently runs up to 10K Boa tests daily against the PrecisionLender app with SpecFlow+ Runner and Selenium Grid.

The Boa test solution architecture, including Continuous Integration through TeamCity and parallel testing with SpecFlow+ Runner and Selenium Grid.

Shifting left with BDD

Better testing and automation practices eventually inspired better development practices. Product owners would create user stories, but developers would struggle to understand requirements and business purposes fully. PrecisionLender’s SETs started bringing together the Three Amigos – business, development, and testing roles – to discuss product behaviors proactively while creating user stories. They introduced Behavior-Driven Development (BDD) activities like Example Mapping to explore behaviors together. Then, well-defined stories could be easily connected to SpecFlow tests written in Gherkin following Specification by Example (SBE). Teams repeatedly saved time by thinking before coding and specifying before testing. They built higher quality into features from the beginning, and they stopped before working on half-baked stories with unjustified value propositions. Developers who participated in these behavior-driven practices were also more likely to automate Boa tests on their own. Furthermore, one of PrecisionLender’s developers loved BDD practices so much that he joined the team of SETs! Through Gherkin, SpecFlow provided a foundation that enabled quality work to shift left.

Challenges along the way

Achieving true continuous testing had its challenges along the way. Intermittent failure was the most significant issue PrecisionLender faced at scale. With so many tests, environments, and infrastructural pieces, arbitrary failures were statistically unavoidable. The PrecisionLender team took a two-pronged approach to handle intermittent failures: (1) eliminate race conditions in automation using good interactions with Boa Constrictor, and (2) use SpecFlow+ Runner to automatically retry failed tests to determine if failures were consistent or intermittent. These two approaches reduced the frequency of flaky failures and helped engineers quickly resolve any remaining issues. As a result, Boa tests enjoy well above a 99% success rate, and most failures are due to actual bugs.

PrecisionLender app performance at scale was a second big challenge. Running up to 100 tests in parallel turned functional tests into de facto load tests. Testing at scale repeatedly uncovered performance bottlenecks in the app. Performance issues caused widespread test failures that were difficult to diagnose because they appeared intermittently. Still, the visual timeline and timestamps in the SpecFlow+ Runner report helped the team identify periods of failure that could be crosschecked against backend logs, metrics, and database queries. Developers resolved many performance issues and significantly boost the app’s response times and load capacity.

Training team members to develop solid test automation was the third challenge. At the start of the journey, test automation, Gherkin, and BDD were all new to PrecisionLender. The PrecisionLender SETs took active steps to train others on how to develop good tests and good automation through group workshops, Three Amigos meetings, and one-on-one mentoring sessions. They shared resources like the Automation Panda blog for how to write good tests and good Gherkin. The investment in education paid off: many developers have joined the SETs in writing readable, reliable Boa tests that run continuously.

Benefits to the business

Developing a continuous testing solution brought many incredible benefits to PrecisionLender. First, the quality of the PrecisionLender app improved because continuous testing provided fast feedback on failures that developers could quickly fix. Instead of relying on manual spot checks, the team could trust the comprehensive safety net of Boa tests to catch bugs. Many issues would be caught within an hour of a developer making a code commit, and the longest feedback cycle would be only one business day for the full nightly test suites to run. Boa tests catch failures before customers ever experience them. The continuous nature of testing enables PrecisionLender to publish new releases every two weeks.

Second, the high reliability of the Boa test solution means that the PrecisionLender team can trust test results. When a test passes, the behavior is working. When a test fails, there is a real bug. Reliability also means that engineers spend less time on automation maintenance and more time on more valuable activities, like developing new features and adding new tests. Quality is present in both the product code and the test code.

Third, continuous testing boosts customer confidence in PrecisionLender. Customers trust the software quality because they know that PrecisionLender thoroughly tests every release. The PrecisionLender team also shares SpecFlow+ LivingDoc reports with specific clients to prove quality.

A bright future

PrecisionLender’s continuous testing journey is not over. Since the PrecisionLender team hired its first SET, it has hired three more, in addition to a testing manager, to grow quality improvement efforts. Multiple development teams have written their own Boa tests, and they plan to write more tests independently. SpecFlow’s tools have been indispensable in helping the PrecisionLender team achieve successful quality assurance. As PrecisionLender welcomes more customers, the Boa solution will be ready to scale with more tests, more configurations, and more executions.

My Upside-Down QA Story for Global Testers Day

Happy Global Testers Day! For 2021, QA Touch is celebrating with webinars, games, competitions, blogs, and videos. I participated by sharing an “upside-down” story from years ago when I accidentally wiped out all of NetApp’s continuous integration testing. Please watch my story below. I hope you find it both insightful and entertaining!

Are Automated Test Retries Good or Bad?

What happens when a test fails? If someone is manually running the test, then they will pause and poke around to learn more about the problem. However, when an automated test fails, the rest of the suite keeps running. Testers won’t get to view results until the suite is complete, and the automation won’t perform any extra exploration at the time of failure. Instead, testers must review logs and other artifacts gathered during testing, and they even might need to rerun the failed test to check if the failure is consistent.

Since testers typically rerun failed tests as part of their investigation, why not configure automated tests to automatically rerun failed tests? On the surface, this seems logical: automated retries can eliminate one more manual step. Unfortunately, automated retries can also enable poor practices, like ignoring legitimate issues.

So, are automated test retries good or bad? This is actually a rather controversial topic. I’ve heard many voices strongly condemn automated retries as an antipattern (see here, here, and here). While I agree that automated retries can be abused, I nevertheless still believe they can add value to test automation. A deeper understanding needs a nuanced approach.

So, how do automated retries work?

To avoid any confusion, let’s carefully define what we mean by “automated test retries.”

Let’s say I have a suite of 100 automated tests. When I run these tests, the framework will execute each test individually and yield a pass or fail result for the test. At the end of the suite, the framework will aggregate all the results together into one report. In the best case, all tests pass: 100/100.

However, suppose that one of the tests fails. Upon failure, the test framework would capture any exceptions, perform any cleanup routines, log a failure, and safely move onto the next test case. At the end of the suite, the report would show 99/100 passing tests with one test failure.

By default, most test frameworks will run each test one time. However, some test frameworks have features for automatically rerunning test cases that fail. The framework may even enable testers to specify how many retries to attempt. So, let’s say that we configure 2 retries for our suite of 100 tests. When that one test fails, the framework would queue that failing test to run twice more before moving onto the next test. It would also add more information to the test report. For example, if one retry passed but another one failed, the report would show 99/100 passing tests with a 1/3 pass rate for the failing test.

In this article, we will focus on automated retries for test cases. Testers could also program other types of retries into automated tests, such as retrying browser page loads or REST requests. Interaction-level retries require sophisticated, context-specific logic, whereas test-level retry logic works the same for any kind of test case. (Interaction-level retries would also need their own article.)

Automated retries can be a terrible antipattern

Let’s see how automated test retries can be abused:

Jeremy is a member of a team that runs a suite of 300 automated tests for their web app every night. Unfortunately, the tests are notoriously flaky. About a dozen different tests fail every night, and Jeremy spends a lot of time each morning triaging the failures. Whenever he reruns failed tests individually on his laptop, they almost always pass.

To save himself time in the morning, Jeremy decides to add automatic retries to the test suite. Whenever a test fails, the framework will attempt one retry. Jeremy will only investigate tests whose retries failed. If a test had a passing retry, then he will presume that the original failure was just a flaky test.

Ouch! There are several problems here.

First, Jeremy is using retries to conceal information rather than reveal information. If a test fails but its retries pass, then the test still reveals a problem! In this case, the underlying problem is flaky behavior. Jeremy is using automated retries to overwrite intermittent failures with intermittent passes. Instead, he should investigate why the test are flaky. Perhaps automated interactions have race conditions that need more careful waiting. Or, perhaps features in the web app itself are behaving unexpectedly. Test failures indicate a problem – either in test code, product code, or infrastructure.

Second, Jeremy is using automated retries to perpetuate poor practices. Before adding automated retries to the test suite, Jeremy was already manually retrying tests and disregarding flaky failures. Adding retries to the test suite merely speeds up the process, making it easier to sidestep failures.

Third, the way Jeremy uses automated retries indicates that the team does not value their automated test suite very much. Good test automation requires effort and investment. Persistent flakiness is a sign of neglect, and it fosters low trust in testing. Using retries is merely a “band-aid” on both the test failures and the team’s attitude about test automation.

In this example, automated test retries are indeed a terrible antipattern. They enable Jeremy and his team to ignore legitimate issues. In fact, they incentivize the team to ignore failures because they institutionalize the practice of replacing red X’s with green checkmarks. This team should scrap automated test retries and address the root causes of flakiness.

green check red x
Testers should not conceal failures by overwriting them with passes.

Automated retries are not the main problem

Ignoring flaky failures is unfortunately all too common in the software industry. I must admit that in my days as a newbie engineer, I was guilty of rerunning tests to get them to pass. Why do people do this? The answer is simple: intermittent failures are difficult to resolve.

Testers love to find consistent, reproducible failures because those are easy to explain. Other developers can’t push back against hard evidence. However, intermittent failures take much more time to isolate. Root causes can become mind-bending puzzles. They might be triggered by environmental factors or awkward timings. Sometimes, teams never figure out what causes them. In my personal experience, bug tickets for intermittent failures get far less traction than bug tickets for consistent failures. All these factors incentivize folks to turn a blind eye to intermittent failures when convenient.

Automated retries are just a tool and a technique. They may enable bad practices, but they aren’t inherently bad. The main problem is willfully ignoring certain test results.

Automated retries can be incredibly helpful

So, what is the right way to use automated test retries? Use them to gather more information from the tests. Test results are simply artifacts of feedback. They reveal how a software product behaved under specific conditions and stimuli. The pass-or-fail nature of assertions simplifies test results at the top level of a report in order to draw attention to failures. However, reports can give more information than just binary pass-or-fail results. Automated test retries yield a series of results for a failing test that indicate a success rate.

For example, SpecFlow and the SpecFlow+ Runner make it easy to use automatic retries the right way. Testers simply need to add the retryFor setting to their SpecFlow+ Runner profile to set the number of retries to attempt. In the final report, SpecFlow records the success rate of each test with color-coded counts. Results are revealed, not concealed.

Here is a snippet of the SpecFlow+ Report showing both intermittent failures (in orange) and consistent failures (in red).

This information jumpstarts analysis. As a tester, one of the first questions I ask myself about a failing test is, “Is the failure reproducible?” Without automated retries, I need to manually rerun the test to find out – often at a much later time and potentially within a different context. With automated retries, that step happens automatically and in the same context. Analysis then takes two branches:

  1. If all retry attempts failed, then the failure is probably consistent and reproducible. I would expect it to be a clear functional failure that would be fast and easy to report. I jump on these first to get them out of the way.
  2. If some retry attempts passed, then the failure is intermittent, and it will probably take more time to investigate. I will look more closely at the logs and screenshots to determine what went wrong. I will try to exercise the product behavior manually to see if the product itself is inconsistent. I will also review the automation code to make sure there are no unhandled race conditions. I might even need to rerun the test multiple times to measure a more accurate failure rate.

I do not ignore any failures. Instead, I use automated retries to gather more information about the nature of the failures. In the moment, this extra info helps me expedite triage. Over time, the trends this info reveals helps me identify weak spots in both the product under test and the test automation.

Automated retries are most helpful at high scale

When used appropriate, automated retries can be helpful for any size test automation project. However, they are arguably more helpful for large projects running tests at high scale than small projects. Why? Two main reasons: complexities and priorities.

Large-scale test projects have many moving parts. For example, at PrecisionLender, we presently run 4K-10K end-to-end tests against our web app every business day. (We also run ~100K unit tests every business day.) Our tests launch from TeamCity as part of our Continuous Integration system, and they use in-house Selenium Grid instances to run 50-100 tests in parallel. The PrecisionLender application itself is enormous, too.

Intermittent failures are inevitable in large-scale projects for many different reasons. There could be problems in the test code, but those aren’t the only possible problems. At PrecisionLender, Boa Constrictor already protects us from race conditions, so our intermittent test failures are rarely due to problems in automation code. Other causes for flakiness include:

  • The app’s complexity makes certain features behave inconsistently or unexpectedly
  • Extra load on the app slows down response times
  • The cloud hosting platform has a service blip
  • Selenium Grid arbitrarily chokes on a browser session
  • The DevOps team recycles some resources
  • An engineer makes a system change while tests were running
  • The CI pipeline deploys a new change in the middle of testing

Many of these problems result from infrastructure and process. They can’t easily be fixed, especially when environments are shared. As one tester, I can’t rewrite my whole company’s CI pipeline to be “better.” I can’t rearchitect the app’s whole delivery model to avoid all collisions. I can’t perfectly guarantee 100% uptime for my cloud resources or my test tools like Selenium Grid. Some of these might be good initiatives to pursue, but one tester’s dictates do not immediately become reality. Many times, we need to work with what we have. Curt demands to “just fix the tests” come off as pedantic.

Automated test retries provide very useful information for discerning the nature of such intermittent failures. For example, at PrecisionLender, we hit Selenium Grid problems frequently. Roughly 1/10000 Selenium Grid browser sessions will inexplicably freeze during testing. We don’t know why this happens, and our investigations have been unfruitful. We chalk it up to minor instability at scale. Whenever the 1/10000 failure strikes, our suite’s automated retries kick in and pass. When we review the test report, we see the intermittent failure along with its exception method. Based on its signature, we immediately know that test is fine. We don’t need to do extra investigation work or manual reruns. Automated retries gave us the info we needed.

Selenium Grid
Selenium Grid is a large cluster with many potential points of failure.
(Image source: LambdaTest.)

Another type of common failure is intermittently slow performance in the PrecisionLender application. Occasionally, the app will freeze for a minute or two and then recover. When that happens, we see a “brick wall” of failures in our report: all tests during that time frame fail. Then, automated retries kick in, and the tests pass once the app recovers. Automatic retries prove in the moment that the app momentarily froze but that the individual behaviors covered by the tests are okay. This indicates functional correctness for the behaviors amidst a performance failure in the app. Our team has used these kinds of results on multiple occasions to identify performance bugs in the app by cross-checking system logs and database queries during the time intervals for those brick walls of intermittent failures. Again, automated retries gave us extra information that helped us find deep issues.

Automated retries delineate failure priorities

That answers complexity, but what about priority? Unfortunately, in large projects, there is more work to do than any team can handle. Teams need to make tough decisions about what to do now, what to do later, and what to skip. That’s just business. Testing decisions become part of that prioritization.

In almost all cases, consistent failures are inherently a higher priority than intermittent failures because they have a greater impact on the end users. If a feature fails every single time it is attempted, then the user is blocked from using the feature, and they cannot receive any value from it. However, if a feature works some of the time, then the user can still get some value out of it. Furthermore, the rarer the intermittency, the lower the impact, and consequentially the lower the priority. Intermittent failures are still important to address, but they must be prioritized relative to other work at hand.

Automated test retries automate that initial prioritization. When I triage PrecisionLender tests, I look into consistent “red” failures first. Our SpecFlow reports make them very obvious. I know those failures will be straightforward to reproduce, explain, and hopefully resolve. Then, I look into intermittent “orange” failures second. Those take more time. I can quickly identify issues like Selenium Grid disconnections, but other issues may not be obvious (like system interruptions) or may need additional context (like the performance freezes). Sometimes, we may need to let tests run for a few days to get more data. If I get called away to another more urgent task while I’m triaging results, then at least I could finish the consistent failures. It’s a classic 80/20 rule: investigating consistent failures typically gives more return for less work, while investigating intermittent failures gives less return for more work. It is what it is.

The only time I would prioritize an intermittent failure over a consistent failure would be if the intermittent failure causes catastrophic or irreversible damage, like wiping out an entire system, corrupting data, or burning money. However, that type of disastrous failure is very rare. In my experience, almost all intermittent failures are due to poorly written test code, automation timeouts from poor app performance, or infrastructure blips.

Context matters

Automated test retries can be a blessing or a curse. It all depends on how testers use them. If testers use retries to reveal more information about failures, then retries greatly assist triage. Otherwise, if testers use retries to conceal intermittent failures, then they aren’t doing their jobs as testers. Folks should not be quick to presume that automated retries are always an antipattern. We couldn’t achieve our scale of testing at PrecisionLender without them. Context matters.