Playwright Workshop for TAU: The Homecoming

Want to learn Playwright with Python? Take this workshop!

Playwright is an awesome new browser automation library. With Playwright, you can automate web UI interactions for testing or for web scraping with a concise, uniform API in one of four languages: Python, C#, Java, and JavaScript. Playwright is also completely open source and backed by Microsoft. It’s a powerful alternative to Selenium WebDriver.

On December 1, 2021, I delivered a workshop on Playwright for TAU: The Homecoming. In my workshop, I taught how to build a test automation project in Python using Playwright with pytest, Python’s most popular test framework. We automated a test case together for performing a DuckDuckGo web search.

If you missed the workshop, no worries: You can still take the workshop as a self-guided tutorial! The workshop instructions and example code are located in this GitHub repository:

https://github.com/AutomationPanda/tau-playwright-workshop

To take the workshop as a self-guided tutorial, read the repository’s README, and then follow the instructions in the Markdown guides under the workshop folder. The workshop has five main parts:

  1. Getting started
    1. What is Playwright?
    2. Our web search test
    3. Test project setup
  2. First steps with Playwright
    1. Browsers, contexts, and pages
    2. Navigating to a web page
    3. Performing a search
  3. Writing assertions
    1. Checking the search field
    2. Checking the result links
    3. Checking the title
  4. Refactoring using page objects
    1. The search page
    2. The result page
    3. Page object fixtures
  5. Nifty Playwright tricks
    1. Testing different browsers
    2. Capturing screenshots and videos
    3. Running tests in parallel

If you get stuck or have any questions, please open issues against the GitHub repository, and I’ll try to help. Happy coding!

A Simple High-Quality Mic Setup for Software Pros

I’ve always been frustrated with poor quality audio recordings. Microphones built into laptops are very convenient, but they usually yield tinny sound lacking the depth of real voices. When I asked my audiophile friends for advice, they recommended studio-level equipment that was beyond my comprehension and my budget. As a software guy, I just wanted an audio setup that captured high-quality audio while still being convenient for everyday usage. I’d need it for remote meetings as well as for recording talks and tutorials. I was willing to pay for good equipment as long as I could use it well. Unfortunately, my biggest frustration was ignorance. I didn’t know anything about recording.

Here’s what I finally found to work well for me:

  1. A Blue Yeti microphone
  2. A Blue Compass boom arm
  3. A Blue Radius III shockmount
  4. A foam microphone windscreen

When I did my research, the Blue Yeti microphone was at the top of everyone’s recommendation list. The nicest thing about a Blue Yeti mic is that it connects via USB. You simply plug it right into your laptop and select audio input and output channels. The mic doesn’t need an external power source, either.

Initially, I bought only the Blue Yeti mic and the foam windscreen. Instead of using the boom arm, I used the tabletop stand that came with the mic for all my recordings. This was a big mistake. The tabletop stand picked up a lot of local noise, like typing on a keyboard. The audio quality it picked up also sounded like it had a bit of an echo, which may have been due to sound bouncing off the tabletop or my hardwood floors.

The boom arm and shockmount made a huge improvement in recording quality. Plus, with the boom arm mounted firmly to my desk, I can easily move it towards me for recording or out of the way otherwise. It feels quite sturdy. I chose to use all Blue products so that I could be certain that they’d work together. You can save quite a bit of money if you buy the microphone, boom arm, and shockmount together as a bundle on Amazon (~$200), instead of a la carte like I did (~$250).

My audio recording setup: a Blue Yeti microphone mounted onto a Blue Radius III Shockmount dangling from a Blue Compass boom arm. Pardon the mess on my desk – it’s a work in progress!

I plan to get a docking station for my laptop so that I can plug the microphone’s USB cable into the dock, simplifying desk’s cable management.

Even with this new setup, I still felt like I didn’t understand how to use my Blue Yeti microphone to its full potential. Thankfully, YouTube came to the rescue! This video greatly helped me understand things like tuning the mic’s gain and positioning the mic while speaking:

In summary, here’s what I like about the setup:

  • It yields very good (maybe professional?) audio recording quality.
  • It is simple enough for anyone to set up and use.
  • It is convenient to use for remote meetings, video recordings, etc.
  • It feels quite sturdy.
  • It is relatively affordable (compared to other audio equipment).

Please note: I am not an expert in audio equipment, and this article is not sponsored by any company. I simply hope that someone can benefit from the things I learned (and possibly save a few bucks) if they want to improve their own recording game!

Boa Constrictor’s Awesome Hacktoberfest 2021

Boa Constrictor is the .NET Screenplay Pattern. It helps you make better interactions for better automation! Its primary use case is Web UI and REST API test automation, but it can be used to automate any kind of interactions. The Screenplay Pattern is much more scalable for development and execution than the Page Object Model.

The Boa Constrictor maintainers and I strongly support open source software. That’s why we participated in Hacktoberfest 2021. In fact, this was the second Hacktoberfest we did. We launched Boa Constrictor as an open source project a year ago during Hacktoberfest 2020! We love sharing our code with the community and inspiring others to get involved. To encourage participation this year, we added the “hacktoberfest” label to open issues, and we offered cool stickers to anyone who contributed.

Boa Constrictor sticker
Boa Constrictor: The .NET Screenplay Pattern
Sticker Medallion

Hacktoberfest 2021 was a tremendous success for Boa Constrictor. Even though the project is small, we received several contributions. Here’s a summary of all the new stuff we added to Boa Constrictor:

  • Updated WebDriver interactions to use Selenium WebDriver 4.0
  • Implemented asynchronous programming for Tasks and Questions
  • Extended the Wait Task to wait for multiple Questions using AND and OR logic
  • Standardized ToString methods for all WebDriver interactions
  • Automated unit tests for WebDriver Questions
  • Wrote new user guides for test framework integrations and interaction patterns
  • Made small refinements to the doc site
  • Created GitHub templates for issues and pull requests
  • Replaced the symbols NuGet package with embedded debugging
  • Added the README to the NuGet package
  • Added Shields to the README
  • Restructured projects for docs, logos, and talk

During Hacktoberfest 2021, we made a series of four releases because we believe in lean development that puts new features in the hands of developers ASAP. The final capstone release was version 2.0.0: a culmination of all Hacktoberfest work! Here’s a view of the Boa Constrictor NuGet package with its new README (Shields included):

The Boa Constrictor NuGet package with the new README and Shields
The Boa Constrictor NuGet package with the new README and Shields

If you like project stats, then here’s a breakdown of the contributions by numbers:

  • 11 total contributors (5 submitting more than one pull request)
  • 41 pull requests closed
  • 151 commits made
  • Over 10K new lines of code

GitHub’s Code Frequency graph for Boa Constrictor shown below illustrates how much activity the project had during Hacktoberfest 2021. Notice the huge green and red spikes on the right side of the chart corresponding to the month of October 2021. That’s a lot of activity!

Hacktoberfest Contributions
The GitHub Code Frequency Graph for Boa Constrictor

Furthermore, every member of my Test Engineering & Architecture (TEA) team at Q2 completed four pull requests for Hacktoberfest, thus earning our prizes and our bragging rights. For the three others on the team, this was their first Hacktoberfest, and Boa Constrictor was their first open source project. We all joined together to make Boa Constrictor better for everyone. I’m very proud of each of them individually and of our team as a whole.

Personally, I gained more experience as an open source project maintainer. I brainstormed ideas with my team, assigned work to volunteers, and provided reviews for pull requests. I also had to handle slightly awkward situations, like politely turning down pull requests that could not be accepted. Thankfully, the project had very little spam, but we did have many potential contributors request to work on issues but then essentially disappear after being assigned. That made me appreciate the folks who did complete their pull requests even more.

Overall, Hacktoberfest 2021 was a great success for Boa Constrictor. We added several new features, docs, and quality-of-life improvements to the project. We also got people excited about open source contributions. Many thanks to Digital Ocean, Appwrite, Intel, and DeepSource for sponsoring Hacktoberfest 2021. Also, special thanks to Digital Ocean for featuring Boa Constrictor in their Hacktoberfest kickoff event. Keep on hacking!

Boa Constrictor is doing Hacktoberfest 2021!

Boa Constrictor is the .NET Screenplay Pattern. It helps you make better interactions for better automation! Its primary use case is Web UI and REST API test automation, but it can be used to automate any kind of interactions. The Screenplay Pattern is much more scalable for development and execution than the Page Object Model.

My team and I at Q2 developed Boa Constrictor for testing the PrecisionLender web app. Originally, we developed it internally as part of our C# test automation solution named “Boa”, but we later released it as an open source project on GitHub so that others could use it. In fact, we released it publicly in October 2020 during last year’s Hacktoberfest!

We are delighted to announce that Boa Constrictor will participate in Hacktoberfest 2021. Open source software is vital for our industry, and we strongly support efforts like Hacktoberfest to encourage folks to contribute to open source projects. Many thanks to Digital Ocean, Appwrite, Intel, and DeepSource for sponsoring Hacktoberfest again this year.

So, how can you contribute to Boa Constrictor? Take these four easy steps:

  1. Start by learning about the project.
  2. Read our guide to contributing code.
  3. Clone the GitHub repository.
  4. Look for unassigned open issues labeled “hacktoberfest”.
    1. Or, open an issue to propose a new idea!
  5. Add a comment to the issue saying that you’d like to do it.

To encourage contributions, I will give free Boa Constrictor stickers to anyone who makes a valid pull request to the project during Hacktoberfest 2021! (I’ll share a link where you can privately share your mailing address. I’ll mail stickers anywhere in the world – not just inside the United States.) The sticker is a 2″ medallion that looks like this:

Boa Constrictor sticker
The Boa Constrictor Sticker

Remember, you have until October 31 to make four qualifying pull requests for Hacktoberfest. We’d love for you to make at least one of those pull requests for Boa Constrictor.

How Q2 uses BDD with SpecFlow for testing PrecisionLender

This case study was written by Andrew Knight, Lead Software Engineer in Test for Q2’s PrecisionLender product, in collaboration with Q2 and Tricentis. It explains the PrecisionLender team’s continuous testing journey and how SpecFlow served as a cornerstone for success.

What is PrecisionLender?

PrecisionLender is a web application that empowers commercial bankers with in-the-moment insights that help them structure and price commercial deals. Andi®, PrecisionLender’s intelligent virtual analyst, delivers these hyper-focused recommendations in real-time, allowing relationship managers to make data-driven decisions while pricing their commercial deals. PrecisionLender is owned and developed by Q2, a financial experience software company dedicated to providing digital banking and lending solutions to banks, credit unions, alternative finance, and fintech companies in the U.S. and internationally.

The PrecisionLender Opportunity Screen
(Picture taken from the PrecisionLender Support Center)

The starting point

The PrecisionLender team had a robust Continuous Integration (CI) delivery pipeline with strong unit test coverage, but they lacked end-to-end feature coverage. Developers would fill this gap by manually inspecting their changes in a shared development environment. However, as the PrecisionLender app grew, manual checks could not cover all possible integrations. The team knew they needed continuous automated testing to provide a safety net for development to remain lean and efficient. In April 2018, they hired Andrew Knight as their first Software Engineer in Test (SET) – a new role for the company – to lead the effort.

Automating tests with SpecFlow

The PrecisionLender team developed the Boa test solution – a project for automating end-to-end tests at scale. Boa would become PrecisionLender’s internal platform for test automation development. The name “Boa” is a loose acronym for “Behavior-Oriented Automation.”

The team chose SpecFlow to be the core framework for Boa tests. Since the PrecisionLender app’s backend is developed using .NET, SpecFlow was a natural fit. SpecFlow’s Gherkin syntax made tests readable and understandable, even to product owners and product support specialists who do not code.

The SpecFlow framework integrates with tools like Selenium WebDriver for testing Web UIs and RestSharp for testing REST APIs to exercise vital pathways for thorough app coverage. SpecFlow’s dependency injection mechanisms are solid yet simple, and the online docs are thorough. Plus, SpecFlow is an open-source project, so anyone can look at its code to learn how things work, open requests for new features, and even offer code contributions.

An example Boa test, written in Gherkin using SpecFlow.

Executing tests with SpecFlow+ Runner

Writing good tests was only part of the challenge. The PrecisionLender team needed to execute Boa tests continuously to provide fast feedback on changes to the app. The team chose to run Boa tests using SpecFlow+ Runner, which is tailored for SpecFlow tests. The team uses SpecFlow+ Runner to launch tests in parallel in TeamCity any time a developer deploys a code change to internal pre-production environments. The entire test suite also runs every night against multiple product configurations. SpecFlow+ Runner produces a helpful test report with everything needed to triage test failures: pass-and-fail tallies overall and per feature, a visual execution timeline, and full system logs. If engineers need to investigate certain failures more closely, they can use SpecFlow tags and SpecFlow+ Runner profiles to selectively filter tests for reruns. SpecFlow+ Runner’s multiple features help the team expedite test execution and investigation.

The SpecFlow+ Runner report for a dozen smoke tests.

Sharing features with SpecFlow+ LivingDoc

Good test cases are more than just verification procedures – they are behavior specifications. They define how features should work. Instead of keeping testing work siloed by role, the PrecisionLender team wanted to share Boa tests as behavior specs with all stakeholders to foster greater collaboration and understanding around features. The team also wanted to share Boa tests with specific customers without sharing the entire automation code.

SpecFlow+ LivingDoc enabled the PrecisionLender team to turn Gherkin feature files into living documentation. Whereas the SpecFlow+ Runner report focuses on automation execution, the SpecFlow+ LivingDoc report focuses on behavior specification apart from coding and automation details. LivingDoc displays Gherkin scenarios in a readable, searchable way that both internal folks and customers can consume. It can also optionally include high-level pass-and-fail results for each scenario, providing just enough information to be helpful and not overwhelming. LivingDoc has also helped PrecisionLender’s engineers identify and eliminate unused step definitions within the automation code. PrecisionLender benefits greatly from complementary reports from SpecFlow+ Runner and SpecFlow+ LivingDoc.

The SpecFlow+ LivingDoc report for a dozen smoke tests with their pass-and-fail results.

Improving interactions with Boa Constrictor

The Boa test solution initially used the Page Object Model to model interactions with the PrecisionLender app. However, as the PrecisionLender team automated more and more Boa tests, it became apparent that page objects did not scale well. Many page object classes had duplicative methods, making automation code messy. Some methods also did not include appropriate waiting mechanisms, introducing flaky failures.

PrecisionLender’s SETs developed Boa Constrictor, a .NET implementation of the Screenplay Pattern, to make better interactions for better automation. In Screenplay, actors use abilities to perform interactions. For example, an ability could be using Selenium WebDriver, and an interaction could be clicking an element. The Screenplay Pattern can be seen as a refactoring of the Page Object Model that minimizes duplicate code through a better separation of concerns. Individual interactions can be hardened for robustness, eliminating flaky hotspots. The Boa test solution now exclusively uses Boa Constrictor for interactions.

In October 2020, Q2 released Boa Constrictor as an open-source project so that anyone can use it. It is fully compatible with SpecFlow and other .NET test frameworks, and it provides rich interactions for Selenium WebDriver and RestSharp out of the box.

Boa Constrictor, the .NET Screenplay Pattern.

Scaling massively with Selenium Grid

When the PrecisionLender team first started automating Boa tests, they ran tests one at a time. That soon became too slow since the average Boa test took 20 to 50 seconds to complete. The team then started running up to 3 tests in parallel on one machine, but that also was not fast enough. They turned to Selenium Grid, a tool for running WebDriver sessions remotely across multiple machines.

PrecisionLender built a set of internal Selenium Grid instances using Microsoft Azure virtual machines to run Boa tests at high scale. As of July 2021, PrecisionLender has over 1800 unique Boa tests that run across four distinct product configurations. Whenever TeamCity detects a code change, it triggers a “continuous” Boa test suite with over 1000 tests running 50 parallel tests using Google Chrome on Selenium Grid. It completes execution in about 10 minutes. TeamCity launches the full test suite every night against all product configurations with 64-100 parallel tests on Selenium Grid. Continuous Integration currently runs up to 10K Boa tests daily against the PrecisionLender app with SpecFlow+ Runner and Selenium Grid.

The Boa test solution architecture, including Continuous Integration through TeamCity and parallel testing with SpecFlow+ Runner and Selenium Grid.

Shifting left with BDD

Better testing and automation practices eventually inspired better development practices. Product owners would create user stories, but developers would struggle to understand requirements and business purposes fully. PrecisionLender’s SETs started bringing together the Three Amigos – business, development, and testing roles – to discuss product behaviors proactively while creating user stories. They introduced Behavior-Driven Development (BDD) activities like Example Mapping to explore behaviors together. Then, well-defined stories could be easily connected to SpecFlow tests written in Gherkin following Specification by Example (SBE). Teams repeatedly saved time by thinking before coding and specifying before testing. They built higher quality into features from the beginning, and they stopped before working on half-baked stories with unjustified value propositions. Developers who participated in these behavior-driven practices were also more likely to automate Boa tests on their own. Furthermore, one of PrecisionLender’s developers loved BDD practices so much that he joined the team of SETs! Through Gherkin, SpecFlow provided a foundation that enabled quality work to shift left.

Challenges along the way

Achieving true continuous testing had its challenges along the way. Intermittent failure was the most significant issue PrecisionLender faced at scale. With so many tests, environments, and infrastructural pieces, arbitrary failures were statistically unavoidable. The PrecisionLender team took a two-pronged approach to handle intermittent failures: (1) eliminate race conditions in automation using good interactions with Boa Constrictor, and (2) use SpecFlow+ Runner to automatically retry failed tests to determine if failures were consistent or intermittent. These two approaches reduced the frequency of flaky failures and helped engineers quickly resolve any remaining issues. As a result, Boa tests enjoy well above a 99% success rate, and most failures are due to actual bugs.

PrecisionLender app performance at scale was a second big challenge. Running up to 100 tests in parallel turned functional tests into de facto load tests. Testing at scale repeatedly uncovered performance bottlenecks in the app. Performance issues caused widespread test failures that were difficult to diagnose because they appeared intermittently. Still, the visual timeline and timestamps in the SpecFlow+ Runner report helped the team identify periods of failure that could be crosschecked against backend logs, metrics, and database queries. Developers resolved many performance issues and significantly boost the app’s response times and load capacity.

Training team members to develop solid test automation was the third challenge. At the start of the journey, test automation, Gherkin, and BDD were all new to PrecisionLender. The PrecisionLender SETs took active steps to train others on how to develop good tests and good automation through group workshops, Three Amigos meetings, and one-on-one mentoring sessions. They shared resources like the Automation Panda blog for how to write good tests and good Gherkin. The investment in education paid off: many developers have joined the SETs in writing readable, reliable Boa tests that run continuously.

Benefits to the business

Developing a continuous testing solution brought many incredible benefits to PrecisionLender. First, the quality of the PrecisionLender app improved because continuous testing provided fast feedback on failures that developers could quickly fix. Instead of relying on manual spot checks, the team could trust the comprehensive safety net of Boa tests to catch bugs. Many issues would be caught within an hour of a developer making a code commit, and the longest feedback cycle would be only one business day for the full nightly test suites to run. Boa tests catch failures before customers ever experience them. The continuous nature of testing enables PrecisionLender to publish new releases every two weeks.

Second, the high reliability of the Boa test solution means that the PrecisionLender team can trust test results. When a test passes, the behavior is working. When a test fails, there is a real bug. Reliability also means that engineers spend less time on automation maintenance and more time on more valuable activities, like developing new features and adding new tests. Quality is present in both the product code and the test code.

Third, continuous testing boosts customer confidence in PrecisionLender. Customers trust the software quality because they know that PrecisionLender thoroughly tests every release. The PrecisionLender team also shares SpecFlow+ LivingDoc reports with specific clients to prove quality.

A bright future

PrecisionLender’s continuous testing journey is not over. Since the PrecisionLender team hired its first SET, it has hired three more, in addition to a testing manager, to grow quality improvement efforts. Multiple development teams have written their own Boa tests, and they plan to write more tests independently. SpecFlow’s tools have been indispensable in helping the PrecisionLender team achieve successful quality assurance. As PrecisionLender welcomes more customers, the Boa solution will be ready to scale with more tests, more configurations, and more executions.

My Upside-Down QA Story for Global Testers Day

Happy Global Testers Day! For 2021, QA Touch is celebrating with webinars, games, competitions, blogs, and videos. I participated by sharing an “upside-down” story from years ago when I accidentally wiped out all of NetApp’s continuous integration testing. Please watch my story below. I hope you find it both insightful and entertaining!

dust

Skin Rashes and Software Testing

Many of you know me as the “Automation Panda” through my blog, my Twitter handle, or my online courses. Maybe you’ve attended one of my conference talks. I love connecting with others, but many people don’t get to know me personally. Behind my black-and-white façade, I’m a regular guy. When I’m not programming, I enjoy cooking and video gaming. I’m also currently fixing up a vintage Volkswagen Beetle. However, for nearly two years, I’ve suffered a skin rash that will not go away. I haven’t talked about it much until recently, when it became unbearable.

For a while, things turned bad. Thankfully, things are a little better now. I’d like to share my journey publicly because it helps to humanize folks in tech like me. We’re all people, not machines. Vulnerability is healthy. My journey also reminded me of a few important tenants for testing software. I hope you’ll read my story, and I promise to avoid the gross parts.


Distant Precursors

I’ve been blessed with great health and healthcare my entire life. However, when I was a teenager, I had a weird skin issue: the skin around my right eye socket turned dry, itchy, and red. Imagine dandruff, but on your face – it was flaky like a bad test (ha!). Lotions and creams did nothing to help. After it persisted for several weeks, my parents scheduled an appointment with my pediatrician. They didn’t know what caused it, but they gave me a sample tube of topical steroids to try. The steroids worked great. The skin around my eye cleared up and stayed normal. This issue resurfaced again while I was in college, but it went away on its own after a month or two.

Rise of the Beast

Around October 2019, I noticed this same rash started appearing again around my right eye for the first time in a decade. The exact date is fuzzy in my mind, but I remember it started before TestBash San Francisco 2019. At that time, my response was to ignore it. I’d continue to keep up my regular hygiene (like washing my face), and eventually it would go away like last time.

Unfortunately, the rash got worse. It started spreading to my cheek and my forehead. I started using body lotion on it, but it would burn whenever I’d apply it, and the rash would persist. My wife started trying a bunch of fancy, high-dollar lotions (like Kiehl’s and other brands I didn’t know), but they didn’t help at all. By Spring 2020, my hands and forearms started breaking out with dry, itchy red spots, too. These were worse: if I scratched them, they would bleed. I also remember taking my morning shower one day and thinking to myself, “Gosh, my whole body is super itchy!”

I had enough, so I visited a local dermatologist. He took one look at my face and arms and prescribed topical steroids. He told me to use them for about four weeks and then stop for two weeks before a reevaluation. The steroids partially worked. The itching would subside, but the rash wouldn’t go away completely. When I stopped using the steroids, the rash returned to the same places. I also noticed the rash slowly spreading, too. Eventually, it crept up my upper arms to my neck and shoulders, down my back and torso, and all the way down to my legs.

On a second attempt, the dermatologist prescribed a much stronger topical steroid in addition to a round of oral steroids. My skin healed much better with the stronger medicine, but, inevitably, when I stopped using them, the rash returned. By now, patches of the rash could be found all over my body, and they itched like crazy. I couldn’t wear white shirts anymore because spots would break out and bleed, as if I nicked myself while shaving. I don’t remember the precise timings, but I also asked the dermatologist to remove a series of three moles on my body that became infected badly by the rash.

Fruitless Mitigations

As a good tester, I wanted to know the root cause for my rash. Was I allergic to something? Did I have a deeper medical issue? Steroids merely addressed the symptoms, and they did a mediocre job at best. So, I tried investigating what triggered my rash.

When the dry patch first appeared above my eye, I suspected cold, dry weather. “Maybe the crisp winter air is drying my skin too much.” That’s when I tried using an assortment of creams. When the rash started spreading, that’s when I knew the cause was more than winter weather.

Then, my mind turned to allergies. I knew I was allergic to pet dander, but I never had reactions to anything else before. At the same time the rash started spreading from my eye, I noticed I had a small problem with mangos. Whenever I would bite into a fresh mango, my lips would become severely, painfully chapped for the next few days. I learned from reading articles online that some folks who are allergic to poison ivy (like me) are also sensitive to mangos because both plants contain urushiol in their skin and sap. At that time, my family and I were consuming a boxful of mangos. I immediately cut out mangos and hoped for the best. No luck.

When the rash spread to my whole body, I became serious about finding the root cause. Every effect has a cause, and I wanted to investigate all potential causes in order of likelihood. I already crossed dry weather and mangoes off the list. Since the rash appeared in splotches across my whole body, then I reasoned its trigger could either be external – something coming in contact with all parts of skin – or internal – something not right from the inside pushing out.

What comes in contact with the whole body? Air, water, and cloth. Skin reactions to things in air and water seemed unlikely, so I focused on clothing and linens. That’s when I remembered a vague story from childhood. When my parents taught me how to do laundry, they told me they used a scent-free, hypoallergenic detergent because my dad had a severe skin reaction to regular detergent one time long before I was born. Once I confirmed the story with my parents, I immediately sprung to action. I switched over my detergent and fabric softener. I rewashed all my clothes and linens – all of them. I even thoroughly cleaned out my dryer duct to make sure no chemicals could leech back into the machine. (Boy, that was a heaping pile of dust.) Despite these changes, my rash persisted. I also changed my soaps and shampoos to no avail.

At the same time, I looked internally. I read in a few online articles that skin rashes could be caused by deficiencies. I started taking a daily multivitamin. I also tried supplements for calcium, Vitamin B6, Vitamin D, and collagen. Although I’m sure my body was healthier as a result, none of these supplements made a noticeable difference.

My dermatologist even did a skin punch test. He cut a piece of skin out of my back about 3mm wide through all layers of the skin. The result of the biopsy was “atopic dermatitis.” Not helpful.

For relief, I tried an assortment of creams from Eucerin, CeraVe, Aveeno, and O’Keeffe’s. None of them staved off the persistent itching or reduced the redness. They were practically useless. The only cream that had any impact (other than steroids) was an herbal Chinese medicine. With a cooling burn, it actually stopped the itch and visibly reduced the redness. By Spring 2021, I stopped going to the dermatologist and simply relied on the Chinese cream. I didn’t have a root cause, but I had an inexpensive mitigation that was 差不多 (chà bù duō; “almost even” or “good enough”).

Insufferability

Up until Summer 2021, my rash was mostly an uncomfortable inconvenience. Thankfully, since everything shut down for the COVID pandemic, I didn’t need to make public appearances with unsightly skin. The itchiness was the worst symptom, but nobody would see me scratch at home.

Then, around the end of June 2021, the rash got worse. My whole face turned red, and my splotches became itchier than ever. Worst of all, the Chinese cream no longer had much effect. Timing was lousy, too, since my wife and I were going to spend most of July in Seattle. I needed medical help ASAP. I needed to see either my primary care physician or an allergist. I called both offices to schedule appointments. The wait time for my primary doctor was 1.5 months, while the wait time for an allergist was 3 months! Even if I wouldn’t be in Seattle, I couldn’t see a doctor anyway.

My rash plateaued while in Seattle. It was not great, but it didn’t stop me from enjoying our visit. I was okay during a quick stop in Salt Lake City, too. However, as soon as I returned home to the Triangle, the rash erupted. It became utterly unbearable – to the point where I couldn’t sleep at night. I was exhausted. My skin was raw. I could not focus on any deep work. I hit the point of thorough debilitation.

When I visited my doctor on August 3, she performed a series of blood tests. Those confirmed what the problem was not:

  1. My metabolic panel was okay.
  2. My cholesterol was okay.
  3. My thyroid was okay.
  4. I did not have celiac disease (gluten intolerance).
  5. I did not have hepatitis C.

Nevertheless, these results did not indicate any culprit. My doctor then referred me to an allergist for further testing on August 19.

The two weeks between appointments was hell. I was not allowed to take steroids or antihistamines for one week before the allergy test. My doctor also prescribed me hydroxyzine under the presumption that the root cause was an allergy. Unfortunately, I did not react well to hydroxyzine. It did nothing to relieve the rash or the itching. Even though I took it at night, I would feel off the next day, to the point where I literally could not think critically while trying to do my work. It affected me so badly that I accidentally ran a red light. During the two weeks between appointments, I averaged about 4 hours of sleep per night. I had to take sick days off work, and on days I could work, I had erratic hours. (Thankfully, my team, manager, and company graciously accommodated my needs.) I had no relief. The creams did nothing. I even put myself on an elimination diet in a desperate attempt to avoid potential allergens.

If you saw this tweet, now you know the full story behind it:

A Possible Answer

On August 19, 2021, the allergist finally performed a skin prick allergy test. A skin prick test is one of the fastest, easiest ways to reveal common allergies. The nurse drew dots down both of my forearms. She then lightly scratched my skin next to each dot with a plastic nub that had been dunked in an allergen. After waiting 15 minutes, she measured the diameter of each spot to determine the severity of the allergic reaction, if any. She must have tested about 60 different allergens.

The results yielded immediate answers:

  • I did not have any of the “Big 8” food allergies.
  • I am allergic to cats and dogs, which I knew.
  • I am allergic to certain pollens, which I suspected.
  • I am allergic to certain fungi, which is no surprise.

Then, there was the major revelation: I am allergic to dust mites.

Once I did a little bit of research, this made lots of sense. Dust mites are microscopic bugs that live in plush environments (like mattresses and pillows) and eat dead skin cells. The allergy is not to the mite itself but rather to its waste. They can appear anywhere but are typically most prevalent in bedrooms. My itchiness always seemed strongest at night while in bed. The worst areas on my skin were my face and upper torso, which have the most contact with my bed pillows and covers. Since I sleep in bed every night, the allergic reaction would be recurring and ongoing. No wonder I couldn’t get any relief!

I don’t yet know if eliminating dust mites will completely cure my skin problems, but the skin prick test at least provides hard evidence that I have a demonstrable allergy to dust mites.


Lessons for Software Testing

After nearly two years of suffering, I’m grateful to have a probable root cause for my rash. Nevertheless, I can’t help but feel frustrated that it took so long to find a meaningful answer. As a software tester, I feel like I should have been able to figure it out much sooner. Reflecting on my journey reminds me of important lessons for software testing.

First and foremost, formal testing with wide coverage is better than random checking. When I first got my rash, I tried to address it based on intuition. Maybe I should stop eating mangoes? Maybe I should change my shower soap, or my laundry detergent? Maybe eating probiotics will help? These ideas, while not entirely bad, were based more on conjecture than evidence. Checking them took weeks at a time and yielded unclear results. Compare that to the skin prick test, which took half an hour in total time and immediately yielded definite answers. So many software development teams do their testing more like tossing mangoes than like a skin prick test. They spot-check a few things on the new features they develop and ship them instead of thoroughly covering at-risk behaviors. Spot checks feel acceptable when everything is healthy, but they are not helpful when something systemic goes wrong. Hard evidence is better than wild guesses. Running an established set of tests, even if they seem small or basic, can deliver immense value in short time.

When tests yield results, knowing what is “good” is just as important as knowing what is “bad.” Frequently, software engineers only look at failing tests. If a test passes, who cares? Failures are the things that need attention. Well, passing tests rule out potential root causes. One of the best results from my allergy test is that I’m not allergic to any of the “Big 8” food allergies: eggs, fish, milk, peanuts, shellfish, soy, tree nuts, and wheat. That’s a huge relief, because I like to eat good food. When we as software engineers get stuck trying to figure out why things are broken, it may be helpful to remember what isn’t broken.

Unfortunately, no testing has “complete” coverage, either. My skin prick test covered about 60 common allergens. Thankfully, it revealed my previously-unknown allergy to dust mites, but I recognize that I might be allergic to other things not covered by the test. Even if I mitigate dust mites in my house, I might still have this rash. That worries me a bit. As a software tester, I worry about product behaviors I am unable to cover with tests. That’s why I try to maximize coverage on the riskiest areas with the resources I have. I also try to learn as much about the behaviors under test to minimize unknowns.

Testing is expensive but worthwhile. My skin prick allergy test cost almost $600 out of pocket. To me, that cost is outrageously high, but it was grudgingly worthwhile to get a definitive answer. (I won’t digress into problems with American healthcare costs.) Many software teams shy away from regular, formal testing work because they don’t want to spend the time doing it or pay the dollars for someone else to do it. I would’ve gladly shelled out a grand a year ago if I could have known the root cause to my rash. My main regret is not visiting an allergist sooner.

Finally, test results are useless without corrective action. Now that I know I have a dust mite allergy, I need to mitigate dust mites in my house:

  1. I need to encase my mattress and pillow with hypoallergenic barriers that keep dust mites out.
  2. I need to wash all my bedding in hot water (at least 130° F) (or freeze it for over 24 hours).
  3. I need to deeply clean my bedroom suite to eliminate existing dust.
  4. I need to maintain a stricter cleaning schedule for my house.
  5. I need to upgrade my HVAC air filters.
  6. I need to run an air purifier in my bedroom to eliminate any other airborne allergens.

In the worst case, I can take allergy shots to abate my symptoms.

Simply knowing my allergies doesn’t fix them. The same goes for software testing – testing does not improve quality, it merely indicates problems with quality. We as engineers must improve software behaviors based on feedback from testing, whether that means fixing bugs, improving user experience, or shipping warnings for known issues.


Next Steps

Now that I know I have an allergy to dust mites, I will do everything I can to abate them. I already ordered covers and an air purifier from Amazon. I also installed new HVAC air filters that catch more allergens. For the past few nights, I slept in a different bed, and my skin has noticeably improved. Hopefully, this is the main root cause and I won’t need to do more testing!

My New GitHub Account: AutomationPanda

I’m changing my GitHub account from AndyLPK247 to AutomationPanda!

My original GitHub username is “AndyLPK247“. I created it long before I became the “Automation Panda.” However, I’d like my GitHub account to match the name of my blog (AutomationPanda.com) and my Twitter handle (@AutomationPanda). Using the same name across platforms makes it easier for folks to recognize my work. Plus, I’m writing a book, and I want the links printed in the book to reference Automation Panda.

Therefore, I created a new GitHub account named “AutomationPanda“. (Actually, I created it a while ago to reserve it, just in case I ever wanted to make a change.) I decided to create an entirely new account instead of simply changing my existing account’s username because I’ve created many materials – articles, videos, and courses – that link to my “AndyLPK247” account. If I were to change that username, then many of those links would break. (GitHub’s docs state that repository links would redirect to the new username, but links to gists and to the old name itself would break.)

From this moment forward, I will use “AutomationPanda” as my primary GitHub account. I intend to create all my new repositories under “AutomationPanda” while preserving existing repositories under “AndyLPK247”. Maintaining two GitHub accounts will be a bit of a hassle, but I think it is the best strategy for my situation. Over time, the new account will supersede the old.

If you’d like to watch me gain my green tiles back, be sure to follow me at https://github.com/AutomationPanda!

Are Automated Test Retries Good or Bad?

What happens when a test fails? If someone is manually running the test, then they will pause and poke around to learn more about the problem. However, when an automated test fails, the rest of the suite keeps running. Testers won’t get to view results until the suite is complete, and the automation won’t perform any extra exploration at the time of failure. Instead, testers must review logs and other artifacts gathered during testing, and they even might need to rerun the failed test to check if the failure is consistent.

Since testers typically rerun failed tests as part of their investigation, why not configure automated tests to automatically rerun failed tests? On the surface, this seems logical: automated retries can eliminate one more manual step. Unfortunately, automated retries can also enable poor practices, like ignoring legitimate issues.

So, are automated test retries good or bad? This is actually a rather controversial topic. I’ve heard many voices strongly condemn automated retries as an antipattern (see here, here, and here). While I agree that automated retries can be abused, I nevertheless still believe they can add value to test automation. A deeper understanding needs a nuanced approach.

So, how do automated retries work?

To avoid any confusion, let’s carefully define what we mean by “automated test retries.”

Let’s say I have a suite of 100 automated tests. When I run these tests, the framework will execute each test individually and yield a pass or fail result for the test. At the end of the suite, the framework will aggregate all the results together into one report. In the best case, all tests pass: 100/100.

However, suppose that one of the tests fails. Upon failure, the test framework would capture any exceptions, perform any cleanup routines, log a failure, and safely move onto the next test case. At the end of the suite, the report would show 99/100 passing tests with one test failure.

By default, most test frameworks will run each test one time. However, some test frameworks have features for automatically rerunning test cases that fail. The framework may even enable testers to specify how many retries to attempt. So, let’s say that we configure 2 retries for our suite of 100 tests. When that one test fails, the framework would queue that failing test to run twice more before moving onto the next test. It would also add more information to the test report. For example, if one retry passed but another one failed, the report would show 99/100 passing tests with a 1/3 pass rate for the failing test.

In this article, we will focus on automated retries for test cases. Testers could also program other types of retries into automated tests, such as retrying browser page loads or REST requests. Interaction-level retries require sophisticated, context-specific logic, whereas test-level retry logic works the same for any kind of test case. (Interaction-level retries would also need their own article.)

Automated retries can be a terrible antipattern

Let’s see how automated test retries can be abused:

Jeremy is a member of a team that runs a suite of 300 automated tests for their web app every night. Unfortunately, the tests are notoriously flaky. About a dozen different tests fail every night, and Jeremy spends a lot of time each morning triaging the failures. Whenever he reruns failed tests individually on his laptop, they almost always pass.

To save himself time in the morning, Jeremy decides to add automatic retries to the test suite. Whenever a test fails, the framework will attempt one retry. Jeremy will only investigate tests whose retries failed. If a test had a passing retry, then he will presume that the original failure was just a flaky test.

Ouch! There are several problems here.

First, Jeremy is using retries to conceal information rather than reveal information. If a test fails but its retries pass, then the test still reveals a problem! In this case, the underlying problem is flaky behavior. Jeremy is using automated retries to overwrite intermittent failures with intermittent passes. Instead, he should investigate why the test are flaky. Perhaps automated interactions have race conditions that need more careful waiting. Or, perhaps features in the web app itself are behaving unexpectedly. Test failures indicate a problem – either in test code, product code, or infrastructure.

Second, Jeremy is using automated retries to perpetuate poor practices. Before adding automated retries to the test suite, Jeremy was already manually retrying tests and disregarding flaky failures. Adding retries to the test suite merely speeds up the process, making it easier to sidestep failures.

Third, the way Jeremy uses automated retries indicates that the team does not value their automated test suite very much. Good test automation requires effort and investment. Persistent flakiness is a sign of neglect, and it fosters low trust in testing. Using retries is merely a “band-aid” on both the test failures and the team’s attitude about test automation.

In this example, automated test retries are indeed a terrible antipattern. They enable Jeremy and his team to ignore legitimate issues. In fact, they incentivize the team to ignore failures because they institutionalize the practice of replacing red X’s with green checkmarks. This team should scrap automated test retries and address the root causes of flakiness.

green check red x
Testers should not conceal failures by overwriting them with passes.

Automated retries are not the main problem

Ignoring flaky failures is unfortunately all too common in the software industry. I must admit that in my days as a newbie engineer, I was guilty of rerunning tests to get them to pass. Why do people do this? The answer is simple: intermittent failures are difficult to resolve.

Testers love to find consistent, reproducible failures because those are easy to explain. Other developers can’t push back against hard evidence. However, intermittent failures take much more time to isolate. Root causes can become mind-bending puzzles. They might be triggered by environmental factors or awkward timings. Sometimes, teams never figure out what causes them. In my personal experience, bug tickets for intermittent failures get far less traction than bug tickets for consistent failures. All these factors incentivize folks to turn a blind eye to intermittent failures when convenient.

Automated retries are just a tool and a technique. They may enable bad practices, but they aren’t inherently bad. The main problem is willfully ignoring certain test results.

Automated retries can be incredibly helpful

So, what is the right way to use automated test retries? Use them to gather more information from the tests. Test results are simply artifacts of feedback. They reveal how a software product behaved under specific conditions and stimuli. The pass-or-fail nature of assertions simplifies test results at the top level of a report in order to draw attention to failures. However, reports can give more information than just binary pass-or-fail results. Automated test retries yield a series of results for a failing test that indicate a success rate.

For example, SpecFlow and the SpecFlow+ Runner make it easy to use automatic retries the right way. Testers simply need to add the retryFor setting to their SpecFlow+ Runner profile to set the number of retries to attempt. In the final report, SpecFlow records the success rate of each test with color-coded counts. Results are revealed, not concealed.

Here is a snippet of the SpecFlow+ Report showing both intermittent failures (in orange) and consistent failures (in red).

This information jumpstarts analysis. As a tester, one of the first questions I ask myself about a failing test is, “Is the failure reproducible?” Without automated retries, I need to manually rerun the test to find out – often at a much later time and potentially within a different context. With automated retries, that step happens automatically and in the same context. Analysis then takes two branches:

  1. If all retry attempts failed, then the failure is probably consistent and reproducible. I would expect it to be a clear functional failure that would be fast and easy to report. I jump on these first to get them out of the way.
  2. If some retry attempts passed, then the failure is intermittent, and it will probably take more time to investigate. I will look more closely at the logs and screenshots to determine what went wrong. I will try to exercise the product behavior manually to see if the product itself is inconsistent. I will also review the automation code to make sure there are no unhandled race conditions. I might even need to rerun the test multiple times to measure a more accurate failure rate.

I do not ignore any failures. Instead, I use automated retries to gather more information about the nature of the failures. In the moment, this extra info helps me expedite triage. Over time, the trends this info reveals helps me identify weak spots in both the product under test and the test automation.

Automated retries are most helpful at high scale

When used appropriate, automated retries can be helpful for any size test automation project. However, they are arguably more helpful for large projects running tests at high scale than small projects. Why? Two main reasons: complexities and priorities.

Large-scale test projects have many moving parts. For example, at PrecisionLender, we presently run 4K-10K end-to-end tests against our web app every business day. (We also run ~100K unit tests every business day.) Our tests launch from TeamCity as part of our Continuous Integration system, and they use in-house Selenium Grid instances to run 50-100 tests in parallel. The PrecisionLender application itself is enormous, too.

Intermittent failures are inevitable in large-scale projects for many different reasons. There could be problems in the test code, but those aren’t the only possible problems. At PrecisionLender, Boa Constrictor already protects us from race conditions, so our intermittent test failures are rarely due to problems in automation code. Other causes for flakiness include:

  • The app’s complexity makes certain features behave inconsistently or unexpectedly
  • Extra load on the app slows down response times
  • The cloud hosting platform has a service blip
  • Selenium Grid arbitrarily chokes on a browser session
  • The DevOps team recycles some resources
  • An engineer makes a system change while tests were running
  • The CI pipeline deploys a new change in the middle of testing

Many of these problems result from infrastructure and process. They can’t easily be fixed, especially when environments are shared. As one tester, I can’t rewrite my whole company’s CI pipeline to be “better.” I can’t rearchitect the app’s whole delivery model to avoid all collisions. I can’t perfectly guarantee 100% uptime for my cloud resources or my test tools like Selenium Grid. Some of these might be good initiatives to pursue, but one tester’s dictates do not immediately become reality. Many times, we need to work with what we have. Curt demands to “just fix the tests” come off as pedantic.

Automated test retries provide very useful information for discerning the nature of such intermittent failures. For example, at PrecisionLender, we hit Selenium Grid problems frequently. Roughly 1/10000 Selenium Grid browser sessions will inexplicably freeze during testing. We don’t know why this happens, and our investigations have been unfruitful. We chalk it up to minor instability at scale. Whenever the 1/10000 failure strikes, our suite’s automated retries kick in and pass. When we review the test report, we see the intermittent failure along with its exception method. Based on its signature, we immediately know that test is fine. We don’t need to do extra investigation work or manual reruns. Automated retries gave us the info we needed.

Selenium Grid
Selenium Grid is a large cluster with many potential points of failure.
(Image source: LambdaTest.)

Another type of common failure is intermittently slow performance in the PrecisionLender application. Occasionally, the app will freeze for a minute or two and then recover. When that happens, we see a “brick wall” of failures in our report: all tests during that time frame fail. Then, automated retries kick in, and the tests pass once the app recovers. Automatic retries prove in the moment that the app momentarily froze but that the individual behaviors covered by the tests are okay. This indicates functional correctness for the behaviors amidst a performance failure in the app. Our team has used these kinds of results on multiple occasions to identify performance bugs in the app by cross-checking system logs and database queries during the time intervals for those brick walls of intermittent failures. Again, automated retries gave us extra information that helped us find deep issues.

Automated retries delineate failure priorities

That answers complexity, but what about priority? Unfortunately, in large projects, there is more work to do than any team can handle. Teams need to make tough decisions about what to do now, what to do later, and what to skip. That’s just business. Testing decisions become part of that prioritization.

In almost all cases, consistent failures are inherently a higher priority than intermittent failures because they have a greater impact on the end users. If a feature fails every single time it is attempted, then the user is blocked from using the feature, and they cannot receive any value from it. However, if a feature works some of the time, then the user can still get some value out of it. Furthermore, the rarer the intermittency, the lower the impact, and consequentially the lower the priority. Intermittent failures are still important to address, but they must be prioritized relative to other work at hand.

Automated test retries automate that initial prioritization. When I triage PrecisionLender tests, I look into consistent “red” failures first. Our SpecFlow reports make them very obvious. I know those failures will be straightforward to reproduce, explain, and hopefully resolve. Then, I look into intermittent “orange” failures second. Those take more time. I can quickly identify issues like Selenium Grid disconnections, but other issues may not be obvious (like system interruptions) or may need additional context (like the performance freezes). Sometimes, we may need to let tests run for a few days to get more data. If I get called away to another more urgent task while I’m triaging results, then at least I could finish the consistent failures. It’s a classic 80/20 rule: investigating consistent failures typically gives more return for less work, while investigating intermittent failures gives less return for more work. It is what it is.

The only time I would prioritize an intermittent failure over a consistent failure would be if the intermittent failure causes catastrophic or irreversible damage, like wiping out an entire system, corrupting data, or burning money. However, that type of disastrous failure is very rare. In my experience, almost all intermittent failures are due to poorly written test code, automation timeouts from poor app performance, or infrastructure blips.

Context matters

Automated test retries can be a blessing or a curse. It all depends on how testers use them. If testers use retries to reveal more information about failures, then retries greatly assist triage. Otherwise, if testers use retries to conceal intermittent failures, then they aren’t doing their jobs as testers. Folks should not be quick to presume that automated retries are always an antipattern. We couldn’t achieve our scale of testing at PrecisionLender without them. Context matters.