Hands-On UI Testing with Python (SmartBear Webinar)

On August 14, 2019, I teamed up with SmartBear to deliver a one-hour webinar about Web UI testing with Python! It was an honor to work with Nicholas Brown, Digital Marketing Manager for CrossBrowserTesting at SmartBear Software, to make this webinar happen.

The Webinar


In the webinar, I showed how to build a basic Web UI test automation solution using Python, pytest, and Selenium WebDriver. The tutorial covered automating one test, a simple DuckDuckGo search, from inception to automation. It also showed how to use CrossBrowserTesting to scale the solution so that it can run tests on any browser, any platform, and any version in the cloud as a service!

The example test project for the webinar is hosted in Github here:

I encourage you to clone the Github repository and try to run the example test on your own! Make sure to get a CrossBrowserTesting trial license so you can try different browsers. You can also try to write new tests of your own. All instructions are in the README. Have fun with it!

The Q&A

After the tutorial, we took questions from the audience. Here are answers to the top questions:

How can we automate UI interactions for CAPTCHA?

CAPTCHA is a feature many websites use to determine whether or not a user is human. Most CAPTCHAs require the user to read obscured text from an image, but there are other variations. By their very nature, CAPTCHAs are designed to thwart UI automation.

When someone asked this question during the webinar, I didn’t have an answer, so I did some research afterwards. Unfortunately, it looks like there’s no easy solution. The best workarounds involve driving apps through their APIs to avoid CAPTCHAs. I also saw some services that offer to solve CAPTCHAs.

Are there any standard Page Object Pattern implementations in Python?

Not really. Mozilla maintains the PyPOM project, but I personally haven’t used it. I like to keep my page objects pretty simple, as shown in the tutorial. I also recommend the Screenplay Pattern, which handles concerns better as test automation solutions grow larger. I’m actually working on a Pythonic implementation of the Screenplay Pattern that I hope to release soon!

How can I run Python tests that use Selenium WebDriver and pytest from Jenkins?

Any major Continuous Integration tool like Jenkins can easily run Web UI tests in any major language. First, make sure the nodes are properly configured to run the tests – they’ll need Python with the appropriate packages. If you plan to use local browsers, make sure the nodes have the browsers and WebDriver executables properly installed. If you plan to use remote browsers (like with CrossBrowserTesting), make sure your CI environment can call out to the remote service. Test jobs can simply call pytest from the command line to launch the tests. I also recommend the “JUnit” pytest option to generate a JUnit-style XML test report because most CI tools require that format for displaying and tracking test results.

How can I combine API and database testing with Web UI testing?

One way to handle API and database testing is to write integration tests separate from Web UI tests. You can still use pytest, but you’d use a library like requests for APIs and SQLAlchemy for databases.

Another approach is to write “hybrid” tests that use APIs and database calls to help Web UI testing. Browsers are notoriously slow compared to direct back-end calls. For example, database calls could pre-populate data so that, upon login, the website already displays stuff to test. Hybrid tests can make tests much faster and much safer.

How can we test mobile apps and browsers using Python?

Even though our tutorial covered desktop-based browser UI interactions, the strategy for testing mobile apps and browsers is the same. Mobile tests need Appium, which is like a special version of WebDriver for mobile features. The Page Object Pattern (or Screenplay Pattern) still applies. CrossBrowserTesting provides mobile platforms, too!

Tutorial: Web Testing Made Easy with Python

Have you ever discovered a bug in a web app? Yuck! Almost everyone has. Bugs look bad, interrupt the user’s experience, and cheapen the web app’s value. Severe bugs can incur serious business costs and tarnish the provider’s reputation.

So, how can we prevent these bugs from reaching users? The best way to catch bugs is to test the web app. However, web UI testing can be difficult: it requires more effort than unit testing, and it has a bad rap for being flaky.

Never fear! Recently, I teamed up with the awesome folks at TestProject to develop a helpful tutorial that makes web UI test automation easy with the power of Python! The tutorial is named Web Testing Made Easy with Python, Pytest and Selenium WebDriver. It is available for free as a set of TestProject blog articles together with a GitHub example project.

In our tutorial, we will build a simple yet robust web UI test solution using Pythonpytest, and Selenium WebDriver. We cover strategies for good test design as well as patterns for good automation code. By the end of the tutorial, you’ll be a web test automation champ! Your Python test project can be the foundation for your own test cases, too.

How can you take the tutorial? Start reading here, and follow the instructions:

I personally want to thank TestProject for this collaboration. TestProject provides helpful tools that can supercharge your test automation. They offer a smart test recorder, a bunch of add-ons that act like test case building blocks, an SDK that can make test automation coding easier, and beautiful analytics to see exactly what the tests are doing. Not only is TestProject a cool platform, but the people with whom I’ve worked there are great. Be sure to check it out!

WebDriver Element Existence vs. Appearance

Web UI tests with Selenium WebDriver must interact with elements on a Web page. Locating elements can be tricky because expected elements may or may not be on the page. Furthermore, WebDriver might not be able to interact with some elements that exist on the page. That may seem crazy, but let’s understand why.

Web UI interactions universally follow these steps:

  1. Wait for an element to be ready.
  2. Get the element using a locator (ID, CSS selector, XPath, etc.).
  3. Send commands (like clicking or typing) or queries (like getting text) to the element.

Clearly, an element must be “ready” before interactions can happen. As humans, we intuitively define “ready” as, “The page is loaded, and the element is visible.” Automation code is a bit more technical because there are two different ways to define readiness:

  1. Existence: the element exists in the HTML structure of the page.
  2. Appearance: the element exists and it is visible on the page.

Existence can easily be determined by WebDriver’s “find elements” method. The plural “find elements” method will return a list of all elements matching a locator query. If no elements match the locator, then an empty list is returned. The singular “find element” method, on the other hand, will return the first element matching the locator or throw an exception if no elements are found. Thus, the plural version is more convenient to use for checking existence.

Here’s an example existence method in C#:

public bool Exists(IWebDriver driver, By locator) =>
    driver.FindElements(locator).Count > 0;

Checking for existence is the most basic level of readiness. If an element doesn’t exist, interactions with it simply cannot happen. However, existence alone may not be sufficient for interactions. Selenium WebDriver requires elements to not only exist but also to be displayed for interactions like sending clicks and scraping text. Existing elements may be scrolled out of view or even deliberately hidden. WebDriver calls to such elements will yield cryptic exceptions. That’s why waiting for appearance is usually the better readiness condition.

Here’s an example appearance method in C#:

// Assume that the locator targets one element, not multiple
public bool Appears(IWebDriver driver, By locator) =>
    Exists(driver, locator) && driver.FindElement(locator).Displayed;

Existence must be checked first, or else the “Displayed” call will throw an exception whenever existence is false.

Putting it all together, here’s what a button click interaction could look like in C#:

// Assume this is a method in a Page Object class
// Assume that "Driver" is the WebDriver instance
public void ClickThatButton()
    var button = By.Id("that-button");
    var wait = new WebDriverWait(Driver, new System.Timespan(0, 0, 15));
    wait.Until((driver) => Appears(driver, button));

It’s good practice to make explicit waits before locating and using elements. It’s also good practice to get fresh elements for every interaction call in order to avoid pesky stale element exceptions. Calls like these should be placed in Page Object methods or Screenplay Pattern tasks and questions so that interactions are safe and thorough.

Appearance may not always be the right choice. There may be times when a test should check if an element doesn’t exist or if an element exists but is hidden. Just think before you code.

Web Element Locators for Test Automation

Do you want a full course? Check out Web Element Locator Strategies on Test Automation University!

If you do any Web UI test automation (like with Selenium WebDriver), then you probably spend a large chunk of your test development time finding elements on a page, like buttons, inputs, and divs. Finding the right elements, however, can be challenging, especially when they lack unique IDs or class names. This guide will show you how to locate any Web element like a pro.

What are Web elements?

A Web element is an individual entity rendered on a Web page. Everything a user sees on a Web page (and even some things they don’t see) are elements: title headers, okay buttons, input fields, text areas, and more. Elements are specified in HTML by tag name, attributes, and contents. They may also have child elements, such as a table containing rows. CSS may be applied to elements to style them with colors, sizes, position, etc. Programming languages typically access Web elements as nodes in the Document Object Model (DOM).

What are Web element locators?

Web elements and locators are two different things. A Web element locator is an object that finds and returns Web elements on a page using a given query. In short, locators find elements.

Why are locators needed? As human users, we interact with Web pages visually: We look, scroll, click, and type through a browser. However, test automation interacts with Web pages programmatically: it needs a coded way to find and manipulate those same elements. Traditional automation won’t “look” at the page like a human* – it will search the DOM instead.

(*Newer automation technologies enable visual testing, which will be discussed later in this article.)

Selenium WebDriver separates the concerns of element location and interaction. WebDriver calls for these two concerns are frequently written back-to-back:

// WebDriver example: typing a search phrase at
// This code is written in C#, but the calls are the same in any language

// First, element location
IWebElement searchField = driver.FindElement(By.Name("q"));

// Second, element interaction

WebDriver provides the following locator query types using “By”:

Which one is best? We’ll discuss that below.

Locators may also return multiple elements, or none at all! For example:

// Get the list of results from a Google search
// Using "FindElements" will return a list of all elements found in order
// Using "FindElement" would return the first element found (or throw an exception if no elements were found)
IList<IWebElement> results = driver.FindElements(By.CssSelector("div.r"));

Large test frameworks often use design patterns for structuring locators and interactions. The Page Object Model organizes locators and action methods together in classes by page or component. However, I strongly recommend the Screenplay Pattern over page objects because its pieces are more reusable and scalable. Whatever the pattern, locators are needed.

How do I find elements?

Elements can be a hassle to find when writing locators for test automation. To simplify my work flow, I use Google Chrome’s Developer Tools side-by-side with my IDE. Why choose Chrome?

To inspect any Web page in Chrome, simply right-click anywhere on the page:

Voila! DevTools will open. For finding Web elements, we want to use the Elements tab.

Visually pinpointing an element is easy. Click the “select” tool in the upper-left corner of the DevTools pane. (It looks like a square with a cursor on it.) The icon should turn blue.

Then, move the cursor to the desired element on the page. You will see each element highlighted in different colors as the mouse moves over. The corresponding HTML source code in the Elements tab will simultaneously be highlighted, too. Nice! Click on the desired element to set the highlighting so that it won’t disappear when you move the cursor elsewhere.

From here, you can check out the element’s tag, classes, attributes, contents, parents, and children.

How do I write good locators?

Finding the element is half the battle. Forming a unique locator query is the other half. If a locator is too broad, then it could return false positives. However, if a locator is too specific, then it could be susceptible to break whenever the DOM changes, and it could also be difficult for others to read. The best philosophy is this: Write the simplest locator query that uniquely identifies the target element(s).

My locator query type order-of-preference is:

  1. ID (if unique)
  2. Name (if unique)
  3. Class name
  4. CSS Selector
  5. XPath without text or indexing
  6. Link text / partial link text
  7. XPath with text and/or indexing

Unique IDs, names, and class names make locators super easy to write: queries are short and don’t need extra anchors. Always encourage developers on the team to use unique identifiers like class names for all elements. However, many elements do not have them, which means locators must fall back on more complicated CSS selectors and XPaths (*shiver*). Whenever this happens, here’s some advice:

  • Use parents as anchors if they have unique identifiers.
    • CSS selector example: “#some-list > li”
    • XPath example: “//ul[@id=’some-list’]/li”
  • Avoid XPaths that use text or indexing if possible.
    • Bad example: “//div[3]//span[text()=’hello’]”
    • Those tend to be the most brittle checks.
  • Use the “contains” function when checking for classes in XPath.
    • Example: “//div[contains(@class, ‘some-class’)]”
    • Elements frequently have more than one class.
    • “contains” will check a substring instead of the full class string.
    • However, be careful because “some-class2” would be matched!

Always test locators, too. Syntax errors and false positives happen frequently. Chrome DevTools makes testing locators easy. Simply hit Ctrl-F on the Elements tab and then paste the locator query into the finder field. DevTools will highlight all the matching elements in order. Spiffy!

Sometimes, when I can’t figure out why a locator isn’t working for a test case, I’ll do the following:

  1. Run the test case with debugging from my IDE.
  2. Set a break point on the locator.
  3. Wait for the test case to stop at the break point.
  4. Enter DevTools on the active Chrome window.
  5. Check the DOM and test the locators on the live page.

What if my tests are flaky?

Web UI testing is roundly criticized for being “flaky” because tests often crash for unexpected reasons. However, much of the unreliability people hit with Web UI testing (and often with Selenium WebDriver itself) is that all Web interactions inherently pose race conditions. The automation and the browser execute independently, so interactions must be synchronized with page state. Otherwise, WebDriver will throw exceptions for timeouts, stale elements, and elements not found. Many times, these issues happen intermittently, so they can be difficult to trace and resolve.

The best way to avoid race conditions is this: Always wait for an element to exist before interacting with it. This may seem basic, but it’s easy to overlook. Selenium WebDriver packages all offer some sort of WebDriverWait object that will force the driver to wait for a given condition to be true before proceeding. The easiest way to check if an element exists is to check if the list of elements returned by a FindElements (plural) call is non-empty. Adding another call for each interaction may feel burdensome, but design patterns within well-designed frameworks (like the Screenplay Pattern) can make these checks happen automatically.

Another good practice is this: Always fetch fresh elements. Sometimes, automation will first get some elements and then use a second query to get more elements. Or, in the case of the Page Object Factory (which should never be used because, bluntly, its design is terrible), elements are fetched once when the page object is constructed and referenced thereafter. No matter which way, the longer a Web element object exists, the more prone it is to become stale and cause exceptions. I’ve seen elements turn stale inexplicably even when they still seem to be on the page, too. Always get an element in the moment when it is needed. That way, it can’t go stale!

Want some helpful tips for clicking tricky elements? Check out this article: Clicking Web Elements with Selenium WebDriver.

How can AI help Web UI testing?

Several new AI-based projects/products aim to improve automated Web UI testing over traditional methods:

  • Applitools extends Selenium WebDriver automation with checks for nontrivial visual differences.
  • Testim can automatically heal locators whenever they break, avoiding test flakiness due to front-end changes.
  • Mabl is an assistant that will learn and rerun tests that developers teach it without writing any code.
  • runs common user tests like login, searching, and shopping on mobile apps based on what its AI has learned from several other apps.
  • Rainforest QA uses crowdsourcing plus AI to run manual tests specified by a team almost like they are automated.

Test Automation University also offers a free course on using AI for element selection: AI for Element Selection: Erasing the Pain of Fragile Test Scripts.

Many AI testing tools definitely add value, but keep in mind, under the hood, locators are still used somewhere.

EGAD! How Do We Start Writing (Better) Tests?

Some have never automated tests and can’t check themselves before they wreck themselves. Others have 1000s of tests that are flaky, duplicative, and slow. Wa-do-we-do? Well, I gave a talk about this problem at a few Python conferences. The language used for example code was Python, but the principles apply to any language.

Here’s the PyTexas 2019 talk:

And here’s the PyGotham 2018 talk:

And here’s the first time I gave this talk, at PyOhio 2018:

I also gave this talk at PyCaribbean 2019, but it was not recorded.

Clicking Web Elements with Selenium WebDriver

Selenium WebDriver is the most popular open source package for Web UI test automation. It allows tests to interact directly with a web page in a live browser. However, using Selenium WebDriver can be very frustrating because basic interactions often lack robustness, causing intermittent errors for tests.

The Basics

One such vulnerable interaction is clicking elements on a page. Clicking is probably the most common interaction for tests. In C#, a basic click would look like this:


This is the easy and standard way to click elements using Selenium WebDriver. However, it will work only if the targeted element exists and is visible on the page. Otherwise, the WebDriver will throw exceptions. This is when programmers pull their hair out.

Waiting for Existence

To avoid race conditions, interactions should not happen until the target element exists on the page. Even split-second loading times can break automation. The best practice is to use explicit waits before interactions with a reasonable timeout value, like this:

const int timeoutSeconds = 15;
var ts = new TimeSpan(0, 0, timeoutSeconds);
var wait = new WebDriverWait(webDriver, ts);

wait.Until((driver) => driver.FindElements(By.Id("my-id")).Count > 0);

Other Preconditions

Sometimes, Web elements won’t appear without first triggering something else. Even if the element exists on the page, the WebDriver cannot click it until it is made visible. Always look for the proper way to make that element available for clicking. Click on any parent panels or expanders first. Scroll if necessary. Make sure the state of the system should permit the element to be clickable.

If the element is scrolled out of view, move to the element before clicking it:

new Actions(webDriver)

Last Ditch Efforts

Nevertheless, there are times when clickable elements just don’t cooperate. They just can’t seem to be made visible. When all else fails, drop directly into JavaScript:


Do this only when absolutely necessary. It is a best practice to use Selenium WebDriver methods because they make automated interaction behave more like a real user than raw JavaScript calls. Make sure to give good reasons in code comments whenever doing this, too.

Final Advice

This article was written specifically for clicks, but its advice can be applied to other sorts of interactions, too. Just be smart about waits and preconditions.

Note: Code examples on this page are written in C#, but calls are similar for other languages supported by Selenium WebDriver.

The Airing of Grievances: Selenium WebDriver

Selenium WebDriver is the de facto standard for Web UI automation. It’s a great tool, but like anything good, it can also be misused. And that’s where I have grievances. I got a lot of problems with Selenium WebDriver abuses, and now you’re gonna hear about it!

WebDriver “Unit Tests”

“WebDriver unit tests” are like square circles – definitionally, they are logical fallacies. In my books, a unit test must be white box, meaning it has direct access to the product code. However, Web UI tests using WebDriver are inherently black box tests because they are interacting with an actively running website. Thus, they must be above-unit tests by definition. Don’t call them unit tests!

Making Every Test a Web Test

NO! The Testing Pyramid is vital to a healthy overall testing strategy. Web tests are great because they test a website in the ways a user would interact with it, but they have a significant cost. As compared to lower-level tests, they are more fragile, they require more development resources, and they take much more time to run. Browser differences may also affect testing. Furthermore, problems in lower level components should be caught at those lower levels! Sure, HTTP 400s and 500s will appear at the web app layer, but they would be much faster to find and fix with service layer tests. Different layers of testing mitigate risk at their optimal returns-on-investment.

No WebDriver Cleanup

Every WebDriver instance spawns a new system process for “driving” web browser interactions. When the test automation process completes, the WebDriver process may not necessary terminate with it. It is imperative that test automation quits the WebDriver instance once testing is complete. Make sure cleanup happens even when abortive exceptions occur! Otherwise, zombie WebDriver processes may continue on the test machine, causing any number of problems: locked files and directories, high memory usage, wasted CPU cycles, and blocked network ports. These problems can cripple a system and even break future test runs, especially on shared testing machines (like Jenkins nodes). Please, only you can stop the zombie apocalypse – always quit WebDriver instances!

Using “Close” Instead of “Quit”

Regardless of programming language, the WebDriver class has both “close” and “quit” methods. “Close” will close the current browser tab or window, while “quit” will close all windows and terminate the WebDriver process. Make sure to quit during final cleanup. Doing only a close may result in zombie WebDriver processes. It’s a rookie mistake.

Not Optimizing Setup/Cleanup with Service Calls

Web tests are notoriously slow. Whenever you can speed them up, do it! Some tests can be optimized by preparing initial state with service calls. For example, let’s say a user visiting a car dealership website needs to have favorite cars pre-selected for a comparison page test. Rather than navigating to a bunch of car pages and clicking a “favorite” icon, make a setup routine that calls a service to select favorites. Not all tests can do this sort of optimization, but definitely do it for those that can!

Web Elements with No ID

Developers, we need to talk – give every significant element a unique ID. PLEASE! WebDriver calls are so much easier to write and so much more robust to run when locator queries can use IDs instead of CSS selectors or XPaths. Let’s pick ID names during our Three Amigos meetings so that I can program the tests while you develop the features. Determining what elements are import should be easy based on our wireframes. You will save us automators so much time and frustration, since we won’t need to dig through HTML and wonder why our XPaths don’t work.

Changing Web Elements Without Warning

Hey, another thing, developers – don’t change the web page structure without telling us! WebDriver locator queries will break if you change the web elements. Even a seemingly innocuous change could wipe out hundreds of tests. Automation effort is non-trivial. Changes must be planned and sized with automation considerations in mind.

Not Using the Page Object Model

The Page Object Model is a widely-used design pattern for modeling a web page (or components on a web page) as an object in terms of its web elements and user interactions with it. It abstracts Web UI interactions into a common layer that can be reused by many different tests. (The Screenplay pattern, also good, is an evolution of the Page Object Model; tutorial here.) Not using the Page Object Model is Selenium suicide. It will result in rampant code duplication.

Demonizing XPath

XPaths have long been criticized for being slower than CSS selectors. That claim is outdated baloney. In many cases, XPaths outperform CSS selectors – see here, here, and here. Another common complaint is that XPath syntax is more complicated than CSS selector syntax. Honestly, I think they’re about the same in terms of learning curve. XPaths are also more powerful that CSS selectors because they can uniquely pinpoint any element on the page.

Inefficient Web Element Access

Web element IDs make access extremely efficient. However, when IDs are not provided, other locator query types are needed. It is always better to use locator queries to pinpoint elements, rather than to get a list of elements (or even a parent/child chain) to traverse using programming code. For example, I often see code reviews in which an XPath returns a list of results with text labels, and then the programming code (C# or Java or whatever) has a for loop that iterates over each element in the list and exits when the element with the desired label is found. Just add “[text()=’desired text’]” or “[contains(text(), ‘desired text’)]” to the XPath! Use locator queries for all they’re worth.

Interacting with Web Elements Before the Page is Ready

Web UI test automation is inherently full of race conditions. Make sure the elements are ready before calling them, or else face a bunch of “element not found” exceptions. Use WebDriver waits for efficient waiting. Do not use hard sleeps (like Java’s Thread.sleep).

Untuned Timeouts

WebDriver calls need timeouts, or else they could hang forever if there is a problem. (Check online docs for default timeout values.) Timeout value ought to be tuned appropriately for different test environments and different websites. Timeouts that are too short will unnecessarily abort tests, while timeouts that are too long will lengthen precious test runtime.