On October 4, 2019, I gave a talk entitled Beyond Unit Tests: End-to-End Web UI Testing at PyGotham 2019. Check it out below! I show how to write a concise-yet-complete test solution for Web UI test cases using Python, pytest, and Selenium WebDriver.
On August 14, 2019, I teamed up with SmartBear to deliver a one-hour webinar about Web UI testing with Python! It was an honor to work with Nicholas Brown, Digital Marketing Manager for CrossBrowserTesting at SmartBear Software, to make this webinar happen.
In the webinar, I showed how to build a basic Web UI test automation solution using Python, pytest, and Selenium WebDriver. The tutorial covered automating one test, a simple DuckDuckGo search, from inception to automation. It also showed how to use CrossBrowserTesting to scale the solution so that it can run tests on any browser, any platform, and any version in the cloud as a service!
I encourage you to clone the Github repository and try to run the example test on your own! Make sure to get a CrossBrowserTesting trial license so you can try different browsers. You can also try to write new tests of your own. All instructions are in the README. Have fun with it!
After the tutorial, we took questions from the audience. Here are answers to the top questions:
How can we automate UI interactions for CAPTCHA?
CAPTCHA is a feature many websites use to determine whether or not a user is human. Most CAPTCHAs require the user to read obscured text from an image, but there are other variations. By their very nature, CAPTCHAs are designed to thwart UI automation.
When someone asked this question during the webinar, I didn’t have an answer, so I did some research afterwards. Unfortunately, it looks like there’s no easy solution. The best workarounds involve driving apps through their APIs to avoid CAPTCHAs. I also saw some services that offer to solve CAPTCHAs.
Are there any standard Page Object Pattern implementations in Python?
Not really. Mozilla maintains the PyPOM project, but I personally haven’t used it. I like to keep my page objects pretty simple, as shown in the tutorial. I also recommend the Screenplay Pattern, which handles concerns better as test automation solutions grow larger. I’m actually working on a Pythonic implementation of the Screenplay Pattern that I hope to release soon!
How can I run Python tests that use Selenium WebDriver and pytest from Jenkins?
Any major Continuous Integration tool like Jenkins can easily run Web UI tests in any major language. First, make sure the nodes are properly configured to run the tests – they’ll need Python with the appropriate packages. If you plan to use local browsers, make sure the nodes have the browsers and WebDriver executables properly installed. If you plan to use remote browsers (like with CrossBrowserTesting), make sure your CI environment can call out to the remote service. Test jobs can simply call pytest from the command line to launch the tests. I also recommend the “JUnit” pytest option to generate a JUnit-style XML test report because most CI tools require that format for displaying and tracking test results.
How can I combine API and database testing with Web UI testing?
One way to handle API and database testing is to write integration tests separate from Web UI tests. You can still use pytest, but you’d use a library like requests for APIs andSQLAlchemy for databases.
Another approach is to write “hybrid” tests that use APIs and database calls to help Web UI testing. Browsers are notoriously slow compared to direct back-end calls. For example, database calls could pre-populate data so that, upon login, the website already displays stuff to test. Hybrid tests can make tests much faster and much safer.
How can we test mobile apps and browsers using Python?
Even though our tutorial covered desktop-based browser UI interactions, the strategy for testing mobile apps and browsers is the same. Mobile tests need Appium, which is like a special version of WebDriver for mobile features. The Page Object Pattern (or Screenplay Pattern) still applies. CrossBrowserTesting provides mobile platforms, too!
Web UI tests with Selenium WebDriver must interact with elements on a Web page. Locating elements can be tricky because expected elements may or may not be on the page. Furthermore, WebDriver might not be able to interact with some elements that exist on the page. That may seem crazy, but let’s understand why.
Web UI interactions universally follow these steps:
Wait for an element to be ready.
Get the element using a locator (ID, CSS selector, XPath, etc.).
Send commands (like clicking or typing) or queries (like getting text) to the element.
Clearly, an element must be “ready” before interactions can happen. As humans, we intuitively define “ready” as, “The page is loaded, and the element is visible.” Automation code is a bit more technical because there are two different ways to define readiness:
Existence: the element exists in the HTML structure of the page.
Appearance: the element exists and it is visible on the page.
Existence can easily be determined by WebDriver’s “find elements” method. The plural “find elements” method will return a list of all elements matching a locator query. If no elements match the locator, then an empty list is returned. The singular “find element” method, on the other hand, will return the first element matching the locator or throw an exception if no elements are found. Thus, the plural version is more convenient to use for checking existence.
Here’s an example existence method in C#:
public bool Exists(IWebDriver driver, By locator) =>
driver.FindElements(locator).Count > 0;
Checking for existence is the most basic level of readiness. If an element doesn’t exist, interactions with it simply cannot happen. However, existence alone may not be sufficient for interactions. Selenium WebDriver requires elements to not only exist but also to be displayed for interactions like sending clicks and scraping text. Existing elements may be scrolled out of view or even deliberately hidden. WebDriver calls to such elements will yield cryptic exceptions. That’s why waiting for appearance is usually the better readiness condition.
Here’s an example appearance method in C#:
// Assume that the locator targets one element, not multiple
public bool Appears(IWebDriver driver, By locator) =>
Exists(driver, locator) && driver.FindElement(locator).Displayed;
Existence must be checked first, or else the “Displayed” call will throw an exception whenever existence is false.
Putting it all together, here’s what a button click interaction could look like in C#:
// Assume this is a method in a Page Object class
// Assume that "Driver" is the WebDriver instance
public void ClickThatButton()
var button = By.Id("that-button");
var wait = new WebDriverWait(Driver, new System.Timespan(0, 0, 15));
wait.Until((driver) => Appears(driver, button));
It’s good practice to make explicit waits before locating and using elements. It’s also good practice to get fresh elements for every interaction call in order to avoid pesky stale element exceptions. Calls like these should be placed in Page Object methods or Screenplay Pattern tasks and questions so that interactions are safe and thorough.
Appearance may not always be the right choice. There may be times when a test should check if an element doesn’t exist or if an element exists but is hidden. Just think before you code.
If you do any Web UI test automation (like with Selenium WebDriver), then you probably spend a large chunk of your test development time finding elements on a page, like buttons, inputs, and divs. Finding the right elements, however, can be challenging, especially when they lack unique IDs or class names. This guide will show you how to locate any Web element like a pro.
What are Web elements?
A Web element is an individual entity rendered on a Web page. Everything a user sees on a Web page (and even some things they don’t see) are elements: title headers, okay buttons, input fields, text areas, and more. Elements are specified in HTML by tag name, attributes, and contents. They may also have child elements, such as a table containing rows. CSS may be applied to elements to style them with colors, sizes, position, etc. Programming languages typically access Web elements as nodes in the Document Object Model (DOM).
What are Web element locators?
Web elements and locators are two different things. A Web element locator is an object that finds and returns Web elements on a page using a given query. In short, locators find elements.
Why are locators needed? As human users, we interact with Web pages visually: We look, scroll, click, and type through a browser. However, test automation interacts with Web pages programmatically: it needs a coded way to find and manipulate those same elements. Traditional automation won’t “look” at the page like a human* – it will search the DOM instead.
(*Newer automation technologies enable visual testing, which will be discussed later in this article.)
Selenium WebDriver separates the concerns of element location and interaction. WebDriver calls for these two concerns are frequently written back-to-back:
// WebDriver example: typing a search phrase at www.google.com
// This code is written in C#, but the calls are the same in any language
// First, element location
IWebElement searchField = driver.FindElement(By.Name("q"));
// Second, element interaction
WebDriver provides the following locator query types using “By”:
Locators may also return multiple elements, or none at all! For example:
// Get the list of results from a Google search
// Using "FindElements" will return a list of all elements found in order
// Using "FindElement" would return the first element found (or throw an exception if no elements were found)
IList<IWebElement> results = driver.FindElements(By.CssSelector("div.r"));
Large test frameworks often use design patterns for structuring locators and interactions. The Page Object Model organizes locators and action methods together in classes by page or component. However, I strongly recommend the Screenplay Pattern over page objects because its pieces are more reusable and scalable. Whatever the pattern, locators are needed.
How do I find elements?
Elements can be a hassle to find when writing locators for test automation. To simplify my work flow, I use Google Chrome’s Developer Tools side-by-side with my IDE. Why choose Chrome?
Chrome’s DevTools are really easy to use and provide rich info.
To inspect any Web page in Chrome, simply right-click anywhere on the page:
Voila! DevTools will open. For finding Web elements, we want to use the Elements tab.
Visually pinpointing an element is easy. Click the “select” tool in the upper-left corner of the DevTools pane. (It looks like a square with a cursor on it.) The icon should turn blue.
Then, move the cursor to the desired element on the page. You will see each element highlighted in different colors as the mouse moves over. The corresponding HTML source code in the Elements tab will simultaneously be highlighted, too. Nice! Click on the desired element to set the highlighting so that it won’t disappear when you move the cursor elsewhere.
From here, you can check out the element’s tag, classes, attributes, contents, parents, and children.
How do I write good locators?
Finding the element is half the battle. Forming a unique locator query is the other half. If a locator is too broad, then it could return false positives. However, if a locator is too specific, then it could be susceptible to break whenever the DOM changes, and it could also be difficult for others to read. The best philosophy is this: Write the simplest locator query that uniquely identifies the target element(s).
My locator query type order-of-preference is:
ID (if unique)
Name (if unique)
XPath without text or indexing
Link text / partial link text
XPath with text and/or indexing
Unique IDs, names, and class names make locators super easy to write: queries are short and don’t need extra anchors. Always encourage developers on the team to use unique identifiers like class names for all elements. However, many elements do not have them, which means locators must fall back on more complicated CSS selectors and XPaths (*shiver*). Whenever this happens, here’s some advice:
Use parents as anchors if they have unique identifiers.
CSS selector example: “#some-list > li”
XPath example: “//ul[@id=’some-list’]/li”
Avoid XPaths that use text or indexing if possible.
Bad example: “//div//span[text()=’hello’]”
Those tend to be the most brittle checks.
Use the “contains” function when checking for classes in XPath.
Example: “//div[contains(@class, ‘some-class’)]”
Elements frequently have more than one class.
“contains” will check a substring instead of the full class string.
However, be careful because “some-class2” would be matched!
Always test locators, too. Syntax errors and false positives happen frequently. Chrome DevTools makes testing locators easy. Simply hit Ctrl-F on the Elements tab and then paste the locator query into the finder field. DevTools will highlight all the matching elements in order. Spiffy!
Sometimes, when I can’t figure out why a locator isn’t working for a test case, I’ll do the following:
Run the test case with debugging from my IDE.
Set a break point on the locator.
Wait for the test case to stop at the break point.
Enter DevTools on the active Chrome window.
Check the DOM and test the locators on the live page.
What if my tests are flaky?
Web UI testing is roundly criticized for being “flaky” because tests often crash for unexpected reasons. However, much of the unreliability people hit with Web UI testing (and often with Selenium WebDriver itself) is that all Web interactions inherently pose race conditions. The automation and the browser execute independently, so interactions must be synchronized with page state. Otherwise, WebDriver will throw exceptions for timeouts, stale elements, and elements not found. Many times, these issues happen intermittently, so they can be difficult to trace and resolve.
The best way to avoid race conditions is this: Always wait for an element to exist before interacting with it. This may seem basic, but it’s easy to overlook. Selenium WebDriver packages all offer some sort of WebDriverWait object that will force the driver to wait for a given condition to be true before proceeding. The easiest way to check if an element exists is to check if the list of elements returned by a FindElements (plural) call is non-empty. Adding another call for each interaction may feel burdensome, but design patterns within well-designed frameworks (like the Screenplay Pattern) can make these checks happen automatically.
Another good practice is this: Always fetch fresh elements. Sometimes, automation will first get some elements and then use a second query to get more elements. Or, in the case of the Page Object Factory (which should never be used because, bluntly, its design is terrible), elements are fetched once when the page object is constructed and referenced thereafter. No matter which way, the longer a Web element object exists, the more prone it is to become stale and cause exceptions. I’ve seen elements turn stale inexplicably even when they still seem to be on the page, too. Always get an element in the moment when it is needed. That way, it can’t go stale!
Some have never automated tests and can’t check themselves before they wreck themselves. Others have 1000s of tests that are flaky, duplicative, and slow. Wa-do-we-do? Well, I gave a talk about this problem at a few Python conferences. The language used for example code was Python, but the principles apply to any language.
A sleek dashboard with automatic reloads for Test-Driven Development
Network traffic control for validation and mocking
Automatic screenshots and videos
Cypress was clearly developed for developers. It enables rapid test development with rapid feedback. The Cypress Test Runner is free, while the Cypress Dashboard Service (for better reporting and CI) will require a paid license.
How Do I Start Using Cypress?
Will Cypress Replace WebDriver?
While Selenium WebDriver supports nearly all major browsers, Cypress currently supports only one browser: Google Chrome. That’s a major limitation. Web apps do not work the same across browsers. Many industries (especially banking and finance) put strict controls on browser types and versions, too.
The WebDriver standard is a W3C Recommendation. What does this mean? All major browsers have a vested interest in implementing the standard. Selenium is simply the most popular implementation of the standard. It’s not going away. Cypress, however, is just a cool project backed with commercial intent.
“Bundled” test frameworks are becoming popular. Historically, a test framework simply provided test structure, basic fixtures, and maybe an assertion library (like JUnit). Then, extra test packages became popular (like Selenium WebDriver, REST APIs, mocking, logging, etc.). Now, new frameworks like Cypress and Protractor aim to provide pre-canned recipes of all these pieces to simplify the setup.
Many new test frameworks will likely be developer-centric. There is a trend in the software industry (especially with Agile) of eliminating traditional tester roles and putting testing work onto developers. The role of the “Software Engineer in Test” – a developer who builds test systems – is also on the rise. Test automation tools and frameworks will need to provide good developer experience (DX) to survive. Cypress is poised to ride that wave.
WebDriver is not perfect. Cypress was developed in large part to address WebDriver’s shortcomings, namely the slowness, difficulty, and unreliability (though unreliability is often a result of poor implementation). Many developers don’t like to use Selenium WebDriver, and so there will be a constant itch to make something better. Cypress isn’t there yet, but it might get there one day.
Selenium WebDriver is the most popular open source package for Web UI test automation. It allows tests to interact directly with a web page in a live browser. However, using Selenium WebDriver can be very frustrating because basic interactions often lack robustness, causing intermittent errors for tests.
One such vulnerable interaction is clicking elements on a page. Clicking is probably the most common interaction for tests. In C#, a basic click would look like this:
This is the easy and standard way to click elements using Selenium WebDriver. However, it will work only if the targeted element exists and is visible on the page. Otherwise, the WebDriver will throw exceptions. This is when programmers pull their hair out.
Waiting for Existence
To avoid race conditions, interactions should not happen until the target element exists on the page. Even split-second loading times can break automation. The best practice is to use explicit waits before interactions with a reasonable timeout value, like this:
const int timeoutSeconds = 15;
var ts = new TimeSpan(0, 0, timeoutSeconds);
var wait = new WebDriverWait(webDriver, ts);
wait.Until((driver) => driver.FindElements(By.Id("my-id")).Count > 0);
Sometimes, Web elements won’t appear without first triggering something else. Even if the element exists on the page, the WebDriver cannot click it until it is made visible. Always look for the proper way to make that element available for clicking. Click on any parent panels or expanders first. Scroll if necessary. Make sure the state of the system should permit the element to be clickable.
If the element is scrolled out of view, move to the element before clicking it:
Last Ditch Efforts
This article was written specifically for clicks, but its advice can be applied to other sorts of interactions, too. Just be smart about waits and preconditions.
Note: Code examples on this page are written in C#, but calls are similar for other languages supported by Selenium WebDriver.