automation

BDD 101: Manual Testing

Behavior-driven development takes an automation-first philosophy: behavior specs should become automated tests. However, BDD can also accommodate manual testing. Manual testing has a place and a purpose, even in BDD. Remember, behavior scenarios are first and foremost behavior specifications, and they provide value beyond testing and automation. Any behavior scenario could be run as a manual test. The main questions, then, are (1) when is manual testing appropriate and (2) how should it be handled.

(Check the Automation Panda BDD page for the full BDD 101 table of contents.)

When is Manual Testing Appropriate?

Automation is not a silver bullet – it doesn’t satisfy all testing needs. Scenarios should be written for all behaviors, but they likely shouldn’t be automated under the following circumstances:

  • The return-on-investment to automate the scenarios is too low.
  • The scenarios won’t be included in regression or continuous integration.
  • The behaviors are temporary (ex: hotfixes).
  • The automation itself would be too complex or too fragile.
  • The nature of the feature is non-functional (ex: performance, UX, etc.).
  • The team is still learning BDD and is not yet ready to automate all scenarios.

Manual testing is also appropriate for exploratory testing, in which engineers rely upon experience rather than explicit test procedures to “explore” the product under test for bugs and quality concerns. It complements automation because both testing styles serve different purposes. However, behavior scenarios themselves are incompatible with exploratory testing. The point of exploring is for engineers to go “unscripted” – without formal test plans – to find problems only a user would catch. Rather than writing scenarios, the appropriate way to approach behavior-driven exploratory testing is more holistic: testers should assume the role of a user and exercise the product under test as a collection of interacting behaviors. If exploring uncovers any glaring behavior gaps, then new behavior scenarios should be added to the catalog.

How Should Manual Testing Be Handled?

Manual testing fits into BDD in much the same way as automated testing because both formats share the same process for behavior specification. Where the two ways diverge is in how the tests are run. There are a few special considerations to make when writing scenarios that won’t be automated.

Repository

Both manual and automated behavior scenarios should be stored in the same repository. The natural way to organize behaviors is by feature, regardless of how the tests will be run. All scenarios should also be managed by some form of version control.

Furthermore, all scenarios should be co-located for document-generation tools like Pickles. Doc tools make it easy to expose behavior specs and steps to everyone. They make it easier for the Three Amigos to collaborate. Non-technical people are not likely to dig into programming projects.

Tags

Scenarios must be classified as manual or automated. When BDD frameworks run tests, they need a way to exclude tests that are not automated. Otherwise, test reports would be full of errors! In Gherkin, scenarios should be classified using tags. For example, scenarios could be tagged as either “@manual” or “@automated”. A third tag, “@automatable”, could be used to distinguish scenarios that are not yet automated but are targeted for automation.

Some BDD frameworks have nifty features for tags. In Cucumber-JVM, tags can be set as runner class options for convenience. This means that tag options could be set to “~@manual” to avoid manual tests. In SpecFlow, any scenario with the special “@ignore” tag will automatically be skipped. Nevertheless, I strongly recommend using custom tags to denote manual tests, since there are many reasons why a test may be ignored (such as known bugs).

Extra Comments

The conciseness of behavior scenarios is problematic for manual testing because steps don’t provide all the information a tester may need. For example, test data may not be written explicitly in the spec. The best way to add extra information to a scenario is to add comments. Gherkin allows any number of lines for comments and description. Comments provide extra information to the reader but are ignored by the automation.

It may be tempting to simply write new Gherkin steps to handle the extra information for manual testing. However, this is not a good approach. Principles of good Gherkin should be used for all scenarios, regardless of whether or not the scenarios will be automated. High-quality specification should be maintained for consistency, for documentation tools, and for potential future automation.

An Example

Below is a feature that shows how to write behavior scenarios for manual tests:

Feature: Google Searching

  @automated
  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  @manual
  Scenario: Image search
    # The Google home page URL is: http://www.google.com/
    # Make sure the images shown include pandas eating bamboo
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

It’s not really different from any other behavior scenarios.

 

As stated in the beginning, BDD should be automation-first. Don’t use the content of this article to justify avoiding automation. Rather, use the techniques outlined here for manual testing only as needed.

 

Test Automation Myth-Busting

Test automation is a vital part of software quality assurance, now more than ever. However, it is a discipline that is often poorly understood. I’ve heard all sorts of crazy claims about automation throughout my career. This post debunks a number of commonly held but erroneous beliefs about automation.

Myth #1: Every test should be automated.

“100% automation” seems to be a new buzz-phrase. If automation is so great, why not automate every test? Not every test is worth automating in terms of return-on-investment. Automation requires significant expertise to design, implement, and maintain. There are limits to how many tests a team can reasonably produce and manage. Furthermore, not all tests are equal. Some require more effort to handle, or may not be run as frequently, or cover less important features. Just because a test could be automated does not mean that it should be automated. Using a risk-based test strategy, tests to automate should be prioritized by highest ROI.

Automated testing does not completely replace manual testing, either. Automated testing is defensive: it protects a code line by consistently running scripted tests for core functionality. However, manual testing is offensive: it uses human expertise to explore features off-script, test-to-break, and evaluate wholistic quality. Returns-on-investment for the same tests are often opposites between automated and manual approaches. Automated and manual testing together fulfill vital, complementary roles.

Myth #2: Automation means we can downsize QA.

Executives often see test automation as a way to automate QA out of a job. This is simply not true: Automation makes QA jobs more efficient and all the more necessary. Automation is software and thus requires strong software development skills. It also requires extra tools, processes, and labor to maintain. The benefit is that more tests can be run more quickly. QA jobs won’t vanish due to automation – they simply assume new responsibilities.

Myth #3: Automation will catch all bugs.

By their very nature, automated tests are “scripted” – each test always follows the same pre-programmed steps. This is a very good thing for catching regression bugs, but it inherently cannot handle new, unforeseen situations. That’s why manual, exploratory testing is needed. Automation, being software, may also have its own bugs. Automation is not a silver bullet.

Myth #4: Automation must be written in the same language as the product code.

Automation must be written in the same programming language as the product code for white-box unit tests. However, any programming language may be used for black-box functional tests. Black-box functional tests (like integration and end-to-end tests) test a live product. There’s no direct connection between the automation code and the product code. For example, a web app could have a REST service layer written in Java, a Web UI frontend written in .NET and JavaScript, and test automation written in Python using requests and Selenium WebDriver. It may be helpful to write automation in the same language as the product so that developers can more easily contribute, but it is not required. Choose the best language for test automation to meet the present needs.

Myth #5: All tests should be such-and-such-level tests.

This argument varies by product type and team. For web apps, it could be phrased as, “All tests should be web UI tests, since that’s how the user interacts with the product.” This is nonsense – different layers of testing mitigate risk at their optimal returns-on-investment. The Testing Pyramid applies this principle. Consider a web app with a service layer in terms of automation – service calls have faster response times and are more reliable than web UI interactions. It would likely be wise to test input combinations at the service layer while focusing on UI-specific functionality only at the web layer.

Myth #6: Unit tests aren’t necessary because the QA team does the testing.

The existence of a QA team or of a black-box automated test suite does not negate the need for unit tests. Unit tests are an insurance policy – they make sure the software programming is fundamentally working. In continuous integration, they make sure builds are good. They are essential for good software development. Many times, I caught bugs in my own code through writing unit tests – bugs that nobody else ever saw because I fixed them before committing code. Personally, I would never want to work on a product without strong unit tests.

Myth #7: We can complete a user story this sprint and automate its tests next sprint.

In Agile Scrum, teams face immense pressure to finish user stories within a sprint. Test automation is often the last part of a story to be done. If the automation isn’t completed by the end of the sprint, teams are tempted to mark the story as complete and finish the test automation in the future. This is a terrible mistake and a violation of Agile principles. Test automation should be included in the definition of done. A story isn’t complete without its prescribed tests! Punting tests into the next sprint merely builds technical debt and forces QA into constant catch-up. To mitigate the risk of incomplete stories, teams should size stories to include automation work, shift left to start QA sooner, or reduce the total sprint commitment size. Incomplete test automation often happens when product code is delivered late or a team’s capacity is overestimated.

Myth #8: Automation is just a bunch of “test scripts.”

It’s quite common to hear developers or managers refer to automated tests as “test scripts.” While this term itself is not inherently derogatory, it oversimplifies the complexity of test automation. “Scripts” sound like short, hacky sequences of commands to do system dirty-work. Test automation, however, is a full stack: in addition to the product under test, automation involves design patterns, dependency packages, development processes, version control, builds, deployments, reporting, and failure triage. Referring to test automation as “scripting” leads to chronic planning underestimations. Automation is a discipline, and the investment it requires should be honored.

 

Do you have any other automation myths to debunk? Share them in the comment section below!

BDD 101: Test Data

How should test data be handled in a behavior-driven test framework? This is a common question I hear from teams working on BDD test automation. A better question to ask first is, What is test data? This article will explain different types of test data and provide best practices for handling each. The strategies covered here can be applied to any BDD test framework. (Check the Automation Panda BDD page for the full table of contents.)

Types of Test Data

Personally, I hate the phrase “test data” because its meaning is so ambiguous. For functional test automation, there are three primary types of test data:

  1. Test Case Values. These are the input and expected output values for test cases. For example, when testing calculator addition “1 + 2 = 3”, “1” and “2” would be input values, and “3” would be the expected output value. Input values are often parameterized for reusability, and output values are used in assertions.
  2. Configuration Data. Config data represents the system or environment in which the tests run. Changes in config data should allow the same test procedure to run in different environments without making any other changes to the automation code. For example, a calculator service with an addition endpoint may be available in three different environments: development, test, and production. Three sets of config data would be needed to specify URLs and authentication in each environment (the config data), but 1 + 2 should always equal 3 in any environment (the test case values).
  3. Ready State. Some tests require initial state to be ready within a system. “Ready” state could be user accounts, database tables, app settings, or even cluster data. If testing makes any changes, then the data must be reverted to the ready state.

Each type of test data has different techniques for handling it.

Test Case Values

There are 4 main ways to specify test case values in BDD frameworks, ranging from basic to complex.

In The Specs

The most basic way to specify test case values is directly within the behavior scenarios themselves! The Gherkin language makes it easy – test case values can be written into the plain language of a step, as step parameters, or in Examples tables. Consider the following example:

Scenario Outline: Simple Google searches
  Given a web browser is on the Google page
  When the search phrase "<phrase>" is entered
  Then results for "<phrase>" are shown
  
  Examples: Animals
    | phrase   |
    | panda    |
    | elephant |
    | rhino    |

The test case value used is the search phrase. The When and Then steps both have a parameter for this phrase, which will use three different values provided by the Examples table. It is perfectly suitable to put these test case values directly into the scenario because the values are small and descriptive.

Furthermore, notice how specific result values are not specified for the Then step. Values like “Panda Express” or “Elephant man” are not hard-coded. The step wording presumes that the step definition will have some sort of programmed mechanism for checking that result links relate to the search phrase (likely through regular expression matching).

Key-Value Lookup

Direct specification is great for small sets of simple values, but one size does not fit all needs. Key-value lookups are appropriate when test data is lengthier. For example, I’ve often seen steps like this:

Given the user navigates to "http://www.somewebsite.com/long/path/to/the/profile/page"

URLs, hexadecimal numbers, XML blocks, and comma-separated lists are all the usual suspects. While it is not incorrect to put these values directly into a step parameter, something like this would be more readable:

Given the user navigates to the "profile" page

Or even:

Given the user navigates to their profile page

The automation would store URLs in a lookup table so that these new steps could easily fetch the URL for the profile page by name. These steps are also more declarative than imperative and better resist changes in the underlying environment.

Another way to use key-value lookup is to refer to a set of values by one name. Consider the following scenario for entering an address:

Scenario Outline: Address entry
  Given the profile edit page is displayed
  When the user sets the street address to "<street>"
  And the user sets the second address line to "<second>"  
  And the user sets the city to "<city>"
  And the user sets the state to "<state>"
  And the user sets the zipcode to "<zipcode>"
  And the user sets the country to "<country>"
  And the user clicks the save button
  Then ...

  Examples: Addresses
    | street | second | city | state | zipcode | country |
    ...

An address has a lot of fields. Specifying each in the scenario makes it very imperative and long. Furthermore, if the scenario is an outline, the Examples table can easily extend far to the right, off the page. This, again, is not readable. This scenario would be better written like this:

Scenario Outline: Address entry
  Given the profile edit page is displayed
  When the user enters the "<address-type>" address
  And the user clicks the save button
  Then ...

  Examples: Addresses
    | address-type |
    | basic        |
    | two-line     |
    | foreign      |

Rather than specifying all the values for different addresses, this scenario names the classifications of addresses. The step definition can be written to link the name of the address class to the desired values.

Data Files

Sometimes, test case values should be stored in data files apart from the specs or the automation code. Reasons could be:

  • The data is simply too large to reasonably write into Gherkin or into code.
  • The data files may be generated by another tool or process.
  • The values are different between environments or other circumstances.
  • The values must be selected or switched at runtime (without re-compiling code).
  • The files themselves are used as payloads (ex: REST request bodies or file upload).

Scenario steps can refer to data files using the key-value lookup mechanisms described above. Lightweight, text-based, tabular file formats like CSV, XML, or JSON work the best. They can parsed easily and efficiently, and changes to them can easily be diff’ed. Microsoft Excel files are not recommended because they have extra bloat and cannot be easily diff’ed line-by-line. Custom text file formats are also not recommended because custom parsing is an extra automation asset requiring unnecessary development and maintenance. Personally, I like using JSON because its syntax is concise and its parsing tools seem to be the simplest in most programming languages.

External Sources

An external dependency exists when the data for test case values exists outside of the automation code base. For example, test case values could reside in a database instead of a CSV file, or they could be fetched from a REST service instead of a JSON file. This would be appropriate if the data is too large to manage as a set of files or if the data is constantly changing.

As a word of caution, external sources should be used only if absolutely necessary:

  1. External sources introduce an additional point-of-failure. If that database or service goes down, then the test automation cannot run.
  2. External sources degrade performance. It is slower to get data from a network connection than from a local machine.
  3. Test case values are harder to audit. When they are in the specs, the code, or data files, history is tracked by version control, and any changes are easy to identify in code reviews.
  4. Test case values may be unpredictable. The automation code base does not control the values. Bad values can fail tests.

External sources can be very useful, if not necessary, for performance / stress / load / limits testing, but it is not necessary for the vast majority of functional testing. It may be convenient to mock external sources with either a mocking framework like Mockito or with a dummy service.

Configuration Data

Config data pertain to the test environments, not the test cases. Test automation should never contain hard-coded values for config data like URLs, usernames, or passwords. Rather, test automation should read config data when it launches tests and make references to the required values. This should be done in Before hooks and not in Gherkin steps. In this way, automated tests can run on any configuration, such as different test environments before being released to production.

Config data can be stored in data files or accessed through some other dependency. (Read the previous section for pros and cons of those approaches.) The config to use should be somehow dynamically selectable when tests run. For example, the path to the config file to use could be provided as a command line argument to the test launch command.

Config data can be used to select test values to use at runtime. For example, different environments may need different test value data files. Conversely, scenario tagging can control what parts of config data should be used. For example, a tag could specify a username to use for the scenario, and a Before hook could use that username to fetch the right password from the config data.

For efficiency, only the necessary config data should be accessed or read into memory. In many cases, fetching the config data should also be done once globally, rather than before each test case.

Ready State

All scenarios have a starting point, and often, that starting point involves data. Setup operations must bring the system into the ready state, and cleanup operations must return the system to the ready state. Test data should leave no trace – temporary files should be deleted and records should be reverted. Otherwise, disk space may run out or duplicate records may fail tests. Maintaining the ready state between tests is necessary for true test independence.

During the Test Run

Simple setup and cleanup operations may be done directly within the automation. For example, when testing CRUD operations, records must be created before they can be retrieved, updated, or deleted. Setup would create a record, and cleanup would guarantee the record’s deletion. If the setup is appropriate to mention as part of the behavior, then it should be written as Given steps. This is true of CRUD operations: “Given a record has been created, When it is deleted, …”. If multiple scenarios share this same setup, then those Given steps should be put into a Background section.

However, sometimes setup details are not pertinent to the behavior at hand. For example, perhaps fresh authentication tokens must be generated for those CRUD calls. Those operations should be handled in Before hooks. The automation will take care of it, while the Gherkin steps can focus exclusively on the behavior.

No matter what, After hooks must do cleanup. It is incorrect to write final Then steps to do cleanup. Then steps should verify outcomes, not take more actions. Plus, the final Then steps will not be run if the test has a failure and aborts!

External Preparation

Some data simply takes too long to set up fresh for each test launch. Consider complicated user accounts or machine learning data: these are things that can be created outside of the test automation. The automation can simply presume that they exist as a precondition. These types of data require tool automation to prepare. Tool automation could involve a set of scripts to load a database, make a bunch of service calls, or navigate through a web portal to update settings. Automating this type of setup outside of the test automation enables engineers to more easily replicate it across different environments. Then, tests can run in much less time because the data is already there.

However, this external preparation must be carefully maintained. If any damage is done to the data, then test case independence is lost. For example, deleting a user account without replacing it means that subsequent test runs cannot log in! Along with setup tools, it is important to create maintenance tools to audit the data and make repairs or updates.

Advice for Any Approach

Use the minimal amount of test data necessary to test the functionality of the product under test. More test data requires more time to develop and manage. As a corollary, use the simplest approach that can pragmatically handle the test data. Avoid external dependencies as much as possible.

To minimize test data, remember that BDD is specification by example: scenarios should use descriptive values. Furthermore, variations should be reduced to input equivalence classes. For example, in the first scenario example on this page, it would probably be sufficient to test only one of those three animals, because the other two animals would not exhibit any different searching behavior.

Finally, be cautioned against randomization in test data. Functional tests are meant to be deterministic – they must always pass or fail consistently, or else test results will not be reliable. (Not only could this drive a tester crazy, but it would also break a continuous integration system.) Using equivalence classes is the better way to cover different types of inputs. Use a unique number counting mechanism whenever values must be unique.

For handling unpredictable test data, check out Unpredictable Test Data.

BDD‑‑; Collaboration without Automation

In the previous post, I described the tradeoffs of using a BDD test automation framework without the full BDD process. But, what about the opposite? What if a team wants to adopt BDD practices without a test framework to support it? Again, behavior-driven practices are beneficial apart from automation, but not without shortcomings.

The Power of Process

BDD should be a refinement, not an overhaul, of Agile software development. All of the problems BDD solves are simply aspects of the development process that must be solved anyway. BDD simply provides formal practices for solving them uniformly. Consider how BDD addresses the following problems:

Problem Solution
Biz, dev, and test roles are siloed and do not talk together much. BDD brings these three roles together in Three Amigos meetings.
Acceptance criteria are missing or poorly defined, wasting in-sprint time. Acceptance criteria are formalized as specifications using Gherkin.
Product features are hard to explain. Scenarios describe individual behaviors in plain language.
Team members have open questions or conflicting views about behaviors. Example Mapping efficiently unifies a team’s understanding and identifies areas for further refinement.
Edge cases are overlooked during testing. Well-defined behavior scenarios capture specifications by example early in development.

All of these problems can be solved through better, behavior-driven practices, and none of them pertain to test automation.

Spec-Less Automation

BDD process improvements don’t necessarily need a BDD framework for test automation. Any test framework could still automate scenario steps. The major difference is that there would be no mechanism to translate Gherkin lines into method/function calls: The automation engineer would simply need to program test cases the “good old-fashioned way.” It would not be much different from translating any other procedure-driven test cases into code.

The weakness of this approach is that specifications are not strongly linked to the test automation. The end-to-end development process is less efficient because behavior scenarios must essentially be rewritten into automation code, rather than becoming part of the automation code. There is also a higher risk that automated test cases won’t cover the actual intention of the test steps. Review and maintenance are more difficult because engineers must always cross-examine the automation code with the Gherkin to make sure they align. All of these problems make it harder to shift left with QA work.

The lack of a behavior-driven test framework is also a double-edged sword for Gherkin steps. On one hand, steps do not need to be scrutinized as strongly in review, since automation code does not directly depend upon them. It is not critical to reuse steps word-for-word or to worry about parameterization. However, sloppy steps can lead to miscommunication and will make adopting a BDD test framework in the future very difficult.

Better Than Nothing

Just like for automation without collaboration, using BDD practices without using a BDD test framework does improve the development process. There aren’t really any disadvantages because the process problems must be solved anyway. A “BDD‑‑;” situation (that’s a postfix decrement, to denote that automation did not follow collaboration) isn’t ideal, but at least it’s better than nothing.

Can Performance Tests be Unit Tests?

A friend recently asked me this question (albeit with some rephrasing):

Can a unit test be a performance test? For example, can a unit test wait for an action to complete and validate that the time it took is below a preset threshold?

I cringed when I heard this question, not only because it is poor practice, but also because it reflects common misunderstandings about types of testing.

QA Buzzword Bingo

The root of this misunderstanding is the lack of standard definitions for types of tests. Every company where I’ve worked has defined test types differently. Individuals often play fast and loose with buzzword bingo, especially when new hires from other places used different buzzwords. Here are examples of some of those buzzwords:

  • Unit testing
  • Integration testing
  • End-to-end testing
  • Functional testing
  • System testing
  • Performance testing
  • Regression testing
  • Test-’til-it-breaks
  • Measurements / benchmarks / metrics
  • Continuous integration testing

And here are some games of buzzword bingo gone wrong:

  • Trying to separate “systemic” tests from “system” tests.
  • Claiming that “unit” tests should interact with a live web page.
  • Separating “regression” tests from other test types.

Before any meaningful discussions about testing can happen, everyone must agree to a common and explicit set of testing type definitions. For example, this could be a glossary on a team wiki page. Whenever I have discussions with others on this topic, I always seek to establish definitions first.

What defines a unit test?

Here is my definition:

A unit test is a functional, white box test that verifies the correctness of a single unit of software code. It is functional in that it gives a deterministic pass-or-fail result. It is white box in that the test code directly calls the product source code under test. The unit is typically a function or method, and there should be separate unit tests for each equivalence class of inputs.

Unit tests should focus on one thing, and they are typically short – both in lines of code and in execution time. Unit tests become extremely useful when they are automated. Every major programming language has unit test frameworks. Some popular examples include JUnit, xUnit.net, and pytest. These frameworks often integrate with code coverage, too.

In continuous integration, automated unit tests can be run automatically every time a new build is made to indicate if the build is good or bad. That’s why unit tests must be deterministic – they must yield consistent results in order to trust build status and expedite failure triage. For example, if a build was green at 10am but turned red at 11am, then, so long as the tests were deterministic, it is reasonable to deduce that a defective change was committed to the code line between 10-11am. Good build status indicates that the build is okay to deploy to a test environment and then hopefully to production.

(As a side note, I’ve heard arguments that unit tests can be black box, but I disagree. Even if a black box test covers only one “unit”, it is still at least an integration test because it covers the connection between the actual product and some caller (script, web browser, etc.).)

What defines a performance test?

Again, here’s my definition:

performance test is a test that measures aspects of a controlled system. It is white box if it tests code directly, such as profiling individual functions or methods. It is black box if it tests a real, live, deployed product. Typically, when people talk about testing software performance, they mean black box style testing. The aspects to measure must be pre-determined, and the system under test must be controlled in order to achieve consistent measurements.

Performance tests are not functional tests:

  • Functional tests answer if a thing works.
  • Performance tests answer how efficiently a thing works.

Rather than yield pass-or-fail results, performance tests yield measurements. These measurements could track things as general as CPU or memory usage, or they could track specific product features like response times. Once measurements are gathered, data analysis should evaluate the goodness of the measurements. This often means comparison to other measurements, which could be from older releases or with other environment controls.

Performance testing is challenging to set up and measure properly. While unit tests will run the same in any environment, performance tests are inherently sensitive to the environment. For example, an enterprise cloud server will likely have better response time than a 7-year-old Macbook.

Why should performance tests not be unit tests?

Returning to the original question, it is theoretically possible to frame a performance test as a functional test by validating a specific measurement against a preset threshold. However, there are 3 main reasons why a unit test should not be a performance test:

  1. Performance checks in unit tests make the build process more vulnerable to environmental issues. Bad measurements from environment issues could cause unit tests to fail for reasons unrelated to code correctness. Any unit test failure will block a build, trigger triage, and stall progress. This means time and money. The build process must not be interrupted by environment problems.
  2. Proper performance tests require lots of setup beyond basic unit test support. Unit tests should be short and sweet, and unit testing frameworks don’t have the tools needed to take good measurements. Unit test environments are often not set up in tightly controlled environments, either. It would take a lot of work to properly put performance checks into a unit test.
  3. Performance tests yield metrics that should not be shoehorned into a binary pass/fail status. Performance data is complex and rich with information. Teams should analyze performance data, especially over time. It can also be volatile.

These points are based on the explicit definitions provided above. Note that I am not saying that performance testing should not be done, but rather that performances checks should not be part of unit testing. Unit testing and performance testing should be categorically separate types of testing.

10 Gotchas for Automation Code Reviews

Lately, I’ve been doing lots of code reviews. I probably spend about an hour every work day handling reviews for my team, both as a reviewer and an author. All of the reviews exclusively cover end-to-end test automation: new tests, old fixes, config changes, and framework updates. I adamantly believe that test automation code should undergo the same scrutiny of review as the product code it tests, because test automation is a product. Thus, all of the same best practices (like the guides here and here) should be applied. Furthermore, I also look for problems that, anecdotally, seem to appear more frequently in test automation than in other software domains. Below is a countdown of my “Top 10 Gotchas”. They are the big things I emphasize in test automation code reviews, in addition to the standard review checklist items.

#10: No Proof of Success

Trust, but verify,” as Ronald Reagan would say. Tests need to run successfully in order to pass review, and proof of success (such as a log or a screen shot) must be attached to the review. In the best case, this means something green (or perhaps blue for Jenkins). However, if the product under test is not ready or has a bug, this could also mean a successful failure with proof that the critical new sections of the code were exercised. Tests should also be run in the appropriate environments, to avoid the “it-ran-fine-on-my-machine” excuse later.

#9: Typos and Bad Formatting

My previous post, Should I Reject a Code Review for Typos?, belabored this point. Typos and bad formatting reflect carelessness, cause frustration, and damage reputation. They are especially bad for Behavior-Driven Development frameworks.

#8: Hard-Coded Values

Hard-coded values often indicate hasty development. Sometimes, they aren’t a big problem, but they can cripple an automation code base’s flexibility. I always ask the following questions when I see a hard-coded value:

  • Should this be a shared constant?
  • Should this be a parameterized value for the method/function/step using it?
  • Should this be passed into the test as an external input (such as from a config file or the command line)?

#7: Incorrect Test Coverage

It is surprisingly common to see an automated test that doesn’t actually cover the intended test steps. A step from the test procedure may be missing, or an assertion may yield a false positive. Sometimes, assertions may not even be performed! When reviewing tests, keep the original test procedure handy, and watch out for missing coverage.

#6: Inadequate Documentation

Documentation is vital for good testing and good maintenance. When a test fails, the doc it provides (both in the logs it prints and in its very own code) significantly assist triage. Automated test cases should read like test procedures. This is one reason why self-documenting behavior-driven test frameworks are so popular. Even without BDD, test automation should be flush with comments and self-documenting identifiers. If I cannot understand a test by skimming its code in a code review, then I ask questions, and when the author provides answers, I ask them to add their answers as comments to the code.

#5: Poor Code Placement

Automation projects tend to grow fast. Along with new tests, new shared code like page objects and data models are added all the time. Maintaining a good, organized structure is necessary for project scalability and teamwork. Test cases should be organized by feature area. Common code should be abstracted from test cases and put into shared libraries. Framework-level code for things like inputs and logging should be separated from test-level code. If code is put in the wrong place, it could be difficult to find or reuse. It could also create a dependency nightmare. For example, non-web tests should not have a dependency on Selenium WebDriver. Make sure new code is put in the right place.

#4: Bad Config Changes

Even the most seemingly innocuous configuration tweak can have huge impacts:

  • A username change can cause tests to abort setup.
  • A bad URL can direct a test to the wrong site.
  • Committing local config files to version control can cause other teammates’ local projects to fail to build.
  • Changing test input values may invalidate test runs.
  • One time, I brought down a whole continuous integration pipeline by removing one dependency.

As a general rule, submit any config changes in a separate code review from other changes, and provide a thorough explanation to the reviewers for why the change is needed. Any time I see unusual config changes, I always call them out.

#3: Framework Hacks

A framework is meant to help engineers automate tests. However, sometimes the framework may also be a hindrance. Rather than improve the framework design, many engineers will try to hack around the framework. Sometimes, the framework may already provide the desired feature! I’ve seen this very commonly with dependency injection – people just don’t know how to use it. Hacks should be avoided because test automation projects need a strong overall design strategy.

#2: Brittleness

Test automation must be robust enough to handle bumps in the road. However, test logic is not always written to handle slightly unexpected cases. Here are a few examples of brittleness to watch out for in review:

  • Do test cases have adequate cleanup routines, even when they crash?
  • Are all exceptions handled properly, even unexpected ones?
  • Is Selenium WebDriver always disposed?
  • Will SSH connections be automatically reconnected if dropped?
  • Are XPaths too loose or too strict?
  • Is a REST API response code of 201 just as good as 200?

#1: Duplication

Duplication is the #1 problem for test automation. I wrote a whole article about it: Why is Automation Full of Duplicate Code? Many testing operations are inherently repetitive. Engineers sometimes just copy-paste code blocks, rather than seek existing methods or add new helpers, to save development time. Plus, it can be difficult to find reusable parts that meet immediate needs in a large code base. Nevertheless, good code reviews should catch code redundancy and suggest better solutions.

 

Please let me know in the comments section if there are any other specific things you look for when reviewing test automation code!

Gotta Catch ’em All!

How can lessons from Pokémon apply to software testing and automation?

It’s no secret that I’m a lifelong Nintendo fanboy, and one of my favorite game franchises is Pokémon. Since Christmas, I have been playing the latest installment in the series, Pokémon Moon. The basic gameplay in all Pokémon games is to capture “pocket monsters” in the wild and train them for competitive battles. The main quest is to become the Pokémon League Champion by defeating the strongest trainers in the land. However, as any child of the ’90s will recall, the other major goal of the game is to catch all species of Pokémon. With 300 unique species in the latest installment, that’s no small feat. I can proudly say that I caught ’em all in Pokémon Moon.

It may seem strange to talk about video games on a professional blog, but I see five major parallels between catching Pokémon and my career in software quality and automation.

#1: As QA, We Gotta Catch ’em All

It is the quality engineer’s job to find and resolve all software problems: bugs, defects, design flaws, bad code check-ins, test failures, environment instabilities, deployment hiccups, and even automation crashes. We get paid to make sure things are good. And if we don’t catch a problem, then we haven’t done our jobs right.

#2: Coverage is Key to Success

One of the reasons to catch new Pokémon is not just to make the game’s Pokémon Professor happy for “scientific research,” but moreover to find stronger monsters to assist with the quest. Likewise, more test coverage means more problems discovered, which assists our quest to guarantee product quality. Our goal should be to achieve as close to complete test coverage as reasonably possible. When necessary, we should use risk-based approaches to smartly minimize test gaps as well. And our tests should be strong enough to legitimately exercise the features under test.

#3: Automating Tests Takes Time

As my trainer passport shows, it took me over 100 hours of gameplay to catch all 300 Pokémon. That’s a serious time investment. Test automation is the same way: it takes time to automate tests properly. It requires a robust, scalable, and extendable framework upon which to build test cases. Test automation is a software product in its own right that requires the same best practices and discipline as the product it tests. Software teams must allocate resources to its development and maintenance. It’s not as simple as just “writing test scripts.” However, when done right, the investment pays off.

#4: Not All Tests are Equal

Test metrics like “X passed and Y failed” or “N% of tests automated” can be very misleading because they do not account for differences between tests. Some tests have more coverage than others. Some require more time to run. Some require more time to automate. For example, 100 tests for feature A may take a day to automate and run in 3 minutes, while 5 tests for feature B may take a full week to automate and run in 3 hours. Yet, feature A may still be more important. All Pokémon are likewise not created equal. Some are simply given to you (like Rowlet, Litten, and Popplio), while others take hours of searching to find (like Castform) or are simply too tough to capture without a hard fight (like the Ultra Beasts). Be mindful of test differences for planning, execution, and reporting.

#5: Never Leave Work Incomplete

As a completionist, I would not consider my Pokémon adventure complete without a full Pokédex. There is great satisfaction in accomplishing the full measure of a goal. The same thing goes for testing and automation: my job is not done until all tests are automated and all lines are green. At times, it may be easy to give up because that “Ultra Beast” just won’t cooperate, but it’s the job to catch it. Always complete the ‘dex; always complete the job.

Pokédex Proof

Here’s the proof that I caught ’em all:

Complete Aloladex

The “Pokédex” is a device that indexes all of the Pokémon captured by a trainer. Mine is 100% complete!

Trainer Stamp

Here’s the stamp in my trainer passport to prove it!

 

If you see any more parallels between Pokémon and QA, please add them to the comments section below!

Python Testing 101: pytest

Overview

pytest is an awesome Python test framework. According to its homepage:

pytest is a mature full-featured Python testing tool that helps you write better programs.

Pytests may be written either as functions or as methods in classes – unlike unittest, which forces tests to be inside classes. Test classes must be named “Test*”, and test functions/methods must be named “test_*”. Test classes also need not inherit from unittest.TestCase or any other base class. Thus, pytests tend to be more concise and more Pythonic. pytest can also run unittest and nose tests.

pytest provides many advanced test framework features:

pytest is actively supported for both Python 2 and 3.

Installation

Use pip to install the pytest module. Optionally, install other plugins as well.

pip install pytest
pip install pytest-cov
pip install pytest-xdist
pip install pytest-bdd

Project Structure

The modules containing pytests should be named “test_*.py” or “*_test.py”. While the pytest discovery mechanism can find tests anywhere, pytests must be placed into separate directories from the product code packages. These directories may either be under the project root or under the Python package. However, the pytest directories must not be Python packages themselves, meaning that they should not have “__init__.py” files. (My recommendation is to put all pytests under “[project root]/tests”.) Test configuration may be added to configuration files, which may go by the names “pytest.ini”, “tox.ini”, or “setup.cfg”.

[project root directory]
|‐‐ [product code packages]
|-- [test directories]
|   |-- test_*.py
|   `-- *_test.py
`-- [pytest.ini|tox.ini|setup.cfg]

Example Code

An example project named example-py-pytest is located in my GitHub python-testing-101 repository. The project has the following structure:

example-py-pytest
|-- com.automationpanda.example
|   |-- __init__.py
|   |-- calc_class.py
|   `-- calc_func.py
|-- tests
|   |-- test_calc_class.py
|   `-- test_calc_func.py
|-- README.md
`-- pytest.ini

The pytest.ini file is simply a configuration file stub. Feel free to add contents for local testing needs.


# Add pytest options here
[pytest]

view raw

pytest.ini

hosted with ❤ by GitHub

The com.automationpanda.example.calc_func module contains basic math functions.


def add(a, b):
return a + b
def subtract(a, b):
return a – b
def multiply(a, b):
return a * b
def divide(a, b):
return a * 1.0 / b
def maximum(a, b):
return a if a >= b else b
def minimum(a, b):
return a if a <= b else b

view raw

calc_func.py

hosted with ❤ by GitHub

The calc_func tests located in tests/test_calc_func.py are written as functions. Test functions are preferable to test classes when testing functions without side effects.


import pytest
from com.automationpanda.example.calc_func import *
NUMBER_1 = 3.0
NUMBER_2 = 2.0
def test_add():
value = add(NUMBER_1, NUMBER_2)
assert value == 5.0
def test_subtract():
value = subtract(NUMBER_1, NUMBER_2)
assert value == 1.0
def test_subtract_negative():
value = subtract(NUMBER_2, NUMBER_1)
assert value == -1.0
def test_multiply():
value = multiply(NUMBER_1, NUMBER_2)
assert value == 6.0
def test_divide():
value = divide(NUMBER_1, NUMBER_2)
assert value == 1.5

The divide-by-zero test uses pytest.raises:


def test_divide_by_zero():
with pytest.raises(ZeroDivisionError) as e:
divide(NUMBER_1, 0)
assert "division by zero" in str(e.value)

And the min/max tests use parameterization:


@pytest.mark.parametrize("a,b,expected", [
(NUMBER_1, NUMBER_2, NUMBER_1),
(NUMBER_2, NUMBER_1, NUMBER_1),
(NUMBER_1, NUMBER_1, NUMBER_1),
])
def test_maximum(a, b, expected):
assert maximum(a, b) == expected
@pytest.mark.parametrize("a,b,expected", [
(NUMBER_1, NUMBER_2, NUMBER_2),
(NUMBER_2, NUMBER_1, NUMBER_2),
(NUMBER_2, NUMBER_2, NUMBER_2),
])
def test_minimum(a, b, expected):
assert minimum(a, b) == expected

The com.automationpanda.example.calc_class module contains the Calculator class, which uses the math functions from calc_func. Keeping the functional spirit, the private _do_math method takes in a reference to the math function for greater code reusability.


from com.automationpanda.example.calc_func import *
class Calculator(object):
def __init__(self):
self._last_answer = 0.0
@property
def last_answer(self):
return self._last_answer
def _do_math(self, a, b, func):
self._last_answer = func(a, b)
return self.last_answer
def add(self, a, b):
return self._do_math(a, b, add)
def subtract(self, a, b):
return self._do_math(a, b, subtract)
def multiply(self, a, b):
return self._do_math(a, b, multiply)
def divide(self, a, b):
return self._do_math(a, b, divide)
def maximum(self, a, b):
return self._do_math(a, b, maximum)
def minimum(self, a, b):
return self._do_math(a, b, minimum)

view raw

calc_class.py

hosted with ❤ by GitHub

While tests for the Calculator class could be written using a test class, pytest test functions are just as capable. Fixtures enable a more fine-tuned setup/cleanup mechanism than the typical xUnit-like methods found in test classes. Fixtures can also be used in conjunction with parameterized methods. The tests/test_calc_class.py module is very similar to tests/test_calc_func.py and shows how to use fixtures for testing a class.


import pytest
from com.automationpanda.example.calc_class import Calculator
# "Constants"
NUMBER_1 = 3.0
NUMBER_2 = 2.0
# Fixtures
@pytest.fixture
def calculator():
return Calculator()
# Helpers
def verify_answer(expected, answer, last_answer):
assert expected == answer
assert expected == last_answer
# Test Cases
def test_last_answer_init(calculator):
assert calculator.last_answer == 0.0
def test_add(calculator):
answer = calculator.add(NUMBER_1, NUMBER_2)
verify_answer(5.0, answer, calculator.last_answer)
def test_subtract(calculator):
answer = calculator.subtract(NUMBER_1, NUMBER_2)
verify_answer(1.0, answer, calculator.last_answer)
def test_subtract_negative(calculator):
answer = calculator.subtract(NUMBER_2, NUMBER_1)
verify_answer(-1.0, answer, calculator.last_answer)
def test_multiply(calculator):
answer = calculator.multiply(NUMBER_1, NUMBER_2)
verify_answer(6.0, answer, calculator.last_answer)
def test_divide(calculator):
answer = calculator.divide(NUMBER_1, NUMBER_2)
verify_answer(1.5, answer, calculator.last_answer)
def test_divide_by_zero(calculator):
with pytest.raises(ZeroDivisionError) as e:
calculator.divide(NUMBER_1, 0)
assert "division by zero" in str(e.value)
@pytest.mark.parametrize("a,b,expected", [
(NUMBER_1, NUMBER_2, NUMBER_1),
(NUMBER_2, NUMBER_1, NUMBER_1),
(NUMBER_1, NUMBER_1, NUMBER_1),
])
def test_maximum(calculator, a, b, expected):
answer = calculator.maximum(a, b)
verify_answer(expected, answer, calculator.last_answer)
@pytest.mark.parametrize("a,b,expected", [
(NUMBER_1, NUMBER_2, NUMBER_2),
(NUMBER_2, NUMBER_1, NUMBER_2),
(NUMBER_2, NUMBER_2, NUMBER_2),
])
def test_minimum(calculator, a, b, expected):
answer = calculator.minimum(a, b)
verify_answer(expected, answer, calculator.last_answer)

Personally, I prefer to write pytests as functions because they are usually cleaner and more flexible than classes. Plus, test functions appeal to my affinity for functional programming.

Test Launch

Basic Test Execution

pytest has a very powerful command line for launching tests. Simply run the pytest module from within the project root directory, and pytest will automatically discover tests.

# Find and run all pytests from the current directory
python -m pytest

# Run pytests under a given path
python -m pytest 

# Run pytests in a specific module
python -m pytest tests/test_calc_func.py

# Generate JUnit-style XML test reports
python -m pytest --junitxml=[path-to-file]

# Get command help
python -m pytest -h

The terminal output looks like this:

python -m pytest
=============================== test session starts ===============================
platform darwin -- Python 2.7.13, pytest-3.0.6, py-1.4.32, pluggy-0.4.0
rootdir: /Users/andylpk247/Programming/automation-panda/python-testing-101/example-py-pytest, inifile: pytest.ini
plugins: cov-2.4.0
collected 25 items

tests/test_calc_class.py .............
tests/test_calc_func.py ............

============================ 25 passed in 0.11 seconds ============================

pytest also provides shorter “pytest” and “py.test” command that may be run instead of the longer “python -m pytest” module form. However, the shorter commands do not append the current path to PYTHONPATH, meaning modules under test may not be importable. Make sure to update PYTHONPATH before using the shorter commands.

# Update the Python path
PYTHONPATH=$PYTHONPATH:.

# Discover and run tests using the shorter command
pytest

Code Coverage

To run code coverage with the pytest-cov plugin module, use the following command. The report types are optional, but all four types are show below. Specific paths for each report may be appended using “:”.

# Run tests with code coverage
python -m pytest [test-path] [other-options] \
      --cov= \
      --cov-report=annotate \
      --cov-report=html \
      --cov-report=term \
      --cov-report=xml

Code coverage output on the terminal (“term” cov-report) looks like this:

python -m pytest --cov=com --cov-report=term
============================= test session starts ==============================
platform darwin -- Python 3.6.5, pytest-3.0.6, py-1.4.32, pluggy-0.4.0
rootdir: /Users/andylpk247/Programming/automation-panda/python-testing-101/example-py-pytest, inifile: pytest.ini
plugins: cov-2.4.0
collected 25 items 

tests/test_calc_class.py .............
tests/test_calc_func.py ............

---------- coverage: platform darwin, python 3.6.5-final-0 -----------
Name                                        Stmts   Miss  Cover
---------------------------------------------------------------
com/__init__.py                                 0      0   100%
com/automationpanda/__init__.py                 0      0   100%
com/automationpanda/example/__init__.py         0      0   100%
com/automationpanda/example/calc_class.py      21      0   100%
com/automationpanda/example/calc_func.py       12      0   100%
---------------------------------------------------------------
TOTAL                                          33      0   100%

========================== 25 passed in 0.11 seconds ===========================<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>

Parallel Testing

Parallel testing is vital for more intense testing, such as web testing. The pytest-xdist plugin makes it possible both to scale-up tests by running more than one test process and to scale-out by running tests on other machines. (As a prerequisite, machines need rsync and SSH.) The command below shows how to run multiple test sub-processes; refer to official documentation for multi-machine setup

python -m pytest -n 4
============================= test session starts ==============================
platform darwin -- Python 3.6.5, pytest-3.0.6, py-1.4.32, pluggy-0.4.0
rootdir: /Users/andylpk247/Programming/automation-panda/python-testing-101/example-py-pytest, inifile: pytest.ini
plugins: xdist-1.22.2, forked-0.2, cov-2.4.0
gw0 [25] / gw1 [25] / gw2 [25] / gw3 [25]
scheduling tests via LoadScheduling
.........................
========================== 25 passed in 1.30 seconds ===========================

Pros and Cons

I’ll say it again: pytest is awesome. It is a powerful test framework with many features, yet its tests are concise and readable. It is very popular and actively supported for both versions of Python. It can handle testing at the unit, integration, and end-to-end levels. It can also be extended with plugins, notably ones for code coverage, parallel execution, and BDD. The only challenge with pytest is that advanced features (namely fixtures) have a learning curve.

My recommendation is to use pytest for standard functional testing in Python. It is one of the best and most popular test frameworks available, and it beats the pants off of alternatives like unittest and nosepytest is my go-to Python framework, period.

This article is meant to be an introduction. Check out Python Testing with pytest by Brian Okken for deeper study.

 

Update: On 4/21/2018, I added pytest-xdist and pytest-bdd plugins, and I made some cosmetic changes.

Update: On 7/29/2018, I added the book recommendation for “Python Testing with pytest.”

Python Testing 101: doctest

Overview

doctest is a rather unique Python test framework: it turns documented Python statements into test cases. Doctests may be written in two places:

  1. Directly in the docstrings of the module under test
  2. In separate text files (potentially for better organization)

The doctest module searches for examples (lines starting with “>>>”) and runs them as if they were interactive sessions entered into a Python shell. The subsequent lines, until the next “>>>” or a blank line, contain the expected output. A doctest will fail if the actual output does not match the expected output verbatim (e.g., string equality).

The doctest module is included out-of-the-box with Python 2 and 3. Like unittest, it can generate XML reports using unittest-xml-reporting.

Installation

doctest does not need any special installation because it comes with Python. However, the unittest-xml-reporting module may be installed with pip if needed:

> pip install unittest-xml-reporting

Project Structure

When doctests are embedded into docstrings, no structural differences are needed. However, if doctests are written as separate text files, then text files should be put under a directory named “doctests”. It may be prudent to create subdirectories that align with the Python package names. Doctest text files should be named after the modules they cover.

[project root directory]
|‐‐ [product code packages]
`-- doctests (?)
    `-- test_*.txt (?)

Example Code

An example project named example-py-doctest is located in my GitHub python-testing-101 repository. The project has the following structure:

example-py-doctest
|-- com.automationpanda.example
|   |-- __init__.py
|   |-- calc_class.py
|   `-- calc_func.py
|-- doctests
|   `-- test_calc_class.txt
`-- README.md

The com.automationpanda.example.calc_func module contains doctests embedded in the docstrings of math functions, alongside other comments:


def add(a, b):
"""
Adds two numbers.
>>> add(3, 2)
5
"""
return a + b
def subtract(a, b):
"""
Subtracts two numbers.
>>> subtract(3, 2)
1
>>> subtract(2, 3)
-1
"""
return a – b
def multiply(a, b):
"""
Multiplies two numbers.
>>> multiply(3, 2)
6
"""
return a * b
def divide(a, b):
"""
Divides two numbers.
Automatically raises ZeroDivisionError.
>>> divide(3.0, 2.0)
1.5
>>> divide(1.0, 0)
Traceback (most recent call last):
ZeroDivisionError: float division by zero
"""
return a * 1.0 / b
def maximum(a, b):
"""
Finds the maximum of two numbers.
>>> maximum(3, 2)
3
>>> maximum(2, 3)
3
>>> maximum(3, 3)
3
"""
return a if a >= b else b
def minimum(a, b):
"""
Finds the minimum of two numbers.
>>> minimum(3, 2)
2
>>> minimum(2, 3)
2
>>> minimum(2, 2)
2
"""
return a if a <= b else b

view raw

calc_func.py

hosted with ❤ by GitHub

On the other hand, the com.automationpanda.example.calc_class module contains a Calculator class without doctests in docstrings:


class Calculator(object):
def __init__(self):
self._last_answer = 0.0
@property
def last_answer(self):
return self._last_answer
def add(self, a, b):
self._last_answer = a + b
return self.last_answer
def subtract(self, a, b):
self._last_answer = a – b
return self.last_answer
def multiply(self, a, b):
self._last_answer = a * b
return self.last_answer
def divide(self, a, b):
# automatically raises ZeroDivisionError
self._last_answer = a * 1.0 / b
return self.last_answer
def maximum(self, a, b):
self._last_answer = a if a >= b else b
return self.last_answer
def minimum(self, a, b):
self._last_answer = a if a <= b else b
return self.last_answer

view raw

calc_class.py

hosted with ❤ by GitHub

Its doctests are located in a separate text file at doctests/test_calc_class.txt:


The “com.automationpanda.example.calc_class“ module
=====================================================
>>> from com.automationpanda.example.calc_class import Calculator
>>> calc = Calculator()
>>> calc.add(3, 2)
5
>>> calc.subtract(3, 2)
1
>>> calc.subtract(2, 3)
-1
>>> calc.multiply(3, 2)
6
>>> calc.divide(3.0, 2.0)
1.5
>>> calc.divide(1.0, 0)
Traceback (most recent call last):
ZeroDivisionError: float division by zero
>>> calc.maximum(3, 2)
3
>>> calc.maximum(2, 3)
3
>>> calc.maximum(3, 3)
3
>>> calc.minimum(3, 2)
2
>>> calc.minimum(2, 3)
2
>>> calc.minimum(2, 2)
2

Doctests are run in the order in which they are written. The examples above align functions with docstrings and classes with text files, but this is not required. Functions may have doctests in separate text files, and classes may have doctests embedded in method docstrings. Additional tricks are documented online.

Test Launch

To launch tests from the command line, change directory to the project root directory and run the doctest module directly from the python command. Note that doctests use file paths, not module names.

# Run doctests embedded as docstrings
> python -m doctest com/automationpanda/example/calc_func.py

# Run doctests written in separate text files
> python -m doctest doctests/test_calc_class.txt

When doctests run successfully, they don’t print any output! This may be surprising to a first-time user, but no news is good news. However, to force output, include the “-v” option:

# Run doctests with verbose output to print successes as well as failures
> python -m doctest -v com/automationpanda/example/calc_func.py
> python -m doctest -v doctests/test_calc_class.txt

Output should look something like this:

> python -m doctest -v doctests/test_calc_class.txt 
Trying:
    from com.automationpanda.example.calc_class import Calculator
Expecting nothing
ok
Trying:
    calc = Calculator()
Expecting nothing
ok
Trying:
    calc.add(3, 2)
Expecting:
    5
ok
...

1 items passed all tests:
  14 tests in test_calc_class.txt
14 tests in 1 items.
14 passed and 0 failed.
Test passed.

Doctests can also generate XML reports using unittest-xml-reporting. Follow the same instructions given for unittest. Furthermore, doctests can integrate with unittest discovery, so that test suites can run together.

Pros and Cons

doctest has many positive aspects. It is very simple yet powerful, and it has practically no learning curve. Since the doctest module comes with Python out of the box, no extra dependencies are required. It integrates nicely with unittest. Tests can be written in-line with the code, providing not only verification tests but also examples for the reader. And if in-line tests are deemed too messy, they can be moved to separate text files.

However, doctest has limitations. First of all, doctests are not independent: Python commands run sequentially and build upon each other. Thus, doctests may not be run individually, and side effects from one example may affect another. doctest also lacks many features of advanced frameworks, including hooks, assertions, tracing, discovery, replay, and advanced reporting. Theoretically, many of these things could be put into doctests, but they would be inelegantly jury-rigged. Long doctests become cumbersome. Furthermore, console output string-matching is not a robust assertion method. Silent Python statements that do not return a value or print output cannot be legitimately tested. Programmers can easily mistype expected output. Output format might also change in future development, or it may be nondeterministic (like for timestamps).

My main recommendation is this: use doctest for small needs but not big needs. doctest would be a good option for small tools and scripts that need spot checks instead of intense testing. It is also well suited for functional programming testing, in which expressions do not have side effects. Doctests should also be used to provide standard examples in docstrings wherever possible, in conjunction with other tests. Rich documentation is wonderful, and working examples can be a godsend. However, serious testing needs a serious framework, such as pytest or behave.

Python Testing 101: unittest

Overview

unittest is the standard Python unit testing framework. Inspired by JUnit, it is included with the standard CPython distribution. unittest provides a base class named TestCase, which provides methods for assertions and setup/cleanup routines. All test case classes must inherit from TestCase. Each method in a TestCase subclass whose name starts with “test” will be run as a test case. Tests can be grouped and loaded using the TestSuite class and load methods, which together can build custom test runners. unittest can also generate XML reports (like JUnit) using unittest-xml-reporting.

unittest is supported in both Python 2 and 3. However, use the unittest2 backport for versions earlier than Python 2.7.

Installation

Basic unittest does not need any special installation because it comes with Python. However, additional modules may be installed with pip if you need them:

> pip install unittest2
> pip install unittest-xml-reporting

Project Structure

Product code modules and unittest test code modules should be placed into separate Python packages within the same project. Test modules must be named “test_*.py” and must be put into packages in order for discovery to work when launching tests. Remember, a Python package is simply a directory with a file named “__init__.py“.

[project root directory]
|‐‐ [product code packages]
`‐‐ tests
    |‐‐ __init__.py
    `‐‐ test_*.py

Example Code

An example project named example-py-unittest is located in my GitHub python-testing-101 repository. The project has the following structure:

example-py-unittest
|-- com.automationpanda.example
|   |-- __init__.py
|   `-- calc.py
|-- com.automationpanda.tests
|   |-- __init__.py
|   `-- test_calc.py
`-- README.md

The com.automationpanda.example.calc module contains a Calculator class with basic math methods:


class Calculator(object):
def __init__(self):
self._last_answer = 0.0
@property
def last_answer(self):
return self._last_answer
def add(self, a, b):
self._last_answer = a + b
return self.last_answer
def subtract(self, a, b):
self._last_answer = a – b
return self.last_answer
def multiply(self, a, b):
self._last_answer = a * b
return self.last_answer
def divide(self, a, b):
# automatically raises ZeroDivisionError
self._last_answer = a * 1.0 / b
return self.last_answer

view raw

calc.py

hosted with ❤ by GitHub

The com.automationpanda.tests.test_calc module contains a unittest.TestCase subclass, shown below. The test class uses the setUp method to construct a Calculator object, which each test method uses. The assertion methods used are assertEqual and assertRaises. A fresh instance of CalculatorTest is instantiated for every test method run.


from com.automationpanda.example.calc import Calculator
NUMBER_1 = 3.0
NUMBER_2 = 2.0
FAILURE = 'incorrect value'
class CalculatorTest(unittest.TestCase):
def setUp(self):
self.calc = Calculator()
def test_last_answer_init(self):
value = self.calc.last_answer
self.assertEqual(value, 0.0, FAILURE)
def test_add(self):
value = self.calc.add(NUMBER_1, NUMBER_2)
self.assertEqual(value, 5.0, FAILURE)
self.assertEqual(value, self.calc.last_answer, FAILURE)
def test_subtract(self):
value = self.calc.subtract(NUMBER_1, NUMBER_2)
self.assertEqual(value, 1.0, FAILURE)
self.assertEqual(value, self.calc.last_answer, FAILURE)
def test_subtract_negative(self):
value = self.calc.subtract(NUMBER_2, NUMBER_1)
self.assertEqual(value, -1.0, FAILURE)
self.assertEqual(value, self.calc.last_answer, FAILURE)
def test_multiply(self):
value = self.calc.multiply(NUMBER_1, NUMBER_2)
self.assertEqual(value, 6.0, FAILURE)
self.assertEqual(value, self.calc.last_answer, FAILURE)
def test_divide(self):
value = self.calc.divide(NUMBER_1, NUMBER_2)
self.assertEqual(value, 1.5, FAILURE)
self.assertEqual(value, self.calc.last_answer, FAILURE)
def test_divide_by_zero(self):
self.assertRaises(ZeroDivisionError, self.calc.divide, NUMBER_1, 0)

view raw

test_calc.py

hosted with ❤ by GitHub

Test Launch

To launch tests from the command line, change directory to the project root directory and run the unittest module directly from the python command:

# Discover and run all tests in the project
> python -m unittest discover

# Run all tests in the given module
> python -m unittest com.automationpanda.tests.test_calc

# Run all tests in the given test class
> python -m unittest com.automationpanda.tests.test_calc.CalculatorTest

# Run all tests in the given Python file (useful for path completion)
> python -m unittest com/automationpanda/tests/test_calc.py

Test output should look like this:

> python -m unittest discover
.............
----------------------------------------------------------------------
Ran 13 tests in 0.002s

OK

In order to generate XML reports, install unittest-xml-reporting and add the following “main” logic to the bottom of the test case module. The example below will generate the XML report into a directory named “test-reports”.


if __name__ == '__main__':
import xmlrunner
unittest.main(
testRunner=xmlrunner.XMLTestRunner(output='test-reports'),
failfast=False,
buffer=False,
catchbreak=False)

view raw

test_calc.py

hosted with ❤ by GitHub

Then, run the test module directly from the command line:

# Run the test module directly
# Do this whenever "main" logic is written to run a test
# Examples: XML results file, custom test suites
> python -m com.automationpanda.tests.test_calc

Pros and Cons

unittest is “Old Reliable”. It is included out-of-the-box with Python, and it provides a basic, universal test class. Many other test frameworks are compatible with unittest. However, unittest is somewhat clunky: it forces class inheritance instead of allowing functions as test cases. The OOP style feels less Pythonic. Tests cannot be parameterized, either.

My recommendation is to use unittest if you need a basic unit test framework with no additional dependencies. Otherwise, there are better test frameworks available, such as pytest.