testing

Who Should Lead BDD?

Behavior-driven development offers great benefits: better communication, easier test automation, and higher code quality. There are many ways for a team to start doing BDD, and naturally, someone needs to stand up and lead the effort. In my experience, adopting BDD is its own process. An evangelist converts team leaders, training sessions are given, and Gherkinized acceptance criteria start being automated. However, not everyone will embrace the changes, especially those across different role types. And big changes take time. Rome wasn’t built in a day, and neither will be a mature, effective BDD process.

This post covers three possible ways to lead BDD adoption, each from one of the Three Amigos roles. Any role can lead the charge, but each will have its unique struggles. These possibilities are advisory but not necessarily prescriptive. If you want to move your team into BDD, use these three approaches as guidelines for crafting a plan that best meets your needs. And, of course, the advice in Winning Support for BDD pertains to all approaches. Furthermore, as you read these approaches, put yourselves in the shoes of roles other than your own, so you can better understand the struggles each role faces.

Note: The approaches below presume that the underlying software development process is Agile Scrum. Nevertheless, they may be tweaked and applied to other processes, like Waterfall or Kanban.

The Starting Point

The starting point for all three approaches below is a “traditional” Agile sprint – one that is not (yet) behavior-driven. Product owners write user stories, developers implement the solutions, and testers test the deliverables. The diagram below shows the the main flow of sprint work in this type of sprint, and it will serve as the basis for illustrating BDD adoption:

The overall flow of a “traditional,” non-behavior-driven Agile sprint. Ceremonies like planning, review, and retrospective should still happen, but the are left out of this diagram to put emphasis on parts affected by BDD.

QA-Led BDD

Circumstances

The most common approach I’ve seen is QA-led BDD adoption, because testers arguably have the most to gain. It is most applicable when the Three Amigos roles (biz, dev, and test) are well-defined and separate. The impetus for QA to lead BDD adoption could be that developers deliver code too late to adequately test and automate within a sprint, or it could be that the QA team is struggling to scale their test automation development. There may also be resistance to BDD from biz and dev roles.

Steps

The sensible path for QA is to start all the way to the right and progressively shift left. This means that the starting point would be test automation. Start by building a solid automation code base. Pick a well-supported BDD framework like Cucumber, SpecFlow, or behave, and start adding scenarios and step definitions. Select scenarios for core product features rather than the latest sprint stories, so that the code base will be populated with the most basic, useful steps. Once the automation code reaches a “critical mass” for step reusability, QA can then proactively classify new test scenarios as automated or manual. Automated tests become easier and easier to write, giving QA more time to be exploratory with manual testing. Ideally, all manual testing would become exploratory.

Then, it’s time to start shifting left. At this point, all Gherkin steps would be in the automation code only, so set up a tool like Pickles to expose the steps to all team members as living documentation. QA should then schedule Three Amigos meetings with biz and dev to proactively discuss user story expectations. In those meetings, QA should start demonstrating how to write acceptance criteria in Gherkin, which then expedites testing. A big win would be if a QA engineer could write a new scenario using only pre-existing, pre-automated steps and then run it successfully on the spot.

Once biz and dev folks are convinced of BDD’s benefits, encourage them to participate in writing Gherkin. When they get comfortable, encourage product owners to write acceptance criteria in Gherkin when they write user stories, and hold Three Amigos meetings before sprint planning as part of grooming. Convince them that for them to help write Gherkin scenarios is a process efficiency for the whole team.

This slideshow requires JavaScript.

Struggles

Shifting left is never easy, especially when team members are hardened into their roles. That’s why QA must write both really good test automation and really good Gherkin scenarios. Success should speak for itself once QA delivers good automation fast. Furthermore, QA must be clear that BDD is not merely a test tool, it’s a process that requires a paradigm shift. Otherwise, BDD could be easily pigeonholed to be a “QA thing.”

Dev-Led BDD

Circumstances

There are a few reasons that could push developers to lead BDD. On some Agile teams, there’s no distinction between dev and QA roles: all team members are software engineers responsible for both developing and testing the software. Or, developers may not be satisfied with the testing effort. Maybe too many bugs are escaping the sprint, or maybe automation isn’t getting done in time. Or, perhaps the product owner is not happy with the deliverables and putting pressure on the team to do better. Whatever the circumstance, developers are more than capable of winning with BDD.

Steps

The best way for developers to start is to set up Three Amigos meetings, to stop the game of telephone between biz and test. In those meetings, start translating acceptance criteria into Gherkin. Then, start helping out with test automation – that may mean anything from offering advice to QA to building the framework from scratch. Then, start pushing left and right to get biz and test on board with BDD.

This slideshow requires JavaScript.

Struggles

It may be difficult for developers to work on test automation because they may lack either the expertise or the time to devote to good test automation. Automation is a specialized discipline, and it takes time and diligence to build up expertise to do it right. I’ve seen very skilled developers haughtily build very shabby automation frameworks.

Developers must also be careful to not be too technical, or else biz and test roles may reject BDD for being too complicated or beyond their abilities. Furthermore, some teams may be resistant to developing test automation. For example, automation work may be “starved” for points because it is underestimated or similarly starved for time because it is deemed lower in priority to other work.

Biz-Led BDD

Circumstances

BDD is designed to bring technical and business roles together into healthier collaboration, and biz folks can certainly lead BDD adoption as successfully as more technical folks can. Major reasons for biz to take the lead could be if development is perpetually running behind schedule, if deliverables don’t meet the original requirements, or if software bugs are rampant.

Steps

For biz roles, “shift left” could be better called “pull left.” Start by writing solid user stories and Gherkin acceptance criteria. Focus on good Gherkin that is readable and reusable. Then, introduce BDD as a refinement to the Agile process, highlighting its benefits. Initiate Three Amigos meetings to make sure that you are communicating the right things to dev and test. Once collaboration is going well, suggest BDD automation as a way to expedite dev and test work. If acceptance criteria are all Gherkinized, then developing BDD automation would be a natural extension.

This slideshow requires JavaScript.

Struggles

In my experience, biz roles (specifically product owners) tend to be the most hesitant about BDD. They often see writing Gherkin as a burdensome requirement rather than a way to help their team. Or, they may fear that BDD is “too technical” for them. It may also be difficult for them to pitch BDD automation to the team. To be successful, biz roles need to step outside their comfort zone to win supporters from dev and test.

Process paradigm shifts can be hard, especially on teams that are already overwhelmed with work. Some people just don’t like change. Process and automation change can also be a big challenge if QA is outsourced (which is common).

Side-By-Side Comparison

Here’s the TL;DR:

Role	Circumstances	Steps	Struggles
QA	Code is delivered too late to test and automate Automation development is not scaling	Build a solid BDD automation framework Demonstrate automation success Set up Three Amigos meetings during the sprint Start writing Gherkin scenarios with biz and dev as part of grooming	Showing that BDD is a whole development process, not just a QA thing Getting the team to truly shift left
Dev	No separation of dev and QA roles Too many bugs are escaping the sprint Pressure from biz to do better	Initiate better collaboration through Three Amigos and Gherkin Push right by helping QA with testing and automation Push left by helping biz write better acceptance criteria	Humbly learning good automation practices Dedicating time for automation and more meetings
Biz	Missed deadlines Deliverables not matching expectations Too many bugs	Write acceptance criteria in good Gherkin Set up Three Amigos meetings to review Gherkin Pitch BDD automation	Learning semi-technical things Pushing all the way to automation

Conclusion

These are just three general approaches intended to show how BDD is for everyone. If you have other approaches, please describe them in the comment section below! Whatever the approach, make sure to demonstrate that BDD helps everyone, or else people may feel forced into corners and reject BDD for bad reasons. And remember, software quality is not just QA’s responsibility; it is everyone’s responsibility.

Winning Support for BDD

Adopting behavior-driven development practices can greatly improve software quality and productivity, but like any big change, it will have opponents along with supporters. I’ve met resistance from all roles: testers, developers, product owners, and managers. And some people can be stubborn. As with any proposal, the best way to win support is not just to tell the benefits but to demonstrate them. Below are five major ways to demonstrate the benefits of BDD.

Make it a Refinement, not an Overhaul

I remember talking with a scrum master one time about challenges his team faced with testing and automation. The user stories his team wrote were a mess: they may or may not have had acceptance criteria, and the product owner would often ask for features to be scrapped or redone after a sprint or two. The team basically gave up on automated testing due to feature flux. Naturally, I proposed BDD to him, suggesting that it could help drive better features through formalization. However, this scrum master balked at the idea: “My team is stretched so thin right now, there’s no way we can overhaul our process right now.”

Clearly, the team had a serious problem, but they weren’t willing to try any solution deemed too “big.” The scrum master’s perception was that BDD would be a disruptive change that would hurt them more than help them. In cases like this, it is best to present BDD as a refinement of Agile, and not an overhaul of it. Agile says user stories should have acceptance criteria; BDD says acceptance criteria should be formalized. Agile says that the definition of done should include test automation; BDD says automation is a natural extension of the acceptance criteria. There’s nothing in Agile that BDD undoes, and there are shortcomings in Agile that BDD solves.

Write Good Gherkin

There is a big difference between Gherkin and good Gherkin. Anyone can add BDD buzzwords to existing test procedures, but effective BDD needs a paradigm shift. Unfortunately, bad Gherkin can ruin many of the benefits BDD can bring. For example, imperative steps will frustrate product owners, and mixed point-of-view will confuse testers. Nothing will ever be truly perfect, but it is important to strive for good Gherkin from the start, especially when the first behavior scenarios will often be used as examples for future scenarios.

Start the Automation Snowball

BDD and automation go together like peas and carrots. Not only can test automation shift left (since Gherkin scenarios are both acceptance criteria and tests), but steps can be implemented once and reused by any scenarios. When the first BDD scenarios are written, obviously all steps are new steps. As sprints pass, though, many common steps will likely be reused. I’ve even written new scenarios without adding any new steps!

Test automation is often the last thing to be done for a story, if it’s even reached at all. The inherent step reusability helps BDD automation get done sooner. It may take a while to build up useful, reusable steps in the code base, but they will cause an “automation snowball” once they are there. Imagine telling your team that the test automation is already done once a scenario is written in Gherkin!

Take Baby Steps

Rome wasn’t built in a day, and neither will a mature BDD process be. People take time to adjust to new paradigms. Start out slow, and do it right. Train the team how to write good Gherkin. Try a few stories one sprint, rather than taking on the whole backlog. For a product-owner-led approach, start with Gherkinizing acceptance criteria for a sprint or two before attempting any automation. Alternatively, for a test-led approach, work on the automation framework first, and then start to shift the scenario writing left to the developers and then to the product owners once the snowball gets bigger.

It’s okay if things aren’t perfect at first. Learn the lessons and iterate for improvement. Take baby steps!

Highlight how Everyone Wins

BDD is truly a win/win for everyone. It’s not a way to shuffle responsibilities or push around busywork, it’s a way to make a team more interdependent upon each other. Each role in the Three Amigos is empowered to do the right things, with support from each other in lock-step. Consider how BDD process changes help each role work together better:

Role	New Responsibility	Interdependent Benefit
Product Owner	Learn to express requirements in a more formalized, slightly techy way	Better assurance that features will be what they actually want, be working correctly, and be protected against future regressions
Developer	Contribute more to grooming and test planning	Less likely to develop the wrong thing or to be “held up” by testing
Tester	Build and learn a new automation framework	Automation will snowball, allowing them to meet sprint commitments and focus extra time on exploratory testing
Everyone	Another meeting or two	Better communication and fewer problems

Nobody on an Agile team can rightly say, “BDD isn’t useful to me.” Software quality is everyone’s responsibility, and BDD is a great way to improve it.

Can Performance Tests be Unit Tests?

A friend recently asked me this question (albeit with some rephrasing):

Can a unit test be a performance test? For example, can a unit test wait for an action to complete and validate that the time it took is below a preset threshold?

I cringed when I heard this question, not only because it is poor practice, but also because it reflects common misunderstandings about types of testing.

QA Buzzword Bingo

The root of this misunderstanding is the lack of standard definitions for types of tests. Every company where I’ve worked has defined test types differently. Individuals often play fast and loose with buzzword bingo, especially when new hires from other places used different buzzwords. Here are examples of some of those buzzwords:

Unit testing
Integration testing
End-to-end testing
Functional testing
System testing
Performance testing
Regression testing
Test-’til-it-breaks
Measurements / benchmarks / metrics
Continuous integration testing

And here are some games of buzzword bingo gone wrong:

Trying to separate “systemic” tests from “system” tests.
Claiming that “unit” tests should interact with a live web page.
Separating “regression” tests from other test types.

Before any meaningful discussions about testing can happen, everyone must agree to a common and explicit set of testing type definitions. For example, this could be a glossary on a team wiki page. Whenever I have discussions with others on this topic, I always seek to establish definitions first.

What defines a unit test?

Here is my definition:

A unit test is a functional, white box test that verifies the correctness of a single unit of software code. It is functional in that it gives a deterministic pass-or-fail result. It is white box in that the test code directly calls the product source code under test. The unit is typically a function or method, and there should be separate unit tests for each equivalence class of inputs.

Unit tests should focus on one thing, and they are typically short – both in lines of code and in execution time. Unit tests become extremely useful when they are automated. Every major programming language has unit test frameworks. Some popular examples include JUnit, xUnit.net, and pytest. These frameworks often integrate with code coverage, too.

In continuous integration, automated unit tests can be run automatically every time a new build is made to indicate if the build is good or bad. That’s why unit tests must be deterministic – they must yield consistent results in order to trust build status and expedite failure triage. For example, if a build was green at 10am but turned red at 11am, then, so long as the tests were deterministic, it is reasonable to deduce that a defective change was committed to the code line between 10-11am. Good build status indicates that the build is okay to deploy to a test environment and then hopefully to production.

(As a side note, I’ve heard arguments that unit tests can be black box, but I disagree. Even if a black box test covers only one “unit”, it is still at least an integration test because it covers the connection between the actual product and some caller (script, web browser, etc.).)

What defines a performance test?

Again, here’s my definition:

A performance test is a test that measures aspects of a controlled system. It is white box if it tests code directly, such as profiling individual functions or methods. It is black box if it tests a real, live, deployed product. Typically, when people talk about testing software performance, they mean black box style testing. The aspects to measure must be pre-determined, and the system under test must be controlled in order to achieve consistent measurements.

Performance tests are not functional tests:

Functional tests answer if a thing works.
Performance tests answer how efficiently a thing works.

Rather than yield pass-or-fail results, performance tests yield measurements. These measurements could track things as general as CPU or memory usage, or they could track specific product features like response times. Once measurements are gathered, data analysis should evaluate the goodness of the measurements. This often means comparison to other measurements, which could be from older releases or with other environment controls.

Performance testing is challenging to set up and measure properly. While unit tests will run the same in any environment, performance tests are inherently sensitive to the environment. For example, an enterprise cloud server will likely have better response time than a 7-year-old Macbook.

Why should performance tests not be unit tests?

Returning to the original question, it is theoretically possible to frame a performance test as a functional test by validating a specific measurement against a preset threshold. However, there are 3 main reasons why a unit test should not be a performance test:

Performance checks in unit tests make the build process more vulnerable to environmental issues. Bad measurements from environment issues could cause unit tests to fail for reasons unrelated to code correctness. Any unit test failure will block a build, trigger triage, and stall progress. This means time and money. The build process must not be interrupted by environment problems.
Proper performance tests require lots of setup beyond basic unit test support. Unit tests should be short and sweet, and unit testing frameworks don’t have the tools needed to take good measurements. Unit test environments are often not set up in tightly controlled environments, either. It would take a lot of work to properly put performance checks into a unit test.
Performance tests yield metrics that should not be shoehorned into a binary pass/fail status. Performance data is complex and rich with information. Teams should analyze performance data, especially over time. It can also be volatile.

These points are based on the explicit definitions provided above. Note that I am not saying that performance testing should not be done, but rather that performances checks should not be part of unit testing. Unit testing and performance testing should be categorically separate types of testing.

10 Gotchas for Automation Code Reviews

Lately, I’ve been doing lots of code reviews. I probably spend about an hour every work day handling reviews for my team, both as a reviewer and an author. All of the reviews exclusively cover end-to-end test automation: new tests, old fixes, config changes, and framework updates. I adamantly believe that test automation code should undergo the same scrutiny of review as the product code it tests, because test automation is a product. Thus, all of the same best practices (like the guides here and here) should be applied. Furthermore, I also look for problems that, anecdotally, seem to appear more frequently in test automation than in other software domains. Below is a countdown of my “Top 10 Gotchas”. They are the big things I emphasize in test automation code reviews, in addition to the standard review checklist items.

#10: No Proof of Success

“Trust, but verify,” as Ronald Reagan would say. Tests need to run successfully in order to pass review, and proof of success (such as a log or a screen shot) must be attached to the review. In the best case, this means something green (or perhaps blue for Jenkins). However, if the product under test is not ready or has a bug, this could also mean a successful failure with proof that the critical new sections of the code were exercised. Tests should also be run in the appropriate environments, to avoid the “it-ran-fine-on-my-machine” excuse later.

#9: Typos and Bad Formatting

My previous post, Should I Reject a Code Review for Typos?, belabored this point. Typos and bad formatting reflect carelessness, cause frustration, and damage reputation. They are especially bad for Behavior-Driven Development frameworks.

#8: Hard-Coded Values

Hard-coded values often indicate hasty development. Sometimes, they aren’t a big problem, but they can cripple an automation code base’s flexibility. I always ask the following questions when I see a hard-coded value:

Should this be a shared constant?
Should this be a parameterized value for the method/function/step using it?
Should this be passed into the test as an external input (such as from a config file or the command line)?

#7: Incorrect Test Coverage

It is surprisingly common to see an automated test that doesn’t actually cover the intended test steps. A step from the test procedure may be missing, or an assertion may yield a false positive. Sometimes, assertions may not even be performed! When reviewing tests, keep the original test procedure handy, and watch out for missing coverage.

#6: Inadequate Documentation

Documentation is vital for good testing and good maintenance. When a test fails, the doc it provides (both in the logs it prints and in its very own code) significantly assist triage. Automated test cases should read like test procedures. This is one reason why self-documenting behavior-driven test frameworks are so popular. Even without BDD, test automation should be flush with comments and self-documenting identifiers. If I cannot understand a test by skimming its code in a code review, then I ask questions, and when the author provides answers, I ask them to add their answers as comments to the code.

#5: Poor Code Placement

Automation projects tend to grow fast. Along with new tests, new shared code like page objects and data models are added all the time. Maintaining a good, organized structure is necessary for project scalability and teamwork. Test cases should be organized by feature area. Common code should be abstracted from test cases and put into shared libraries. Framework-level code for things like inputs and logging should be separated from test-level code. If code is put in the wrong place, it could be difficult to find or reuse. It could also create a dependency nightmare. For example, non-web tests should not have a dependency on Selenium WebDriver. Make sure new code is put in the right place.

#4: Bad Config Changes

Even the most seemingly innocuous configuration tweak can have huge impacts:

A username change can cause tests to abort setup.
A bad URL can direct a test to the wrong site.
Committing local config files to version control can cause other teammates’ local projects to fail to build.
Changing test input values may invalidate test runs.
One time, I brought down a whole continuous integration pipeline by removing one dependency.

As a general rule, submit any config changes in a separate code review from other changes, and provide a thorough explanation to the reviewers for why the change is needed. Any time I see unusual config changes, I always call them out.

#3: Framework Hacks

A framework is meant to help engineers automate tests. However, sometimes the framework may also be a hindrance. Rather than improve the framework design, many engineers will try to hack around the framework. Sometimes, the framework may already provide the desired feature! I’ve seen this very commonly with dependency injection – people just don’t know how to use it. Hacks should be avoided because test automation projects need a strong overall design strategy.

#2: Brittleness

Test automation must be robust enough to handle bumps in the road. However, test logic is not always written to handle slightly unexpected cases. Here are a few examples of brittleness to watch out for in review:

Do test cases have adequate cleanup routines, even when they crash?
Are all exceptions handled properly, even unexpected ones?
Is Selenium WebDriver always disposed?
Will SSH connections be automatically reconnected if dropped?
Are XPaths too loose or too strict?
Is a REST API response code of 201 just as good as 200?

#1: Duplication

Duplication is the #1 problem for test automation. I wrote a whole article about it: Why is Automation Full of Duplicate Code? Many testing operations are inherently repetitive. Engineers sometimes just copy-paste code blocks, rather than seek existing methods or add new helpers, to save development time. Plus, it can be difficult to find reusable parts that meet immediate needs in a large code base. Nevertheless, good code reviews should catch code redundancy and suggest better solutions.

Please let me know in the comments section if there are any other specific things you look for when reviewing test automation code!

Gotta Catch ’em All!

How can lessons from Pokémon apply to software testing and automation?

It’s no secret that I’m a lifelong Nintendo fanboy, and one of my favorite game franchises is Pokémon. Since Christmas, I have been playing the latest installment in the series, Pokémon Moon. The basic gameplay in all Pokémon games is to capture “pocket monsters” in the wild and train them for competitive battles. The main quest is to become the Pokémon League Champion by defeating the strongest trainers in the land. However, as any child of the ’90s will recall, the other major goal of the game is to catch all species of Pokémon. With 300 unique species in the latest installment, that’s no small feat. I can proudly say that I caught ’em all in Pokémon Moon.

It may seem strange to talk about video games on a professional blog, but I see five major parallels between catching Pokémon and my career in software quality and automation.

#1: As QA, We Gotta Catch ’em All

It is the quality engineer’s job to find and resolve all software problems: bugs, defects, design flaws, bad code check-ins, test failures, environment instabilities, deployment hiccups, and even automation crashes. We get paid to make sure things are good. And if we don’t catch a problem, then we haven’t done our jobs right.

#2: Coverage is Key to Success

One of the reasons to catch new Pokémon is not just to make the game’s Pokémon Professor happy for “scientific research,” but moreover to find stronger monsters to assist with the quest. Likewise, more test coverage means more problems discovered, which assists our quest to guarantee product quality. Our goal should be to achieve as close to complete test coverage as reasonably possible. When necessary, we should use risk-based approaches to smartly minimize test gaps as well. And our tests should be strong enough to legitimately exercise the features under test.

#3: Automating Tests Takes Time

As my trainer passport shows, it took me over 100 hours of gameplay to catch all 300 Pokémon. That’s a serious time investment. Test automation is the same way: it takes time to automate tests properly. It requires a robust, scalable, and extendable framework upon which to build test cases. Test automation is a software product in its own right that requires the same best practices and discipline as the product it tests. Software teams must allocate resources to its development and maintenance. It’s not as simple as just “writing test scripts.” However, when done right, the investment pays off.

#4: Not All Tests are Equal

Test metrics like “X passed and Y failed” or “N% of tests automated” can be very misleading because they do not account for differences between tests. Some tests have more coverage than others. Some require more time to run. Some require more time to automate. For example, 100 tests for feature A may take a day to automate and run in 3 minutes, while 5 tests for feature B may take a full week to automate and run in 3 hours. Yet, feature A may still be more important. All Pokémon are likewise not created equal. Some are simply given to you (like Rowlet, Litten, and Popplio), while others take hours of searching to find (like Castform) or are simply too tough to capture without a hard fight (like the Ultra Beasts). Be mindful of test differences for planning, execution, and reporting.

#5: Never Leave Work Incomplete

As a completionist, I would not consider my Pokémon adventure complete without a full Pokédex. There is great satisfaction in accomplishing the full measure of a goal. The same thing goes for testing and automation: my job is not done until all tests are automated and all lines are green. At times, it may be easy to give up because that “Ultra Beast” just won’t cooperate, but it’s the job to catch it. Always complete the ‘dex; always complete the job.

Pokédex Proof

Here’s the proof that I caught ’em all:

The “Pokédex” is a device that indexes all of the Pokémon captured by a trainer. Mine is 100% complete!

Here’s the stamp in my trainer passport to prove it!

If you see any more parallels between Pokémon and QA, please add them to the comments section below!

Python Testing 101: pytest

Overview

pytest is an awesome Python test framework. According to its homepage:

pytest is a mature full-featured Python testing tool that helps you write better programs.

Pytests may be written either as functions or as methods in classes – unlike unittest, which forces tests to be inside classes. Test classes must be named “Test*”, and test functions/methods must be named “test_*”. Test classes also need not inherit from unittest.TestCase or any other base class. Thus, pytests tend to be more concise and more Pythonic. pytest can also run unittest and nose tests.

pytest provides many advanced test framework features:

Invocations
- A rich command line with many options
- Test discovery
- JUnit-style XML test reports
- Calling pytest from within Python code
- Skipping tests
Assertions
- Integration with the basic assert statement
- Assertions for expected exceptions, warnings, and deprecations
- Custom assertion comparisons
- Advanced assertion introspection
Parameterized tests
Fixtures for setup/cleanup (builtin, custom, and classic)
Mocking modules and environments
Marking tests with attributes
Plugins and hooks
- pytest-cov for code coverage
- pytest-xdist for parallel execution (scale-up and scale-out)
- pytest-bdd for Gherkin-like Behavior Driven Development

pytest is actively supported for both Python 2 and 3.

Installation

Use pip to install the pytest module. Optionally, install other plugins as well.

pip install pytest
pip install pytest-cov
pip install pytest-xdist
pip install pytest-bdd

Project Structure

The modules containing pytests should be named “test_*.py” or “*_test.py”. While the pytest discovery mechanism can find tests anywhere, pytests must be placed into separate directories from the product code packages. These directories may either be under the project root or under the Python package. However, the pytest directories must not be Python packages themselves, meaning that they should not have “__init__.py” files. (My recommendation is to put all pytests under “[project root]/tests”.) Test configuration may be added to configuration files, which may go by the names “pytest.ini”, “tox.ini”, or “setup.cfg”.

[project root directory]
|‐‐ [product code packages]
|-- [test directories]
|   |-- test_*.py
|   `-- *_test.py
`-- [pytest.ini|tox.ini|setup.cfg]

Example Code

An example project named example-py-pytest is located in my GitHub python-testing-101 repository. The project has the following structure:

example-py-pytest
|-- com.automationpanda.example
|   |-- __init__.py
|   |-- calc_class.py
|   `-- calc_func.py
|-- tests
|   |-- test_calc_class.py
|   `-- test_calc_func.py
|-- README.md
`-- pytest.ini

The pytest.ini file is simply a configuration file stub. Feel free to add contents for local testing needs.

	# Add pytest options here
	[pytest]

view raw

pytest.ini

hosted with ❤ by GitHub

The com.automationpanda.example.calc_func module contains basic math functions.

	def add(a, b):
	return a + b


	def subtract(a, b):
	return a – b


	def multiply(a, b):
	return a * b


	def divide(a, b):
	return a * 1.0 / b


	def maximum(a, b):
	return a if a >= b else b


	def minimum(a, b):
	return a if a <= b else b

view raw

calc_func.py

hosted with ❤ by GitHub

The calc_func tests located in tests/test_calc_func.py are written as functions. Test functions are preferable to test classes when testing functions without side effects.

	import pytest
	from com.automationpanda.example.calc_func import *

	NUMBER_1 = 3.0
	NUMBER_2 = 2.0


	def test_add():
	value = add(NUMBER_1, NUMBER_2)
	assert value == 5.0


	def test_subtract():
	value = subtract(NUMBER_1, NUMBER_2)
	assert value == 1.0


	def test_subtract_negative():
	value = subtract(NUMBER_2, NUMBER_1)
	assert value == -1.0


	def test_multiply():
	value = multiply(NUMBER_1, NUMBER_2)
	assert value == 6.0


	def test_divide():
	value = divide(NUMBER_1, NUMBER_2)
	assert value == 1.5

view raw

test_calc_func.py

hosted with ❤ by GitHub

The divide-by-zero test uses pytest.raises:

	def test_divide_by_zero():
	with pytest.raises(ZeroDivisionError) as e:
	divide(NUMBER_1, 0)
	assert "division by zero" in str(e.value)

view raw

test_calc_func.py

hosted with ❤ by GitHub

And the min/max tests use parameterization:

	@pytest.mark.parametrize("a,b,expected", [
	(NUMBER_1, NUMBER_2, NUMBER_1),
	(NUMBER_2, NUMBER_1, NUMBER_1),
	(NUMBER_1, NUMBER_1, NUMBER_1),
	])
	def test_maximum(a, b, expected):
	assert maximum(a, b) == expected


	@pytest.mark.parametrize("a,b,expected", [
	(NUMBER_1, NUMBER_2, NUMBER_2),
	(NUMBER_2, NUMBER_1, NUMBER_2),
	(NUMBER_2, NUMBER_2, NUMBER_2),
	])
	def test_minimum(a, b, expected):
	assert minimum(a, b) == expected

view raw

test_calc_func.py

hosted with ❤ by GitHub

The com.automationpanda.example.calc_class module contains the Calculator class, which uses the math functions from calc_func. Keeping the functional spirit, the private _do_math method takes in a reference to the math function for greater code reusability.

	from com.automationpanda.example.calc_func import *


	class Calculator(object):
	def __init__(self):
	self._last_answer = 0.0

	@property
	def last_answer(self):
	return self._last_answer

	def _do_math(self, a, b, func):
	self._last_answer = func(a, b)
	return self.last_answer

	def add(self, a, b):
	return self._do_math(a, b, add)

	def subtract(self, a, b):
	return self._do_math(a, b, subtract)

	def multiply(self, a, b):
	return self._do_math(a, b, multiply)

	def divide(self, a, b):
	return self._do_math(a, b, divide)

	def maximum(self, a, b):
	return self._do_math(a, b, maximum)

	def minimum(self, a, b):
	return self._do_math(a, b, minimum)

view raw

calc_class.py

hosted with ❤ by GitHub

While tests for the Calculator class could be written using a test class, pytest test functions are just as capable. Fixtures enable a more fine-tuned setup/cleanup mechanism than the typical xUnit-like methods found in test classes. Fixtures can also be used in conjunction with parameterized methods. The tests/test_calc_class.py module is very similar to tests/test_calc_func.py and shows how to use fixtures for testing a class.

	import pytest
	from com.automationpanda.example.calc_class import Calculator

	# "Constants"

	NUMBER_1 = 3.0
	NUMBER_2 = 2.0


	# Fixtures

	@pytest.fixture
	def calculator():
	return Calculator()


	# Helpers

	def verify_answer(expected, answer, last_answer):
	assert expected == answer
	assert expected == last_answer


	# Test Cases

	def test_last_answer_init(calculator):
	assert calculator.last_answer == 0.0


	def test_add(calculator):
	answer = calculator.add(NUMBER_1, NUMBER_2)
	verify_answer(5.0, answer, calculator.last_answer)


	def test_subtract(calculator):
	answer = calculator.subtract(NUMBER_1, NUMBER_2)
	verify_answer(1.0, answer, calculator.last_answer)


	def test_subtract_negative(calculator):
	answer = calculator.subtract(NUMBER_2, NUMBER_1)
	verify_answer(-1.0, answer, calculator.last_answer)


	def test_multiply(calculator):
	answer = calculator.multiply(NUMBER_1, NUMBER_2)
	verify_answer(6.0, answer, calculator.last_answer)


	def test_divide(calculator):
	answer = calculator.divide(NUMBER_1, NUMBER_2)
	verify_answer(1.5, answer, calculator.last_answer)


	def test_divide_by_zero(calculator):
	with pytest.raises(ZeroDivisionError) as e:
	calculator.divide(NUMBER_1, 0)
	assert "division by zero" in str(e.value)


	@pytest.mark.parametrize("a,b,expected", [
	(NUMBER_1, NUMBER_2, NUMBER_1),
	(NUMBER_2, NUMBER_1, NUMBER_1),
	(NUMBER_1, NUMBER_1, NUMBER_1),
	])
	def test_maximum(calculator, a, b, expected):
	answer = calculator.maximum(a, b)
	verify_answer(expected, answer, calculator.last_answer)


	@pytest.mark.parametrize("a,b,expected", [
	(NUMBER_1, NUMBER_2, NUMBER_2),
	(NUMBER_2, NUMBER_1, NUMBER_2),
	(NUMBER_2, NUMBER_2, NUMBER_2),
	])
	def test_minimum(calculator, a, b, expected):
	answer = calculator.minimum(a, b)
	verify_answer(expected, answer, calculator.last_answer)

view raw

test_calc_class.py

hosted with ❤ by GitHub

Personally, I prefer to write pytests as functions because they are usually cleaner and more flexible than classes. Plus, test functions appeal to my affinity for functional programming.

Test Launch

Basic Test Execution

pytest has a very powerful command line for launching tests. Simply run the pytest module from within the project root directory, and pytest will automatically discover tests.

# Find and run all pytests from the current directory
python -m pytest

# Run pytests under a given path
python -m pytest 

# Run pytests in a specific module
python -m pytest tests/test_calc_func.py

# Generate JUnit-style XML test reports
python -m pytest --junitxml=[path-to-file]

# Get command help
python -m pytest -h

The terminal output looks like this:

python -m pytest
=============================== test session starts ===============================
platform darwin -- Python 2.7.13, pytest-3.0.6, py-1.4.32, pluggy-0.4.0
rootdir: /Users/andylpk247/Programming/automation-panda/python-testing-101/example-py-pytest, inifile: pytest.ini
plugins: cov-2.4.0
collected 25 items

tests/test_calc_class.py .............
tests/test_calc_func.py ............

============================ 25 passed in 0.11 seconds ============================

pytest also provides shorter “pytest” and “py.test” command that may be run instead of the longer “python -m pytest” module form. However, the shorter commands do not append the current path to PYTHONPATH, meaning modules under test may not be importable. Make sure to update PYTHONPATH before using the shorter commands.

# Update the Python path
PYTHONPATH=$PYTHONPATH:.

# Discover and run tests using the shorter command
pytest

Code Coverage

To run code coverage with the pytest-cov plugin module, use the following command. The report types are optional, but all four types are show below. Specific paths for each report may be appended using “:”.

# Run tests with code coverage
python -m pytest [test-path] [other-options] \
      --cov= \
      --cov-report=annotate \
      --cov-report=html \
      --cov-report=term \
      --cov-report=xml

Code coverage output on the terminal (“term” cov-report) looks like this:

python -m pytest --cov=com --cov-report=term
============================= test session starts ==============================
platform darwin -- Python 3.6.5, pytest-3.0.6, py-1.4.32, pluggy-0.4.0
rootdir: /Users/andylpk247/Programming/automation-panda/python-testing-101/example-py-pytest, inifile: pytest.ini
plugins: cov-2.4.0
collected 25 items 

tests/test_calc_class.py .............
tests/test_calc_func.py ............

---------- coverage: platform darwin, python 3.6.5-final-0 -----------
Name                                        Stmts   Miss  Cover
---------------------------------------------------------------
com/__init__.py                                 0      0   100%
com/automationpanda/__init__.py                 0      0   100%
com/automationpanda/example/__init__.py         0      0   100%
com/automationpanda/example/calc_class.py      21      0   100%
com/automationpanda/example/calc_func.py       12      0   100%
---------------------------------------------------------------
TOTAL                                          33      0   100%

========================== 25 passed in 0.11 seconds ===========================<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>

Parallel Testing

Parallel testing is vital for more intense testing, such as web testing. The pytest-xdist plugin makes it possible both to scale-up tests by running more than one test process and to scale-out by running tests on other machines. (As a prerequisite, machines need rsync and SSH.) The command below shows how to run multiple test sub-processes; refer to official documentation for multi-machine setup

python -m pytest -n 4
============================= test session starts ==============================
platform darwin -- Python 3.6.5, pytest-3.0.6, py-1.4.32, pluggy-0.4.0
rootdir: /Users/andylpk247/Programming/automation-panda/python-testing-101/example-py-pytest, inifile: pytest.ini
plugins: xdist-1.22.2, forked-0.2, cov-2.4.0
gw0 [25] / gw1 [25] / gw2 [25] / gw3 [25]
scheduling tests via LoadScheduling
.........................
========================== 25 passed in 1.30 seconds ===========================

Pros and Cons

I’ll say it again: pytest is awesome. It is a powerful test framework with many features, yet its tests are concise and readable. It is very popular and actively supported for both versions of Python. It can handle testing at the unit, integration, and end-to-end levels. It can also be extended with plugins, notably ones for code coverage, parallel execution, and BDD. The only challenge with pytest is that advanced features (namely fixtures) have a learning curve.

My recommendation is to use pytest for standard functional testing in Python. It is one of the best and most popular test frameworks available, and it beats the pants off of alternatives like unittest and nose. pytest is my go-to Python framework, period.

This article is meant to be an introduction. Check out Python Testing with pytest by Brian Okken for deeper study.

Update: On 4/21/2018, I added pytest-xdist and pytest-bdd plugins, and I made some cosmetic changes.

Update: On 7/29/2018, I added the book recommendation for “Python Testing with pytest.”

Python Testing 101: doctest

Overview

doctest is a rather unique Python test framework: it turns documented Python statements into test cases. Doctests may be written in two places:

Directly in the docstrings of the module under test
In separate text files (potentially for better organization)

The doctest module searches for examples (lines starting with “>>>”) and runs them as if they were interactive sessions entered into a Python shell. The subsequent lines, until the next “>>>” or a blank line, contain the expected output. A doctest will fail if the actual output does not match the expected output verbatim (e.g., string equality).

The doctest module is included out-of-the-box with Python 2 and 3. Like unittest, it can generate XML reports using unittest-xml-reporting.

Installation

doctest does not need any special installation because it comes with Python. However, the unittest-xml-reporting module may be installed with pip if needed:

> pip install unittest-xml-reporting

Project Structure

When doctests are embedded into docstrings, no structural differences are needed. However, if doctests are written as separate text files, then text files should be put under a directory named “doctests”. It may be prudent to create subdirectories that align with the Python package names. Doctest text files should be named after the modules they cover.

[project root directory]
|‐‐ [product code packages]
`-- doctests (?)
    `-- test_*.txt (?)

Example Code

An example project named example-py-doctest is located in my GitHub python-testing-101 repository. The project has the following structure:

example-py-doctest
|-- com.automationpanda.example
|   |-- __init__.py
|   |-- calc_class.py
|   `-- calc_func.py
|-- doctests
|   `-- test_calc_class.txt
`-- README.md

The com.automationpanda.example.calc_func module contains doctests embedded in the docstrings of math functions, alongside other comments:

	def add(a, b):
	"""
	Adds two numbers.

	>>> add(3, 2)
	5
	"""

	return a + b


	def subtract(a, b):
	"""
	Subtracts two numbers.

	>>> subtract(3, 2)
	1
	>>> subtract(2, 3)
	-1
	"""

	return a – b


	def multiply(a, b):
	"""
	Multiplies two numbers.

	>>> multiply(3, 2)
	6
	"""

	return a * b


	def divide(a, b):
	"""
	Divides two numbers.
	Automatically raises ZeroDivisionError.

	>>> divide(3.0, 2.0)
	1.5
	>>> divide(1.0, 0)
	Traceback (most recent call last):
	…
	ZeroDivisionError: float division by zero
	"""

	return a * 1.0 / b


	def maximum(a, b):
	"""
	Finds the maximum of two numbers.

	>>> maximum(3, 2)
	3
	>>> maximum(2, 3)
	3
	>>> maximum(3, 3)
	3
	"""

	return a if a >= b else b


	def minimum(a, b):
	"""
	Finds the minimum of two numbers.

	>>> minimum(3, 2)
	2
	>>> minimum(2, 3)
	2
	>>> minimum(2, 2)
	2
	"""

	return a if a <= b else b

view raw

calc_func.py

hosted with ❤ by GitHub

On the other hand, the com.automationpanda.example.calc_class module contains a Calculator class without doctests in docstrings:

	class Calculator(object):
	def __init__(self):
	self._last_answer = 0.0

	@property
	def last_answer(self):
	return self._last_answer

	def add(self, a, b):
	self._last_answer = a + b
	return self.last_answer

	def subtract(self, a, b):
	self._last_answer = a – b
	return self.last_answer

	def multiply(self, a, b):
	self._last_answer = a * b
	return self.last_answer

	def divide(self, a, b):
	# automatically raises ZeroDivisionError
	self._last_answer = a * 1.0 / b
	return self.last_answer

	def maximum(self, a, b):
	self._last_answer = a if a >= b else b
	return self.last_answer

	def minimum(self, a, b):
	self._last_answer = a if a <= b else b
	return self.last_answer

view raw

calc_class.py

hosted with ❤ by GitHub

Its doctests are located in a separate text file at doctests/test_calc_class.txt:

	The “com.automationpanda.example.calc_class“ module
	=====================================================

	>>> from com.automationpanda.example.calc_class import Calculator

	>>> calc = Calculator()

	>>> calc.add(3, 2)
	5

	>>> calc.subtract(3, 2)
	1

	>>> calc.subtract(2, 3)
	-1

	>>> calc.multiply(3, 2)
	6

	>>> calc.divide(3.0, 2.0)
	1.5

	>>> calc.divide(1.0, 0)
	Traceback (most recent call last):
	…
	ZeroDivisionError: float division by zero

	>>> calc.maximum(3, 2)
	3

	>>> calc.maximum(2, 3)
	3

	>>> calc.maximum(3, 3)
	3

	>>> calc.minimum(3, 2)
	2

	>>> calc.minimum(2, 3)
	2

	>>> calc.minimum(2, 2)
	2

view raw

test_calc_class.txt

hosted with ❤ by GitHub

Doctests are run in the order in which they are written. The examples above align functions with docstrings and classes with text files, but this is not required. Functions may have doctests in separate text files, and classes may have doctests embedded in method docstrings. Additional tricks are documented online.

Test Launch

To launch tests from the command line, change directory to the project root directory and run the doctest module directly from the python command. Note that doctests use file paths, not module names.

# Run doctests embedded as docstrings
> python -m doctest com/automationpanda/example/calc_func.py

# Run doctests written in separate text files
> python -m doctest doctests/test_calc_class.txt

When doctests run successfully, they don’t print any output! This may be surprising to a first-time user, but no news is good news. However, to force output, include the “-v” option:

# Run doctests with verbose output to print successes as well as failures
> python -m doctest -v com/automationpanda/example/calc_func.py
> python -m doctest -v doctests/test_calc_class.txt

Output should look something like this:

> python -m doctest -v doctests/test_calc_class.txt 
Trying:
    from com.automationpanda.example.calc_class import Calculator
Expecting nothing
ok
Trying:
    calc = Calculator()
Expecting nothing
ok
Trying:
    calc.add(3, 2)
Expecting:
    5
ok
...

1 items passed all tests:
  14 tests in test_calc_class.txt
14 tests in 1 items.
14 passed and 0 failed.
Test passed.

Doctests can also generate XML reports using unittest-xml-reporting. Follow the same instructions given for unittest. Furthermore, doctests can integrate with unittest discovery, so that test suites can run together.

Pros and Cons

doctest has many positive aspects. It is very simple yet powerful, and it has practically no learning curve. Since the doctest module comes with Python out of the box, no extra dependencies are required. It integrates nicely with unittest. Tests can be written in-line with the code, providing not only verification tests but also examples for the reader. And if in-line tests are deemed too messy, they can be moved to separate text files.

However, doctest has limitations. First of all, doctests are not independent: Python commands run sequentially and build upon each other. Thus, doctests may not be run individually, and side effects from one example may affect another. doctest also lacks many features of advanced frameworks, including hooks, assertions, tracing, discovery, replay, and advanced reporting. Theoretically, many of these things could be put into doctests, but they would be inelegantly jury-rigged. Long doctests become cumbersome. Furthermore, console output string-matching is not a robust assertion method. Silent Python statements that do not return a value or print output cannot be legitimately tested. Programmers can easily mistype expected output. Output format might also change in future development, or it may be nondeterministic (like for timestamps).

My main recommendation is this: use doctest for small needs but not big needs. doctest would be a good option for small tools and scripts that need spot checks instead of intense testing. It is also well suited for functional programming testing, in which expressions do not have side effects. Doctests should also be used to provide standard examples in docstrings wherever possible, in conjunction with other tests. Rich documentation is wonderful, and working examples can be a godsend. However, serious testing needs a serious framework, such as pytest or behave.

Python Testing 101: unittest

Overview

unittest is the standard Python unit testing framework. Inspired by JUnit, it is included with the standard CPython distribution. unittest provides a base class named TestCase, which provides methods for assertions and setup/cleanup routines. All test case classes must inherit from TestCase. Each method in a TestCase subclass whose name starts with “test” will be run as a test case. Tests can be grouped and loaded using the TestSuite class and load methods, which together can build custom test runners. unittest can also generate XML reports (like JUnit) using unittest-xml-reporting.

unittest is supported in both Python 2 and 3. However, use the unittest2 backport for versions earlier than Python 2.7.

Installation

Basic unittest does not need any special installation because it comes with Python. However, additional modules may be installed with pip if you need them:

> pip install unittest2
> pip install unittest-xml-reporting

Project Structure

Product code modules and unittest test code modules should be placed into separate Python packages within the same project. Test modules must be named “test_*.py” and must be put into packages in order for discovery to work when launching tests. Remember, a Python package is simply a directory with a file named “__init__.py“.

[project root directory]
|‐‐ [product code packages]
`‐‐ tests
    |‐‐ __init__.py
    `‐‐ test_*.py

Example Code

An example project named example-py-unittest is located in my GitHub python-testing-101 repository. The project has the following structure:

example-py-unittest
|-- com.automationpanda.example
|   |-- __init__.py
|   `-- calc.py
|-- com.automationpanda.tests
|   |-- __init__.py
|   `-- test_calc.py
`-- README.md

The com.automationpanda.example.calc module contains a Calculator class with basic math methods:

	class Calculator(object):
	def __init__(self):
	self._last_answer = 0.0

	@property
	def last_answer(self):
	return self._last_answer

	def add(self, a, b):
	self._last_answer = a + b
	return self.last_answer

	def subtract(self, a, b):
	self._last_answer = a – b
	return self.last_answer

	def multiply(self, a, b):
	self._last_answer = a * b
	return self.last_answer

	def divide(self, a, b):
	# automatically raises ZeroDivisionError
	self._last_answer = a * 1.0 / b
	return self.last_answer

view raw

calc.py

hosted with ❤ by GitHub

The com.automationpanda.tests.test_calc module contains a unittest.TestCase subclass, shown below. The test class uses the setUp method to construct a Calculator object, which each test method uses. The assertion methods used are assertEqual and assertRaises. A fresh instance of CalculatorTest is instantiated for every test method run.

	from com.automationpanda.example.calc import Calculator


	NUMBER_1 = 3.0
	NUMBER_2 = 2.0
	FAILURE = 'incorrect value'


	class CalculatorTest(unittest.TestCase):
	def setUp(self):
	self.calc = Calculator()

	def test_last_answer_init(self):
	value = self.calc.last_answer
	self.assertEqual(value, 0.0, FAILURE)

	def test_add(self):
	value = self.calc.add(NUMBER_1, NUMBER_2)
	self.assertEqual(value, 5.0, FAILURE)
	self.assertEqual(value, self.calc.last_answer, FAILURE)

	def test_subtract(self):
	value = self.calc.subtract(NUMBER_1, NUMBER_2)
	self.assertEqual(value, 1.0, FAILURE)
	self.assertEqual(value, self.calc.last_answer, FAILURE)

	def test_subtract_negative(self):
	value = self.calc.subtract(NUMBER_2, NUMBER_1)
	self.assertEqual(value, -1.0, FAILURE)
	self.assertEqual(value, self.calc.last_answer, FAILURE)

	def test_multiply(self):
	value = self.calc.multiply(NUMBER_1, NUMBER_2)
	self.assertEqual(value, 6.0, FAILURE)
	self.assertEqual(value, self.calc.last_answer, FAILURE)

	def test_divide(self):
	value = self.calc.divide(NUMBER_1, NUMBER_2)
	self.assertEqual(value, 1.5, FAILURE)
	self.assertEqual(value, self.calc.last_answer, FAILURE)

	def test_divide_by_zero(self):
	self.assertRaises(ZeroDivisionError, self.calc.divide, NUMBER_1, 0)

view raw

test_calc.py

hosted with ❤ by GitHub

Test Launch

To launch tests from the command line, change directory to the project root directory and run the unittest module directly from the python command:

# Discover and run all tests in the project
> python -m unittest discover

# Run all tests in the given module
> python -m unittest com.automationpanda.tests.test_calc

# Run all tests in the given test class
> python -m unittest com.automationpanda.tests.test_calc.CalculatorTest

# Run all tests in the given Python file (useful for path completion)
> python -m unittest com/automationpanda/tests/test_calc.py

Test output should look like this:

> python -m unittest discover
.............
----------------------------------------------------------------------
Ran 13 tests in 0.002s

OK

In order to generate XML reports, install unittest-xml-reporting and add the following “main” logic to the bottom of the test case module. The example below will generate the XML report into a directory named “test-reports”.

	if __name__ == '__main__':
	import xmlrunner

	unittest.main(
	testRunner=xmlrunner.XMLTestRunner(output='test-reports'),
	failfast=False,
	buffer=False,
	catchbreak=False)

view raw

test_calc.py

hosted with ❤ by GitHub

Then, run the test module directly from the command line:

# Run the test module directly
# Do this whenever "main" logic is written to run a test
# Examples: XML results file, custom test suites
> python -m com.automationpanda.tests.test_calc

Pros and Cons

unittest is “Old Reliable”. It is included out-of-the-box with Python, and it provides a basic, universal test class. Many other test frameworks are compatible with unittest. However, unittest is somewhat clunky: it forces class inheritance instead of allowing functions as test cases. The OOP style feels less Pythonic. Tests cannot be parameterized, either.

My recommendation is to use unittest if you need a basic unit test framework with no additional dependencies. Otherwise, there are better test frameworks available, such as pytest.

Python Testing 101: Introduction

Python is an amazing programming language. Loved by beginners and experts alike, it is consistently ranked as one of the most in-demand languages today. At PyData Carolinas 2016, Josh Howes, a senior data science manager at MaxPoint at the time, described Python like this (in rough paraphrase):

Python is a magical tool that easily lets you solve the world’s toughest problems.

I first touched Python back in high school more than a decade ago, but I really started using it and loving it in recent years for test automation. This 101 series will teach how to do testing in Python. This introductory post will give basic orientation, and each subsequent post will focus on a different Python test framework in depth.

Why Use Python for Testing?

As mentioned in another post, The Best Programming Language for Test Automation, Python is concise, elegant, and readable – the precise attributes needed to effectively turn test cases into test scripts. It has richly-supported test packages to deftly handle both white-box and black-box testing. It is also command-line-friendly. Engineers who have never used Python tend to learn it quickly.

The following examples illustrate ways to use Python for test automation:

A developer embedding quick checks into function docstrings.
A developer writing unit tests for a module or package.
A tester writing integration tests for REST APIs.
A tester writing end-to-end web tests using Selenium.
A data scientist verifying functions in a Jupyter notebook.
The Three Amigos writing Given-When-Then scenarios for BDD testing.

Remember, Python can be used for any black-box testing, even if the software product under test isn’t written in Python!

Python Version

Choosing the right Python installation itself is no small decision. For an in-depth analysis, please refer to Which Version of Python Should I Use? Tl;dr:

For white-box testing, use the matching Python version.
For black-box testing, use CPython version 3 if not otherwise constrained.

Unless otherwise stated, this 101 series uses CPython 3.

Picking a Framework

There are so many Python test frameworks that choosing one may seem daunting – just look at the Python wiki, The Hitchhiker’s Guide to Python, and pythontesting.net. Despite choice overload, there are a few important things to consider:

Consider the type of testing. Basic unit tests could be handled by unittest or even doctest, but higher-level testing would do better with other frameworks like pytest. BDD testing would require behave, lettuce, or radish.
Consider the supported Python version. Python 2 and 3 are two different languages, with Python 2’s end-of-life slated for 2020. Different frameworks have different levels of version support, which could become especially problematic for white-box testing. Furthermore, some may have different features between Python versions.
Consider support and development. Typically, it is best to choose mature, actively-developed frameworks for future sustainability. For example, the once-popular nose is now deprecated.

Future posts in this series will document many frameworks in detail to empower you, as the reader, to pick the best one for your needs.

Virtual Environments

A virtual environment (VE) is like a local Python installation with a specific package set. Tools like venv (Python 3.3+), virtualenv (Python 2 and 3), and Conda (Python 2 and 3; for data scientists) make it easy to create virtual environments from the command line. Pipenv goes a step further by combining VE management with simple-yet-sophisticated package management. Creating at least one separate VE for each Python project is typically a good practice. VEs are extremely useful for test automation because:

VEs allow engineers to maintain multiple Python environments simultaneously.
- Engineers can develop and test packages for both versions of Python.
- Engineers can separate projects that rely on different package versions.
VEs allow users to install Python packages locally without changing global installations.
- Users may not have permissions to install packages globally.
- Global changes may disrupt other dependent Python software.
VEs can import and export package lists for easy reconstruction.

VEs become especially valuable in continuous integration and deployment because they can easily provide Python consistency. For example, a Jenkins build job can create a VE, install dependencies from PyPI in the VE, run Python tests, and safely tear down. Once the product under test is ready to be deployed, the same VE configuration can be used.

Recommended IDEs

Any serious test automation work needs an equally serious IDE. My favorite is JetBrains PyCharm. I really like its slick interface and intuitive nature, and it provides out-of-the-box support for a number of Python test frameworks. PyCharm may be downloaded as a standalone IDE or a plugin for JetBrains IntelliJ IDEA. The Community Edition is free and meets most automation needs, while the Professional Edition requires a license. PyDev is a nice alternative for those who prefer Eclipse. Eric satisfies the purists for being a Python IDE written in Python. While all three have a plugin framework, PyCharm and PyDev seem to take the advantage in popularity and support. There’s also the classic IDLE, but its use is strongly discouraged nowadays, due to bugs and better options.

Lightweight text editors can make small edits easy and fast. Visual Studio Code is a recent favorite. Notepad++ is always a winner on Windows. Atom is a newer, cross-platform editor developed by GitHub that’s gaining popularity. Of course, UNIX platforms typically provide vim or emacs.

Framework Survey

If this series is for you, then install an IDE, set up a virtual environment, and let’s roll! The next posts will each introduce a popular Python test framework.Each post should be used as an introduction for getting started or as a quick reference. Please refer to official framework documentation for full details – it would be imprudent for this blog to unnecessarily duplicate information.

The outline for each post will be:

Overview
Installation
Project Structure
Example Code
Test Launch
Pros and Cons

Cucumber-JVM Global Hook Workarounds

Almost all BDD automation frameworks have some sort of hooks that run before and after scenarios. However, not all frameworks have global hooks that run once at the beginning or end of a suite of scenarios – and Cucumber-JVM is one of these unlucky few. Cucumber-JVM GitHub Issue #515, which seeks to add @BeforeAll and @AfterAll hooks, has been open and active since 2013, but it looks unclear if the issue will ever be resolved. Thankfully, there are some workarounds to effect the same behavior as global hooks.

Workaround #1: Don’t Do It

From a purist’s perspective, each scenario (or test) should be completely independent, meaning it should not share parts with any other tests. Independence provides the following benefits:

Safety between tests
Consistency across tests
The ability to run any tests individually, in any order, or in parallel
More sensible, understandable tests

If not handled properly, global hooks can be dangerous because they make tests interdependent. Changes or failures in one test may cascade into others. Global test data would waste memory for tests that don’t use it. Furthermore, the fact that Issue #515 has been open for years indicates the difficulty of properly implementing global hooks.

However, the main cost of independence is runtime. Independent tests often repeat similar setup and cleanup routines. Even a few extra seconds per test can add up tremendously. Google Guava, for example, has over 286,000 tests – adding one second to each test would amount to nearly 80 hours! Performance becomes especially critical for continuous integration, in which wasted time means either delivery delays or coverage gaps. Certain operations like preparing a database or fetching authentication tokens may be pragmatic candidates for global hooks.

The best strategy is to use global hooks only when necessary for time-intensive setup that can be shared safely. Any shared test data should be immutable. Always question the need for global hooks. Most tests probably won’t need them.

Workaround #2: Static Variables

A basic hack for global hooks is actually provided in Issue #515. A static Boolean flag can indicate when the @Before hook has run more than once because it isn’t “reset” when a new scenario re-instantiates the step definition classes. The runtime shutdown hook will be called once all tests are done and the program exits. (Note that a static flag cannot be used in an @After hook due to the halting problem.) The example from the issue is shamelessly copied below:

public class GlobalHooks {
    private static boolean dunit = false;

    @Before
    public void beforeAll() {
        if(!dunit) {
            Runtime.getRuntime().addShutdownHook(afterAllThread);
            // do the beforeAll stuff...
            dunit = true;
        }
    }
}

Workaround #3: Singleton Caching

The basic hack is useful for simple setup and cleanup routines, but it becomes inelegant when objects must be shared by scenarios. Rather than polluting the class with static members, a singleton can cache test data between scenarios, and global setup logic may be put into the singleton’s constructor. Furthermore, if the singleton uses lazy initialization, then @Before hooks may not be needed at all. A “lazy” singleton will not be instantiated until the first time its getInstance method is called, meaning it will be skipped if the scenarios do not need them. This is a huge advantage when selectively running scenarios by name, tag, or feature. (Please refer to the previous post, Static or Singleton, for a deeper explanation of the singleton pattern.)

Consider scenarios that must generate authentication tokens (like OAuth) for API testing. A singleton “token holder” could cache tokens for usernames, rather than doing the authorization dance for every scenario. The snippet below shows how such a singleton could be called within a @When step definition with no @Before method.

public class ExampleSteps {
    ...
    @When("^some API is called$")
    public void whenSomeApiIsCalled() {
        // Get the token from the singleton cache lazily
        String token = TokenHolder.getInstance().getToken("user", "pass");
        // Use the token to call some API (method not shown)
        callSomeApi(token);
    }
    ...
}

And the singleton class could be defined like this:

public class TokenHolder {
    private static volatile TokenHolder instance = null;
    private HashMap<String, String> tokens;

    private TokenHolder() {
        tokens = new HashMap<String, String>();
    }

    public static TokenHolder getInstance() {
        // Lazy and thread-safe
        if (instance == null) {
            synchronized(TokenHolder.class) {
                if (instance == null) {
                    instance = new TokenHolder();
                }
            }
        }

        return instance;
    }
    
    public String getToken(String username, String password) {
        // This check could be extended to handle token expiration
        if (!tokens.containsKey(username)) {
            // Request a fresh authentication token (method not shown)
            String token = requestToken(username, password);
            // Cache the token for later
            tokens.put(username, token);
        }
        
        return tokens.get(username);
    }
    
    ...
}

Workaround #4: JUnit Class Annotations

Another workaround mentioned in Issue #515 and elsewhere is to use JUnit‘s @BeforeClass and @AfterClass annotations in the runner class, like this:

@RunWith(Cucumber.class)
@Cucumber.Options(format = {
    "html:target/cucumber-html-report",
    "json-pretty:target/cucumber-json-report.json"})
public class RunCukesTest {

    @BeforeClass
    public static void setup() {
        System.out.println("Ran the before");
    }

    @AfterClass
    public static void teardown() {
        System.out.println("Ran the after");
    }
}

While @BeforeClass and @AfterClass may look like the cleanest solution at first, they are not very practical to use. They work only when Cucumber-JVM is set to use the JUnit runner. Other runners, like TestNG, the command line runner, and special IDE runners, won’t pick up these hooks. Their methods must also be are static and would need static variables or singletons to share data anyway. Therefore, I personally discourage using these annotations in Cucumber-JVM.

What About Dependency Injection?

Dependency injection is a marvelous technique. As defined by Wikipedia:

In software engineering, dependency injection is a technique whereby one object supplies the dependencies of another object. A dependency is an object that can be used (a service). An injection is the passing of a dependency to a dependent object (a client) that would use it. The service is made part of the client’s state. Passing the service to the client, rather than allowing a client to build or find the service, is the fundamental requirement of the pattern.

Dependency injection can be a powerful alternative to singletons because DI provides finer control over the scope of objects. However, Cucumber-JVM’s dependency injection cannot be applied with global hooks because dependency objects, like step definition objects, are constructed and destroyed for each scenario.

Comparison Table

Ultimately, the best approach for global hooks in Cucumber-JVM is the one that best fits the tests’ needs. Below is a table to make workaround comparisons easier.

Workaround	Pros	Cons
Don’t Do It	Scenarios are completely independent. No complicated or risky workarounds.	Repeated setup and cleanup procedures may add significant execution time.
Static Variables	Simple yet effective implementation.	May need many static variables to share test data.
Singleton Caching	Abstracts test data and setup procedures. Easily handles lazy initialization and evaluation. May not need a @Before hook.	More complicated design.
JUnit Class Annotations	Clean look for basic setup and cleanup routines.	May be used only with the JUnit runner. Requires static variables or singletons to share test data anyway.