Python

Better automation is one of BDD’s hallmark benefits. In fact, the main goal of BDD could be summarized as rapidly turning conceptualized behavior into automatically tested behavior. While the process and the Gherkin are universal, the underlying automation could be built using one of many frameworks.

This post explains how BDD automation frameworks work. It focuses on the general structure of the typical framework – it is not a tutorial on how to use any specific framework. However, I wrote short examples for each piece using Python’s behave framework, since learning is easier with examples. I chose to use Python here simply for its conciseness. (Check the Automation Panda BDD page for the full table of contents.)

Framework Parts

Every BDD automation framework has five major pieces:

#1: Feature Files

Gherkin feature files are very much part of the automation. They act like test scripts – each scenario is essentially a test case. Previous posts covered Gherkin in depth.

Here is an example feature file named google_search.feature:

Feature: Google Searching
  As a web surfer, I want to search Google, so that I can learn new things.
  
  # This scenario should look familiar
  @automated @google-search @panda
  Scenario: Simple Google search
    Given a web browser is on the Google page
    When the search phrase "panda" is entered
    Then results for "panda" are shown

#2: Step Definitions

A step definition is a code block that implements the logic to execute a step. It is typically a method or function with the English-y step phrase as an annotation. Step definitions can take in arguments, doc strings, and step tables. They may also make assertions to pass or fail a scenario. In most frameworks, data can be passed between steps using some sort of context object. When a scenario is executed, the driver matches each scenario step phrase to its step definition. (Most frameworks use regular expressions for phrase matching.) Thus, every step in a feature file needs a step definition.

The step definitions would be written in a Python source file like this:

from behave import *

@given('a web browser is on the Google page')
def step_impl(context):
  context.google_page.load();

@when('the search phrase "{phrase}" is entered')
def step_impl(context, phrase):
  context.google_page.search(phrase)

@then('the results for "{phrase}" are shown')
def step_impl(context, phrase):
  assert context.google_page.has_results(phrase)

#3: Hooks

Certain automation logic cannot be handled by step definitions. For example, scenarios may need special setup and cleanup operations. Most BDD frameworks provide hooks that can insert calls before or after Gherkin sections, typically filterable using tags. Hooks are similar in concept to aspect-oriented programming.

In behave, hooks are written in a Python source file named environment.py:

import page_objects
from selenium import webdriver

def before_all(context):
  context.browser = webdriver.Chrome()

def before_scenario(context):
  context.google_page = page_objects.GooglePage(context.browser)

def after_all(context):
  context.browser.quit()

#4: Support Code

Support code (a.k.a libraries or packages) refers to any code called by step definitions and hooks. Support code could be dependency packages downloaded using managers like Maven (Java), NuGet (.NET), or PyPI (Python). For example, Selenium WebDriver is a well-known package for web browser automation. Support code could also be components to assist automation, such as page objects or other design patterns. As the cliché goes, “Don’t reinvent the wheel.” Step definitions and hooks should not contain all of the logic for running the actions – they should reuse common code as much as possible.

A Python page object class from the page_objects.py module could look like this:

class GooglePage(object):
  """A page object for the Google home page"""
  
  def __init__(self, browser):
    self.browser = browser
  
  def load():
    # put code here
    pass
  
  def search(phrase):
    # put code here
    pass
  
  def has_results(phrase):
    # put code here
    return False

#5: Driver

Every automation framework has a driver that runs tests, and BDD frameworks are no different. The driver executes each scenario in a feature file independently. Whenever a failure happens, the driver reports the failure and aborts the scenario. Drivers typically have discovery mechanisms for selecting scenarios to run based on tag names or file paths.

The behave driver can be launched from the command line like this:

> behave google_search.py --tags @panda

Automation Advantages

Even if a team does not apply behavior-driven practices to its full development process, BDD test frameworks still have some significant advantages over non-BDD test frameworks. First of all, steps make BDD automation very modular and thus reusable. Each step is an independent action, much like how each scenario is an independent behavior. Once a step definition is written, it may be reused by any number of scenarios. This is crucial, since most behaviors for a feature share common actions. And all steps are inherently self-documenting, since they are written in plain language. There is a natural connection between high-level behavior and low-level implementation.

Test execution also has advantages. Tags make it very easy to select tests to run, especially from the command line. Failures are very informative as well. The driver pinpoints precisely which step failed for which scenario. And since behaviors are isolated, a failure for one scenario is less likely to affect other test scenarios than would be the case for procedure-driven tests.

All of this is explained more thoroughly in the Automation Panda article, ‑‑BDD; Automation without Collaboration.

What About Test Data?

Test data is a huge concern for any automation framework. Simple test data values may be supplied directly in Gherkin as step arguments or table values, but larger test data sets require other strategies. Support code can be used to handle test data. Read BDD 101: Test Data for more information.

Available Frameworks

There are many BDD frameworks out there. The next post will introduce a few major frameworks for popular languages.

Dividing Test Layers

First of all, unit tests should always be written in the same language as the product under test. Otherwise, they would definitionally no longer be unit tests! Unit tests are white box and need direct access to the product source code. This allows them to cover functions, methods, and classes.

The question at hand pertains more to higher-layer functional tests. These tests fall into many (potentially overlapping) categories: integration, end-to-end, system, acceptance, regression, and even performance. Since they are all typically black box, higher-layer tests do not necessarily need to be written in the same language as the product under test.

My Opinionated Choices

Personally, I think Python is today’s best all-around language for test automation. Python is wonderful because its conciseness lets the programmer expressively capture the essence of the test case. It also has very rich test support packages. Check out this article: Why Python is Great for Test Automation. Java is a good choice as well – it has a rich platform of tools and packages, and continuous integration with Java is easy with Maven/Gradle/ANT and Jenkins. I’ve heard that Ruby is another good choice for reasons similar to Python, but I have not used it myself.

Some languages are good in specific domains. For example, JavaScript is great for pure web app testing (à la Jasmine, Karma, and Protractor) but not so good for general purposes (despite Node.js running anywhere). A good reason to use JavaScript for testing would be MEAN stack development. TypeScript would be even better because it is safer and scales better. C# is great for Microsoft shops and has great test support, but it lives in the Microsoft bubble. .NET development tools are not always free, and command line operations can be painful.

Other languages are poor choices for test automation. While they could be used for automation, they likely should not be used. C and C++ are inconvenient because they are very low-level and lack robust frameworks. Perl is dangerous because it simply does not provide the consistency and structure for scalable, self-documenting code. Functional languages like LISP and Haskell are difficult because they do not translate well from test case procedures. They may be useful, however, for some lower-level data testing.

8 Criteria for Evaluation

There are eight major points to consider when evaluating any language for automation. These criteria specifically assess the language from a perspective of purity and usability, not necessarily from a perspective of immediate project needs.

Usability. A good automation language is fairly high-level and should handle rote tasks like memory management. Lower learning curves are preferable. Development speed is also important for deadlines.

Elegance. The process of translating test case procedures into code must be easy and clear. Test code should also be concise and self-documenting for maintainability.

Available Test Frameworks. Frameworks provide basic needs such as fixtures, setup/cleanup, logging, and reporting. Examples include Cucumber and xUnit.

Available Packages. It is better to use off-the-shelf packages for common operations, such as web drivers (Selenium), HTTP requests, and SSH.

Powerful Command Line. A good CLI makes launching tests easy. This is critical for continuous integration, where tests cannot be launched manually.

Easy Build Integration. Build automation should launch tests and report results. Difficult integration is a DevOps nightmare.

IDE Support. Because Notepad and vim just don’t cut it for big projects.

Industry Adoption. Support is good. If the language remains popular, then frameworks and packages will be maintained well.

Below, I rated each point for a few popular languages:

Python

Java

JavaScript

C/C++

Perl

Usability

awesome

good

terrible

poor

Elegance

awesome

good

okay

good

poor

Available Test Frameworks

awesome

good

okay

poor

Available Packages

awesome

okay

good

Powerful Command Line

awesome

good

okay

poor

okay

Easy Build Integration

good

poor

IDE Support

good

awesome

good

okay

terrible

Industry Adoption

awesome

good

terrible

Conclusion

I won’t shy away from my preference for Python, but I recognize that they may not be the right choice for all situations. For example, when I worked at LexisNexis, we used C# because management wanted developers, who wrote the app in C#, to contribute to test automation.

Now, a truly nifty idea would be to create a domain-specific language for test automation, but that must be a topic for another post.

UPDATE: I changed some recommendations on 4/18/2018.

Automation Panda

A blog for software development and testing

BDD 101: Automation

Framework Parts

#1: Feature Files

#2: Step Definitions

#3: Hooks

#4: Support Code

#5: Driver

Automation Advantages

What About Test Data?

Available Frameworks

The Best Programming Language for Test Automation

Dividing Test Layers

My Opinionated Choices

8 Criteria for Evaluation

Conclusion