Perl

BDD 101: Frameworks

Every major programming language has a BDD automation framework. Some even have multiple choices. Building upon the structural basics from the previous post, this post provides a survey of the major frameworks available today. Since I cannot possibly cover every BDD framework in depth in this 101 series, my goal is to empower you, the reader, to pick the best framework for your needs. Each framework has support documentation online justifying its unique goodness and detailing how to use it, and I would prefer not to duplicate documentation. Use this post primarily as a reference. (Check the Automation Panda BDD page for the full table of contents.)

Major Frameworks

Most BDD frameworks are Cucumber versions, JBehave derivatives inspired by Dan North, or non-Gherkin spec runners. Some put behavior scenarios into separate files, while others put them directly into the source code.

C# and Microsoft .NET

SpecFlow, created by Gáspár Nagy, is arguably the most popular BDD framework for Microsoft .NET languages. Its tagline is “Cucumber for .NET” – thus fully compliant with Gherkin. SpecFlow also has polished, well-designed hookscontext injection, and parallel execution (especially with test thread affinity). The basic package is free and open source, but SpecFlow also sells licenses for SpecFlow+ extensions. The free version requires a unit test runner like MsTest, NUnit, or xUnit.net in order to run scenarios. This makes SpecFlow flexible but also feels jury-rigged and inelegant. The licensed version provides a slick runner named SpecFlow+ Runner (which is BDD-friendly) and a Microsoft Excel integration tool named SpecFlow+ Excel. Microsoft Visual Studio has extensions for SpecFlow to make development easier.

There are plenty of other BDD frameworks for C# and .NET, too. xBehave.net is an alternative that pairs nicely with xUnit.net. A major difference of xBehave.net is that scenario steps are written directly in the code, instead of in separate text (feature) files. LightBDD bills itself as being more lightweight than other frameworks and basically does some tricks with partial classes to make the code more readable. NSpec is similar to RSpec and Mocha and uses lambda expressions heavily. Concordion offers some interesting ways to write specs, too. NBehave is a JBehave descendant, but the project appears to be dead without any updates since 2014.

Java and JVM Languages

The main Java rivalry is between Cucumber-JVM and JBehave. Cucumber-JVM is the official Cucumber version for Java and other JVM languages (Groovy, Scala, Clojure, etc.). It is fully compliant with Gherkin and generates beautiful reports. The Cucumber-JVM driver can be customized, as well. JBehave is one of the first and foremost BDD frameworks available. It was originally developed by Dan North, the “father of BDD.” However, JBehave is missing key Gherkin features like backgrounds, doc strings, and tags. It was also a pure-Java implementation before Cucumber-JVM existed. Both frameworks are widely used, have plugins for major IDEs, and distribute Maven packages. This popular but older article compares the two in slight favor of JBehave, but I think Cucumber-JVM is better, given its features and support.

The Automation panda article Cucumber-JVM for Java is a thorough guide for the Cucumber-JVM framework.

Java also has a number of other BDD frameworks. JGiven uses a fluent API to spell out scenarios, and pretty HTML reports print the scenarios with the results. It is fairly clean and concise. Spock and JDave are spec frameworks, but JDave has been inactive for years. Scalatest for Scala also has spec-oriented features. Concordion also provides a Java implementation.

JavaScript

Almost all JavaScript BDD frameworks run on Node.js. Jasmine and Mocha are two of the most popular general-purpose JS test frameworks. They differ in that Jasmine has many features included (like assertions and spies) that Mocha does not. This makes Jasmine easier to get started (good for beginners) but makes Mocha more customizable (good for power users). Both claim to be behavior-driven because they structure tests using “describe” and “it-should” phrases in the code, but they do not have the advantage of separate, reusable steps like Gherkin. Personally, I consider Jasmine and Mocha to be behavior-inspired but not fully behavior-driven.

Other BDD frameworks are more true to form. Cucumber provides Cucumber.js for Gherkin-compliant happiness. Yadda is Gherkin-like but with a more flexible syntax. Vows provides a different way to approach behavior using more formalized phrase partitions for a unique form of reusability. The Cucumber blog argues that Cucumber.js is best due to its focus on good communication through plain language steps, whereas other JavaScript BDD frameworks are more code-y. (Keep in mind, though, that Cucumber would naturally boast of its own framework.) Other comparisons are posted here, here, here, and here.

PHP

The two major BDD frameworks for PHP are Behat and Codeception. Behat is the official Cucumber version for PHP, and as such is seen as the more “pure” BDD framework. Codeception is more programmer-focused and can handle other styles of testing. There are plenty of articles comparing the two – here, here, and here (although the last one seems out of date). Both seem like good choices, but Codeception seems more flexible.

Python

Python has a plethora of test frameworks, and many are BDD. behave and lettuce are probably the two most popular players. Feature comparison is analogous to Cucumber-JVM versus JBehave, respectively: behave is practically Gherkin compliant, while lettuce lacks a few language elements. Both have plugins for major IDEs. pytest-bdd is on the rise because it integrates with all the wonderful features of pytestradish is another framework that extends the Gherkin language to include scenario loops, scenario preconditions, and variables. All these frameworks put scenarios into separate feature files. They all also implement step definitions as functions instead of classes, which not only makes steps feel simpler and more independent, but also avoids unnecessary object construction.

Other Python frameworks exist as well. pyspecs is a spec-oriented framework. Freshen was a BDD plugin for Nose, but both Freshen and Nose are discontinued projects.

Ruby

Cucumber, the gold standard for BDD frameworks, was first implemented in Ruby. Cucumber maintains the official Gherkin language standard, and all Cucumber versions are inspired by the original Ruby version. Spinach bills itself as an enhancement to Cucumber by encapsulating steps better. RSpec is a spec-oriented framework that does not use Gherkin.

Which One is Best?

There is no right answer – the best BDD framework is the one that best fits your needs. However, there are a few points to consider when weighing your options:

  • What programming language should I use for test automation?
  • Is it a popular framework that many others use?
  • Is the framework actively supported?
  • Is the spec language compliant with Gherkin?
  • What type of testing will you do with the framework?
  • What are the limitations as compared to other frameworks?

Frameworks that separate scenario text from implementation code are best for shift-left testing. Frameworks that put scenario text directly into the source code are better for white box testing, but they may look confusing to less experienced programmers.

Personally, my favorites are SpecFlow and pytest-bdd. At LexisNexis, I used SpecFlow and Cucumber-JVM. For Python, I used behave at MaxPoint, but I have since fallen in love with pytest-bdd since it piggybacks on the wonderfulness of pytest. (I can’t wait for this open ticket to add pytest-bdd support in PyCharm.) For skill transferability, I recommend Gherkin compliance, as well.

Reference Table

The table below categorizes BDD frameworks by language and type for quick reference. It also includes frameworks in languages not described above. Recommended frameworks are denoted with an asterisk (*). Inactive projects are denoted with an X (x).

Language Framework Type
C Catch In-line Spec
C++ Igloo In-line Spec
C# and .NET Concordion
LightBDD
NBehave x
NSpec
SpecFlow *
xBehave.net
In-line Spec
In-line Gherkin
Separated semi-Gherkin
In-line Spec
Separated Gherkin
In-line Gherkin
Golang Ginkgo In-line Spec
Java and JVM Cucumber-JVM *
JBehave
JDave x
JGiven *
Scalatest
Spock
Separated Gherkin
Separated semi-Gherkin
In-line Spec
In-line Gherkin
In-line Spec
In-line Spec
JavaScript Cucumber.js *
Yadda
Jasmine
Mocha
Vows
Separated Gherkin
Separated semi-Gherkin
In-line Spec
In-line Spec
In-line Spec
Perl Test::BDD::Cucumber Separated Gherkin
PHP Behat
Codeception *
Separated Gherkin
Separated or In-line
Python behave *
freshen x
lettuce
pyspecs
pytest-bdd *
radish
Separated Gherkin
Separated Gherkin
Separated semi-Gherkin
In-line Spec
Separated semi-Gherkin
Separated Gherkin-plus
Ruby Cucumber *
RSpec
Spinach
Separated Gherkin
In-line Spec
Separated Gherkin
Swift / Objective C Quick In-line Spec

 

[4/22/2018] Update: I updated info for C# and Python frameworks.

The Best Programming Language for Test Automation

Which programming languages are best for writing test automation? There are several choices – just look at this list on Wikipedia and this cool decision graphs for choosing languages. While this topic can quickly devolve into a spat over personal tastes, I do believe there are objective reasons for why some languages are better for automating test cases than others.

Dividing Test Layers

First of all, unit tests should always be written in the same language as the product under test. Otherwise, they would definitionally no longer be unit tests! Unit tests are white box and need direct access to the product source code. This allows them to cover functions, methods, and classes.

The question at hand pertains more to higher-layer functional tests. These tests fall into many (potentially overlapping) categories: integration, end-to-end, system, acceptance, regression, and even performance. Since they are all typically black box, higher-layer tests do not necessarily need to be written in the same language as the product under test.

My Opinionated Choices

Personally, I think Python is today’s best all-around language for test automation. Python is wonderful because its conciseness lets the programmer expressively capture the essence of the test case. It also has very rich test support packages. Check out this article: Why Python is Great for Test AutomationJava is a good choice as well – it has a rich platform of tools and packages, and continuous integration with Java is easy with Maven/Gradle/ANT and Jenkins. I’ve heard that Ruby is another good choice for reasons similar to Python, but I have not used it myself.

Some languages are good in specific domains. For example, JavaScript is great for pure web app testing (à la Jasmine, Karma, and Protractor) but not so good for general purposes (despite Node.js running anywhere). A good reason to use JavaScript for testing would be MEAN stack development. TypeScript would be even better because it is safer and scales better. C# is great for Microsoft shops and has great test support, but it lives in the Microsoft bubble. .NET development tools are not always free, and command line operations can be painful.

Other languages are poor choices for test automation. While they could be used for automation, they likely should not be used. C and C++ are inconvenient because they are very low-level and lack robust frameworks. Perl is dangerous because it simply does not provide the consistency and structure for scalable, self-documenting code. Functional languages like LISP and Haskell are difficult because they do not translate well from test case procedures. They may be useful, however, for some lower-level data testing.

8 Criteria for Evaluation

There are eight major points to consider when evaluating any language for automation. These criteria specifically assess the language from a perspective of purity and usability, not necessarily from a perspective of immediate project needs.

  1. Usability.  A good automation language is fairly high-level and should handle rote tasks like memory management. Lower learning curves are preferable. Development speed is also important for deadlines.
  2. Elegance. The process of translating test case procedures into code must be easy and clear. Test code should also be concise and self-documenting for maintainability.
  3. Available Test Frameworks. Frameworks provide basic needs such as fixtures, setup/cleanup, logging, and reporting. Examples include Cucumber and xUnit.
  4. Available Packages. It is better to use off-the-shelf packages for common operations, such as web drivers (Selenium), HTTP requests, and SSH.
  5. Powerful Command Line. A good CLI makes launching tests easy. This is critical for continuous integration, where tests cannot be launched manually.
  6. Easy Build Integration. Build automation should launch tests and report results. Difficult integration is a DevOps nightmare.
  7. IDE Support. Because Notepad and vim just don’t cut it for big projects.
  8. Industry Adoption. Support is good. If the language remains popular, then frameworks and packages will be maintained well.

Below, I rated each point for a few popular languages:

Python Java JavaScript C# C/C++ Perl
Usability  awesome  good  good  good  terrible  poor
Elegance  awesome  good  okay  good  poor  poor
Available Test Frameworks  awesome  awesome  awesome  good  okay  poor
Available Packages  awesome  awesome  okay  good  good  good
Powerful Command Line  awesome  good  good  okay  poor  okay
Easy Build Integration  good  good  good  good  poor  poor
IDE Support  good  awesome  good  good  okay  terrible
Industry Adoption  awesome  awesome  awesome  good  terrible  terrible

Conclusion

I won’t shy away from my preference for Python, but I recognize that they may not be the right choice for all situations. For example, when I worked at LexisNexis, we used C# because management wanted developers, who wrote the app in C#, to contribute to test automation.

Now, a truly nifty idea would be to create a domain-specific language for test automation, but that must be a topic for another post.

UPDATE: I changed some recommendations on 4/18/2018.