languages

Which Version of Python Should I Use?

Which version of Python should I use? Now, that’s a loaded question. While the answer is simple, the explanation is more complicated.

tl;dr

  • Use the latest version of Python 3.
  • Use the CPython implementation.
  • Use venv or virtualenv to manage multiple installations.
  • Use PyCharm or PyDev as the IDE.

Which Version?

Python 2 and Python 3 are actually slightly different languages. The differences go deeper than just print statements. The What’s New in Python page on the official doc site lists all the gory details, and decent articles showcasing differences can be found here, here, and here. Although Python 3 is newer, Python 2 remains prevalent. Most popular packages use Python packaging tools to support both versions. At the time of writing this article, the current versions of Python are 3.6 and 2.7.

The Python Wiki makes it clear that Python 3 is the better choice:

Python 2.x is legacy, Python 3.x is the present and future of the language

Furthermore, Python 2 will reach end-of-life in 2020. The Python team will continue to provide bug fixes for 2.7 until 2020 (PEP 373), but there will be no new language features and no 2.8 (PEP 404). (Originally, end-of-life was planned for 2015, but it was pushed back by 5 years.) There is even a Python 2.7 Countdown clock online.

Which Implementation?

In purest terms, “Python” is a language specification. An implementation provides the language processing tools (compiler, interpreter, etc.) to run Python programs. The Hitchhiker’s Guide to Python has a great article entitled Picking an Interpreter that provides a good summary of available interpreters. Others are listed on python.org and the Python Wiki. The table below provides a quick overview of the big ones.

Implementation Points
CPython
  • most widely used implementation
  • the reference implementation
  • has the most libraries and support
  • implemented in C
  • supports Python 2 and 3
PyPy
  • much faster than CPython
  • much more memory efficient
  • implemented in RPython
  • supports Python 2 and 3
Jython
  • implemented in Java
  • runs on the JVM
  • supports Python 2
  • only a sandbox for Python 3
  • no project updates since May 2015
IronPython
  • implemented for .NET
  • lets Python libs call .NET and vice versa
  • supports Python 2
Python for .NET
  • integrates CPython with .NET/Mono runtime
  • supports Python 2 and 3
Stackless Python
  • branch of CPython with real threading
MicroPython
  • optimized for microcontrollers
  • uses a subset of the standard library

Unless you have a very specific reason, just use CPython. In fact, most people are referring to CPython when they say “Python.” CPython has the most compatibility, the widest package library, and the richest support.

Managing Installations

The simplest way to install Python is to install it “globally” for the system. In fact, some operating systems like macOS and Ubuntu have Python pre-installed. However, global installation has limitations:

  1. You may want to develop packages for both versions 2 and 3.
  2. You may not have permissions to add new packages globally.
  3. Different projects may require different versions of packages.

These problems can be solved by using “virtual” environments. A virtual environment is like a local Python installation with a specific package set. For example, I have created virtual environments for Python as part of Jenkins build jobs, since I did not have permission to install special automation packages globally on the Jenkins slaves.

The standard virtual environment tool for Python is venv, which has been packaged with (C)Python since 3.3. (venv had a command line wrapper named pyvenv, but this was deprecated in 3.6.) Another older but still popular third-party tool is virtualenv. As explained in this Reddit post, venv is the Python-sanctioned replacement for virtualenv. However, virtualenv supports Python 2, whereas venv does not. Conda is an environment manager popular with the science and data communities, and it can support other languages in addition to Python. My recommendation is to use venv if you use Python 3 exclusively and use virtualenv for switching between Python 2 and 3.

Getting Started

After setting up your Python environment, you are ready to start programming! While you could use a simple text editor like Notepad++, I highly recommend an IDE like JetBrains PyCharmPyDev for Eclipse, or Eric. IDEs provide rich development support, especially for larger apps that use frameworks like Django. They also make testing easier with plugins for test frameworks like pytest, behave, and others. PyCharm and PyDev are particularly nice because they can integrate into their larger IDEs (IntelliJ IDEA and Eclipse, respectively) to handle more languages. Personally, I prefer PyCharm, but advanced features require a paid license. PyDev and Eric, on the other hand, are totally free and open source.

 

This article is dedicated to my good friend Sudarsan, who recently asked me the question in the title.

The Best Programming Language for Test Automation

Which programming languages are best for writing test automation? There are several choices – just look at this list on Wikipedia and this cool decision graphs for choosing languages. While this topic can quickly devolve into a spat over personal tastes, I do believe there are objective reasons for why some languages are better for automating test cases than others.

Dividing Test Layers

First of all, unit tests should always be written in the same language as the product under test. Otherwise, they would definitionally no longer be unit tests! Unit tests are white box and need direct access to the product source code. This allows them to cover functions, methods, and classes.

The question at hand pertains more to higher-layer functional tests. These tests fall into many (potentially overlapping) categories: integration, end-to-end, system, acceptance, regression, and even performance. Since they are all typically black box, higher-layer tests do not necessarily need to be written in the same language as the product under test.

My Choices

Personally, I think Python and Java are today’s best languages for test automation. Python, in particular, is wonderful because its conciseness lets the programmer expressively capture the essence of the test case. Java has a rich platform of tools and packages, and continuous integration with Java is easy with Maven/Gradle/ANT and Jenkins. I’ve heard that Ruby is another good choice for reasons similar to Python, but I have not used it myself. JavaScript is good for pure web app testing (à la Protractor) but not so good for general purposes.

On the other hand, languages like C, C++, C#, and Perl are less suitable for test automation. C and C++ are very low-level and lack robust frameworks. Although C# as a language is similar to Java, it lives in the Microsoft bubble: .NET development tools are not as friendly or as free, and command line operations are painful. Perl simply does not provide the consistency and structure for scalable and self-documenting code. Purely functional languages like LISP and Haskell are also poor choices for test automation because they do not translate well from test case procedures. They may be useful, however, for some lower-level data testing.

8 Criteria for Evaluation

There are eight major points to consider when evaluating any language for automation. These criteria specifically assess the language from a perspective of purity and usability, not necessarily from a perspective of immediate project needs.

  1. Usability.  A good automation language is fairly high-level and should handle rote tasks like memory management. Lower learning curves are also preferable. Development speed is also important for deadlines.
  2. Elegance. The process of translating test case procedures into code must be easy and clear. Test code should also be concise and self-documenting for maintainability.
  3. Available Test Frameworks. Frameworks provide basic needs such as assertions, setup/cleanup, logging, and reporting. Examples include Cucumber and xUnit.
  4. Available Packages. It is better to use off-the-shelf packages for common operations, such as web drivers (Selenium), HTTP requests, and SSH.
  5. Powerful Command Line. A good CLI makes launching tests easy. This is critical for continuous integration, where tests cannot be launched manually.
  6. Easy Build Integration. Build automation should launch tests and report results. Difficult integration is a DevOps nightmare.
  7. IDE Support. Because Notepad and vim just don’t cut it for big projects.
  8. Industry Adoption. Support is good. If the language remains popular, then frameworks and packages will be maintained well.

Below, I rated each point for a few popular languages:

Python Java C# C/C++ Perl
Usability  awesome  good  good  terrible  poor
Elegance  awesome  good  good  poor  poor
Available Test Frameworks  awesome  awesome  good  okay  poor
Available Packages  awesome  awesome  good  good  good
Powerful Command Line  awesome  good  terrible  poor  okay
Easy Build Integration  good  awesome  poor  poor  poor
IDE Support  good  awesome  okay  okay  terrible
Industry Adoption  awesome  awesome  good  terrible  terrible

Conclusion

I won’t shy away from my preference for Python and Java, but I recognize that they may not be the right choice for all situations. For example, we use C# at my current job because our app is written in C# and management wants developers and QA to be on the same page.

Now, a truly nifty idea would be to create a domain-specific language for test automation, but that must be a topic for another post.

Should Gherkin Steps Use First-Person or Third-Person?

The Gherkin language allows the tester to write their own steps.  This is a blessing (for flexibility) and a curse (for poor grammar).  Although misspellings and out-of-place capitalization don’t affect the functionality of test scenarios, mixed point of view may cause ambiguity.  Consider the following two examples:

    Given I am at the Google search page
    When I search for “panda”
    Then I see web page links for “panda”
    Given the browser is at the Google search page
    When the user searches for “panda”
    Then web page links for “panda” are shown

Both scenarios do the same thing: they run a basic Google search.   However, the first one is written in first-person narrative, while the second one is written in third-person narrative.  What happens when we mix the steps together?

    Given I am at the Google search page
    When the user searches for “panda”
    Then I see web page links for “panda”

That scenario is confusing.  Am I the user, or is the user a different person?  Should there be a second browser for the user?  Why do I see what the user sees?  The English is poorly written due to the mixed point of view.

This may seem like a trivial example, but consider a project with multiple tests.  Gherkin scenarios will reuse steps.  Steps with different points of view will clash.  Therefore, all Gherkin scenarios for a project should use one point of view.

So, which point of view is better?  There is no definitively correct answer, but my strong conviction is that all Gherkin steps should use third-person perspective.  Third-person perspective is entirely generic and can expressively name any user or system component.  First-person semantically limits the expressive coverage of a step by forcing presumptions of who the speaker is.  For example, if “I” am a user, what profile or privileges do I have?  And are those attributes of who “I” am applicable when the step is used in other contexts?  It may be easier to write Gherkin scenarios in first-person perspective because it helps the author to frame himself or herself in the context of the user, but it makes the steps less reusable.  Even worse, first-person perspective can cause steps to be misunderstood.  As a workaround, scenarios could add an extra “Given” step to explicitly frame the context of the first person (such as, “Given I am an administrator, When …”), but this requires an extra step that would be unnecessary with third-person perspective.  Personally, I just don’t see the advantage to first-person point of view in Gherkin.  And I would definitely reject code reviews that mixed the point of view either way.

As techies, we can look to the humanities for one more reason to use third-person point of view in Gherkin. In middle school, in high school, and in college, every teacher emphasized time and time again that essays must be written in third-person perspective.  Every slip of “I think” and “I believe” and “you know” was dinged.  Why?  Third-person presents a more objective, more formal, and more powerful writing style.  Gherkin is meant to be expressive, so let’s write it like we mean it.

Gherkin Syntax Highlighting in Notepad++

Notepad++ is an excellent text editor for Windows. It is free, lightweight, feature-rich, and extendable. It can handle just about any programming language out there. I use it all the time, especially for config files and quick edits that don’t require a bulky IDE. Seriously, if you don’t have it, download it now. (Not a Windows user? Check out Gherkin Syntax Highlighting in Atom.)

One of the nifty features in Notepad++ is User Defined Language, which allows users to customize the syntax highlighting for any language. This is invaluable if you use an obscure language or even create your own. To access this feature in version 7.2.2, simply navigate to the Language menu option and choose Define your language…. From there, you can create new user language and set stylers for keywords, operators, and other language facets. Stylers can set font color, size, and style. Users can also import and export UDLs as XML files for sharing. Since the highlighting doesn’t rely upon a context-free grammar, it has its limits. For example, keywords may still be highlighted when not actually being used as keywords in the language. Nevertheless, it’s better than nothing.

Since I do a lot of behavior-driven test automation development, I created a UDL for Gherkin. You can download it from the Automation Panda Github repository – the file is named gherkin_npp_udl.xml. Import it into Notepad++, and you’re ready to go!

Below is a screen shot of an example feature file:

npp_gherkin

An example feature file using my Notepad++ UDL for Gherkin