automation

Why is Automation Full of Duplicate Code?

Copypasta
noun

A block of text that is duplicated repeatedly via "copy-paste",
    often causing annoyance or frustration.

One the the biggest problems (if not the biggest problem) I have seen in test automation is copypasta – the unnecessary duplication of code. It happens at all layers of testing. It happens in any type of project. It happens at companies big and small. And the consequences are stark: test development slows down, mistakes become more common, and maintenance becomes a nightmare. Although duplicate code can happen in any software project, it is especially prevalent in test automation. The reasons may or may not surprise you, but the solutions are clear.

4 Reasons Why Duplicate Code Pervades Automation

#1: Test cases are repetitive. For any given product, tests will share many of the same steps. For example, web app tests must all navigate to a start page at first, or API tests might cover a few variations for one call. Testing mechanics such as input parameters, setup/cleanup, logging, and assertions happen frequently. Put all of that together into test suites that have tens, hundreds, thousands, or even more test cases. It’s simply the nature of testing.

#2: Automation frameworks reinforce repetition. Most frameworks structure test cases as a class with methods (like JUnit) or as a collection of functions (like pytest), in which each method or function represents one test. Inherently, this basic structure is a good thing for making tests independent. However, lazy programmers may abuse the structure. Often, they put all test code inside these test methods, instead of extracting repetitive logic into helper methods or design patterns. Then, it becomes easier to simply duplicate an entire test case method and change a few things, rather than to implement a better overall design.

#3: Test code takes a backseat to product code. Business needs drive software development in the industry, and since test code is not part of the product delivered to customers, it is often deemed to be less important. Not as much devotion is given to developing good test code. Many best practices are abandoned for expediency.

#4: Testers often have weaker development skills. This is not a condemnation of testers, nor a universal labeling, but rather a distinction between disciplines: developers are developers because they are good at making software, and testers are testers because they are good at exercising software and finding bugs. Of course, I know plenty of testers who do indeed have strong dev skills. However, I also know solid testers who have limited programming experience. When automation responsibility falls upon testers with limited dev skills, poor development practices happen, and code duplication is typically rampant.

How to Avoid Duplicate Code in Automation

Code duplication is code cancer. -Andy

There are a number of ways to slay the copypasta monster. The first line of defense is to check yourself before you wreck yourself. Always question yourself when you copy-paste blocks of code. Why did you do that? What are you changing in the pasted copy? Should you abstract that logic into a method or a class that can be reused? Can you parameterize it? Override your Ctrl-C, Ctrl-V keyboard shortcut if necessary.

Be a good programmer. Develop packages for reusable actions. Things like assertions, logging, setup, and cleanup should be shared by all test cases. In that shared code, keep action calls short. Long method names with too many parameters inhibit usability. Remember that automated test cases should be self-documenting so that they read like test procedures. Whenever possible, make repetitive actions happen automatically. For example, make library methods do internal logging, and use test framework setup/cleanup routines. For another example, I once wrote code to automatically reconnect SSH sessions whenever they dropped. These auto-actions allow test case code to focus less on the low-level mechanics and more on the high-level features under test.

Finally, be a team player. Use the same development practices for test code as for product code. Automation is a product, and its customers are the team. Use coding standards, design patterns, and revision control. Most importantly, reinforce good practices through code review. Use the review process as a constructive way to learn new tricks and even to mentor less experienced team members. Finally, divide testing roles between test formulation, test case automation, and test framework development. “QA” (quality assurance) is a wide discipline, and not everyone is equally skilled. Let people do what they do best. There is strength in diversity and in teamwork.

The Best Programming Language for Test Automation

Which programming languages are best for writing test automation? There are several choices – just look at this list on Wikipedia and this cool decision graphs for choosing languages. While this topic can quickly devolve into a spat over personal tastes, I do believe there are objective reasons for why some languages are better for automating test cases than others.

Dividing Test Layers

First of all, unit tests should always be written in the same language as the product under test. Otherwise, they would definitionally no longer be unit tests! Unit tests are white box and need direct access to the product source code. This allows them to cover functions, methods, and classes.

The question at hand pertains more to higher-layer functional tests. These tests fall into many (potentially overlapping) categories: integration, end-to-end, system, acceptance, regression, and even performance. Since they are all typically black box, higher-layer tests do not necessarily need to be written in the same language as the product under test.

My Opinionated Choices

Personally, I think Python is today’s best all-around language for test automation. Python is wonderful because its conciseness lets the programmer expressively capture the essence of the test case. It also has very rich test support packages. Check out this article: Why Python is Great for Test Automation. Java is a good choice as well – it has a rich platform of tools and packages, and continuous integration with Java is easy with Maven/Gradle/ANT and Jenkins. I’ve heard that Ruby is another good choice for reasons similar to Python, but I have not used it myself.

Some languages are good in specific domains. For example, JavaScript is great for pure web app testing (à la Jasmine, Karma, and Protractor) but not so good for general purposes (despite Node.js running anywhere). A good reason to use JavaScript for testing would be MEAN stack development. TypeScript would be even better because it is safer and scales better. C# is great for Microsoft shops and has great test support, but it lives in the Microsoft bubble. .NET development tools are not always free, and command line operations can be painful.

Other languages are poor choices for test automation. While they could be used for automation, they likely should not be used. C and C++ are inconvenient because they are very low-level and lack robust frameworks. Perl is dangerous because it simply does not provide the consistency and structure for scalable, self-documenting code. Functional languages like LISP and Haskell are difficult because they do not translate well from test case procedures. They may be useful, however, for some lower-level data testing.

8 Criteria for Evaluation

There are eight major points to consider when evaluating any language for automation. These criteria specifically assess the language from a perspective of purity and usability, not necessarily from a perspective of immediate project needs.

Usability. A good automation language is fairly high-level and should handle rote tasks like memory management. Lower learning curves are preferable. Development speed is also important for deadlines.
Elegance. The process of translating test case procedures into code must be easy and clear. Test code should also be concise and self-documenting for maintainability.
Available Test Frameworks. Frameworks provide basic needs such as fixtures, setup/cleanup, logging, and reporting. Examples include Cucumber and xUnit.
Available Packages. It is better to use off-the-shelf packages for common operations, such as web drivers (Selenium), HTTP requests, and SSH.
Powerful Command Line. A good CLI makes launching tests easy. This is critical for continuous integration, where tests cannot be launched manually.
Easy Build Integration. Build automation should launch tests and report results. Difficult integration is a DevOps nightmare.
IDE Support. Because Notepad and vim just don’t cut it for big projects.
Industry Adoption. Support is good. If the language remains popular, then frameworks and packages will be maintained well.

Below, I rated each point for a few popular languages:

	Python	Java	JavaScript	C#	C/C++	Perl
Usability	awesome	good	good	good	terrible	poor
Elegance	awesome	good	okay	good	poor	poor
Available Test Frameworks	awesome	awesome	awesome	good	okay	poor
Available Packages	awesome	awesome	okay	good	good	good
Powerful Command Line	awesome	good	good	okay	poor	okay
Easy Build Integration	good	good	good	good	poor	poor
IDE Support	good	awesome	good	good	okay	terrible
Industry Adoption	awesome	awesome	awesome	good	terrible	terrible

Conclusion

I won’t shy away from my preference for Python, but I recognize that they may not be the right choice for all situations. For example, when I worked at LexisNexis, we used C# because management wanted developers, who wrote the app in C#, to contribute to test automation.

Now, a truly nifty idea would be to create a domain-specific language for test automation, but that must be a topic for another post.

UPDATE: I changed some recommendations on 4/18/2018.

Should Gherkin Steps Use First-Person or Third-Person?

The Gherkin language allows the tester to write their own steps. This is a blessing (for flexibility) and a curse (for poor grammar). Although misspellings and out-of-place capitalization don’t affect the functionality of test scenarios, mixed point of view may cause ambiguity. Consider the following two examples:

    Given I am at the Google search page
    When I search for “panda”
    Then I see web page links for “panda”

    Given the browser is at the Google search page
    When the user searches for “panda”
    Then web page links for “panda” are shown

Both scenarios do the same thing: they run a basic Google search. However, the first one is written in first-person narrative, while the second one is written in third-person narrative. What happens when we mix the steps together?

    Given I am at the Google search page
    When the user searches for “panda”
    Then I see web page links for “panda”

That scenario is confusing. Am I the user, or is the user a different person? Should there be a second browser for the user? Why do I see what the user sees? The English is poorly written due to the mixed point of view.

This may seem like a trivial example, but consider a project with multiple tests. Gherkin scenarios will reuse steps. Steps with different points of view will clash. Therefore, all Gherkin scenarios for a project should use one point of view.

So, which point of view is better? There is no definitively correct answer, but my strong conviction is that all Gherkin steps should use third-person perspective. Third-person perspective is entirely generic and can expressively name any user or system component. First-person semantically limits the expressive coverage of a step by forcing presumptions of who the speaker is. For example, if “I” am a user, what profile or privileges do I have? And are those attributes of who “I” am applicable when the step is used in other contexts? It may be easier to write Gherkin scenarios in first-person perspective because it helps the author to frame himself or herself in the context of the user, but it makes the steps less reusable. Even worse, first-person perspective can cause steps to be misunderstood. As a workaround, scenarios could add an extra “Given” step to explicitly frame the context of the first person (such as, “Given I am an administrator, When …”), but this requires an extra step that would be unnecessary with third-person perspective. Personally, I just don’t see the advantage to first-person point of view in Gherkin. And I would definitely reject code reviews that mixed the point of view either way.

As techies, we can look to the humanities for one more reason to use third-person point of view in Gherkin. In middle school, in high school, and in college, every teacher emphasized time and time again that essays must be written in third-person perspective. Every slip of “I think” and “I believe” and “you know” was dinged. Why? Third-person presents a more objective, more formal, and more powerful writing style. Gherkin is meant to be expressive, so let’s write it like we mean it.

Gherkin Syntax Highlighting in Notepad++

Notepad++ is an excellent text editor for Windows. It is free, lightweight, feature-rich, and extendable. It can handle just about any programming language out there. I use it all the time, especially for config files and quick edits that don’t require a bulky IDE. Seriously, if you don’t have it, download it now. (Not a Windows user? Check out Gherkin Syntax Highlighting in Atom.)

One of the nifty features in Notepad++ is User Defined Language, which allows users to customize the syntax highlighting for any language. This is invaluable if you use an obscure language or even create your own. To access this feature, simply navigate to the Language menu option, go to User Defined Language near the bottom, and choose Define your language…. From there, you can create new user language and set stylers for keywords, operators, and other language facets. Stylers can set font color, size, and style. Users can also import and export UDLs as XML files for sharing. Since the highlighting doesn’t rely upon a context-free grammar, it has its limits. For example, keywords may still be highlighted when not actually being used as keywords in the language. Nevertheless, it’s better than nothing.

Since I do a lot of behavior-driven test automation development, I created a UDL for Gherkin. You can download it from the Automation Panda Github repository – the file is named gherkin_npp_udl.xml. Import it into Notepad++ through the User Defined Language window, and you’re ready to go! If you download my UDL file from GitHub, make sure to download it as a raw XML file.

Below is a screen shot of an example feature file:

npp_gherkin — An example feature file using my Notepad++ UDL for Gherkin

(Note: These instructions are based on NotePad++ 7.9.1.)

10 Things You Lose Without Automation

Automation has a lot of potential to improve software development. Unfortunately, though, automation is often seen as a luxury. Deadlines in the real word are unforgiving, and since test code isn’t product code, automation tasks are given lower priority and dunked into the black hole of the backlog. Some might argue that this is okay because it is lean or because a new project is just getting started. Once, I even heard it quipped that the first ones cut during a layoff are the automation folks. And it is true that automation requires a nontrivial resource investment.

However, I want to turn the tables. Instead of thinking about automation in terms of the opportunity, think about automation in terms of the opportunity cost. What happens if you don’t automate your tests from the get-go? There are 10 major things you lose:

#1: Man Hours

Automated tests will automatically run. Manual tests must be manually run. That’s ontological. If you only run a test one time, then automation has no return-on-investment. But if you run a test more than once, automation saves a tester from repeating themselves. Plus, it’s easy: push the button and wait for results. Automated tests almost always run faster than manual tests, too. Considering that time is money and engineer salaries aren’t cheap, man hours are a clear opportunity cost.

#2: Coverage

Automated tests can achieve greater coverage than manual tests, particularly for regression testing. As product development progresses, the sheer number of test cases increases. For example, in Agile, new tests will be created every sprint. Older tests must be run periodically to verify that new features don’t break existing features. If regression tests are manual, then testers must burn hours grinding through the same tests repeatedly. Often, for expediency, this means that they skip some tests – not in the sense of being lazy, but rather as part of a risk-based approach. Weaker coverage plus risk of missing bugs are accepted for the sake of shorter testing time. If those regression tests were automated, then there would be no reason to shrink coverage, because they would be easy to run.

#3: Consistency

People make mistakes. It’s human nature – nobody’s perfect. And manual tests are prone to human error because humans run them. I remember how nervous I felt running manual on-call system checks at MaxPoint for the first time, afraid that I would miss a problem that could bring down a million-dollar bidding system. Automated scripts run the same way every time.

#4: Protection

Continuous integration (CI) protects code against defects by building and testing every code change in real time. A CI system will automatically trigger tests all the time.Tests not running in CI (like manual tests) are effectively dead. At NetApp, failing code changes would immediately be kicked out of the code line, making automated tests act like a vaccine against bugs. On the other hand, I remember a project at MaxPoint that was riddled with bugs and perpetually delayed. When I asked the developers to see their unit tests, they said they never wrote unit tests because “it wasn’t a requirement.”

#5: Delivery Time

Continuous delivery (CD) is the natural extension of continuous integration, in which software products can automatically be delivered (and potentially even deployed) as the final step in a CI pipeline. This is how big companies like Google, Facebook, and Netflix can deliver so rapidly. No automation means no CD.

#6: Results and Metrics

Non-engineers (managers, product owners, scrum masters, oh my!) love to ask questions about tests. “Are we red or green?” “How many tests do we have for this feature?” “What’s our coverage?” “How often do we run the tests?” Automated tests simply yield more accurate and more comprehensive results. Automation can also generate test reports, so engineers don’t need to waste time drafting emails or updating wiki pages.

#7: Accountability

Numbers don’t lie. Scripts don’t lie. Engineers typically don’t lie, but… results from manual tests can have a fudge factor, or a mistake in reporting, or any other sort of inconsistency. Inaccurate results may lead to poor business decisions. Automated results tell it like it is.

#8: Creativity

Manual testing can devolve into repetitive, menial labor: just follow steps 1-10 again and again and again. It would be much more effective for manual testers to focus on exploratory testing rather than deterministic testing. While automated tests can cover the fixed, repetitive test scenarios, exploratory testing lets testers find creative ways to uncover defects and judge how well a product actually works. Lack of automation ties up human capital.

#9: Peace of Mind

Are you sure that your product is “good”? Can you run enough tests to make sure? I learned the value of peace of mind while I was still in college. In my compiler theory course, I had to develop a simple programming language and build a compiler for it. Every week, we had to add new language features: arithmetic, strings, arrays, functions, etc. And every week, I wrote a slew of mini-programs to test grammar updates to my new language. By the time the project was complete, I had 1000+ automated test cases running through JUnit with 100% coverage, and the entire suite took a mere few minutes to run. And there were many late nights when the tests caught bugs in my language right away before committing code. There was no way I could have passed that class without my automated tests.

#10: Quality

The ultimate purpose of test automation is product quality. Having automation doesn’t necessarily mean product quality is good, but not having automation severely limits how quality can be pursued. Anecdotally, I’ve seen much better code quality come out of projects that have good test automation than ones without it. If I were a product owner, I know what I would want.

My Choice to Automate

Why am I an automation engineer? The answer became starkly clear to me one Tuesday morning that should have been like any other. At that time, I was a “Software Quality Engineer” on the QA Framework team of a fairly small software company. I arrived at the office at my normal time around 9am. The following two hours I spent in my daily groove with my headphones on, blissfully unaware as our director started escorting engineers out the door with boxes in hand. After an interrupted 11am standup, I found myself, heart racing, in a conference room with my manager and the VP of engineering who had flown in from our other site. On the table, I saw a pile of key fobs and bejeweled company phones. By this point, anyone who knows the software industry could predict what would happen next: the dreaded layoff. However, that’s where the story becomes more interesting.

I wasn’t laid off. Surprise!

In a daring reorganization, the VP eliminated all QA positions in the company. The justification given was to make teams purely Agile with no silos of division between development and quality. Everybody would own quality. Manual testers would be removed in order to prioritize automation. I also suspected that company finances may have encouraged the re-org. As a result, people lost their jobs. However, a few QA engineers, myself included, were spared the layoff and rebranded as regular “software engineers.” In that conference room, as the adrenaline rush subsided, the VP outlined a high-level plan in which I would move to a product development team that desperately needed automation help. Apparently, this team didn’t even have unit tests, and their product was weeks behind schedule. Once they got it going, I would become a regular web developer with the others.

So, why me? The survivor’s guilt bothered me, but the answer, apart from God’s grace, was pretty obvious: automation skill. I was the company’s automation champion, and everybody knew it. I consulted with each team to help improve their testing efforts. I gave a lunch-and-learn on behavior-driven development. I wrote several wiki pages and even example projects. Now, that’s not to brag on myself or belittle the skills of others, since many of the engineers who were cut also had automation talent, but my skills had been made highly visible to the people in power. And my skills continued to afford me great opportunity.

However, the last part of the VP’s new plan disconcerted me – did I want to become a standard web UI developer? I pondered the question deeply, but my gut answer never changed: no. Test automation was my specialty. It was the product I made, and I made it well. I knew the ins and outs, the frameworks, the best practices, the fundamental problem of test automation and the lowercase problems that derive. And I loved doing automation because it gave me the opportunity to set things right: to find an eliminate bugs, to protect the code line, and to make it look damn good. I could indeed switch over to web dev, but why? I didn’t have interest in web dev. That would be like making a hardwood floor installer do painting instead. With automation, I had both a specialty and an opportunity.

To conclude the story, I didn’t stick around much longer at that company. I quickly found a better opportunity as a senior-level automation engineer on an exciting new project at a different company. That’s the software industry – so it goes! Ironically, that layoff may have impacted me just as much as if I had actually lost my job. It forced me to make a choice. I definitively chose to be a software quality engineer. Automation was no longer merely a skill set but a career identity.