testing

5 Things I Love About SpecFlow

SpecFlow, a.k.a. “Cucumber for .NET,” is a leading BDD test automation framework for .NET. Created by Gáspár Nagy and maintained as a free, open source project on GitHub by TechTalk, SpecFlow presently has almost 3 million total NuGet downloads. I’ve used it myself at a few companies, and, I must say as an automationeer, it’s awesome! SpecFlow shares a lot in common with other Cucumber frameworks like Cucumber-JVM, but it is not a knockoff – it excels in many ways. Below are five features I love about SpecFlow.

#1: Declarative Specification by Example

SpecFlow is a behavior-driven test framework. Test cases are written as Given-When-Then scenarios in Gherkin “.feature” files. For example, imagine testing a cucumber basket:

Feature: Cucumber Basket
  As a gardener,
  I want to carry many cucumbers in a basket,
  So that I don’t drop them all.
  
  @cucumber-basket
  Scenario: Fill an empty basket with cucumbers
    Given the basket is empty
    When "10" cucumbers are added to the basket
    Then the basket is full

Notice a few things:

It is declarative in that steps indicate what should be done at a high level.
It is concise in that a full test case is only a few lines long.
It is meaningful in that the coverage and purpose of the test are intuitively obvious.
It is focused in that the scenario covers only one main behavior.

Gherkin makes it easy to specify behaviors by example. That way, everybody can understand what is happening. C# code will implement each step in lower layers. Even if your team doesn’t do the full-blown BDD process, using a BDD framework like SpecFlow is still great for test automation. Test code naturally abstracts into separate layers, and steps are reusable, too!

#2: Context is King

Safely sharing data (e.g., “context”) between steps is a big challenge in BDD test frameworks. Using static variables is a simple yet terrible solution – any class can access them, but they create collisions for parallel test runs. SpecFlow provides much better patterns for sharing context.

Context injection is SpecFlow’s simple yet powerful mechanism for inversion of control (using BoDi). Any POCOs can be injected into any step definition class, either using default values or using a specific initialization, by declaring the POCO as a step def constructor argument. Those instances will also be shared instances, meaning steps across different classes can share the same objects! For example, steps for Web tests will all need a reference to the scenario’s one WebDriver instance. The context-injected objects are also created fresh for each scenario to protect test case independence.

Another powerful context mechanism is ScenarioContext. Every scenario has a unique context: title, tags, feature, and errors. Arbitrary objects can also be stored in the context object like a Dictionary, which is a simple way to pass data between steps without constructor-level context injection. Step definition classes can access the current scenario context using the static ScenarioContext.Current variable, but a better, thread-safe pattern is to make all step def classes extend the Steps class and simply reference the ScenarioContext instance variable.

#3: Hooks for Any Occasion

Hooks are special methods that insert extra logic at critical points of execution. For example, WebDriver cleanup should happen after a Web test scenario completes, no matter the result. If the cleanup routine were put into a Then step, then it would not be executed if the scenario had a failure in a When step. Hooks are reminiscent of Aspect-Oriented Programming.

Most BDD frameworks have some sort of hooks, but SpecFlow stands out for its hook richness. Hooks can be applied before and after steps, scenario blocks, scenarios, features, and even around the whole test run. (Cucumber-JVM, by contrast, does not support global hooks.) Hooks can be selectively applied using tags, and they can be assigned an order if a project has multiple hooks of the same type. Hook methods will also be picked up from any step definition class. SpecFlow hooks are just awesome!

#4: Thorough Outline Templating

Scenario Outlines are a standard part of Gherkin syntax. They’re very useful for templating scenarios with multiple input combinations. Consider the cucumber basket again:

Feature: Cucumber Basket
  
  Scenario Outline: Add cucumbers to the basket
    Given the basket has "<initial>" cucumbers
    When "<some>" cucumbers are added to the basket
    Then the basket has "<total>" cucumbers

    Examples: Counts
      | initial | some | total |
      | 1       | 2    | 3     |
      | 5       | 3    | 8     |

All BDD frameworks can parametrize step inputs (shown in double quotes). However, SpecFlow can also parametrize the non-input parts of a step!

Feature: Cucumber Basket
  
  Scenario Outline: Use the cucumber basket
    Given the basket has "<initial>" cucumbers
    When "<some>" cucumbers are <handled-with> the basket
    Then the basket has "<total>" cucumbers

    Examples: Counts
      | initial | some | handled-with | total |
      | 1       | 2    | added to     | 3     |
      | 5       | 3    | removed from | 2     |

The step definitions for the add and remove steps are separate. The step text for the action is parametrized, even though it is not a step input:

[When(@"""(\d+)"" cucumbers are added to the basket")]
public void WhenCucumbersAreAddedToTheBasket(int count) { /* */ }

[When(@"""(\d+)"" cucumbers are removed from the basket")]
public void WhenCucumbersAreRemovedFromTheBasket(int count) { /* */ }

That’s cool!

#5: Test Thread Affinity

SpecFlow can use any unit test runner (like MsTest, NUnit, and xUnit.net), but TechTalk provides the official SpecFlow+ Runner for a licensed fee. I’m not associated with TechTalk in any way, but the SpecFlow+ Runner is worth the cost for enterprise-level projects. It has a friendly command line, a profile file to customize execution, parallel execution, and nice integrations.

The major differentiator, in my opinion, is its test thread affinity feature. When running tests in parallel, the major challenge is avoiding collisions. Test thread affinity is a simple yet powerful way to control which tests run on which threads. For example, consider testing a website with user accounts. No two tests should use the same user at the same time, for fear of collision. Scenarios can be tagged for different users, and each thread can have the affinity to run scenarios for a unique user. Some sort of parallel isolation management like test thread affinity is absolutely necessary for test automation at scale. Given that the SpecFlow+ Runner can handle up to 64 threads (according to TechTalk), massive scale-up is possible.

But Wait, There’s More!

SpecFlow is an all-around great test automation framework, whether or not your team is doing full BDD. Feel free to add comments below about other features you love (or *gasp* hate) about SpecFlow!

Quality Metrics 101: Test Quality

New to the series? Start from the beginning!

Test quality metrics make sure that testing efforts are worthwhile. Though “testing” and “quality” may be synonymous as organizational titles, testing is only one method of enforcing quality. In software, it just happens to be the most effective one. Testing is expensive, though, because it slows down time-to-market. Some people even devalue testing work because it doesn’t add new features to a product. Below are aspects of test quality to consider measuring to prove and even increase the value of testing efforts.

Coverage

Quality Aspect

How much functionality is covered by tests?

Desired State

High – More coverage means less risk. Note that 100% complete coverage is impossible.

Metrics

Coverage may be measured for both manual and automated tests. However, automated test coverage is usually more important because automated tests are meant to be defensive without gaps.

Code Coverage – Code coverage tools check what paths of code are actually exercised by automated tests. While they cannot tell if tests are good or bad, they are great for exposing gaps in coverage. Unit test code coverage is easy because most frameworks have plugins, but above-unit code coverage requires instrumented builds. Look for tools that track more than just lines of code. Target 90%+ coverage. Add new tests to cover any major gaps.

Feature Coverage – Feature coverage is a manual way to score features on test coverage based on planning and review. For this metric to be successful, a team must consistently specify features well; otherwise, this metric will give useless data. Gherkin scenarios a great way to do this – for example, each scenario can be marked as untested, manual, or automated. Feature coverage is unscientific, but it can give a better picture of functionalities actually covered (instead of just the raw lines of code covered).

Automation Debt – Technical debt increases when tests are not automated and thus lack coverage. Teams are often unable to automate all tests originally planned, and test automation is frequently jettisoned from the Definition of Done. Or, a project may not start automating tests until a large chunk of the project is already complete. The best way to track automation debt is to create a backlog for incomplete automation work. Backlog tasks can be sized, prioritized, and planned according to whatever development process is used (Scrum, Kanban, etc.). Appropriate process metrics can then be used to understand the magnitude of the work and, thus, the lack of automated test coverage.

Warning: Test case count, test length, and test code line count are terrible metrics for coverage because they encourage largeness rather than uniqueness. The goal of testing is to have the greatest coverage with the lowest risk for the least work. Anybody can blindly write tests or variations that add no meaningful value.

Reliability

Quality Aspect

Do automated tests consistently reach completion? And how trustworthy are the results?

Desired State

High – Reliability means less time for failure triage or (horrors) reruns.

Metrics

Failure Reasons – Track the failure reason for each test case run. Ideally, tests should fail only when they discover product bugs. However, tests may also fail when:

an acceptable product change caused an automation error because tests were not updated, indicating poor communication or careless updates
an environmental change or interruption caused an automation error, indicating deployment or sysadmin problems
the automation code itself has a bug

Remember, “successful” test runs either pass with appropriate coverage or fail due to product bugs. “Unsuccessful” test runs fail or crash for reasons other than product bugs. Aim to minimize unsuccessful test runs. Never hack a test just to get it passing – always work to fix the problems behind test failures.

Speed

Quality Aspect

How much time do test runs take?

Desired State

Fast – Tests should complete in the shortest time possible.

Metrics

Test Case Execution Time – Test case execution times indicate the efficiency of the automation code. Track the start-to-end execution time for every individual test case run. Then, analyze the data using common sense. For example, outliers may be inefficient tests that need tuning or should be removed altogether. It may be wise to separate test runs by result type or coverage area. Historical data can also be used as a baseline to determine performance impacts when making cross-cutting automation changes.

Test Suite Execution Time – Test suites are sets of test cases, but their execution times are not merely the sum of their tests’ times. A test suite run may include environmental setup, deployment, parallel execution, reporting, and other things. The purpose of tracking test suite execution time is to determine the start-to-end time of the suite in total, because that indicates the speed of feedback and, in CI, delivery. Tracking test suite execution time will also reveal the effect of adding more test cases to the suite, which then factors into the risk-based decisions of including or excluding tests.

Test Pyramid Balance – The Test Pyramid separates tests between unit (bottom), integration (middle), and end-to-end (top) layers. Ideally, there should be more tests at the bottom than at the top. Why? Higher-level tests are more expensive – they take more time to develop, they are more time consuming to triage, and they have slower execution times. Consider the “Rule of 1’s”: a unit test takes ~1ms, an integration test takes ~1s, and an end-to-end test takes ~1m. When scaled to thousands of tests with continuous integration, end-to-end tests simply take too much time. Tracking the proportion of tests at each layer will give a rough picture of the balance. There’s no perfect ratio between layers, but make sure that the tests form a pyramid and not a cupcake, hourglass, or ice cream cone. Rebalance test efforts as appropriate.

Return on Investment

Quality Aspect

Do the tests add greater value than their cost?

Desired State

High – Tests need to be worth the effort. Don’t test for the sake of testing!

Metrics

Measuring return on investment in terms of hard dollars is objectively impossible. The true cost of bugs can never be fully known: if a bug is caught early, the potential cost to fix it later can merely be estimated. The intangible value of protecting brand reputation may be more important than the tangible value of money saved by finding specific bugs. Better quality practices might prevent developers from causing bugs that would have otherwise happened – and there’s no good way to measure that.

Instead, return on investment is better measured by a collection of metrics that validate both code line protection and defect discovery. Use a weighted scorecard to get a more holistic view of ROI. Scorecards can be used with estimates for planning tests, as well as plugged in with actual values to measure the degree of success. Note that some aspects of ROI may be too difficult to measure accurately – in those cases, a LOW-MID-HIGH grading scale may be best. Others may seem like micromanagement.

Priority – Assign each test a priority for its coverage importance. Core functionalities should have the highest priority, while fringe functionalities should have the lowest priority. Focus on high-priority tests. Another way to look at importance is risk, or the chances that bugs will escape if explicit testing for a feature is not done.
Test Execution Frequency – Track how many times tests are actually run. Higher frequency is better. Tests that are rarely run should either be included in more regular runs or removed/archived. This could easily be tracked by a test management tool or database.
Coverage Uniqueness – Duplicate test coverage wastes resources. Unfortunately, this one is difficult to measure. Tools for code coverage or static analysis might help. Manual review, however, is typically a better approach.
Development Cost and Maintenance Cost – Track how much effort it takes to make and keep tests, including man-hours and resources. Lower costs are better, of course. Planning tools may help with this.
Bug Discovery – Track bugs discovered in terms of severity and when and how they were caught. Ideally, the number of bugs caught by customers after a release (meaning, not caught by tests during development) should be minimal, and their severity should be low. Bug tracking tools should easily provide this data. Be warned, though, that the raw bug count is a poor metric. Consider this question: Is a high bug count good or bad? Trick question – during a release, it indicates good test quality but poor product quality; after a release, it indicates all-around poor quality. What matters is that a minimal number of bugs happen at all, and that most of those bugs are caught and fixed before a release. Plus, keep in mind that bugs happen by accident. Finally, focusing exclusively on bug count to determine test value ignores the positive side of testing – that passing tests give confidence that features work correctly.

Quality Metrics 101: The Good, The Bad, and The Ugly

metric – [me-trik] – (noun) a standard for measuring or evaluating something

(Courtesy of dictionary.com)

When developing software, metrics can be a good way to track progress and evaluate quality. Managers typically love them because they provide insights that could otherwise be hard to see. Come on, who doesn’t love pretty charts with rainbow colors? However, gathering metrics is not easy, especially for quality. Some metrics are downright useless, and others encourage bad behavior when used improperly. It is far more important to focus on the most important aspects of quality than to blindly promulgate numbers. This article will cover quality metrics in depth, giving guidance on what quality aspects matter most and how they can be measured.

What are Quality Metrics?

Quality is the degree of a feature’s excellence. Quality metrics attempt to impartially measure a feature’s excellence. The word “attempt” is notable – quality is inherently relative, and metrics can sometimes be subjective. Take pizza as an example: How would the quality of a pizza be measured? One method could be to analyze the freshness and nutritious value of the ingredients, but, Pizza Hut notoriously fought Papa John’s Pizza over the assertion that better ingredients make better pizza. Another method could be to analyze the cooking process, like bake time or the order of toppings, but that would be better for identifying carelessness than quality. The delivery process could also be considered, like Domino’s delivery robots, but that evaluates customer service and not the pizza itself. Ultimately, what matters are the taste and the visual appearance, which are totally subjective to the consumer. Surveys are unreliable. Taste tests have limited selection. Appearance is an art, not a science. Each of these metrics gives a glimpse into quality but does not fully reveal what actually makes a “good” pizza. Together, though, they provide a reasonable picture when the desired metrics are gathered well.

tony_pepperoni-rochester-ny-pizza-coupon

Is that really high quality pizza? Well, what aspects of quality are we measuring? We won’t get a perfect picture of quality from metrics, but we can get a rough idea. Software quality metrics work the same way.

Software Quality

In software, there are three primary types of quality metrics:

Test Quality
- How effective are tests at enforcing high quality standards?
- Examples: code coverage, test failure reasons.
Process Quality
- How effective are processes at delivering good features?
- Examples: time to fix broken builds, time to discover bugs.
Product Quality
- How good is the software product?
- Examples: test failure rate, up-time, customer satisfaction.

The main purpose of software quality metrics is to validate successes and find areas for improvement in the development process. Metrics expose problems like gaps in coverage or slow feedback loops so that a team knows what to improve. They are meant to be informative but not punitive – they should simply report accurate data. Don’t shoot the messenger! For example, if the test failure rate is high, fix the bugs instead of blaming each other.

However, be warned by W. Edwards Deming‘s red bead experiment: Quality cannot be inspected into a product – it must be built in from the beginning! Metrics alone cannot solve problems – they can merely expose them. It is up to the development team to affect the proper change based on what metrics reveal. Awareness is useless without action. And action should ultimately lead to better features, faster delivery, and higher profits.

Choosing Quality Metrics

Metrics are nothing but tools to improve aspects of quality. Not every job needs the full toolbox! Always pick the quality aspect first, and then find the right measuring stick. Don’t just pick some metrics that others say are good. For example, if build stability is the quality aspect that is deemed important (and it should be), then the metric to track it could be the average time to fix a build after it is broken.

The best process for choosing quality metrics is:

Identify a quality aspect that adds value.
Decide if the aspect is worth measuring.
Determine the desired state for that aspect.
Derive the best way to measure progress toward the desired state impartially.
Implement the metric gathering, storage, and analysis.
Revisit the metric periodically to assert its value.
Stop gathering the metric when it ceases to provide value.

Keep in mind that metrics have a cost: they must be gathered, stored, and analyzed. That’s why it’s important to pick the quality aspects that matter most.

This Series

The articles in this series will cover each of the quality metric types in detail. Each will list major quality aspects with meaningful metrics to track them and advice on how to use them. Remember, metrics should be constructive and not destructive.

Are Gherkin Scenarios with Multiple When-Then Pairs Okay?

Don’t know about Behavior-Driven Development or Gherkin? Start here!

Writing Gherkin is easy, but writing good Gherkin is hard. My post BDD 101: Writing Good Gherkin covers many aspects of good behavior specification, including titles, phrasing, and data. One of the major points I make anytime I discuss good Gherkin is what I call the “Cardinal Rule of BDD.”

The Cardinal Rule of BDD: One Scenario, One Behavior!

A behavior scenario specification should focus on one individual behavior. This is the essence of the BDD mindset – a product’s features can be specified in terms of its behaviors, and the specs should be written as examples of those behaviors in action. Identifying individual behaviors brings clarity to design, development, and testing. Combining behaviors into a single scenario causes ambiguity, miscommunication, and test gaps. Test failure triage also becomes more difficult and time consuming because the root causes for failures are less clear – the culprit could be one of multiple behaviors. There is also a high risk of duplication when scenarios repeat the same sequence of steps instead of isolating behaviors.

One of the dead giveaways to violations of the Cardinal Rule of BDD is when a Gherkin scenario has multiple When-Then pairs, like this:

Feature: Google Searching

  Scenario: Google Image search shows pictures
    Given the user opens a web browser
    And the user navigates to "https://www.google.com/"
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

A When-Then pair denotes a unique behavior. In this example, the behaviors of performing a search and changing the search to images could and should clearly be separated into two scenarios, like this:

Feature: Google Searching

  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  Scenario: Image search
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

Despite being so central to BDD philosophy, the Cardinal Rule is the one thing people always try to sidestep. Nobody ever doubts the usefulness of step parameters or the need for good grammar, but people frequently show me scenarios with multiple When-Then pairs and basically ask for an exception from the rule. My gut reaction is always, “NO! Rules don’t change.”

However…

I must first admit that the Cardinal Rule of BDD is “opinionated” – it is the way that I have found BDD to work best for collaboration and automation. Adherence forces people to adopt a behavior-driven mindset, and strictness keeps feature and test quality high. Other experts are more permissive of multiple When-Then pairs, though. Most examples I could find from leading sources such as The Cucumber Book exhibit strict Given-When-Then order for Gherkin scenarios, but other sources such as the online JBehave documentation show scenarios with multiple When-Then pairs boldly on the front page.

I must also begrudgingly admit that there are times when it is simply more convenient for a single scenario to have multiple behaviors (and thus multiple When-Then pairs). This is by no means a best practice but rather a pragmatic alternative for specification dilemmas. (See Purist vs. Pragmatist.) Below are situations in which multiple When-Then pairs may be acceptable.

Lengthy End-to-End Scenarios

End-to-end tests verify execution paths through a live system with all of its parts. Web UI tests frequently fall into this category: Selenium WebDriver interacts with a page in a browser, which then triggers calls to a backend service layer or database. Despite the name, end-to-end tests may still focus on one individual behavior. The example scenarios above, though short, technically count as end-to-end tests.

However, many people use the term “end-to-end” to refer to tests that cover sequences of behaviors. Such a scenario could violate the Cardinal Rule of BDD if it is not handled carefully. My article BDD 101: Unit, Integration, and End-to-End Tests gives strategies for handling lengthy end-to-end scenarios. One strategy is to simply turn a blind eye to multiple When-Then pairs. Ideally, each behavior would already have its own individual scenario, but then a new scenario would explicitly combine the behaviors together to get that full, end-to-end path. The new scenario would be easy to write because the steps could be reused. This isn’t the only strategy, so please be sure to consider the others before writing the tests.

Audits

Software system audits frequently require lengthy end-to-end scenarios. They are quite common in highly-regulated domains. For example, a bank may need to prove that a loan is prepared correctly or that a transaction puts money into the right accounts. Auditors typically require tests to run through entire system paths (e.g., multiple behaviors) using the same records, such as one loan application or one payment. Auditees must not only provide test results for past runs but must also repeat tests on demand. Separating each individual behavior into its own scenario makes each test independent, so during test execution, there will be no guaranteed order and no shared test data, and auditors would not have the end-to-end verification that they require. The simplest way to give the auditors what they need is to write one lengthy scenario with multiple When-Then pairs.

Service Calls

Service call testing is another case for which multiple When-Then pairs may be pragmatically justified. REST, SOAP, and WSDL are examples of service call types. Service layer development is more engineering-centric than business-centric, but many teams nevertheless choose to test service calls with Gherkin-based frameworks like Cucumber. Due to the programmatic nature of services, Gherkin scenarios for service calls tend to be quite imperative: specify a request, make the call, and verify parts of the response. This isn’t so bad for independent service calls, but it becomes a problematic when one request needs another call’s response.

One solution is the classic “pure” scenario split: put any necessary setup, including initial requests to get required response parts, into custom Given steps. This abides by the Cardinal Rule and avoids duplicate When-Then pairs. But, it introduces an unsavory form of code duplication. Many service calls end up being written twice: once as a Gherkin scenario for testing, and once in the underlying automation code to be called by Given steps. This violates the DRY principle.

The alternative “pragmatic” solution is to write scenarios that specify multiple service calls in the Gherkin steps. The Karate project advocates this approach, as shown in their “Hello World” example:

	Feature: karate 'hello world' example
	Scenario: create and retrieve a cat

	Given url 'http://myhost.com/v1/cats'
	And request { name: 'Billie' }
	When method post
	Then status 201
	And match response == { id: '#notnull', name: 'Billie' }

	Given path response.id
	When method get
	Then status 200

view raw

hello-world.feature

hosted with ❤ by GitHub

Take Caution!

There may be other cases when When-Then repetition is useful. Feel free to leave suggestions in the comments below. My examples are meant to be descriptive, not prescriptive. Another aspect to consider is that allowing multiple When-Then pairs per scenario indicates that a team sees more value in BDD’s test framework than in its collaborative spec process. (Refer to ‑‑BDD; Automation without Collaboration and BDD‑‑; Collaboration without Automation.)

Ultimately, you must decide what practices are best for your project. The main reason I uphold the Cardinal Rule of BDD so strongly is that it makes for good specs and good tests. I’ve seen engineers write extremely long, intensive test procedures (and I mean, dozens of duplicate behaviors per test) that are alright for manual testing but do not transition well into automation because they are too fragile and they don’t yield useful information upon failure. The Cardinal Rule is a way to break out of the procedure-driven mindset, and banning multiple When-Then pairs per Gherkin scenario is an effective rule for enforcing it.

To Infinity and Beyond: A Guide to Parallel Testing

Are your automated tests running in parallel? If not, then they probably should be. Together with continuous integration, parallel testing the best way to fail fast during software development and ultimately enforce higher software quality. Switching tests from serial to parallel execution, however, is not a simple task. Tests themselves must be designed to run concurrently without colliding, and extra tools and systems are needed to handle the extra stress. This article is a high-level guide to good parallel testing practices.

What is Parallel Testing?

Parallel testing means running multiple automated tests simultaneously to shorten the overall start-to-end runtime of a test suite. For example, if 10 tests take a total of 10 minutes to run, then 2 parallel processes could execute 5 tests each and cut the total runtime down to 5 minutes. Even better, 10 processes could execute 1 test each to shrink runtime to 1 minute. Parallel testing is usually managed by either a test framework or a continuous integration tool. It also requires more compute resources than serial testing.

Why Go Parallel?

Running automated tests in parallel does require more effort (and potentially cost) than running tests serially. So, why go through the trouble?

The answer is simple: time. It is well documented that software bugs cost more when they are discovered later. That’s why current development practices like Agile and BDD strive to avoid problems from the start through small iterations and healthy collaboration (“shift left“), while CI/CD defensively catches regressions as soon as they happen (“fail fast“). Reducing the time to discover a problem after it has been introduced means higher quality and higher productivity.

Ideally, a developer should be told if a code change is good or bad immediately after committing it. The change should automatically trigger a new build that runs all tests. Unfortunately, tests are not instantaneous – they could take minutes, hours, or even days to complete. A test automation strategy based on the Testing Pyramid will certainly shorten start-to-end execution time but likely still require parallelization. Consider the layers of the Testing Pyramid and their tests’ average runtimes, the Testing Pyramid Rule of 1’s:

The Testing Pyramid with Times — Each layer is listed above with the rough runtime of a typical test. Though actual runtimes will vary, the Rule of 1’s focuses on orders of magnitude. Unit tests typically run in milliseconds because they often exercise product code in memory. Integration tests exercise live products but are limited in scope and often cover low-level areas (like REST service calls). End-to-end tests, however, cover full paths through a live system, which requires extra setup and waiting (like Selenium WebDriver interaction).

Now, consider how many tests from each layer could be run within given time limits, if the tests are run serially:

Test Layer	1 Minute Near-Instant	10 Minutes Coffee Break	1 Hour There Goes Today
Unit	60,000	600,000	3,600,000
Integration	60	600	3,600
End-to-End	1	10	60

Unit test numbers look pretty good, though keep in mind 1 millisecond is often the best-case runtime for a unit test. Integration and end-to-end runtimes, however, pose a more pressing problem. It is not uncommon for a project to have thousands of above-unit tests, yet not even a hundred end-to-end tests could complete within an hour, nor could a thousand integration tests complete within 10 minutes. Now, consider two more facts: (1) tests often run as different phases in a CI pipeline, to total runtimes are stacked, and (2) multiple commits would trigger multiple builds, which could cause a serious backup. Serial test execution would starve engineering feedback in any continuous integration system of scale. A team would need to drastically shrink test coverage or give up on being truly “continuous” in favor of running tests daily or weekly. Neither alternative is acceptable these days. CI needs parallel testing to be truly continuous.

The Danger of Collisions

The biggest danger for parallel testing is collision – when tests interfere with each other, causing invalid test failures. Collisions may happen in the product under test if product state is manipulated by more than one test at a time, or they may happen in the automation code itself if the code is not thread-safe. Collisions are also inherently intermittent, which makes them all the more difficult to diagnose. As a design principle, automated tests must avoid collisions for correct parallel execution.

Making tests run in parallel is not as simple as flipping a switch or adding a new config file. Automated tests must be specifically designed to run in parallel. A team may need to significantly redevelop their automation code to make parallel execution work right.

A train collision in Mannheim, Germany in 2014. Don’t let this happen to your tests!

Handling Product-Level Collisions

Product-level collisions essentially reduce to how environments are set up and handled.

Separate Environments

The most basic way to avoid product-level collisions would be to run each test thread or process against its own instance of the product in an exclusive environment. (In the most extreme case, every single test could have its own product instance.) No collisions would happen in the product because each product instance would be touched by only one test instance at a time. Separate environments are possible to implement using various configuration and deployment tools. Docker containers are quick and easy to spin up. VMs with Vagrant, Puppet, Chef, and/or Ansible can also get it done.

However, it may not always be sensible to make separate environments for each test thread/process:

Creating a new environment is inefficient – it takes extra time to set up that may cancel out any time saved from parallel execution.
Many projects simply don’t have the money or the compute resources to handle a massive scale-out.
Some tests may not cause collisions and therefore may not need total isolation.
Some product environments are extremely large and complicated and would not be practical to replicate for each test individually.

Shared Environments

Environments with a shared product instance are quite common. One could be a common environment that everyone on a team shares, or one could be freshly created during a CI run and accessed by multiple test threads/processes. Either way, product-level collisions are possible, and tests must be designed to avoid clashing product states. Any test covering a persistent state is vulnerable; usually, this is the vast majority of tests. Consider web app testing as an example. Tests to load a page and do some basic interactions can probably run in parallel without extra protection, but tests that use a login to enter data or change settings could certainly collide. In this case, collisions could be avoided by using different logins for each simultaneous test instance – by using either a pool of logins, a unique login per test case, or a unique login per thread/process. Each product is different and will require different strategies for avoiding collisions.

the_earth_seen_from_apollo_17 — We all share certain environments. Take care of them when you do. (Photo: *The Blue Marble*, taken by the Apollo 17 crew on Dec 7, 1972)

Handling Automation-Level Collisions

Automation-level collisions can happen when automation code is not thread-safe, which could mean more than simply locks and semaphores.

#1: Test Independence

Test cases must be completely independent of each other. One test must not require another test to run before it for the sake of setup. A test case should be able to run by itself without any others. A test suite should be able to run successfully in random order.

#2: Proper Variable Scope

If parallel tests will be run in the same memory address space, then it is imperative to properly scope all variables. Global or static mutable variables (e.g., “non-constants”) must not be allowed because they could be changed unexpectedly. The best pattern for handling scope is dependency injection. Thread-safe singletons would be a second choice. (Typically, global or static variables are used to subvert design patterns, so they may reveal further necessary automation rework when discovered.)

#3: External Resources

Automation may sometimes interact with external resources, such as test config files or test result databases/services. Make sure no external interactions collide. For example, make sure test run updates don’t overwrite each other.

#4: Logging

Logs are very difficult to trace when multiple tests are simultaneously printed to the same file. The best practice is to generate separate log files for each test case, thread, or process to make them readable.

#5: Result Aggregation

A test suite is a unified collection of tests, no matter how many threads/processes are used to run its tests in parallel. Make sure test results are aggregated together into one report. Some frameworks will do this automatically, while others will require custom post-processing.

#6: Test Filtering

One strategy to avoid collisions may be to run non-colliding partitions (subsets) of tests in parallel. Test tagging and filtering would make this possible. For example, tests that require a special login could be tagged as such and run together on one thread.

Test Scalability

The previous section on collisions discussed how to handle product environments. It is also important to consider how to handle the test automation environment. These are two different things: the product environment contains the live product under test, while the test environment contains the automation software and resources that run tests against the product. The test environment is where the parallel tests will be executed, and, as such, it must be scalable to handle the parallelization. A common example of a test environment could be a Jenkins master with a few agents for running build pipelines. There are two primary ways to scale the test environment: scale-up and scale-out.

Parallel Scale-Up

Scale-up is when one machine is configured to handle more tests in parallel. For example, scale-up would be when a machine switches from one (serial) thread to two, three, or even more in parallel. Many popular test runners support this type of scale-up by spawning and joining threads in a common memory address space or by forking processes. (For example, the SpecFlow+ Runner lets you choose.)

Scale-up is a simple way to squeeze as much utility out of an existing machine as possible. If tests are designed to handle collisions, and the test runner has out-of-the-box support, then it’s usually pretty easy to add more test threads/processes. However, parallel test scale-up is inherently limited by the machine’s capacity. Each additional test process succumbs to the law of diminishing returns as more memory and processor cycles are used. Eventually, adding more threads will actually slow down test execution because the processor(s) will waste time constantly switching between tests. (Anecdotally, I found the optimal test-thread-to-processor ratio to be 2-to-1 for running C#/SpecFlow/Selenium-WebDriver tests on Amazon EC2 M4 instances.) A machine itself could be upgraded with more threads and processors, but nevertheless, there are limits to a single machine’s maximum capacity. Weird problems like TCP/IP port exhaustion may also arise.

Parallel Scale-Out

Scale-out is when multiple machines are configured to run tests in parallel. Whereas scale-up had one machine running multiple tests, scale-out has multiple machines each running tests. Scale-out can be achieved in a number of ways. A few examples are:

One master test execution machine launches multiple Web UI tests that each use a remote Selenium WebDriver with a service like Selenium Grid, Sauce Labs, LambdaTest, or BrowserStack.
A Jenkins pipeline launches tests across ten agents in parallel, in which each agent executes a tenth of the tests independently.

Scale-out is a better long-term solution than scale-up because scale-out can handle an unlimited number of machines for parallel testing. The limiting factor with scale-out is not the maximum capacity of the hardware but rather the cost of running more machines. However, scale-out is much harder to implement than scale-up. It requires tests to be evenly divided with some sort of balancer and filter. It also requires some sort of test result aggregation for joint reporting – people won’t want to piece together a bunch of separate reports to get an overall snapshot of quality. Plus, the test environment is more complicated to build and maintain (though tools like CloudBees Jenkins Enterprise or Amazon EC2 can make it easier.)

Upwards and Outwards

Of course, scale-up and scale-out are not mutually exclusive. Scaled-out nodes could individually be scaled-up. Consider a test environment with 10 powerful VMs that could each handle 10 tests in parallel – that means 100 tests could run simultaneously. Using the Rule of 1’s, it would take only about a minute to run 100 Web UI tests, which serially would have taken over an hour and a half! Use both strategies to shorten start-to-end runtime as much as possible.

Conclusion

Parallel testing is a worthwhile endeavor. When done properly, it will not only reduce development time but also improve the development experience. For readers who want to start doing parallel testing, I recommend researching the tools and frameworks you want to use. Many popular test frameworks support parallel execution, and even if the one you choose doesn’t, you can always invoke tests in parallel from the command line. Do well!

Missing Error Messages with Angular Testing

Logs are an essential part of test automation – they leave a trace of execution that is indispensable when backtracking through failures. Missing logs can make it much, much harder to figure out problems in the code. Recently, I hit this problem while writing unit tests for an Angular project: neither the console nor Google Chrome’s debugger showed any helpful error messages! Thankfully, there was a pretty easy solution. This article will explain the problem and the solution.

Update (January 18, 2018):

After further research, it appears that this problem was fixed in the @angular/cli 1.3.x release. I updated to 1.3.2, removed the “–sourcemaps=false” option, and verified that the error messages are printed. Furthermore, the source mapping is correct – the errors map to the correct line and column in the sources files!

If you are stuck using a version prior to 1.3.x, then use the workaround detailed below. Otherwise, upgrade the package and avoid the problem altogether!

TL;DR

Disable source maps when running Angular tests:

$ ng test --sourcemaps=false

Angular Project Setup

This article presumes the standard Angular 4 project setup, as automatically generated by the “ng new” command. Jasmine unit tests are written in “*.spec.ts” files and run with Karma using Google Chrome as the browser.

The Problem

The Angular testing utilities provide great support for isolating and exercising parts of Angular code for unit testing. However, programmers need to use them properly, or else they won’t work. When I tried writing some unit tests for ngrx, I quickly hit dependency problems. However, it took me hours to figure it out because the console output was not helpful – all it would print was “ERROR”:

As a newbie, I had no idea what went wrong. I tried debugging with Chrome, but the error message I got there was cryptic and not much more helpful:

The Solution

After googling for a while, I discovered that there is a bug with source maps in the Angular CLI (Issue #7296). The workaround is to add the “–sourcemaps=false” option to the “ng test” command. If the package.json file contains a “test” script that calls “ng test”, the option may be added there. Now, the console prints error messages:

Errors also appear on the Karma page in the browser:

One side effect of this workaround, however, is that the line and column numbers don’t correctly line up to the TypeScript files. I presume that they map to the compiled JavaScript files instead. Nevertheless, error messages with wrong line numbers are better than no error messages at all. There may be a way to fix the source mapping, but that’s a problem for another day. Hopefully, the Angular team will fix this “feature” for us.

Now, time to go fix those test errors!

The Airing of Grievances: Selenium WebDriver

Selenium WebDriver is the de facto standard for Web UI automation. It’s a great tool, but like anything good, it can also be misused. And that’s where I have grievances. I got a lot of problems with Selenium WebDriver abuses, and now you’re gonna hear about it!

WebDriver “Unit Tests”

“WebDriver unit tests” are like square circles – definitionally, they are logical fallacies. In my books, a unit test must be white box, meaning it has direct access to the product code. However, Web UI tests using WebDriver are inherently black box tests because they are interacting with an actively running website. Thus, they must be above-unit tests by definition. Don’t call them unit tests!

Making Every Test a Web Test

NO! The Testing Pyramid is vital to a healthy overall testing strategy. Web tests are great because they test a website in the ways a user would interact with it, but they have a significant cost. As compared to lower-level tests, they are more fragile, they require more development resources, and they take much more time to run. Browser differences may also affect testing. Furthermore, problems in lower level components should be caught at those lower levels! Sure, HTTP 400s and 500s will appear at the web app layer, but they would be much faster to find and fix with service layer tests. Different layers of testing mitigate risk at their optimal returns-on-investment.

No WebDriver Cleanup

Every WebDriver instance spawns a new system process for “driving” web browser interactions. When the test automation process completes, the WebDriver process may not necessary terminate with it. It is imperative that test automation quits the WebDriver instance once testing is complete. Make sure cleanup happens even when abortive exceptions occur! Otherwise, zombie WebDriver processes may continue on the test machine, causing any number of problems: locked files and directories, high memory usage, wasted CPU cycles, and blocked network ports. These problems can cripple a system and even break future test runs, especially on shared testing machines (like Jenkins nodes). Please, only you can stop the zombie apocalypse – always quit WebDriver instances!

Using “Close” Instead of “Quit”

Regardless of programming language, the WebDriver class has both “close” and “quit” methods. “Close” will close the current browser tab or window, while “quit” will close all windows and terminate the WebDriver process. Make sure to quit during final cleanup. Doing only a close may result in zombie WebDriver processes. It’s a rookie mistake.

Not Optimizing Setup/Cleanup with Service Calls

Web tests are notoriously slow. Whenever you can speed them up, do it! Some tests can be optimized by preparing initial state with service calls. For example, let’s say a user visiting a car dealership website needs to have favorite cars pre-selected for a comparison page test. Rather than navigating to a bunch of car pages and clicking a “favorite” icon, make a setup routine that calls a service to select favorites. Not all tests can do this sort of optimization, but definitely do it for those that can!

Web Elements with No ID

Developers, we need to talk – give every significant element a unique ID. PLEASE! WebDriver calls are so much easier to write and so much more robust to run when locator queries can use IDs instead of CSS selectors or XPaths. Let’s pick ID names during our Three Amigos meetings so that I can program the tests while you develop the features. Determining what elements are import should be easy based on our wireframes. You will save us automators so much time and frustration, since we won’t need to dig through HTML and wonder why our XPaths don’t work.

Changing Web Elements Without Warning

Hey, another thing, developers – don’t change the web page structure without telling us! WebDriver locator queries will break if you change the web elements. Even a seemingly innocuous change could wipe out hundreds of tests. Automation effort is non-trivial. Changes must be planned and sized with automation considerations in mind.

Not Using the Page Object Model

The Page Object Model is a widely-used design pattern for modeling a web page (or components on a web page) as an object in terms of its web elements and user interactions with it. It abstracts Web UI interactions into a common layer that can be reused by many different tests. (The Screenplay pattern, also good, is an evolution of the Page Object Model; tutorial here.) Not using the Page Object Model is Selenium suicide. It will result in rampant code duplication.

Demonizing XPath

XPaths have long been criticized for being slower than CSS selectors. That claim is outdated baloney. In many cases, XPaths outperform CSS selectors – see here, here, and here. Another common complaint is that XPath syntax is more complicated than CSS selector syntax. Honestly, I think they’re about the same in terms of learning curve. XPaths are also more powerful that CSS selectors because they can uniquely pinpoint any element on the page.

Inefficient Web Element Access

Web element IDs make access extremely efficient. However, when IDs are not provided, other locator query types are needed. It is always better to use locator queries to pinpoint elements, rather than to get a list of elements (or even a parent/child chain) to traverse using programming code. For example, I often see code reviews in which an XPath returns a list of results with text labels, and then the programming code (C# or Java or whatever) has a for loop that iterates over each element in the list and exits when the element with the desired label is found. Just add “[text()=’desired text’]” or “[contains(text(), ‘desired text’)]” to the XPath! Use locator queries for all they’re worth.

Interacting with Web Elements Before the Page is Ready

Web UI test automation is inherently full of race conditions. Make sure the elements are ready before calling them, or else face a bunch of “element not found” exceptions. Use WebDriver waits for efficient waiting. Do not use hard sleeps (like Java’s Thread.sleep).

Untuned Timeouts

WebDriver calls need timeouts, or else they could hang forever if there is a problem. (Check online docs for default timeout values.) Timeout value ought to be tuned appropriately for different test environments and different websites. Timeouts that are too short will unnecessarily abort tests, while timeouts that are too long will lengthen precious test runtime.

The Airing of Grievances: Test Automation Code

More grievances! Test code ought to be developed with the same high standards as product code, but often it is neglected. That bothers me so much, and it costs teams a significant amount of time and money through its bad consequences. I got a lot of problems with bad test automation code, and now you’re gonna hear about it!

Copypasta

Code duplication is code cancer. It is particularly rampant in test automation because test steps are often repetitive. That’s no excuse, however, to allow it. Use better programming practices, or face my code review rejections!

Hard-Coded Configuration Data

Automation should be able to run on any environment without hassle. Don’t hard-code URLs, usernames, passwords, and other config-specific values! Read them in as inputs or config files. Nothing is more frustrating than switching environments but – surprise! – not running with the correct values.

Re-reading Inputs for Every Single Test

Config files and input values should not change during a test suite run. So, don’t repeatedly read them! That’s inefficient. Read them once, and hold them in memory in a centrally-accessible location.

Leaving Temporary Files or Settings

Clean up your mess! Don’t leave temporary files or settings after a test completes. Not doing proper cleanup can harm future tests. Files can also build up and eventually run the system out of storage space. It’s a Boy Scout principle: Leave No Trace!

Incorrect Test Results

Test results need integrity to be trusted. Watch out for false positives and false negatives. If there’s a problem during a test, handle it – don’t ignore it. Don’t skip assertion calls. Business decisions are made based on test results.

Uninformative Failure Messages

Here’s one: “Failed.” That tells me nothing. Why did the test fail? Was something missing from a web page? Was there an unexpected exception? Did a service call return 500? TELL ME! Otherwise, I need to waste time rerunning and even debugging the test. The failure may also be intermittent. Tell me what the problem is right when it happens.

Making Assertion Calls in Automation Support Code

Every automated test case must make assertions to verify goodness or badness of system conditions. But, as a best practice, those assertion calls should be written only in the test case code – not in support code like page objects or framework-level classes. Assertions are test-specific, while support code is generic. Support code should simply give system state, while test cases use assertions to validate system state. If support code has assertion calls, then it is far less reusable and traceable.

Treating Boolean Methods like Assertions

This is an all-too-common rookie mistake. Boolean methods simply return a true/false value. They do not (or, as least according to the previous grievance, should not) contain assertion calls. However, I’ve seen many code reviews where programmers call a Boolean method but do nothing with the result. Whoops!

Burying Exceptions

Exceptions mean problems. Test automation is meant to expose problems. Don’t blindly catch all exceptions, log a message, and continue on with a test like nothing’s wrong. Let the exception rise! Most modern test frameworks will catch all exceptions at the test case level, automatically report them as failures, and safely move on to the next test. There is no need to catch exceptions yourself if they ought to abort the test, anyway – that adds a lot of unnecessary code. Catch exceptions only if they are recoverable!

No Automatic Recovery

True automation means no manual intervention. An automation framework should have built-in ways to automatically recover from common problems. For example, if a network connection breaks, automatically attempt to reconnect. If a test fails, retry it. If too many failures happen, either end the run early or pause for some time. And make sure recovery mechanisms are built into the framework, rather than appearing as copypasta throughout the codebase. The purpose of retries (especially for tests) is not to blindly run tests until they turn from red to green, but rather to overcome unexpected interruptions and also to gather more data to better understand failure reasons. Interruptions will happen – handle them when they do. And be sure to log any failures that do happen so that the reasons may be investigated.

No Logging

Please leave a helpful trail of log messages. Tracing through automation code can be difficult after a failure, especially when you’re not the original author. Logging helps everyone quickly get to the root cause of failures. It also really helps to create reproduction scenarios. If I can copy the log from a failing tests into a bug report and be done, then AMEN for an easy triage!

Hard Waits

Ain’t nobody got time for that! Automation needs to be fast, because time is money. Forcing Thread.sleep (or other such equivalent in your language of choice) shows either laziness or desperation. It unnecessarily wastes precious runtime. It also creates race conditions: what if the wait time ends before the thing is ready? Always use “smart” waits – actively and repeatedly check the system at short intervals for the thing to be ready, and abort after a healthy timeout value.

The Airing of Grievances: Test Automation Process

Test automation is a big deal for me. It is my chosen specialty within the broad field of software. When I see things done wrong, or when people just don’t get what it’s about, it really grinds my gears. I got a lot of problems with bad test automation processes, and now you’re gonna hear about it!

Saying “They’re Just Test Scripts”

Test automation is not just a bunch of test scripts: it is a full technology stack that requires design, integration, and expertise. Test automation development is a discipline. Saying it is just a bunch of test scripts is derogatory and demeaning. It devalues the effort test automation requires, which can lead to poor work item sizings and an “us vs. them” attitude between developers and QA.

Not Applying the Same Software Development Best Practices

Test automation is software development, and all the same best practices should thus apply. Write clean, well-designed code. Use version control with code reviews. Add comments and doc. Don’t get lazy because “they’re just test scripts” – wrong attitude!

Lip Service

Don’t say automation is important but then never dedicate time or resources to work on it. Don’t leave automation as a task to complete only if there’s time after manual testing is done. Make automation a priority, or else it will never get done! I once worked on an Agile team where automation framework stories were never included into the sprint because there weren’t “enough points to go around.” So, even though this company hired me explicitly to do test automation, I always got shunted into a manual testing scramble every sprint.

Confusing Test Automation with Deployment Automation

Test automation is the automation of test scenarios (for either functional or performance tests). Deployment automation is the automation of product build distribution and installation in a software environment. They are two different concerns. Cucumber is not Ansible.

Forcing 100% Automation

Some people think that automation will totally eliminate the need for any manual testing. That’s simply not true. Automation and manual testing are complementary. Automation should handle deterministic scenarios with a worthwhile return-on-investment to automate, while manual testing should focus on exploratory testing, user experience (UX), and tests that are too complicated to automate properly. Forcing 100% automation will make teams focus on metrics instead of quality and effectiveness.

Downsizing or Eliminating QA

Test automation doesn’t reduce or eliminate the need for testers. On the contrary, test automation requires even more advanced skills than old-school manual testing. There is still a need for testing roles, whether as a dedicated position or as shared collectively by a bunch of developers. The work done by that testing role just becomes more technical.

Saying Product Code and Test Code Must Use the Same Language

For unit tests, this is true, but for above-unit tests, it is simply false. Any general purpose programming language could be used to automate black-box tests. For example, Python tests could run against an Angular web app. A team may choose to use the same language for product and test code for simplicity, but it is not mandatory.

Not Classifying Test Types

Not all tests are the same. You can play buzzword bingo with all the different test type names: unit, integration, end-to-end, functional, performance, system, contract, exploratory, stress, limits, longevity, test-to-break, etc. Different tests need different tools or frameworks. Tests should also be written at the appropriate Testing Pyramid level.

Assuming All Tests Are Equal

Again, not all tests are the same, even within the same test type. Tests vary in development time, runtime, and maintenance time. It’s not accurate to compare individuals or teams merely on test numbers.

Not Prioritizing Tests to Automate

There’s never enough time to automate everything. Pick the most important ones to automate first – typically the core, highest-priority features. Don’t dilly-dally on unimportant tests.

Not Running Tests Regularly

Automated tests need to run at least once daily, if not continuously in Continuous Integration. Otherwise, the return-on-investment is just too low to justify the automation work. I once worked on a QA team that would run an automated test only once or twice during a 2-year release! How wasteful.

HP Quality Center / ALM

This tool f*&@$#! sucks. Don’t use it for automated tests. Pocket the money and just develop a good codebase with decent doc in the code, and rely upon other dashboards (like Jenkins or Kibana) for test reporting.

The Airing of Grievances: Software Development

Let the Airing of Grievances series begin: I got a lot of problems with bad software development practices, and now you’re gonna hear about it!

Duplicate Code

Code duplication is code cancer. It spreads unmaintainable code snippets throughout the code base, making refactoring a nightmare and potentially spreading bugs. Instead, write a helper function. Add parameters to the method signature. Leverage a design pattern. I don’t care if it’s test automation or any other code – Don’t repeat yourself!

Typos

Typos are a hallmark of carelessness. We all make them from time to time, but repeated appearances devalue craftsmanship. Plus, they can be really confusing in code that’s already confusing enough! That’s why I reject code reviews for typos.

Global Non-Constant Variables

Globals are potentially okay if their values are constant. Otherwise, just no. Please no. Absolutely no for parallel or concurrent programming. The side effects! The side effects!

Using Literal Values Instead of Constants

Literal values get buried so easily under lines and lines of code. Hunting down literals when they must be changed can be a terrible pain in the neck. Just put constants in a common place.

Avoiding Dependency Management

Dependency managers like Maven, NuGet, and pip automatically download and link packages. There’s no need to download packages yourself and struggle to set your PATHs correctly. There’s also no need to copy their open source code directly into your project. I’ve seen it happen! Just use a dependable dependency manager.

Breaking Established Design Patterns

When working within a well-defined framework, it is often better to follow the established design patterns than to hack your own way of doing things into it. Hacking will be difficult and prone to breakage with framework updates. If the framework is deficient in some way, then make it better instead of hacking around it.

No Comments or Documentation

Please write something. Even “self-documenting” code can use some context. Give a one-line comment for every “paragraph” of code to convey intent. Use standard doc formats like Javadoc or docstrings that tie into doc tools. Leave a root-level README file (and write it in Markdown) for the project. Nobody will know your code the way you think they should.

Not Using Version Control

With all the risks of bugs and typos in an inherently fragile development process, you would think people would always want code protection, but I guess some just like to live on the edge. A sense of danger must give them a thrill. Or, perhaps they don’t like dwelling on things of the past. That is, until they break something and can’t figure it out, or their machine’s hard drive crashes with no backup. Give that person a Darwin Award.

No Unit Tests

If no version control wasn’t enough, let’s just never test our code and make QA deal with all the problems! Seriously, though, how can you sleep at night without unit tests? If I were the boss, I’d fire developers for not writing unit tests.

Permitting Tests to Fail

Whenever tests fail, people should be on point immediately to open bug reports, find the root cause, and fix the defect. Tests should not be allowed to fail repeatedly day after day without action. At the very least, flag failures in the test report as triaged. Complacency degrades quality.

Blue Balls

Jenkins uses blue balls because the creator is Japanese. I love Japanese culture, but my passing tests need to be green. Get the Green Balls plugin. Nobody likes blue balls.

Skipping Code Reviews

Ain’t nobody got time for that? Ain’t nobody got time when your code done BROKE cuz nobody caught the problem in review! Take the time for code reviews. Teams should also have policy for quick turnaround times.

Not Re-testing Code After Changes

People change code all the time. But people don’t always re-test code after changes are made. Why not? That’s how problems happen. I’ve seen people post updated code to a PR that wouldn’t even compile. Please, check yourself before you wreck yourself.

“It Works on My Machine”

That’s bullfeathers, and you know it! Nobody cares if it works on your machine if it doesn’t work on other machines. Do the right thing and go help the person, instead of blowing them off with this lame excuse.

Arrogance

Nobody wants to work with a condescending know-it-all. Don’t be that person.