best practices

Quality Metrics 101: Test Quality

New to the series? Start from the beginning!

Test quality metrics make sure that testing efforts are worthwhile. Though “testing” and “quality” may be synonymous as organizational titles, testing is only one method of enforcing quality. In software, it just happens to be the most effective one. Testing is expensive, though, because it slows down time-to-market. Some people even devalue testing work because it doesn’t add new features to a product. Below are aspects of test quality to consider measuring to prove and even increase the value of testing efforts.

roofing

Coverage

Quality Aspect How much functionality is covered by tests?
Desired State High – More coverage means less risk. Note that 100% complete coverage is impossible.
Metrics Coverage may be measured for both manual and automated tests. However, automated test coverage is usually more important because automated tests are meant to be defensive without gaps.

Code Coverage – Code coverage tools check what paths of code are actually exercised by automated tests. While they cannot tell if tests are good or bad, they are great for exposing gaps in coverage. Unit test code coverage is easy because most frameworks have plugins, but above-unit code coverage requires instrumented builds. Look for tools that track more than just lines of code. Target 90%+ coverage. Add new tests to cover any major gaps.

Feature Coverage – Feature coverage is a manual way to score features on test coverage based on planning and review. For this metric to be successful, a team must consistently specify features well; otherwise, this metric will give useless data. Gherkin scenarios a great way to do this – for example, each scenario can be marked as untested, manual, or automated. Feature coverage is unscientific, but it can give a better picture of functionalities actually covered (instead of just the raw lines of code covered).

Automation DebtTechnical debt increases when tests are not automated and thus lack coverage. Teams are often unable to automate all tests originally planned, and test automation is frequently jettisoned from the Definition of Done. Or, a project may not start automating tests until a large chunk of the project is already complete. The best way to track automation debt is to create a backlog for incomplete automation work. Backlog tasks can be sized, prioritized, and planned according to whatever development process is used (Scrum, Kanban, etc.). Appropriate process metrics can then be used to understand the magnitude of the work and, thus, the lack of automated test coverage.

Warning: Test case count, test length, and test code line count are terrible metrics for coverage because they encourage largeness rather than uniqueness. The goal of testing is to have the greatest coverage with the lowest risk for the least work. Anybody can blindly write tests or variations that add no meaningful value.

gears-image1

Reliability

Quality Aspect Do automated tests consistently reach completion? And how trustworthy are the results?
Desired State High – Reliability means less time for failure triage or (horrors) reruns.
Metrics Failure Reasons – Track the failure reason for each test case run. Ideally, tests should fail only when they discover product bugs. However, tests may also fail when:

  • an acceptable product change caused an automation error because tests were not updated, indicating poor communication or careless updates
  • an environmental change or interruption caused an automation error, indicating deployment or sysadmin problems
  • the automation code itself has a bug

Remember, “successful” test runs either pass with appropriate coverage or fail due to product bugs. “Unsuccessful” test runs fail or crash for reasons other than product bugs. Aim to minimize unsuccessful test runs. Never hack a test just to get it passing – always work to fix the problems behind test failures.

is-the-speedometer-reliable-in-telling-me-my-vehicles-real-time-speed

Speed

Quality Aspect How much time do test runs take?
Desired State Fast – Tests should complete in the shortest time possible.
Metrics Test Case Execution Time – Test case execution times indicate the efficiency of the automation code. Track the start-to-end execution time for every individual test case run. Then, analyze the data using common sense. For example, outliers may be inefficient tests that need tuning or should be removed altogether. It may be wise to separate test runs by result type or coverage area. Historical data can also be used as a baseline to determine performance impacts when making cross-cutting automation changes.

Test Suite Execution Time – Test suites are sets of test cases, but their execution times are not merely the sum of their tests’ times. A test suite run may include environmental setup, deployment, parallel execution, reporting, and other things. The purpose of tracking test suite execution time is to determine the start-to-end time of the suite in total, because that indicates the speed of feedback and, in CI, delivery. Tracking test suite execution time will also reveal the effect of adding more test cases to the suite, which then factors into the risk-based decisions of including or excluding tests.

Test Pyramid Balance – The Test Pyramid separates tests between unit (bottom), integration (middle), and end-to-end (top) layers. Ideally, there should be more tests at the bottom than at the top. Why? Higher-level tests are more expensive – they take more time to develop, they are more time consuming to triage, and they have slower execution times. Consider the “Rule of 1’s”: a unit test takes ~1ms, an integration test takes ~1s, and an end-to-end test takes ~1m. When scaled to thousands of tests with continuous integration, end-to-end tests simply take too much time. Tracking the proportion of tests at each layer will give a rough picture of the balance. There’s no perfect ratio between layers, but make sure that the tests form a pyramid and not a cupcake, hourglass, or ice cream cone. Rebalance test efforts as appropriate.

piggybankmoney

Return on Investment

Quality Aspect Do the tests add greater value than their cost?
Desired State High – Tests need to be worth the effort. Don’t test for the sake of testing!
Metrics Measuring return on investment in terms of hard dollars is objectively impossible. The true cost of bugs can never be fully known: if a bug is caught early, the potential cost to fix it later can merely be estimated. The intangible value of protecting brand reputation may be more important than the tangible value of money saved by finding specific bugs. Better quality practices might prevent developers from causing bugs that would have otherwise happened – and there’s no good way to measure that.

Instead, return on investment is better measured by a collection of metrics that validate both code line protection and defect discovery. Use a weighted scorecard to get a more holistic view of ROI. Scorecards can be used with estimates for planning tests, as well as plugged in with actual values to measure the degree of success. Note that some aspects of ROI may be too difficult to measure accurately – in those cases, a LOW-MID-HIGH grading scale may be best. Others may seem like micromanagement.

  • Priority – Assign each test a priority for its coverage importance. Core functionalities should have the highest priority, while fringe functionalities should have the lowest priority. Focus on high-priority tests. Another way to look at importance is risk, or the chances that bugs will escape if explicit testing for a feature is not done.
  • Test Execution Frequency – Track how many times tests are actually run. Higher frequency is better. Tests that are rarely run should either be included in more regular runs or removed/archived. This could easily be tracked by a test management tool or database.
  • Coverage Uniqueness – Duplicate test coverage wastes resources. Unfortunately, this one is difficult to measure. Tools for code coverage or static analysis might help. Manual review, however, is typically a better approach.
  • Development Cost and Maintenance Cost – Track how much effort it takes to make and keep tests, including man-hours and resources. Lower costs are better, of course. Planning tools may help with this.
  • Bug Discovery – Track bugs discovered in terms of severity and when and how they were caught. Ideally, the number of bugs caught by customers after a release (meaning, not caught by tests during development) should be minimal, and their severity should be low. Bug tracking tools should easily provide this data. Be warned, though, that the raw bug count is a poor metric. Consider this question: Is a high bug count good or bad? Trick question – during a release, it indicates good test quality but poor product quality; after a release, it indicates all-around poor quality. What matters is that a minimal number of bugs happen at all, and that most of those bugs are caught and fixed before a release. Plus, keep in mind that bugs happen by accident. Finally, focusing exclusively on bug count to determine test value ignores the positive side of testing – that passing tests give confidence that features work correctly.

Quality Metrics 101: The Good, The Bad, and The Ugly

metric – [me-trik] – (noun) a standard for measuring or evaluating something

(Courtesy of dictionary.com)

When developing software, metrics can be a good way to track progress and evaluate quality. Managers typically love them because they provide insights that could otherwise be hard to see. Come on, who doesn’t love pretty charts with rainbow colors? However, gathering metrics is not easy, especially for quality. Some metrics are downright useless, and others encourage bad behavior when used improperly. It is far more important to focus on the most important aspects of quality than to blindly promulgate numbers. This article will cover quality metrics in depth, giving guidance on what quality aspects matter most and how they can be measured.

What are Quality Metrics?

Quality is the degree of a feature’s excellence. Quality metrics attempt to impartially measure a feature’s excellence. The word “attempt” is notable – quality is inherently relative, and metrics can sometimes be subjective. Take pizza as an example: How would the quality of a pizza be measured? One method could be to analyze the freshness and nutritious value of the ingredients, but, Pizza Hut notoriously fought Papa John’s Pizza over the assertion that better ingredients make better pizza. Another method could be to analyze the cooking process, like bake time or the order of toppings, but that would be better for identifying carelessness than quality. The delivery process could also be considered, like Domino’s delivery robots, but that evaluates customer service and not the pizza itself.  Ultimately, what matters are the taste and the visual appearance, which are totally subjective to the consumer. Surveys are unreliable. Taste tests have limited selection. Appearance is an art, not a science. Each of these metrics gives a glimpse into quality but does not fully reveal what actually makes a “good” pizza. Together, though, they provide a reasonable picture when the desired metrics are gathered well.

tony_pepperoni-rochester-ny-pizza-coupon

Is that really high quality pizza? Well, what aspects of quality are we measuring? We won’t get a perfect picture of quality from metrics, but we can get a rough idea. Software quality metrics work the same way.

Software Quality

In software, there are three primary types of quality metrics:

  1. Test Quality
    • How effective are tests at enforcing high quality standards?
    • Examples: code coverage, test failure reasons.
  2. Process Quality
    • How effective are processes at delivering good features?
    • Examples: time to fix broken builds, time to discover bugs.
  3. Product Quality
    • How good is the software product?
    • Examples: test failure rate, up-time, customer satisfaction.

The main purpose of software quality metrics is to validate successes and find areas for improvement in the development process. Metrics expose problems like gaps in coverage or slow feedback loops so that a team knows what to improve. They are meant to be informative but not punitive – they should simply report accurate data. Don’t shoot the messenger! For example, if the test failure rate is high, fix the bugs instead of blaming each other.

However, be warned by W. Edwards Deming‘s red bead experiment: Quality cannot be inspected into a product – it must be built in from the beginning! Metrics alone cannot solve problems – they can merely expose them. It is up to the development team to affect the proper change based on what metrics reveal. Awareness is useless without action. And action should ultimately lead to better features, faster delivery, and higher profits.

Choosing Quality Metrics

Metrics are nothing but tools to improve aspects of quality. Not every job needs the full toolbox! Always pick the quality aspect first, and then find the right measuring stick. Don’t just pick some metrics that others say are good. For example, if build stability is the quality aspect that is deemed important (and it should be), then the metric to track it could be the average time to fix a build after it is broken.

The best process for choosing quality metrics is:

  1. Identify a quality aspect that adds value.
  2. Decide if the aspect is worth measuring.
  3. Determine the desired state for that aspect.
  4. Derive the best way to measure progress toward the desired state impartially.
  5. Implement the metric gathering, storage, and analysis.
  6. Revisit the metric periodically to assert its value.
  7. Stop gathering the metric when it ceases to provide value.

Keep in mind that metrics have a cost: they must be gathered, stored, and analyzed. That’s why it’s important to pick the quality aspects that matter most.

This Series

The articles in this series will cover each of the quality metric types in detail. Each will list major quality aspects with meaningful metrics to track them and advice on how to use them. Remember, metrics should be constructive and not destructive.

 

lavemufo_edwards-deming-quote2

 

Are Gherkin Scenarios with Multiple When-Then Pairs Okay?

Don’t know about Behavior-Driven Development or Gherkin? Start here!

Writing Gherkin is easy, but writing good Gherkin is hard. My post BDD 101: Writing Good Gherkin covers many aspects of good behavior specification, including titles, phrasing, and data. One of the major points I make anytime I discuss good Gherkin is what I call the “Cardinal Rule of BDD.”

The Cardinal Rule of BDDOne Scenario, One Behavior!

A behavior scenario specification should focus on one individual behavior. This is the essence of the BDD mindset – a product’s features can be specified in terms of its behaviors, and the specs should be written as examples of those behaviors in action. Identifying individual behaviors brings clarity to design, development, and testing. Combining behaviors into a single scenario causes ambiguity, miscommunication, and test gaps. Test failure triage also becomes more difficult and time consuming because the root causes for failures are less clear – the culprit could be one of multiple behaviors. There is also a high risk of duplication when scenarios repeat the same sequence of steps instead of isolating behaviors.

One of the dead giveaways to violations of the Cardinal Rule of BDD is when a Gherkin scenario has multiple When-Then pairs, like this:

Feature: Google Searching

  Scenario: Google Image search shows pictures
    Given the user opens a web browser
    And the user navigates to "https://www.google.com/"
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

A When-Then pair denotes a unique behavior. In this example, the behaviors of performing a search and changing the search to images could and should clearly be separated into two scenarios, like this:

Feature: Google Searching

  Scenario: Search from the search bar
    Given a web browser is at the Google home page
    When the user enters "panda" into the search bar
    Then links related to "panda" are shown on the results page

  Scenario: Image search
    Given Google search results for "panda" are shown
    When the user clicks on the "Images" link at the top of the results page
    Then images related to "panda" are shown on the results page

Despite being so central to BDD philosophy, the Cardinal Rule is the one thing people always try to sidestep. Nobody ever doubts the usefulness of step parameters or the need for good grammar, but people frequently show me scenarios with multiple When-Then pairs and basically ask for an exception from the rule. My gut reaction is always, “NO! Rules don’t change.”

However…

I must first admit that the Cardinal Rule of BDD is “opinionated” – it is the way that I have found BDD to work best for collaboration and automation. Adherence forces people to adopt a behavior-driven mindset, and strictness keeps feature and test quality high. Other experts are more permissive of multiple When-Then pairs, though. Most examples I could find from leading sources such as The Cucumber Book exhibit strict Given-When-Then order for Gherkin scenarios, but other sources such as the online JBehave documentation show scenarios with multiple When-Then pairs boldly on the front page.

I must also begrudgingly admit that there are times when it is simply more convenient for a single scenario to have multiple behaviors (and thus multiple When-Then pairs). This is by no means a best practice but rather a pragmatic alternative for specification dilemmas. (See Purist vs. Pragmatist.) Below are situations in which multiple When-Then pairs may be acceptable.

Lengthy End-to-End Scenarios

End-to-end tests verify execution paths through a live system with all of its parts. Web UI tests frequently fall into this category: Selenium WebDriver interacts with a page in a browser, which then triggers calls to a backend service layer or database. Despite the name, end-to-end tests may still focus on one individual behavior. The example scenarios above, though short, technically count as end-to-end tests.

However, many people use the term “end-to-end” to refer to tests that cover sequences of behaviors. Such a scenario could violate the Cardinal Rule of BDD if it is not handled carefully. My article BDD 101: Unit, Integration, and End-to-End Tests gives strategies for handling lengthy end-to-end scenarios. One strategy is to simply turn a blind eye to multiple When-Then pairs. Ideally, each behavior would already have its own individual scenario, but then a new scenario would explicitly combine the behaviors together to get that full, end-to-end path. The new scenario would be easy to write because the steps could be reused. This isn’t the only strategy, so please be sure to consider the others before writing the tests.

Audits

Software system audits frequently require lengthy end-to-end scenarios. They are quite common in highly-regulated domains. For example, a bank may need to prove that a loan is prepared correctly or that a transaction puts money into the right accounts. Auditors typically require tests to run through entire system paths (e.g., multiple behaviors) using the same records, such as one loan application or one payment. Auditees must not only provide test results for past runs but must also repeat tests on demand. Separating each individual behavior into its own scenario makes each test independent, so during test execution, there will be no guaranteed order and no shared test data, and auditors would not have the end-to-end verification that they require. The simplest way to give the auditors what they need is to write one lengthy scenario with multiple When-Then pairs.

Service Calls

Service call testing is another case for which multiple When-Then pairs may be pragmatically justified. REST, SOAP, and WSDL are examples of service call types. Service layer development is more engineering-centric than business-centric, but many teams nevertheless choose to test service calls with Gherkin-based frameworks like Cucumber. Due to the programmatic nature of services, Gherkin scenarios for service calls tend to be quite imperative: specify a request, make the call, and verify parts of the response. This isn’t so bad for independent service calls, but it becomes a problematic when one request needs another call’s response.

One solution is the classic “pure” scenario split: put any necessary setup, including initial requests to get required response parts, into custom Given steps. This abides by the Cardinal Rule and avoids duplicate When-Then pairs. But, it introduces an unsavory form of code duplication. Many service calls end up being written twice: once as a Gherkin scenario for testing, and once in the underlying automation code to be called by Given steps. This violates the DRY principle.

The alternative “pragmatic” solution is to write scenarios that specify multiple service calls in the Gherkin steps. The Karate project advocates this approach, as shown in their “Hello World” example:

Take Caution!

There may be other cases when When-Then repetition is useful. Feel free to leave suggestions in the comments below. My examples are meant to be descriptive, not prescriptive. Another aspect to consider is that allowing multiple When-Then pairs per scenario indicates that a team sees more value in BDD’s test framework than in its collaborative spec process. (Refer to ‑‑BDD; Automation without Collaboration and BDD‑‑; Collaboration without Automation.)

Ultimately, you must decide what practices are best for your project. The main reason I uphold the Cardinal Rule of BDD so strongly is that it makes for good specs and good tests. I’ve seen engineers write extremely long, intensive test procedures (and I mean, dozens of duplicate behaviors per test) that are alright for manual testing but do not transition well into automation because they are too fragile and they don’t yield useful information upon failure. The Cardinal Rule is a way to break out of the procedure-driven mindset, and banning multiple When-Then pairs per Gherkin scenario is an effective rule for enforcing it.

Good Gherkin Scenario Titles

Don’t know about Behavior-Driven Development or Gherkin? Start here!

The Golden Gherkin Rule states:

Treat other readers as you would want to be treated. Write Gherkin so that people who don’t know the feature will understand it.

Part of writing good Gherkin (or any other specification-by-example language) includes writing good behavior scenario titles. The title is the face of the scenario: it summarizes what the behavior is all about. Good titles make collaboration and test triage a breeze, whereas bad titles make it tougher. But what makes a title “good”? Below are some helpful pointers.

One-Liners

Good titles should be short one-liners. One simple statement should be sufficient to concisely capture the intended behavior. Anything longer likely means that either the author doesn’t truly understand the behavior in focus, or that the scenario does not focus on one main behavior. Extra comments may be added to supplement the scenario’s description if necessary to avoid lengthy titles. Also, most BDD test automation frameworks will print scenario titles to logs for traceability.

Bad Example Good Example
The user can log into the app, navigate to the profile page, and see their full name, address, phone number, email, and username The profile page displays the user’s personal info

Conjunction Disjunction

Watch out for conjunction words like “and,” “or,” and “but.” Conjunctions typically imply that more than one thing will be done, which for scenario titles implies that more than one behavior will be covered. Or, it indicates that a Scenario Outline may be appropriate Don’t break the Cardinal Rule of BDD! Keep each scenario focused on one main behavior.

Avoid other conjunctions like “because,” “since,” and “so” as well. Phrases starting with those words often give an explanation for why the scenario exists. However, for conciseness, scenario titles should focus on what the behavior is. The why can either be deduced from the steps or made plain with comments.

Bad Example Good Example
The user can request an insurance quote from the big “Get-A-Quote” button on the home page or from the “Insurance Policies” page Two Scenarios: The user requests an insurance quote from the “Get-A-Quote” button on the home page / The user requests an insurance quote from the “Insurance Policies” page

-OR-

Scenario Outline: The user requests an insurance quote

The last five search phrases are saved so that the user can rerun them from the history page The history page saves the last five search phrases

Avoid Assertion Language

Don’t use the words “verify,” “assert,” or “should” in scenario titles. They put the scenario’s emphasis on the assertion rather than the behavior. Assertions are merely a facet of behavior testing – they verify that something exists or that two values are equal. Behavior scenarios, however, are full software specifications. BDD is a development practice for making better software products – it’s not just a test tool. Don’t reduce the behavior-driven mindset to a test-only mindset.

Furthermore, leading every scenario title with “verify” or “assert” becomes very repetitive. The words just don’t enhance the meaningfulness of the title. They also thwart alphabetical order.

Bad Example Good Example
Verify the user can change their address on the profile page Profile page address change
Assert that a stock quote is displayed in green text when its value is higher than its previous closing value A stock quote has green text when its value is higher than its previous closing value
The goodbye page should be displayed after a successful logout Logout displays the goodbye page

 

Do you have any more suggestions? Put them in the comments below!

Please Hang Up and Dial Again: Handling Test Interruptions in CI/CD

This post was originally published by Sealights on December 19, 2017 as part of their article Test Quality in CI/CD – Expert Roundup. I was honored to contribute my thoughts on automatic recovery in test automation, and I reblogged the text of my contribution here for Automation Panda readers. Please check out contributions from other experts in the full article!

Test automation is an essential part of CI/CD, but it must be extremely robust.
Unfortunately, tests running in live environments (integration and end-to-end)
often suffer rare but pesky “interruptions” that, if unhandled, will cause tests to fail.
These interruptions could be network blips, web pages not fully loaded, or
temporarily downed services – any environment issues unrelated to product bugs.
Interruptive failures are problematic because they (a) are intermittent and thus
difficult to pinpoint, (b) waste engineering time, (c) potentially hide real failures,
and (d) cast doubt over process/product quality. Furthermore, CI/CD magnifies
even rare issues. If an interruption has only a 1% chance of happening during a test,
then considering binomial probabilities, there is a 63% chance it will happen after
100 tests, and a 99% chance it will happen after 500 tests. Keep in mind that it is not
uncommon for thousands of tests to run daily in CI – Google Guava had over 286K
tests back in July 2012!

It is impossible to completely avoid interruptions – they will happen. Therefore, it is
imperative to handle interruptions at multiple layers:

  1. First, secure the platform upon which the tests run. Make sure system
    performance is healthy and that network connections are stable.
  2. Second, add failover logic to the automated tests. Any time an interruption
    happens, catch it as close to its source as possible, pause briefly, and retry the
    operation(s). Do not catch any type of error: pinpoint specific interruption
    signatures to avoid false positives. Build failover logic into the framework
    rather than implementing it for one-off cases. For example, wrappers around web element or service calls could automatically perform retries. Aspect-
    oriented programming can help here tremendously. Repeating failed tests in their entirety also works and may be easier to implement but takes much
    more time to run.
  3. Third, log any interruptions and recovery attempts as warnings. Do not
    neglect to report them because they could indicate legitimate problems,
    especially if patterns appear.

It may be difficult to differentiate interruptions from legitimate bugs. Or, certain
retry attempts might take too long to be practical. When in doubt, just fail the test –
that’s the safer approach.

Django Favicon Setup (including Admin)

Do you want to add a favicon to your Django site the right way? Want to add it to your admin site as well? Read this guide to find out how!

What is a Favicon?

A favicon (a.k.a a “favorite icon” or a “shortcut icon”) is a small image that appears with the title of a web page in a browser. Typically, it’s a logo. Favicons were first introduced by Internet Explorer 5 in 1999, and they have since been standardized by W3C. Traditionally, a site’s favicon is saved as 16×16 pixel “favicon.ico” file in the site’s root directory, but many contemporary browsers support other sizes, formats, and locations. There are a plethora of free favicon generators available online. Every serious website should have a favicon.

AP Favicon

The favicon for this blog is circled above in red.

Making the Favicon a Static File

Before embedding the favicon in web pages, it must be added to the Django project as a static file. Make sure the favicon is accessible however you choose to set up static files. The simplest approach would be to put the image file under a directory named static/images and use the standard static file settings. However, I strongly recommend reading the official docs on static files:

Embedding the Favicon into HTML

Adding the favicon to a Django web page is really no different than adding it to any other type of web page. Simply add the link for the favicon file to the HTML template file’s header using the static URL. It should look something like this:

Better Reuse with a Parent Template

Most sites use only one favicon for all pages. Rather than adding the same favicon explicitly to every page, it would be better to write a parent template that adds it automatically for all pages. A basic parent template could look like this:

And a child of it could look like this:

As good practice, other common things like CSS links could also be added to the parent template. Customize parent templates to your project’s needs.

Admin Site Favicon

While the admin site is not the main site most people will see, it is still nice to give it a favicon. The best way to set the favicon is to override admin templates, as explained in this StackOverflow post. This approach is like an extension of the previous one: a new template will be inserted between an existing parent-child inheritance to set the favicon. Create a new template at templates/admin/base_site.html with the contents below, and all admin site pages will have the favicon!

Make sure the template directory path is included in the TEMPLATES setting if it is outside of an app:

Django REST Framework Browsable API Favicon

The Django REST Framework is a great extension to Django for creating simple, standard, and seamless REST APIs for a site. It also provides a browsable API so that humans can easily see and use the endpoints. It’s fairly easy to change the browsable API’s favicon using a similar template override. Create a new template at templates/rest_framework/api.html with the following contents:

Favicon URL Redirect

A number of other articles (here, here, and here) suggest adding a URL redirect for the favicon file. Unfortunately, I got mixed results when I attempted this method myself: it worked on Mozilla Firefox and Microsoft Edge but not Google Chrome. (Yes, I tried clearing the cache and all that jazz.)

Django Favicon Apps

There are open-source Django apps for handling favicons more easily. I have not used them personally, but they are at least worth mentioning:

YAML Comments in Gherkin Feature Files

In Gherkin-based BDD test frameworks, feature files hold behavior scenarios with Given-When-Then steps. Features and scenarios may be categorized by tags for hooks and filtering, and additional comment lines may be added anywhere. However, Gherkin itself may not be sufficient enough to capture all desired test metadata. Tags are great for simple classification but crude for larger information. And comments are meaningful only to the reader.

Fraser Scott (zeroXten) came up with a nifty idea for improving Gherkin information while working on the OWASP Cloud Security project: write YAML comments in feature files to provide more formal documentation. As stated on the project home page, “The OWASP Cloud Security project aims to help people secure their products and services running in the cloud by providing a set of easy to use threat and control BDD stories that pool together the expertise and experience of the development, operations and security communities.” It’s a pretty cool idea – use Gherkin to model attacks for both education and automation. The team is writing YAML comments at the top of feature files to provide custom information in a clean, readable format that could also be easily parsed by other tools. Below is an example feature file I copied from the project, with YAML comments at the top:

At first, I wasn’t too thrilled by the thought of YAML comments in feature files. Gherkin should provide all specification needs, and tag classification is often needed for automation. However, the YAML comments are quite clean, and for this project, they appear to document aspects of the scenarios that shouldn’t be buried in Gherkin (such as confirmation status and reference links). YAML is a very sensible format for formalized comments, too.

Take this idea as food for thought: YAML comments can be an effective way to add metadata to Gherkin feature files. Just make sure to capture all behavior specification using Gherkin and to still use tags for automation.