Author: Andy Knight

I'm a software engineer who specializes in test automation.

PyDev of the Week Interview

This week, I was featured as the PyDev of the Week on DZone and The Mouse Vs. The Python. Many thanks to Mike Driscoll, who interviewed me about my development experience for the article.

Here are the links to the interview:

PyDev of the Week: Andrew Knight on DZone
PyDev of the Week: Andrew Knight on The Mouse Vs. The Python

Pipenv: Python Packagement for Champions!

While recently deploying a new Python Django app to Heroku, I noticed the documentation mentioned a tool I hadn’t known before: pipenv. I thought to myself, “Great, now I need to learn a new tool. What was so bad about pip and virtualenv?” So, I did my research, and BOOM! Yes. Mind blown. Life changed. This.

What It Is

Pipenv is the Python packaging and environments tool for champions.

It unites pip, Pipfile, and virtualenv into a sophisticated workflow with simple commands.
It automatically creates virtual environments for projects.
It automatically updates package dependencies (and their dependencies).
It locks versions for deterministic builds.

Despite some controversy and limitations, I strongly recommend using pipenv for most new Python projects. The Python Packaging Authority recommends it, too.

What It’s About

Packages and environments (“packagement”) are essential to Python development. Typically, Pythoneers create a virtual environment for each project and install dependent packages into it locally using pip. They then “freeze” the dependencies into a requirements.txt file so that others can easily recreate the environment. Virtual environments thus enable different projects to use different package versions without global conflict.

Unfortunately, this traditional workflow has some problems:

It uses multiple tools instead of one and requires many commands.
Different projects can do the workflow differently, which can be confusing.
The requirements.txt file must be manually generated and can easily fall out of date.
Dev-only dependencies are a hassle to separate.
Uninstalling packages will not remove sub-packages.
Dependencies with version ranges instead of fixed versions cause nondeterministic builds.

Pipenv solves these problems by combining pip, Pipfile, and virtualenv into a standard workflow that automatically handles and locks package updates.

How to Use It

See how simple it is to use pipenv with a Python project:

# Install pipenv
pip install pipenv

# Create a new project directory
mkdir panda_project
cd panda_project
echo "print('hello')" > main.py

# Init pipenv:
# Creates a virtual environment
# Then creates Pipfile and Pipfile.lock files
pipenv install

# Install a package:
# Updates the Pipfiles
pipenv install requests

# Install a dev-only package:
# Updates the Pipfiles
pipenv install --dev pytest

# Run commands in the environment
pipenv run python --version
pipenv run python main.py

More Info

There’s no need for me to repeat what other people have already said:

Official Docs
- Pipenv: Python Dev Workflow for Humans
- Pipenv: The Future of Python Dependency Management (PyCon 2018 talk)
- The pypa/pipenv project on GitHub
- Managing Application Dependencies from PyPA
- Pipenv & Virtual Environments from The Hitchhiker’s Guide to Python
Secondary Articles
- Announcing Pipenv! from Kenneth Reitz, the primary author
- How to manage your Python projects with Pipenv
- Why Python devs should use Pipenv
- Stop everything! Start using Pipenv!
Comparisons
Limitations
- Reddit: Why is pipenv the recommended packaging tool?
- Pipenv: promises a lot, delivers very little

Me, after using pipenv for the first time.

[8/24/2018 Update: Mentioned some of the controversy and limitations of pipenv.]

Quality Metrics 101: Product Quality

New to the series? Start from the beginning!

Product quality metrics measure the excellence of a product and its features. They measure the “goodness” inherent in the product, apart from how the product was developed. High-quality processes and tests contribute to, but do not alone guarantee, high-quality products. That’s why quality must be built into the product from the start and checked throughout all phases of development. Below are metrics for assuring quality in the delivered products.

Functionality

Quality Aspect

Does the product work correctly?

Desired State

True – Features either work, or they don’t.

Metrics

Test Failure Rate – The whole purpose of functional testing is to determine which features work and which don’t. Assuming test quality is high, the test failure rate is the single best indicator of product functionality. Higher failure rates mean more broken features. Teams should target low-to-zero test failures. It may be useful to keep a failure history for each test. For large products, it may also be useful to break down failure rates by feature area.

It is imperative to recognize, however, that the test failure rate is meaningful only if test quality is high – meaning that tests have good coverage and reliability. Poor-quality tests will give untrustworthy results. For example, weak coverage could mean that failure rate is low because functionality is not truly exercised, and poor reliability could mean that failure rate is high because tests always crash. Be sure to back up any reporting on test failure rate with assurance that test quality is high (using test quality metrics).

Stability

Quality Aspect

Does the product work reliably?

Desired State

High – Product functionality should be consistently good and available.

Metrics

Build Failure Rate – The build failure rate is the proportion of builds that have failed for whatever reason over a given period of time. While process metrics focus on response times to fix broken builds, the build failure rate itself indicates the health of the product while it is being developed. It does not track how badly a build failed like test failure rate does, but instead it impartially tracks ultimate success or failure. Make sure to limit the history of builds included in the calculation to keep it relevant (such as the last 30 days or so). Occasional build failures are acceptable as long as they are fixed quickly. High build failure rates indicate product instability, which could be due to design flaws, weak pre-check-in testing, tricky bugs, or even pipeline faults.

Uptime – Uptime refers to the total time a system is usable. For example, consider a website that must go down for a one-hour service window every week – its uptime would be 167/168 = 99.4%. Not all downtime is planned, however. A bad deployment during maintenance could knock that website offline for an additional 3 hours – dragging uptime down to 97.6% for the week. This may not seem bad at first, but it’s quite terrible when considering that (a) lost time is lost money and (b) the goal of Six Sigma is 99.99966%. A product should have near-perfect availability. System monitoring tools can easily measure uptime. Low uptime indicates either poor design or lack of failover redundancy.

Performance

Quality Aspect

Does the product work optimally?

Desired State

Optimal – Performance should be at its best in all areas.

Metrics

There are four classic software performance metrics. They may be applied in various ways to aspects of product behavior. Ultimately, software products should have a minimal impact on the system while providing a maximal capacity for work.

Processor Usage – Processor cycles should not be needlessly wasted. Make sure algorithms are efficient in terms of computational complexity (big O) and implementation details.

Memory Usage – Watch out for both memory bloat (when features take up a lot of memory unnecessarily) and memory leaks (when memory is not freed up after it is no longer needed.)

Response Time – Response time, or latency, measures the turnaround time from when an action is taken to when the actor receives feedback that the action is completed. Common examples of response time are web page loading, REST API call responses, and database queries. Response time should be as short as possible.

Throughput – Throughput measures how much load a system can handle. It could refer to data I/O bandwidth, transactions per time unit, number of concurrent users, etc. Typically, higher stress on a system will cause other performance metrics to degrade. The “sweet spot” to find is the maximum throughput value that does not unacceptably impact other performance aspects.

Complexity

Quality Aspect

Is the software code unnecessarily complicated?

Desired State

Minimal – Simple is better than complex. Complex is better than complicated. (See The Zen of Python.)

Metrics

There are a number of code metrics that indicate complexity in various ways.

Lines of Code – One of the most rudimentary metrics is to count the lines of code. All things equal, line count indicates the magnitude of the software product, with the assumption that fewer lines will be easier to maintain. Any modern IDE (or, worst case, shell scripting) can yield line counts. However, all things are not equal, and line count alone does not indicate quality or efficiency.

Cyclomatic Complexity – Cyclomatic complexity measures the number of different execution paths the code can take. It is more meaningful than counting sheer lines of code because it indicates the magnitude of testing needed for full coverage. Lower values are better. Cyclomatic complexity is a popular code metric, and many modern analysis tools can measure it.

Depth of Inheritance – For object-oriented languages, the depth of inheritance measures the maximum length of a class inheritance tree from child class to its ultimate root. For example, in the class inheritance tree of Tiger > Cat > Animal > Object, Tiger would have an inheritance depth of 3. Lower values are desirable because they make classes easier to understand.

There are countless other code metrics available. For example, Microsoft Visual Studio calculates the metrics above plus a maintainability index and class coupling. Halstead metrics are another way to measure complexity.

Satisfaction

Quality Aspect	Does the product satisfy the end user?
Desired State	High – The product should meet the end user’s needs, and the end user should like using it.
Metrics	Customer satisfaction is inherently subjective, so trying to measure it is difficult. Ultimately, the end users must find compelling value in the product over other alternatives, or else they won’t use it or buy it. There are many ways to attempt to gauge customer satisfaction: surveys, interviews, A/B testing, etc. Statistics and psychology also play a part. Check out articles here, here, and here to get some ideas.

Quality Metrics 101: Process Quality

New to the series? Start from the beginning!

Process quality metrics make sure that software development practices build good, high-quality features. Healthy software processes identify and resolve issues as early as possible because later bug discovery means higher cost to fix. Quality starts at inception, when features are first brainstormed, and it carries through design, implementation, and testing. Every step in the development process should have quality checkpoints: acceptance criteria for planning, reviews for design and implementation, and reports for testing. Process quality metrics primarily focus on delivery speed or the effectiveness of feedback loops to make sure a team is responding appropriately to change.

Note: Standard software development methodologies often come with canned metrics. For example, Agile Scrum focuses heavily on velocity for determining a team’s capacity for work, while Agile Kanban focuses heavily on lead time and cycle time for measuring how fast work gets done. This article will not cover methodology-specific metrics – please refer to external resources to learn more about them. Instead, this article will cover generic aspects of process quality.

Delivery Speed

Quality Aspect

How fast are new features with high quality delivered to the end user?

Desired State

ASAP – Deliver them fast without compromising quality.

Metrics

People are impatient – they always want things as soon as possible. Fast delivery speed is thus crucial for businesses to meet client expectations and respond quickly to change. However, delivery speed is not the sole metric for success: it must be counterbalanced with safety measures. Delivery speed could be absolutely minimized by committing changes directly to production, but that’s a terrible practice because the damage risk is too high. The best strategy is to pursue the fastest speed without sacrificing too much coverage.

Time to Production – Time to production focuses on the time it takes for a developer’s checked-in code to become useful to end users. It’s a decent way to judge from a business perspective how quickly new stuff gets out the door. Measure the total time for each code check-in from when it is first committed to when it is deployed to production. Source control logs and deployment histories can be pieced together to measure the total time. It may be beneficial to split check-ins by feature area and to review distributions rather than averages. Short, consistent times are desirable. Long times reveal delays in testing, fixing, and deploying changes.

Pipeline Speed – Pipeline speed is a DevOps-y metric. Measure the total start-to-end time from triggering the build pipeline to the final deployment, and measure the time taken by each stage. This will give insights into bottlenecks, such as: system resource exhaustion, network delays, being stuck in job queues, tests that are too long, etc. Knowing each stage will indicate where the greatest optimizations can occur. For example, parallel test execution can significantly reduce total pipeline time. Use pipeline speed metrics to find efficiencies, not to justify cutting vital stages. Most modern continuous integration systems should provide time metrics.

Test Coverage per Time Period – There is always a tradeoff between test coverage and delivery speed. Assuming tests have optimally efficient execution times, higher coverage means slower delivery. Whenever time periods are fixed (such as CI pipeline limits or release deadlines), the best strategy is to maximize test coverage during the available time. For this purpose, coverage should be heuristically scored in terms of feature coverage priority (or the importance of the behaviors under test), not so much in terms of numerical code coverage. Then, for each test, divide the coverage score by the execution time. Sort tests by this ratio, and select the tests with the greatest scores until the total test execution time reaches the time limit. This approach guarantees that maximal test coverage will be achieved in the given period. It may also be advantageous to determine a threshold score for minimal coverage – if the maximum score for a given time period is below the minimal coverage threshold, then the time period should be increased. This metric is compelling if, for example, a CI pipeline needs more time for tests but managers are hesitant to slow down delivery.

Note: The metrics here cover speed after code is checked in, focusing on operational excellence. Metrics covering speed before code is checked in are important but are typically already covered by standard processes (like Scrum’s velocity). There are several ways to measure speed before code check-in: development time, backlog age, story completion rate, etc. Slow times before check-in indicate that a team is overloaded with work, lacks focus on priorities, or is being disrupted too frequently. However, one major caution for these metrics is that they are difficult to accurately measure, and they presume artifacts are logged precisely at event times. For example, if a story ticket is not created until a week after a new feature was first inspired, then the actual times measured will be inaccurate.

Feedback Notification

Quality Aspect

How quickly does a team identify problems?

Desired State

Fast – Fast feedback helps teams resolve issues quickly before they become more costly.

Metrics

Software development is the poster child for Murphy’s Law: anything that can go wrong will. Problems will happen. Metrics targeting perfection (such as 100% pass rates or 0-bug counts) are foolishly impossible and hopelessly destructive. Instead, metrics should gauge feedback loops – how well a team handles problems as they arise. Feedback has two parts: (1) notification time to discover and report problems, and (2) response time to fix problems. Ultimately, the sum should be minimal, but separating the parts identifies bottlenecks. This section covers notification.

Code Review Effectiveness – Code reviews are often the second line of defense against bugs (the first line being the author themselves). They grant an opportunity for other experts to inspect code for problems before fully committing changes. However, measuring the effectiveness of code reviews can be tricky. A few metrics to consider are:

Percentage of code check-ins that undergo review, if the team notoriously skips reviews
Average review turnaround time, if reviews are ignored
Code change size in terms of line number or another similar unit, if reviews are too large for teams to handle effectively
Issues caught, whenever a review successfully identifies and resolves an issue

Issue Discovery Time – The sooner issues are discovered, the less costly they are to resolve. “Issues” typically mean defects in the product (e.g., “bugs”), but they could include problems with the environment, deployment, or tests. The simplest form of issue discovery time is the measurement from when a pipeline starts to the time the issue is discovered. More advanced measurements can track time back to the root cause, such as when code containing a bug was committed, but these may be difficult to gather or may be less accurate. Issue types should be analyzed as separate distributions. Look specifically for blocking issues that appear late in the pipeline, such as critical services being down, and add checks early in the pipeline to discover them ASAP.

Bugs per Phase – Raw bug counts, like test counts, are not helpful beyond soundbites, but the proportions of bug counts per phase are useful for determining test effectiveness. A well-engineered pipeline should have meaningful phases (or “stages” or “steps”) with feedback after each one. A typical pipeline could have phases for build, unit tests, integration tests, end-to-end tests, and production deployment. Ideally, bugs should be caught in the shortest time, at the lowest level, and in the earliest phase. For example, if the majority of bugs are caught by end-to-end tests or (gasp!) in production, then the lower-level tests might need stronger coverage.

Feedback Response

Quality Aspect

How quickly does a team resolve problems once they are found?

Desired State

Fast – Again, resolve issues quickly before they become more costly.

Metrics

Time to Fix a Broken Build – Build health is vital for successful software development, especially in continuous integration. After a build is broken, it must be fixed ASAP so that it does not block progress. “Fixing” a build means that the pipeline can run to completion with an acceptable test passing rate. Fixing a build may mean:

Fixing a bug in the product
Fixing a problem in the environment, deployment, or tests
Reverting a code check-in that caused a bug
Updating tests to somehow flag the failure

Subverting safety checks (like removing tests or skipping phases) is not acceptable because it doesn’t truly fix the build’s underlying problems.

Measure the time it takes from when a pipeline reports a broken build to when the pipeline produces the first subsequent working build. The distribution of these times will reveal the team’s dedication to build stability. Clearly, shorter times are better. When broken builds are caused by code changes, the author should favor reverting check-ins over attempting fixes for faster recovery speed.

Time to Resolve Bugs – While the time to fix a broken build focuses on immediate product stability, the time to resolve bugs focuses instead on ultimate correctness. Just because a build is fixed does not mean a bug is necessarily fixed – tests may mark it as an acceptable failure, or the code containing the bug may simply be reverted. The time to resolve a bug is the total time from when the bug was first discovered to when it is fixed or otherwise closed (such as being marked as invalid or won’t fix). Bug tracker tools should easily provide this data. Bugs should be separated by severity when analyzing resolution times. Bugs should be resolved quickly, with priority given to higher-severity bugs. Resolution time metrics indicate if bugs are addressed adequately and in the proper order. Long resolution times may indicate overloaded teams, tolerance of low quality, or the need for redesign/refactoring.

Quality Metrics 101: Test Quality

New to the series? Start from the beginning!

Test quality metrics make sure that testing efforts are worthwhile. Though “testing” and “quality” may be synonymous as organizational titles, testing is only one method of enforcing quality. In software, it just happens to be the most effective one. Testing is expensive, though, because it slows down time-to-market. Some people even devalue testing work because it doesn’t add new features to a product. Below are aspects of test quality to consider measuring to prove and even increase the value of testing efforts.

Coverage

Quality Aspect

How much functionality is covered by tests?

Desired State

High – More coverage means less risk. Note that 100% complete coverage is impossible.

Metrics

Coverage may be measured for both manual and automated tests. However, automated test coverage is usually more important because automated tests are meant to be defensive without gaps.

Code Coverage – Code coverage tools check what paths of code are actually exercised by automated tests. While they cannot tell if tests are good or bad, they are great for exposing gaps in coverage. Unit test code coverage is easy because most frameworks have plugins, but above-unit code coverage requires instrumented builds. Look for tools that track more than just lines of code. Target 90%+ coverage. Add new tests to cover any major gaps.

Feature Coverage – Feature coverage is a manual way to score features on test coverage based on planning and review. For this metric to be successful, a team must consistently specify features well; otherwise, this metric will give useless data. Gherkin scenarios a great way to do this – for example, each scenario can be marked as untested, manual, or automated. Feature coverage is unscientific, but it can give a better picture of functionalities actually covered (instead of just the raw lines of code covered).

Automation Debt – Technical debt increases when tests are not automated and thus lack coverage. Teams are often unable to automate all tests originally planned, and test automation is frequently jettisoned from the Definition of Done. Or, a project may not start automating tests until a large chunk of the project is already complete. The best way to track automation debt is to create a backlog for incomplete automation work. Backlog tasks can be sized, prioritized, and planned according to whatever development process is used (Scrum, Kanban, etc.). Appropriate process metrics can then be used to understand the magnitude of the work and, thus, the lack of automated test coverage.

Warning: Test case count, test length, and test code line count are terrible metrics for coverage because they encourage largeness rather than uniqueness. The goal of testing is to have the greatest coverage with the lowest risk for the least work. Anybody can blindly write tests or variations that add no meaningful value.

Reliability

Quality Aspect

Do automated tests consistently reach completion? And how trustworthy are the results?

Desired State

High – Reliability means less time for failure triage or (horrors) reruns.

Metrics

Failure Reasons – Track the failure reason for each test case run. Ideally, tests should fail only when they discover product bugs. However, tests may also fail when:

an acceptable product change caused an automation error because tests were not updated, indicating poor communication or careless updates
an environmental change or interruption caused an automation error, indicating deployment or sysadmin problems
the automation code itself has a bug

Remember, “successful” test runs either pass with appropriate coverage or fail due to product bugs. “Unsuccessful” test runs fail or crash for reasons other than product bugs. Aim to minimize unsuccessful test runs. Never hack a test just to get it passing – always work to fix the problems behind test failures.

Speed

Quality Aspect

How much time do test runs take?

Desired State

Fast – Tests should complete in the shortest time possible.

Metrics

Test Case Execution Time – Test case execution times indicate the efficiency of the automation code. Track the start-to-end execution time for every individual test case run. Then, analyze the data using common sense. For example, outliers may be inefficient tests that need tuning or should be removed altogether. It may be wise to separate test runs by result type or coverage area. Historical data can also be used as a baseline to determine performance impacts when making cross-cutting automation changes.

Test Suite Execution Time – Test suites are sets of test cases, but their execution times are not merely the sum of their tests’ times. A test suite run may include environmental setup, deployment, parallel execution, reporting, and other things. The purpose of tracking test suite execution time is to determine the start-to-end time of the suite in total, because that indicates the speed of feedback and, in CI, delivery. Tracking test suite execution time will also reveal the effect of adding more test cases to the suite, which then factors into the risk-based decisions of including or excluding tests.

Test Pyramid Balance – The Test Pyramid separates tests between unit (bottom), integration (middle), and end-to-end (top) layers. Ideally, there should be more tests at the bottom than at the top. Why? Higher-level tests are more expensive – they take more time to develop, they are more time consuming to triage, and they have slower execution times. Consider the “Rule of 1’s”: a unit test takes ~1ms, an integration test takes ~1s, and an end-to-end test takes ~1m. When scaled to thousands of tests with continuous integration, end-to-end tests simply take too much time. Tracking the proportion of tests at each layer will give a rough picture of the balance. There’s no perfect ratio between layers, but make sure that the tests form a pyramid and not a cupcake, hourglass, or ice cream cone. Rebalance test efforts as appropriate.

Return on Investment

Quality Aspect

Do the tests add greater value than their cost?

Desired State

High – Tests need to be worth the effort. Don’t test for the sake of testing!

Metrics

Measuring return on investment in terms of hard dollars is objectively impossible. The true cost of bugs can never be fully known: if a bug is caught early, the potential cost to fix it later can merely be estimated. The intangible value of protecting brand reputation may be more important than the tangible value of money saved by finding specific bugs. Better quality practices might prevent developers from causing bugs that would have otherwise happened – and there’s no good way to measure that.

Instead, return on investment is better measured by a collection of metrics that validate both code line protection and defect discovery. Use a weighted scorecard to get a more holistic view of ROI. Scorecards can be used with estimates for planning tests, as well as plugged in with actual values to measure the degree of success. Note that some aspects of ROI may be too difficult to measure accurately – in those cases, a LOW-MID-HIGH grading scale may be best. Others may seem like micromanagement.

Priority – Assign each test a priority for its coverage importance. Core functionalities should have the highest priority, while fringe functionalities should have the lowest priority. Focus on high-priority tests. Another way to look at importance is risk, or the chances that bugs will escape if explicit testing for a feature is not done.
Test Execution Frequency – Track how many times tests are actually run. Higher frequency is better. Tests that are rarely run should either be included in more regular runs or removed/archived. This could easily be tracked by a test management tool or database.
Coverage Uniqueness – Duplicate test coverage wastes resources. Unfortunately, this one is difficult to measure. Tools for code coverage or static analysis might help. Manual review, however, is typically a better approach.
Development Cost and Maintenance Cost – Track how much effort it takes to make and keep tests, including man-hours and resources. Lower costs are better, of course. Planning tools may help with this.
Bug Discovery – Track bugs discovered in terms of severity and when and how they were caught. Ideally, the number of bugs caught by customers after a release (meaning, not caught by tests during development) should be minimal, and their severity should be low. Bug tracking tools should easily provide this data. Be warned, though, that the raw bug count is a poor metric. Consider this question: Is a high bug count good or bad? Trick question – during a release, it indicates good test quality but poor product quality; after a release, it indicates all-around poor quality. What matters is that a minimal number of bugs happen at all, and that most of those bugs are caught and fixed before a release. Plus, keep in mind that bugs happen by accident. Finally, focusing exclusively on bug count to determine test value ignores the positive side of testing – that passing tests give confidence that features work correctly.

Quality Metrics 101: The Good, The Bad, and The Ugly

metric – [me-trik] – (noun) a standard for measuring or evaluating something

(Courtesy of dictionary.com)

When developing software, metrics can be a good way to track progress and evaluate quality. Managers typically love them because they provide insights that could otherwise be hard to see. Come on, who doesn’t love pretty charts with rainbow colors? However, gathering metrics is not easy, especially for quality. Some metrics are downright useless, and others encourage bad behavior when used improperly. It is far more important to focus on the most important aspects of quality than to blindly promulgate numbers. This article will cover quality metrics in depth, giving guidance on what quality aspects matter most and how they can be measured.

What are Quality Metrics?

Quality is the degree of a feature’s excellence. Quality metrics attempt to impartially measure a feature’s excellence. The word “attempt” is notable – quality is inherently relative, and metrics can sometimes be subjective. Take pizza as an example: How would the quality of a pizza be measured? One method could be to analyze the freshness and nutritious value of the ingredients, but, Pizza Hut notoriously fought Papa John’s Pizza over the assertion that better ingredients make better pizza. Another method could be to analyze the cooking process, like bake time or the order of toppings, but that would be better for identifying carelessness than quality. The delivery process could also be considered, like Domino’s delivery robots, but that evaluates customer service and not the pizza itself. Ultimately, what matters are the taste and the visual appearance, which are totally subjective to the consumer. Surveys are unreliable. Taste tests have limited selection. Appearance is an art, not a science. Each of these metrics gives a glimpse into quality but does not fully reveal what actually makes a “good” pizza. Together, though, they provide a reasonable picture when the desired metrics are gathered well.

tony_pepperoni-rochester-ny-pizza-coupon

Is that really high quality pizza? Well, what aspects of quality are we measuring? We won’t get a perfect picture of quality from metrics, but we can get a rough idea. Software quality metrics work the same way.

Software Quality

In software, there are three primary types of quality metrics:

Test Quality
- How effective are tests at enforcing high quality standards?
- Examples: code coverage, test failure reasons.
Process Quality
- How effective are processes at delivering good features?
- Examples: time to fix broken builds, time to discover bugs.
Product Quality
- How good is the software product?
- Examples: test failure rate, up-time, customer satisfaction.

The main purpose of software quality metrics is to validate successes and find areas for improvement in the development process. Metrics expose problems like gaps in coverage or slow feedback loops so that a team knows what to improve. They are meant to be informative but not punitive – they should simply report accurate data. Don’t shoot the messenger! For example, if the test failure rate is high, fix the bugs instead of blaming each other.

However, be warned by W. Edwards Deming‘s red bead experiment: Quality cannot be inspected into a product – it must be built in from the beginning! Metrics alone cannot solve problems – they can merely expose them. It is up to the development team to affect the proper change based on what metrics reveal. Awareness is useless without action. And action should ultimately lead to better features, faster delivery, and higher profits.

Choosing Quality Metrics

Metrics are nothing but tools to improve aspects of quality. Not every job needs the full toolbox! Always pick the quality aspect first, and then find the right measuring stick. Don’t just pick some metrics that others say are good. For example, if build stability is the quality aspect that is deemed important (and it should be), then the metric to track it could be the average time to fix a build after it is broken.

The best process for choosing quality metrics is:

Identify a quality aspect that adds value.
Decide if the aspect is worth measuring.
Determine the desired state for that aspect.
Derive the best way to measure progress toward the desired state impartially.
Implement the metric gathering, storage, and analysis.
Revisit the metric periodically to assert its value.
Stop gathering the metric when it ceases to provide value.

Keep in mind that metrics have a cost: they must be gathered, stored, and analyzed. That’s why it’s important to pick the quality aspects that matter most.

This Series

The articles in this series will cover each of the quality metric types in detail. Each will list major quality aspects with meaningful metrics to track them and advice on how to use them. Remember, metrics should be constructive and not destructive.

Andy’s Latest Opportunity

While most of my posts are technical, this one is a personal update:

I have accepted a fantastic new role as a Software Engineer in Test at PrecisionLender! I will be the company’s technical leader for testing and automation: building a strategy, setting up frameworks, writing tests, running tests in a CI/CD pipeline, and educating others. It’s the perfect role for me, and together, we will do great things. PrecisionLender is a very collaborative company that builds a software platform to help banks make smarter loans. They have about a hundred employees right now, and they’re growing. Their Raleigh office is located very close to my home.

With this announcement also comes the bittersweet news of my departure from LexisNexis. After almost a year and a half, it is time to say goodbye. I want to make it very clear that I am not leaving LexisNexis because I am unhappy, but rather to pursue a great new opportunity that providentially found me. My role as a Senior Software Test Engineer at LexisNexis has truly been the greatest opportunity of my career so far. I became a technical leader on one of the strongest test automation teams in Raleigh. I led the development of test frameworks that were shared across the whole company, in addition to writing countless test cases. I did internal consulting with groups across the globe to teach them how to be better testers and automationeers. I even earned the nickname “Reverend BDD” for the many impassioned training sessions I delivered. I grew tremendously in my own professional software skills. I learned from my mistakes along the way with the grace of others. And I found many great, new friends, with whom I will surely miss working. I specifically want to thank my manager, Kalen Howell, Sr., and my team lead, Jeff Wolf, for trusting me to tackle big problems and valuing my expertise. Working for LexisNexis has been a privilege.

My last day at LexisNexis will be Tuesday, April 3. My wife and I will then take a short vacation to Charleston, SC, and I will start my new position at PrecisionLender on Tuesday, April 10. Other than that, I will continue to write this Automation Panda blog and help my wife with her businesses as needed. I will also deliver a talk at PyCon 2018 in Cleveland, Ohio this May entitled, “Behavior-Driven Python.” Be sure to check it out! Connect with me on LinkedIn and Twitter, too.

I am resolute in my career path to continue pursuing testing and automation. Vocationally, we as creatures made in God’s image ought to seek to glorify Him through our creative work. As software engineers, our form of work emulates the creativeness of our Creator. Much in the way that God spoke creation into being, we likewise speak software into being, albeit in a microcosm. The whole discipline of computer science is itself rooted in language, in instruction. The instructions we issue, and the very systems we construct, reflect the logical, rational, orderly nature of God’s creation. Furthermore, as testers, we likewise recognize man’s fallen nature and our need for correction. The systems we implement will never be perfect because we are not equal to God. In testing, we simultaneously assert the wonders of creativity as well as our need for redemption in Christ – both to the glory of the Good Lord. This is what motivates me to pursue test automation. I thank God for this opportunity. Soli Deo Gloria.

andy

BDD Example Mapping

The two major goals of Behavior-Driven Development are better collaboration and automation. Even when the Three Amigos actually get together, collaboration can be tough. Where do we start? What scenarios should we write? What examples should be included?

Well, the Cucumber folks have a practice called “Example Mapping” to make it easier. All you need is a pack of index cards and a big table!

Write the story under discussion on a yellow card at the top of the table.
Write a rule for each known acceptance criteria on a blue card under the story.
Write each example for a rule on a green card.
Write each open question on a red card on the side to discuss later.

Keep writing cards until the team is satisfied with the story. This process provides clear, fast feedback for stories. It should take about 25 minutes per story. A team can quickly see if a story is too big or needs further refinement. Engineers can easily turn example cards into Gherkin scenarios. Remember to assign questions to owners to get answers.

Rather than duplicate documentation here, please read Matt Wynne’s seminal post on the practice, Introducing Example Mapping.

Also, watch this webinar recording from Cucumber about Example Mapping:

Tutoring: A Lifelong Impact

On Saturday, February 17, 2018, I delivered the keynote address at RIT TutorCon 2018 at the Rochester Institute of Technology in Rochester, NY. I was a student tutor at RIT from 2007-2010. The Academic Support Center asked me to speak about my experiences. Below is the transcript of my speech.

It’s good to be back in Ra-cha-cha! Happy Presidents’ Day weekend, and also Happy Chinese New Year! Let me get a good look at our tutors: If you are a tutor, please stand up.

[Wait for tutors to stand up.]

Great! It’s awesome to see so many of you here today. Is anyone in Computer Science?

Now, remain standing if you have been a tutor for at least one year.

[Wait for people to sit down.]

Not bad. What about two years?

[Wait for more people to sit down.]

Three? [Wait.] Four? [Wait.] Five? [Wait.]

What about ten years? Ten years of tutoring? [Give anyone who remains standing a round of applause, and then ask them to sit down.]

Ten years is a long time! A lot can happen; a lot can change. Here’s a question for you today, though: Will your tutoring make an impact in ten years? [Repeat the question for emphasis.]

Ten years ago, I was one of you. I was in my second year at RIT studying computer science, and I worked for the Academic Support Center and TRIO as a tutor for math, physics, and basically anything that was needed. I would have been sitting in your chair if we had these fancy tutoring conferences back them. Things were quite different a decade ago. Let me drop some knowledge bombs on you for the world in February 2008:

We were still on the iPhone 1. iPads did not exist yet.
Barack Obama was still seen as a surprise challenger to Hillary Clinton in the 2008 Democratic primaries.
The Great Recession was looming but had not yet hit.
The Summer Olympics were going to be held in Beijing, China. (Michael Phelps & Usain Bolt)
Lady Gaga had not yet released her debut album.

Now, let me contextualize this for RIT:

Bill Destler was still in his first year as university president.
RIT was still on the quarter system.
Park Point was being built.
The Simon Center (a.k.a. the “Toilet Bowl”) was being built.
The main drop-in study center was the “Math Lab” in Building 1, not Bates.

One thing that looks like it hasn’t changed, though, is Gracie’s. [Assume the audience will laugh.]

By the way, have they knocked down Riverknoll yet? I lived at 232 Kimball Drive. [Assume the audience will laugh or somehow respond.]

A lot happens in ten years. But, will your tutoring have an impact in ten years? Will the tutoring you do today benefit your students years from now? It should.

As college students, life is typically fast-paced. You have classes, you have papers, you have projects; quarters – excuse me, semesters – fly by; and it’s all over after about four years. And, for you, tutoring is just a part of that overall experience. It’s just a part-time job. As we saw earlier, most of you will spend only a few years tutoring before entering your career fields. Personally, I haven’t done any tutoring since 2010. It’s tempting to think that the time you spent tutoring doesn’t matter. So what if you help people finish their homework problems a few times a week? Students come and go anyway. It’s no big deal, right?

Well, if you’re here today at this tutoring conference, I’m pretty sure that tutoring is a big deal to you. You know it’s important. I’d be willing to bet that many of you would do tutoring even if you didn’t get paid – although, the pay is certainly deserved! I want you all to understand that what you do as a tutor will impact your students and will also impact you for the rest of your lives. Tutoring is a vector: I want you to see the line and not just the dot.

Your students come with a myriad of different circumstances. Some are just looking for a healthy environment for doing their homework. Maybe they’re stuck on a tricky physics brain-buster. Others struggle. Some really struggle – and may be one more failure away from academic suspension. But all students have one thing in common: they come to you because they want to do better. Whoever they are, they look to you as tutors to help them succeed. And every question you answer – or rather, every guiding question you turn back to them – puts them further down their paths to success. Today’s practice problems become tomorrow’s degrees. With you, they’ll learn not just the course material but, more importantly, they will learn how to learn. They will learn what questions to ask themselves. They will learn how to find answers using their resources. They will learn to teach themselves. Plutarch once said, “The mind is not a vessel to be filled but a fire to be ignited.”

With my perspective of the line, I want to give you three big ways you can make your tutoring today leave an impact for a lifetime.

First, own your role. As tutors, you have a very unique role with your students: you are peers; you are not professors. That’s a big difference! Professors are experts in their fields with years of experience and dozens of publications. You, as tutors, are students yourselves, just a few more years ahead. You can relate to your students on much more common ground. You’ve taken the same courses. You’ve taken the same tests. You’ve probably even done the very same problems. One of the tutoring tricks is to always work with a student at their level – if they sit at the table, you sit; if they stand at the board, you stand; and unless you’re making a really good example, don’t stand on the table! The equal-level principle also applies to your role as a peer tutor. There’s camaraderie. There’s energy. There’s less embarrassment to ask “stupid” questions. There’s a sense that they can do it because you can do it. So own your role as a peer tutor.

Second, focus on the student and not the problem. The problem is the dot; the student is the line. Tutors aren’t there to solve the world’s problems! Nobody comes to a tutoring center to watch a tutor show off with how much they know or how fast they can solve problems. “Look at how smart I am” – NO! Let’s be real, here: the solution to any given practice problem doesn’t really matter. What does matter is how the student learned to handle problems. Did they make an attempt? Did they look at their formulas? Did they write out their work? Did they persevere when they got stuck? Let me ask you a question: Do you think that I remember specific details to any homework assignments from ten years ago? [Wait for audience response.] Nope! But, I remember that a derivative is a rate of change. And, if I had to solve a derivative again, I’d know exactly where to look in my books to figure it out. That’s how you want your students to be in ten years. Cultivate your students to become independent.

Third, build camaraderie. Your students are already your peers – make them your friends. I don’t have any fancy statistics to share, but I know anecdotally that most students become “repeat customers.” You’ll see them again, and again, and again. Whether intended or not, you will forge relationships with your students. As your tutoring shifts become part of your everyday life, so, too, do the students who show up. Treat every single one of them the way you’d want to be treated. Work to form good relationships. Work to form trust. Be honest when you don’t know something. And furthermore, build camaraderie with your fellow tutors as well! Tutors are a team – each one brings fresh eyes and unique expertise. My specialty? Discrete math and differential equations – what a combo! We, as tutors, are trained in common techniques and share the common burdens to help our students. It’s almost like we have a special, unspoken club. I still keep up with my students and my tutors. I dined with a former student on top of the Space Needle. I partied with another on New Year’s Eve. I’m attending another student’s wedding this summer. A fellow tutor came to mine. So build camaraderie with your students and your fellow tutors.

As I close, I’d like to remind you that you are all in tutoring together. For some of you, this might just be the best job you ever have. I challenge all of you today to make your tutoring count: for now, for ten years from now, and for a lifetime. Tutors don’t make bad students good – tutors make students learn to teach themselves. That is how your tutoring will make a lifelong impact. Thank you.

Django Projects in Visual Studio Code

Visual Studio Code is a free source code editor developed my Microsoft. It feels much more lightweight than traditional IDEs, yet its extensions make it versatile enough to handle just about any type of development work, including Python and the Django web framework. This guide shows how to use Visual Studio Code for Django projects.

Installation

Make sure the latest version of Visual Studio Code is installed. Then, install the following (free) extensions:

Python (published by Microsoft) – for full Python language support
Django Template – for template file source highlighting
Django Snippets – for common Django code
- Alternatively, install Djaniero – Django Snippets if you prefer

Reload Visual Studio Code after installation.

This slideshow requires JavaScript.

Editing Code

The VS Code Python editor is really first-class. The syntax highlighting is on point, and the shortcuts are mostly what you’d expect from an IDE. Django template files also show syntax highlighting. The Explorer, which shows the project directory structure on the left, may be toggled on and off using the top-left file icon. Check out Python with Visual Studio Code for more features.

This slideshow requires JavaScript.

Virtual Environments

Virtual environments with venv or virtualenv make it easy to manage Python versions and packages locally rather than globally (system-wide). A common best practice is to create a virtual environment for each Python project and install only the packages the project needs via pip. Different environments make it possible to develop projects with different version requirements on the same machine.

Visual Studio Code allows users to configure Python environments. Navigate to File > Preferences > Settings and set the python.pythonPath setting to the path of the desired Python executable. Set it as a Workspace Setting instead of a User Setting if the virtual environment will be specific to the project.

Python virtual environment setup is shown as a Workspace Setting. The terminal window shows the creation and activation of the virtual environment, too.

Helpful Settings

Visual Studio Code settings can be configured to automatically lint and format code, which is especially helpful for Python. As shown on Ruddra’s Blog, install the following packages:

$ pip install pep8
$ pip install autopep8
$ pip install pylint

And then add the following settings:

{
    "team.showWelcomeMessage": false,
    "editor.formatOnSave": true,
    "python.linting.pep8Enabled": true,
    "python.linting.pylintPath": "/path/to/pylint",
    "python.linting.pylintArgs": [
        "--load-plugins",
        "pylint_django"
    ],
    "python.linting.pylintEnabled": true
}

Editor settings may also be language-specific. For example, to limit automatic formatting to Python files only:

{
    "[python]": {
        "editor.formatOnSave": true
    }
}

Make sure to set the pylintPath setting to the real path value. Keep in mind that these settings are optional.

Full settings for automatically formatting and linting the Python code.

Running Django Commands

Django development relies heavily on its command-line utility. Django commands can be run from a system terminal, but Visual Studio Code provides an Integrated Terminal within the app. The Integrated Terminal is convenient because it opens right to the project’s root directory. Plus, it’s in the same window as the code. The terminal can be opened from View > Integrated Terminal or using the “Ctrl-`” shortcut.

Running Django commands from within the editor is delightfully convenient.

Debugging

Debugging is another way Visual Studio Code’s Django support shines. The extensions already provide the launch configuration for debugging Django apps! As a bonus, it should already be set to use the Python path given by the python.pythonPath setting (for virtual environments). Simply switch to the Debug view and run the Django configuration. The config can be edited if necessary. Then, set breakpoints at the desired lines of code. The debugger will stop at any breakpoints as the Django app runs while the user interacts with the site.

The Django extensions provide a default debug launch config. Simply set breakpoints and then run the “Django” config to debug!

Version Control

Version control in Visual Studio Code is simple and seamless. Git has become the dominant tool in the industry, but VS Code supports other tools as well. The Source Control view shows all changes and provides options for all actions (like commits, pushes, and pulls). Clicking changed files also opens a diff. For Git, there’s no need to use the command line!

The Source Control view with a diff for a changed file.

Visual Studio Code creates a hidden “.vscode” directory in the project root directory for settings and launch configurations. Typically, these settings are specific to a user’s preferences and should be kept to the local workspace only. Remember to exclude them from the Git repository by adding the “.vscode” directory to the .gitignore file.

.gitignore setting for the .vscode directory

Editor Comparisons

JetBrains PyCharm is one of the most popular Python IDEs available today. Its Python and Django development features are top-notch: full code completion, template linking and debugging, a manage.py console, and more. PyCharm also includes support for other Python web frameworks, JavaScript frameworks, and database connections. Django features, however, are available only in the (paid) licensed Professional Edition. It is possible to develop Django apps in the free Community Edition, as detailed in Django Projects in PyCharm Community Edition, but the missing features are a significant limitation. Plus, being a full IDE, PyCharm can feel heavy with its load time and myriad of options.

PyCharm is one of the best overall Python IDEs/editors, but there are other good ones out there. PyDev is an Eclipse-based IDE that provides Django support for free. Sublime Text and Atom also have plugins for Django. Visual Studio Code is nevertheless a viable option. It feels fast and simple yet powerful. Here’s my recommended decision table:

What’s Going On	What You Should Do
Do you already have a PyCharm license?	Just use PyCharm Professional Edition.
Will you work on a large-scale Django project?	Strongly consider buying the license.
Do you need something fast, simple, and with basic Django support for free?	Use Visual Studio Code, Atom, or Sublime Text.
Do you really want to stick to a full IDE for free?	Pick PyDev if you like Eclipse, or follow the guide for Django Projects in PyCharm Community Edition

[Update on 9/30/2018: Check out the official VS Code guide here: Use Django in Visual Studio Code.]