Where to start when fixing tests

My test suite isn't horrible, but it isn't great either...

You have a test suite that runs decently well, but you have some transient failures and the suite has been taking progressively longer as time goes on. At some point, you realize that you are spending the first 15 minutes of a deploy crossing your fingers hoping the tests pass, and the next hour re-running the suite to get the tests to pass "transiently". You have a problem that should be addressed, but where do you start?

Generally, you should start with stabilizing your suite. Consistently passing in 1 hour is a better situation than having to run a test suite 2-3 times at 45 minutes each.

How do you know which tests to tackle first?

Approach

Our testing stack: minitest, Capybara, PhantomJS, Poltergeist, and Jenkins or CircleCI.

We use Jenkins and CircleCI as part of our continuous integration process, which means they are on the critical path to deploying. If our tests pass quickly and consistently locally, but not in our CI environment, we still can't (or shouldn't) ship. "It works on my machine" is rarely a good enough defense. To solve our slow and flaky problem, we want to be sure we are looking at our test performance on servers in our deploy path.

How big of a problem do you have?

How often does your test suite pass? Are there particular suites within the project that fail more frequently? Jenkins and CircleCI can show you this history, but we couldn't find summary level data like, "this suite has passed 75% of the time in the last month".

How do you find your flakiest tests?

We couldn't find an easy way. You can have people document failing tests when they come across them, but manual processes are destined to fail.

How do you find your slowest tests?

There are a few gems that can help you identify your slowest tests locally, like minitest-perf, but we want to know how our tests perform in the continuous integration environment. Jenkins and CircleCI provide some of this data, but it is pretty limited.

Solution

We created JUnit Visualizer to help collect the data we want

Gathering test data

Jenkins and CircleCI support the JUnit.xml format, which includes test timing, test status, and number of assertions. With JUnit.xml, we can leverage an industry standard, and CircleCI maintains a gem, minitest-ci, that exports minitest data to the format. The gem can basically be dropped into an existing project using minitest. It creates an xml file per test file that is run, and saves it in the "test/reports" directory by default.

To standardize our integration with Jenkins and CircleCI, we push the xml files to S3, using a directory per build. We use the following to accomplish pushing to S3:

S3 upload configuration

Displaying test data

The main categories of test data we want to view:

  1. Historical information that shows how frequently our tests pass or fail. This is broken down by suites if we have more than 1 suite within a project. This is helpful in focusing our attention to the worst offenders.
  2. Single list of failures that shows all of the test failures, across suites, on one page. This is a convenient way to see all of the failures without having to click into the details of each suite.
  3. Unstable tests list that shows which tests fail the most. This allows us to see our "transient" test failures, as well as identify areas of our code that may be fragile. This provides guidance on where to start fixing tests.
  4. Slowest tests list that shows which tests are taking the most time. There is no point in speeding up a test that takes 1 second, if you have a test that is taking 45 seconds.
  5. Duration trends that show how your test duration is changing over time. It is helpful to see that we are making progress.

For screenshots of how these look in JUnit Visualizer, check out the section at the bottom of this post.

Next Steps

We have made great progress on our stability and speed since starting on JUnit Visualizer, how we addressed the test issues is chronicled here.

Some potential next steps for JUnit Visualizer:

  • Enhance the trend charts to account for outliers
  • Be able to reset the unstable test list when we think we have fixed an unstable test

Check out the code for JUnit Visualizer here: https://github.com/avvo/junit_visualizer

Screen shots

Historical Information

We wanted to show how often our tests pass, broken down by project and the suite within the project.

Project view with suites

Single list of failures

We wanted a better summary view of the tests that failed. In Jenkins v1, you can only see the failures within a suite, which means there is a lot of clicking around.

View of the errors for a specific build, where skips and errors are on top

Failures across suites

Unstable tests

We wanted to find the tests that failed most frequently.

View of the tests, with most frequent failures on top

Unstable Tests

Slowest tests

The tests that take the most time, sorted slowest to fastest.

View of the tests for a specific build, slowest test at the top.

Slowest Tests

Duration trends

As we started to fix slow tests, we wanted to be able to see how our test duration changed over time.

There is a simple graph that shows the duration (in seconds)

Test duration over time