If you've got flaky or very slow UI tests this is the post for you. Do any of these problems sound familiar?

  • Unexplainable exceptions during tests

  • Capybara is timing out

  • Capybara cannot find elements on the page that are clearly present

  • I hate writing tests this is awful please send help

  • Tests take forever to do things that are fast manually

  • Order of tests is affecting stability

  • PhantomJS raises DeadClient exception

  Capybara::Poltergeist::DeadClient: PhantomJS client died while processing
  • None of this is consistently reproducible, if at all

  • Existential dread
    Existential dread

Most of the specifics discussed here will be about rails, minitest, capybara, poltergeist, and phantomJS. This is a common stack but the principles here are useful elsewhere.

A test that tests something correctly is the first priority in writing a test. I can't help you with getting the test right, but after that comes stability, then performance. We created a gem that includes most of the things we're going to cover here, and most of the code snippets are directly from this gem.

intransient_capybara

intransient_capybara is a rails gem that combines all of the ideas presented here (and more). By inheriting from IntransientCapybaraTest in your integration tests, you can write capybara tests that are far, far less flaky. The README explains more on how to use it and exactly what it does.

The goals of intransient_capybara are debuggability, correctly configuring and using minitest, capybara, poltergeist, and phantomJS, and improving on some of those things where there are gaps (most notably with the genius rack request blocker). This combines a ton of helpful stuff out there into a gem that will take you 10 minutes to set up.

Test stability

Test stability is monstrously difficult to nail down. Flaky tests come from race conditions, test ordering, framework failures, and obscure app-specific issues like class variable usage and setup/teardown hygiene. Almost nothing is reproducible. We can all stop writing tests, or we can try to understand these core issues a little bit and at least alleviate this pain.

Use one type of setup and teardown. Tests use both setup do and def setup and it matters which you pick, because it affects the order things are called. I recommend always using def setup and def teardown in all tests, because when you have to manually call super, you can choose to run the parent method before or after your own. The example below shows the two options.

class MyTest < MyCapybaraBaseClass
  # Option 1
  def setup
    # I can do my setup stuff here, before MyCapybaraBaseClass's setup method
    super # You MUST call this
    # ... or I can call it after
  end

  # Option 2
  setup do
    # I do not have to call super because I am not overriding the parent method...
    # but am I before or after MyCapybaraBaseClass's setup method??
  end
end

Use setup and teardown correctly. Your setup and teardown methods will invariably contain critical test pre- and post-conditions. They must be called. It is very easy to override one or both in a specific test and forget to call super. This creates frustrating issues and is very hard to track down. Fix these in your app, and add some code to raise exceptions if you haven't called these methods in the base test class. intransient_capybara does this for you.

Warm up your asset cache. The very first integration test fails transiently a lot? That is suspicious. Gitlab had the same problem. Use a solution like theirs to warm up your asset cache before trying to run integration tests. intransient_capybara does this for you. Wow!

Wait on requests before moving on. Tests can leave around AJAX queries even if you don't have "hanging" queries at the end of a test, and these create two issues. First you might be missing stuff these requests need in order to complete successfully, because you are awesome and have all the right stuff in teardown and are calling it correctly. Now you get obscure things like "missing mock for blahblah" in the next test that is completely unrelated! Second, these use up your test server's likely sole connection and produce even more obscure errors:

Capybara::Poltergeist::StatusFailError - Request to <test server URL> failed to reach server, check DNS and/or server status

You can use wait methods and those can be very helpful inside of a test, but the best way is to absolutely ensure you are done with all requests in between tests. Rack request blocker is THE way to do this. It is just awesome. Can't get enough of it. intransient_capybara includes rack request blocker.

Do not have "hanging" requests at the end of a test. If you have a test that ends with click_on X or visit ABC this request is going to hang around, potentially into the next test and interfere with it. Don't do this - it is pointless! If it is worth doing, it is worth testing that it worked. If not, change it to assert the ability to do this instead of doing it (checking presence of link vs. clicking it for example). This is less important using intransient_capybara because it always waits for the previous test's requests before moving on.

Save yourself a headache. Try hard to solve all transient test problems. You'll still get them from time to time, though. If you've got a tool to tell you what they are, you don't need them to fail your test run for you to fix these things. Most likely you re-run tests and move on anyways, so why re-run the whole set of tests when you can automatically retry failed tests? You can use something like Minitest::Retry for this. Retrying failed tests is far from ideal, but so is having to re-run tests when you're trying to ship something. intransient_capybara has this included and has options for configuring or disabling this behavior.

Test performance

After stability, improving test performance is the next most important thing. There are a ton of things that are easy to do that make tests slow.

Look at your helpers. You have helpers for your tests. They log you in, they assert you have common headers, and all sorts of things. One of these is probably very slow and you haven't noticed. We were logging in nearly every test using the UI, and stubbing that method call instead of actually logging in cut test time in multiple projects anywhere from 40-90%.

Don't use assert !has_selector? This will wait for timeout (Capybara.default_max_wait_time) to complete. If you're expecting a selector, use assert has_selector?. If you aren't, use assert has_no_selector? Learn more from codeship.

Avoid external resources. This is mostly about performance, but is also an important stability improvement. It can help you avoid this:

Capybara::Poltergeist::StatusFailError - Request to <test server URL> failed to reach server, check DNS and/or server status - Timed out with the following resources still waiting for <some external URL>

Almost everyone is susceptible to hitting external stuff in tests. You might be loading jQuery from a CDN, or have javascript on your checkout page that queries a payment provider with a test key. These can timeout, be rate limited, and are properly tested in higher level system integration tests (acceptance testing). You should track these down and eliminate them. The code below can be included in the teardown method of your tests to help you debug your own network traffic. This method is included by default in intransient_capybara.

    def report_traffic
        if ENV.fetch('DEBUG_TEST_TRAFFIC', false) == 'true'
          puts "Downloaded #{page.driver.network_traffic.map(&:response_parts).flatten.map(&:body_size).compact.sum / 1.megabyte} megabytes"
          puts "Processed #{page.driver.network_traffic.size} network requests"

          grouped_urls = page.driver.network_traffic.map(&:url).group_by{|url| /\Ahttps?:\/\/(?:.*\.)?(?:localhost|127\.0\.0\.1)/.match(url).present?}
          internal_urls = grouped_urls[true]
          external_urls = grouped_urls[false]

          if internal_urls.present?
            puts "Local URLs queried: #{internal_urls}"
          end

          if external_urls.present?
            puts "External URLs queried: #{external_urls}"

            if ENV.fetch('DEBUG_TEST_TRAFFIC_RAISE_EXTERNAL', false) == 'true'
              raise "Queried external URLs!  This will be slow! #{external_urls}"
            end
          end
        end
      end

Don’t repeat yourself. Lots of tests have overlap - try to test one thing. Tests with a copy/paste start pattern like visit X, click_on ABC are not required. One test can visit X and click_on ABC, and all the others can skip to that page that comes after clicking on ABC. This saves a lot of time - probably 10-20 seconds every time such a pattern is factored out.

Don’t revisit links. Try to assert links, but if you click them you pay a cost. Like the last point, let some other test assert that the page loads and has stuff correct, and it can pay that visit once only over there. assert has_link? instead of click_on link.

Don't use sleep. Sleeps are either too long or too short. Writing sleep 5 might make you look cool to your friends but it is damaging to your health and should be avoided. Don't get peer pressured into sleeps in your tests. You can assert_text to make it wait for the page to load or write a simple helper method wait_for_page_load!

  def wait_for_page_load!
    page.document.synchronize do
      current_path
      true
    end
  end

You can wait for ajax too. thoughtbot solved this. intransient_capybara includes these methods and uses them in teardown for you already, and makes them available for you to use inside of your own tests.

Avoid visit in your setup method. If you write visit in a setup method in a file that has a bunch of tests, you did something dangerous. One of our tests was visiting 3 pages before visiting more pages in the test itself. Try to break down what you want to test with regards to visiting pages so you can minimize this. Every visit call will be 2-10 seconds long, and it is easy to have pointless visits go unnoticed.

Delete all your skipped tests. We had so many it affected performance, and there was no point to them. Fix or create these stubbed tests today or just delete them.

Parallelize! By breaking your tests down into suites, you can run your tests in parallel a lot easier. You can have parallel test harnesses run SUITE=blah rake test. The matrix configuration in Jenkins makes this a lot easier. If you use something hosted like CircleCI, they can often run things in parallel even without creating suites (allowing you to specify directories per parallel container to be executed). You can try to balance out the tests run in each parallel container and get the fastest times. Our acceptance tests were almost twice as fast after less than an hour of parallelization work, and optimized parallelization with only 3 containers reduced our most important project's tests by more than half (and this again took less than a dev day - this is homerun level stuff).

Find your slowest tests. You need to find or create a tool that can monitor your performance over many test runs and highlight the slowest tests so you can tackle the problems in a targeted way. Once you've dealt with systemic problems, you're left optimizing test by test. There are gems that help you output test performance, such as minitest-ci and minitest-perf.

Results

We're not perfect yet, but these tips and intransient_capybara have reduced the rate of transient failures in our tests from a whopping 40-50% of all test runs to virtually none (<1%). It only takes one failed transient test to fail the whole run, so things have to be really stable for it to start passing consistently. The performance has gone from more than one hour to about 16 minutes in CircleCI (and that is not the best it can be). Acceptance tests have gone from 15 minutes with around a 25% transient failure rate in pre-production environments, and a low transient rate but 10 minutes in production, to 2.5 minutes in pre-production and 1.5 minutes in production, with a test-caused transient rate of near 0 (transients today are due to pre-production environmental issues, not the tests or their framework).