Flaky

A plugin for nosetests or py.test that automatically reruns flaky tests.

pip install flaky

Available on GitHub: https://github.com/box/flaky

Why?

Ever been tempted to do this?

'Fixed' Test

There are lots of wrong reasons to do something like this, but a test that fails only on occasion can really make you want to.

Flaky Tests

We call tests flaky when they:

  • Only fail sometimes
  • Have a failure mode that may be very difficult to fix
  • Often pass when rerun

These tests put us in a bad position, and there's not a perfect fix.

The mathematics of flaky tests

Given 50 flaky tests (99% pass rate), how often will the test run fail?

What about with 100 flaky tests?

The mathematics of flaky tests

Given 50 flaky tests (99% pass rate), how often will the test run fail? 40%

\begin{equation*} 1 - \left(1 - 0.01\right)^{ 50} = 0.4 \end{equation*}

What about with 100 flaky tests? 64%

\begin{equation*} 1 - \left(1 - 0.01\right)^{ 100} = 0.64 \end{equation*}

The mathematics of rerun flaky tests

If we retry those 50 flaky tests, however, the test run fails just 0.5% of the time.

\begin{equation*} 1 - \left(1 - 0.01^2\right)^{ 50} = 0.0005 \end{equation*}

What about with 100 flaky tests? 1%

\begin{equation*} 1 - \left(1 - 0.01^2\right)^{ 100} = 0.001 \end{equation*}
In [1]:
%%file flaky_tests.py

def test_list_length(my_list=[]):
    my_list.append(0)
    assert len(my_list) > 1 # Fails the first time it's run. Passes on subsequent runs.
Overwriting flaky_tests.py
In [2]:
!nosetests flaky_tests.py
F
======================================================================
FAIL: flaky_tests.test_list_length
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/jmeadows/Documents/Projects/pycon-flaky/flaky_tests.py", line 4, in test_list_length
    assert len(my_list) > 1 # Fails the first time it's run. Passes on subsequent runs.
AssertionError

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)

If this test can't be fixed, you have a few options.

This test would be pretty easy to fix, but many of us have encountered tests that rely on external components that might not have a solution that can be reached in the amount of time available.

In [3]:
# TODO: fix this test next month when I have time
#def test_list_length(my_list=[]):
    #my_list.append(0)
    #assert len(my_list) > 1

Comment out the flaky test

Commenting out the flaky test makes sure it won't fail, but it leaves your code uncovered and defeats your tooling. No good!

In [4]:
from unittest import skip

@skip("Fix this text next month when I have time.")
def test_list_length(my_list=[]):
    my_list.append(0)
    assert len(my_list) > 1

Skip the flaky test

Skipping the flaky test also makes sure it won't fail, but it's not really much better than commenting it out. These skipped tests are rarely fixed.

What if we could rerun tests automatically when they fail?

Introducing flaky, a plugin for nosetests or py.test that automatically reruns flaky tests.

In [5]:
%%file flaky_tests.py

from flaky import flaky

@flaky
def test_list_length(my_list=[]):
    my_list.append(0)
    assert len(my_list) > 1
Overwriting flaky_tests.py
In [6]:
!nosetests --with-flaky flaky_tests.py
===Flaky Test Report===

test_list_length failed (1 runs remaining out of 2).
	<type 'exceptions.AssertionError'>
	
	<traceback object at 0x1052b4cb0>
test_list_length passed 1 out of the required 1 times. Success!

===End Flaky Test Report===
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK

Flaky

A plugin for nosetests or py.test that automatically reruns flaky tests.

Just mark your flaky tests with @flaky, and run them with py.test or nosetests --with-flaky.

Customizing Flaky

Need more reruns? No problem - just tell flaky how many times it can run the test:

@flaky(max_runs=5)
def test_list_length(my_list=[]):
    my_list.append(0)
    assert len(my_list) > 1

Prefer to make sure the test can pass more than once? Tell flaky how many times it must pass:

@flaky(min_passes=2)
def test_list_length(my_list=[]):
    my_list.append(0)
    assert len(my_list) > 1

Customizing Flaky

Have a lot of flaky tests? Flaky allows you to mark an entire test class @flaky. Still not enough? Simply pass the command line argument --force-flaky to the test runner and all tests will be marked flaky.

The flaky report

The end of each test run includes some information about your flaky tests.

This is important because a failed test won't be reported to the test runner if the rerun passes.

However, if you don't want to see this info, you can pass --no-flaky-report to your test runner.

Examples

In [7]:
%%file flaky_nosetests.py

from flaky import flaky
from unittest import expectedFailure, TestCase

class ExampleTests(TestCase):
    _threshold = -1

    def test_non_flaky_thing(self):
        """Flaky will not interact with this test"""
        pass

    @expectedFailure
    def test_non_flaky_failing_thing(self):
        """Flaky will also not interact with this test"""
        self.assertEqual(0, 1)

    @flaky(3, 2)
    def test_flaky_thing_that_fails_then_succeeds(self):
        """
        Flaky will run this test 3 times.
        It will fail once and then succeed twice.
        """
        self._threshold += 1
        if self._threshold < 1:
            raise Exception("Threshold is not high enough: {0} vs {1}.".format(
                self._threshold, 1),
            )

    @flaky(3, 2)
    def test_flaky_thing_that_succeeds_then_fails_then_succeeds(self):
        """
        Flaky will run this test 3 times.
        It will succeed once, fail once, and then succeed one more time.
        """
        self._threshold += 1
        if self._threshold == 1:
            self.assertEqual(0, 1)

    @flaky(2, 2)
    def test_flaky_thing_that_always_passes(self):
        """Flaky will run this test twice.  Both will succeed."""
        pass
Overwriting flaky_nosetests.py
In [8]:
!nosetests --with-flaky flaky_nosetests.py
../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py:340: RuntimeWarning: TestResult has no addExpectedFailure method, reporting as passes
  RuntimeWarning)
..
===Flaky Test Report===

test_flaky_thing_that_always_passes passed 1 out of the required 2 times. Running test again until it passes 2 times.
test_flaky_thing_that_always_passes passed 2 out of the required 2 times. Success!
test_flaky_thing_that_fails_then_succeeds failed (2 runs remaining out of 3).
	<type 'exceptions.Exception'>
	Threshold is not high enough: 0 vs 1.
	<traceback object at 0x1052c43b0>
test_flaky_thing_that_fails_then_succeeds passed 1 out of the required 2 times. Running test again until it passes 2 times.
test_flaky_thing_that_fails_then_succeeds passed 2 out of the required 2 times. Success!
test_flaky_thing_that_succeeds_then_fails_then_succeeds passed 1 out of the required 2 times. Running test again until it passes 2 times.
test_flaky_thing_that_succeeds_then_fails_then_succeeds failed (1 runs remaining out of 3).
	<type 'exceptions.AssertionError'>
	0 != 1
	<traceback object at 0x1052c4638>
test_flaky_thing_that_succeeds_then_fails_then_succeeds passed 2 out of the required 2 times. Success!

===End Flaky Test Report===
----------------------------------------------------------------------
Ran 5 tests in 0.002s

OK
In [9]:
%%file flaky_nosetests_2.py


from flaky import flaky
from unittest import TestCase

@flaky
class ExampleFlakyTests(TestCase):
    _threshold = -1

    def test_flaky_thing_that_fails_then_succeeds(self):
        """
        Flaky will run this test twice.
        It will fail once and then succeed.
        """
        self._threshold += 1
        if self._threshold < 1:
            raise Exception("Threshold is not high enough: {0} vs {1}.".format(
                self._threshold, 1),
            )


@flaky
def test_flaky_function(param=[]):
    # pylint:disable=dangerous-default-value
    param.append(None)
    assert len(param) == 1
Overwriting flaky_nosetests_2.py
In [10]:
!nosetests --with-flaky flaky_nosetests_2.py
.
===Flaky Test Report===

test_flaky_thing_that_fails_then_succeeds failed (1 runs remaining out of 2).
	<type 'exceptions.Exception'>
	Threshold is not high enough: 0 vs 1.
	<traceback object at 0x1049b7fc8>
test_flaky_thing_that_fails_then_succeeds passed 1 out of the required 1 times. Success!
test_flaky_function passed 1 out of the required 1 times. Success!

===End Flaky Test Report===
----------------------------------------------------------------------
Ran 2 tests in 0.001s

OK
In [11]:
%%file flaky_pytest.py

from flaky import flaky
from unittest import TestCase


@flaky
def test_something_flaky(dummy_list=[]):
    # pylint:disable=dangerous-default-value
    dummy_list.append(0)
    assert len(dummy_list) > 1


@flaky
class TestExampleFlakyTests(object):
    _threshold = -1

    def test_flaky_thing_that_fails_then_succeeds(self):
        """
        Flaky will run this test twice.
        It will fail once and then succeed.
        """
        self._threshold += 1
        assert self._threshold >= 1


@flaky
class TestExampleFlakyTestCase(TestCase):
    _threshold = -1

    def test_flaky_thing_that_fails_then_succeeds(self):
        """
        Flaky will run this test twice.
        It will fail once and then succeed.
        """
        self._threshold += 1
        assert self._threshold >= 1
Overwriting flaky_pytest.py
In [12]:
!py.test flaky_pytest.py
============================= test session starts ==============================
platform darwin -- Python 2.7.9 -- py-1.4.26 -- pytest-2.7.0
rootdir: /Users/jmeadows/Documents/Projects/pycon-flaky, inifile: 
plugins: flaky
collected 3 items 

flaky_pytest.py ...
===Flaky Test Report===

test_something_flaky failed (1 runs remaining out of 2).
	<class '_pytest.assertion.reinterpret.AssertionError'>
	assert 1 > 1
 +  where 1 = len([0])
	[<TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/flaky/flaky_pytest_plugin.py:286>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/flaky/flaky_pytest_plugin.py:131>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:521>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:528>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:393>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:113>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:138>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:123>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:394>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/runner.py:90>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/python.py:1174>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:521>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:528>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:394>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/python.py:204>, <TracebackEntry /Users/jmeadows/Documents/Projects/pycon-flaky/flaky_pytest.py:10>]
test_something_flaky passed 1 out of the required 1 times. Success!
test_flaky_thing_that_fails_then_succeeds failed (1 runs remaining out of 2).
	<class '_pytest.assertion.reinterpret.AssertionError'>
	assert 0 >= 1
 +  where 0 = <flaky_pytest.TestExampleFlakyTests object at 0x104ba3dd0>._threshold
	[<TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/flaky/flaky_pytest_plugin.py:286>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/flaky/flaky_pytest_plugin.py:131>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:521>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:528>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:393>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:113>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:138>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:123>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:394>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/runner.py:90>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/python.py:1174>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:521>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:528>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:394>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/python.py:204>, <TracebackEntry /Users/jmeadows/Documents/Projects/pycon-flaky/flaky_pytest.py:23>]
test_flaky_thing_that_fails_then_succeeds passed 1 out of the required 1 times. Success!
test_flaky_thing_that_fails_then_succeeds failed (1 runs remaining out of 2).
	<class '_pytest.assertion.reinterpret.AssertionError'>
	assert 0 >= 1
 +  where 0 = <flaky_pytest.TestExampleFlakyTestCase testMethod=test_flaky_thing_that_fails_then_succeeds>._threshold
	[<TracebackEntry /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/unittest/case.py:329>, <TracebackEntry /Users/jmeadows/Documents/Projects/pycon-flaky/flaky_pytest.py:36>]
test_flaky_thing_that_fails_then_succeeds passed 1 out of the required 1 times. Success!

===End Flaky Test Report===

=========================== 3 passed in 0.03 seconds ===========================
In [13]:
%%file flaky_pytest_2.py

from flaky import flaky
import pytest
from unittest import TestCase

class TestExample(object):
    _threshold = -1

    def test_non_flaky_thing(self):
        """Flaky will not interact with this test"""
        pass

    @pytest.mark.xfail
    def test_non_flaky_failing_thing(self):
        """Flaky will also not interact with this test"""
        assert self == 1

    @flaky(3, 2)
    def test_flaky_thing_that_fails_then_succeeds(self):
        """
        Flaky will run this test 3 times.
        It will fail once and then succeed twice.
        """
        self._threshold += 1
        assert self._threshold >= 1

    @flaky(3, 2)
    def test_flaky_thing_that_succeeds_then_fails_then_succeeds(self):
        """
        Flaky will run this test 3 times.
        It will succeed once, fail once, and then succeed one more time.
        """
        self._threshold += 1
        assert self._threshold != 1

    @flaky(2, 2)
    def test_flaky_thing_that_always_passes(self):
        """Flaky will run this test twice.  Both will succeed."""
        pass
Overwriting flaky_pytest_2.py
In [14]:
!py.test flaky_pytest_2.py
============================= test session starts ==============================
platform darwin -- Python 2.7.9 -- py-1.4.26 -- pytest-2.7.0
rootdir: /Users/jmeadows/Documents/Projects/pycon-flaky, inifile: 
plugins: flaky
collected 5 items 

flaky_pytest_2.py .x...
===Flaky Test Report===

test_flaky_thing_that_fails_then_succeeds failed (2 runs remaining out of 3).
	<class '_pytest.assertion.reinterpret.AssertionError'>
	assert 0 >= 1
 +  where 0 = <flaky_pytest_2.TestExample object at 0x10539f850>._threshold
	[<TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/flaky/flaky_pytest_plugin.py:286>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/flaky/flaky_pytest_plugin.py:131>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:521>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:528>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:393>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:113>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:138>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:123>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:394>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/runner.py:90>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/python.py:1174>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:521>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:528>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:394>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/python.py:204>, <TracebackEntry /Users/jmeadows/Documents/Projects/pycon-flaky/flaky_pytest_2.py:25>]
test_flaky_thing_that_fails_then_succeeds passed 1 out of the required 2 times. Running test again until it passes 2 times.
test_flaky_thing_that_fails_then_succeeds passed 2 out of the required 2 times. Success!
test_flaky_thing_that_succeeds_then_fails_then_succeeds passed 1 out of the required 2 times. Running test again until it passes 2 times.
test_flaky_thing_that_succeeds_then_fails_then_succeeds failed (1 runs remaining out of 3).
	<class '_pytest.assertion.reinterpret.AssertionError'>
	assert 1 != 1
 +  where 1 = <flaky_pytest_2.TestExample object at 0x10539fed0>._threshold
	[<TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/flaky/flaky_pytest_plugin.py:286>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/python.py:1174>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:521>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:528>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/core.py:394>, <TracebackEntry /Users/jmeadows/.virtualenvs/pycon-flaky/lib/python2.7/site-packages/_pytest/python.py:204>, <TracebackEntry /Users/jmeadows/Documents/Projects/pycon-flaky/flaky_pytest_2.py:34>]
test_flaky_thing_that_succeeds_then_fails_then_succeeds passed 2 out of the required 2 times. Success!
test_flaky_thing_that_always_passes passed 1 out of the required 2 times. Running test again until it passes 2 times.
test_flaky_thing_that_always_passes passed 2 out of the required 2 times. Success!

===End Flaky Test Report===

===================== 4 passed, 1 xfailed in 0.03 seconds ======================

Some advice

Flaky, as great as it is, should be used with caution!

Many failing tests are trying to tell you something! Don't just silence them with @flaky.

When applied judiciously, flaky can be a great time-saving tool, but it can also be misused, so apply your best judgment.

Summary

  • Mark your flaky tests @flaky instead of removing or @skipping them.
  • pip install flaky to get started.
  • Run py.test or nosetests --with-flaky to get automatic retrying of failed tests.

This notebook is available for download: http://opensource.box.com/flaky/Flaky.ipynb

Its content is licensed under the Apache License 2.0.

To reproduce the presentation, execute the following commands:

curl -O http://opensource.box.com/flaky/Flaky.ipynb -O http://opensource.box.com/flaky/fixed_test.jpeg
pip install nose pytest flaky ipython[notebook]
ipython notebook Flaky.ipynb