Mutation testing in Python

Łukasz Chojnacki | 2022-09-21

Do you write unit tests for your code? I hope so. Have you ever wondered how good your unit tests are? If so, what techniques did you use to check this?

Good unit tests: the classic approach

There are several characteristics that good unit tests should have:

they should run automatically - for example on CI runner
they should be isolated - should not affect each other
every test should check exactly one thing
they should be readable and easy to understand
they should have high code coverage

and the question is...

Is that enough?

Well, yes, but actually no

Take a look at the code snippet below:

def fibobacci(n: int) -> int:
    """
    Get the n-th Fibonacci number
    """
    a, b = 0, 1
    for _ in range(n-1):
        a, b = b, a + b
    return b  # BUG! Should be `a`!

This code contains a bug that makes the function return incorrect results. Let's write a test for our function:

def test_fibonacci():
    fibonacci()

Can you already see why test coverage is not always the best metric? Let's run our test with pytest-cov:

> pytest --cov --cov-branch

Name          Stmts   Miss Branch BrPart  Cover
-----------------------------------------------
__init__.py       0      0      0      0   100%
fib.py            5      0      2      0   100%
test_fib.py       3      0      0      0   100%
-----------------------------------------------
TOTAL             8      0      2      0   100%

Great, we have 100% coverage!

...but our test doesn't check anything. There's no single assert statement, it just calls the function. We have a bug, but none of our tests detected it. If only there was a way to prevent it ...

Oh, wait...

There is!

Mutation testing

Mutation testing hero

The idea behind mutation testing is to see how your test suite will behave after you make a change to the tested code. If the test suite is good, at least one of the tests should fail after you make a change. In other words: it verifies that your tests check the behavior of the code, not just run it.

In this post, I will demonstrate how mutation testing works using the mutmut tool, but it is not the only option available (see also: MutPy, mutatest).

So, get to it!

Firstly, let's install the tool:

> pip install mutmut

What mutmut does to your code? It makes subtle modifications that change the behavior of the code. Available mutations include:

changing mathematical operators, such as + to -, > to >=
changing logical operators: and to or and the other way around
swapping keywords, such as True/False, in/not in
swapping copy and deepcopy
swapping "" with None
changing number values by adding 1
changing string values by adding "xx"
changing function argument names by adding "xx"

Let's try it out on our previous example:

> mutmut run --paths-to-mutate=.

- Mutation testing starting -

These are the steps:
1. A full test suite run will be made to make sure we
   can run the tests successfully and we know how long
   it takes (to detect infinite loops for example)
2. Mutants will be generated and checked

Results are stored in .mutmut-cache.
Print found mutants with `mutmut results`.

Legend for output:
🎉 Killed mutants.   The goal is for everything to end up in this bucket.
⏰ Timeout.          Test suite took 10 times as long as the baseline so were killed.
🤔 Suspicious.       Tests took a long time, but not long enough to be fatal.
🙁 Survived.         This means your tests need to be expanded.
🔇 Skipped.          Skipped.

1. Using cached time for baseline tests, to run baseline again delete the cache file

2. Checking mutants
⠏ 8/8  🎉 2  ⏰ 0  🤔 0  🙁 6  🔇 0

The output from the above command indicates that mutmut managed to perform 8 mutations of our short function. Two of them were detected by our tests and "killed," but six of them survived.

To see the details of the mutants that survived, we can take a look at the results summary:

> mutmut results

To apply a mutant on disk:
    mutmut apply <id>

To show a mutant:
    mutmut show <id>


Survived 🙁 (6)

---- ./fib.py (5) ----

1-2, 4-6

---- ./test_fib.py (1) ----

8

Now we can see that mutations with id 1, 2, 4, 5, 6 and 8 survived. Then, using the id of the mutation, we can display its details:

> mutmut show 1

--- ./fib.py
+++ ./fib.py
@@ -1,6 +1,6 @@
 def fibonacci(n: int) -> int:
     """Get the n-th Fibonacci number"""
-    a, b = 0, 1
+    a, b = 1, 1
     for _ in range(n-1):
         a, b = b, a + b
     return b  # BUG! should be a!

The form of presentation is very developer-friendly, it looks like a diff view from git. We can see that mutmut has changed the value of a variable from 0 to 1. What can we do to detect this change? I would start with adding a proper assert statement to the test_fibonacci function:

def test_fibonacci():
    assert fibonacci(10) == 34

Adding the correct assert will make our tests fail - remember the comment with the error note? :)

So let's fix the actual function:

def fibonacci(n: int) -> int:
    """Get the n-th Fibonacci number"""
    a, b = 0, 1
    for _ in range(n-1):
        a, b = b, a + b
    return a

The coverage hasn't changed:

> pytest --cov --cov-branch

Name          Stmts   Miss Branch BrPart  Cover
-----------------------------------------------
__init__.py       0      0      0      0   100%
fib.py            5      0      2      0   100%
test_fib.py       3      0      0      0   100%
-----------------------------------------------
TOTAL             8      0      2      0   100%

But we've killed almost all of our mutants:

> mutmut run --paths-to-mutate=. 

[...]
2. Checking mutants
⠇ 10/10  🎉 9  ⏰ 0  🤔 0  🙁 1  🔇 0

So let's check out what's left to be tested:

> mutmut results

To apply a mutant on disk:
    mutmut apply <id>

To show a mutant:
    mutmut show <id>


Survived 🙁 (1)

---- ./fib.py (1) ----

6

> mutmut show 6

--- ./fib.py
+++ ./fib.py
@@ -2,6 +2,6 @@
     """Get the n-th Fibonacci number, starting with 0 and 1."""
     a, b = 0, 1
     for _ in range(n-1):
-        a, b = b, a + b
+        a, b = b, a - b
     return a

It turns out that for an even argument, it does not matter whether there is a + or - sign between the variables a and b. So let's add another assert, this time for an odd argument:

def test_fibonacci():
    assert fibonacci(3) == 1
    assert fibonacci(10) == 35

And run mutmut once again:

> mutmut run --paths-to-mutate=. 

[...]
2. Checking mutants
⠏ 13/13  🎉 13  ⏰ 0  🤔 0  🙁 0  🔇 0

We did it! No mutant survived the meeting with our test suite.

Additional configuration

You might have noticed that the number of mutants generated by mutmut varied over time. This is due to the fact that the test file was also mutated, and therefore, with the increase in the number of "mutable" lines of code (addition of == characters and numbers), the number of potential mutants to kill also increased.

To make more sense of the mutations, we can configure the package accordingly. The available options can be found under the mutmut run --help command, we can use all of them in the setup.cfg or pyproject.toml file. An example configuration in your pyproject.toml might look like this:

[tool.mutmut]
paths_to_mutate = "src/"
tests_dir = "tests/"

After a slight modification of the project structure:

.
├── __init__.py
├── pyproject.toml
├── src
│   ├── fib.py
│   ├── __init__.py
└── tests
    ├── __init__.py
    └── test_fib.py

we can stop using the --paths-to-mutate parameter in the mutmut run command call, and the number of mutants produced has decreased (the test file is no longer mutated).

> mutmut run

[...]
2. Checking mutants
⠦ 7/7  🎉 7  ⏰ 0  🤔 0  🙁 0  🔇 0

Final thoughts

The purpose of this post was to introduce the idea of mutation tests and to trigger a reflection on whether the tests you write are any good. The example shown here was written in Python, but other languages also have corresponding tools, such as Stryker (JavaScript) or Mutant (Ruby). If you have any questions about this post or would like to discuss mutation testing, feel free to contact me: lukasz.chojnacki [at] deployed.pl.

May good tests always protect you from bad changes!

Back to blog