I've always had a hard time describing my approach to automated testing; it tends to be much easier to explain what it isn't. This is usually a hint that I don't fully understand my own principles, but that doesn't stop me from having opinions!

There is a rough continuum of testing philosophy, with "what's testing?" on one end and TDD on the other. You can think of TDD as the "agile school", since the intellectual groundwork was laid by prominent figures in the agile community; people like Kent Beck, Martin Fowler, and Bob Martin.

In this method, testability of code is paramount, because making code more testable incidentally improves all other measures of goodness. Testable code is loosely coupled and has tight cohesion. It has established, verified contracts between its components. It's correct, because it has good test coverage. Its functions are short, and short code is readable code. It's documented, because tests themselves represent an alternate description of code's behaviour that is verified each time your test suite passes.

Advocates may disagree on minor details; the key idea is that code quality goes up with testability. TDD's popularity has been pretty resilient over the years, perhaps due to the widespread adoption of agile practices (or, simulacrums named after agile practices) in engineering organizations.

Despite its success, I never really bought TDD's claims. The basic philosophy gets the causal relationship backwards: good code is testable, not the other way around.

For a long time, I couldn't understand how people used TDD and got good results until I realized that there is an implicit assumption of mastery that doesn't map well to a lot of my projects. It doesn't tend to describe the beginners that we wean on these principles, either.

TDD's strongest proponents tend to work in domains (consultancy, web projects) where a good structure for the solution is already known. For this type of work, the approach starts to makes more sense. While it's tempting to trivialize "solving a problem you've solved before", this describes a lot of meaningful work.

A lot of engineering, however, isn't that. It's more exploratory. How do you write tests for a system that you don't know how to write? Before you've figured out what the system should do? The start of a project is the time when your understanding of the solution is at its worst.

I'm reminded of the Kernighan's famous quote about debugging:

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

You can't write tests with a better understanding of the system than you had when you built it in the first place. If you have very little understanding, then your tests will be worth very little.

This is why I'm really interested in things like like property based testing and fuzz testing, which fill the gaps in your own knowledge by searching for tests you didn't know to write, but am otherwise skeptical about a lot of other industry practices. When you bring up problems with the practice, their advocates always seem to find a way to blame systematic failures on the user.

One such pattern that has been on my mind is mocking.

Like bloom filters, mocks look powerful and elegant, but are less generally useful than they first appear.

A typical introduction to mocking relates to testing code that makes calls to a database. The standard advice is to mock those database queries to return the expected results.

What is this mock other than a reflection of the limits of your own understanding? How are they not just as fallible as the assumptions of correctness you used when building the system you're testing?

It's the same problem as with TDD; writing a good mock requires knowledge of the system being mocked, and you get that either through experience or integration testing. This should be a hint that, if you mock at all, it should be much higher level.

Mocking libraries and mock generators typically include features that encourage a kind of coupling that makes it difficult to change code without breaking tests. Are you counting the number of calls made to your mock? What happens if you add memoisation? You've probably caused failures across the entire suite. The most likely "fix" is to change the call count assertions to whatever the new values are and move on.

Most of these uses of mocks are an indication that your library code is insufficiently observable. It's common to see mocks used to check whether "invisible" behaviour is occuring: to check if a more efficient branch is used when expected, or a new caching layer is being used. Why use a mock to count things like cache gets and misses? This is important telemetry that your caching library should be exporting so you can keep track of them. The better solution here is to make these behaviours visible through their normal exported APIs, and then use those in your tests.

If mocking and DI were truly useless, they'd have never caught on. Of course, there are times where they are appropriate.

Since the main risk is your ignorance of a component's behaviour, you can heavily mitigate this by only mocking your own APIs. You're the one who decides what it should do, and you have more control over when and how it will change. This is why I recommend against mocking a database; build a data access layer for yourself and then mock that instead. You can then verify that contract with integration tests, and your high level code gets a stable and fast interface to test against.

There are other classic cases where mocks can be useful: controlling time, inducing rare error conditions, et al. These have their own caveats, but what they tend to share is that the behaviour they are mocking is relatively simple, even if the underlying implementation is not.

Tests should strive to be simple and obvious, and mocks are neither. Auto-generated mocks are especially complicated. Reserve them for testing behaviour which is simple to describe but complicated to replicate.

Apr 22 2022