Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Post History
This is more or less equivalent to a long used testing technique that I've commonly heard referred to as gold filing. The benefit of it is that it is very cheap to do, and it can catch even minor c...
Answer
#1: Initial revision
This is more or less equivalent to a long used testing technique that I've commonly heard referred to as gold filing. The benefit of it is that it is very cheap to do, and it can catch even minor changes that you may not have covered with a typical unit test. Two pitfalls I'll focus on are: * Gold files tend to significantly *over*specify the output. * Creating a gold file is not always trivial. You've mentioned some other downsides such as often being coarser-grained and slightly closer to integration tests. Gold file test failures can also be less useful for diagnosing a problem, e.g. a small error could lead to a diff where everything is different. A downside that it shares with unit tests is that it is a point test and so it is an inefficient way to express a (partial) specification^[Property-based testing is a much more efficient way to specify properties of your code.]t. It's arguably worse than unit tests in this regard as there's usually a reasonable indication of the general property a unit test is (poorly) verifying. For example, if the property you want to check is that whenever one key is in the output another key is also required to be in the output, then this will probably be fairly obvious in most unit tests, but will be much less obvious in a collection of gold file tests. Returning to the above two pitfalls, the first is a bit theoretical though it definitely has practical consequences. When gold filing, the gold file records *every* aspect of the output. It has no way of knowing and makes no attempt to distinguish the "relevant" or "important" aspects from the irrelevant. For example, if you're comparing the actual text strings of JSON, then the order the keys are serialized in will matter to the gold file but don't matter to the JSON object. Even if you compare the gold output to the actual output as JSON objects, you still commonly run into this problem in many places. For example, if I have a list of results that represents a *set* of things, then the order doesn't matter but the gold file will still arbitrarily canonize a particular order. This mainly leads to spurious test failures. This problem is especially bad when these irrelevant aspects are effectively non-deterministic, which leads to the next downside. It's pretty common that simply recording the output will not produce a usable gold file test. In fact, even before that, it's not always simple to record the output at all. The relevant objects you care about may not be serializable (e.g. including higher-order functions), may require non-trivial techniques to serialize (e.g. cyclic data structures), or, most commonly, may just be inconveniently large perhaps due to sharing that is lost. This leads to additional code to serialize the result which can itself be a source of bugs. The serialization process itself can hide bugs (e.g. numbers that look the same but are different types). Assuming that you can get reasonable serialized output, it's pretty common for there to be parts of the output that you *don't* expect to stay constant across runs^[This is another manifestation of overspecification.]. Common examples are timestamps and surrogate IDs from databases. Minor numerical differences can easily occur in more number crunching scenarios. It's also possible that there is some harmless non-determinism arising from concurrency, e.g. you accumulate the results of some asynchronous requests as a list but it is conceptually a set and the ordering in the list doesn't matter and can vary due to timing. A common way of dealing with this is to post-process the output (both when generating the gold file and in actual tests). This may involve sorting output to force an order, replacing IDs with local identifiers, making dates relative, and likely a myriad of other "patches". The problem is that this is additional code that can itself have bugs and makes gold file tests less cheap. Ultimately, gold filing is just another tool in the testing toolbox. There are plenty of cases where it produces a fairly good quality point test (in terms of catching problems and coverage) with a very minimal amount of effort. There are also plenty of cases when it takes a decent amount of effort to get a reasonable gold file while simultaneously degrading the effectiveness of the test.