Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

What are the drawbacks of using data snapshot testing?

+2
−0

Our team is finally focusing on writing more automatic testing and one of my ex-colleagues recommended to try out the Verify library.

The tool does the following:

  • runs the test and compares the JSON serialization of the actual result with a JSON file named after the test name. The first run will always fail, as the file is missing
  • the actual data is written to a file (matching the test name) and the file will become the expected result
  • subsequent test runs will succeed as long as the actual result does not change

This is particularly useful for complex objects assertions since it spares the developer to write lots of assertions.

Until now I have avoided comparing large objects, except for rather technical scenarios like deep-cloning where I relied on Fluent Assertions Object Graphs operations (e.g. Should().BeEquivalentTo).

The gain is clear and I think it is a great library. I am wondering about its downsides. The only downside I can think of is increasing the effort to quickly understand what's wrong with a failed test since the result is just a partial object graph mismatch instead of an assertion and a human-readable assertion "because" text.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+4
−0

This is more or less equivalent to a long used testing technique that I've commonly heard referred to as gold filing. The benefit of it is that it is very cheap to do, and it can catch even minor changes that you may not have covered with a typical unit test.

Two pitfalls I'll focus on are:

  • Gold files tend to significantly overspecify the output.
  • Creating a gold file is not always trivial.

You've mentioned some other downsides such as often being coarser-grained and slightly closer to integration tests. Gold file test failures can also be less useful for diagnosing a problem, e.g. a small error could lead to a diff where everything is different. A downside that it shares with unit tests is that it is a point test and so it is an inefficient way to express a (partial) specification[1]t. It's arguably worse than unit tests in this regard as there's usually a reasonable indication of the general property a unit test is (poorly) verifying. For example, if the property you want to check is that whenever one key is in the output another key is also required to be in the output, then this will probably be fairly obvious in most unit tests, but will be much less obvious in a collection of gold file tests.

Returning to the above two pitfalls, the first is a bit theoretical though it definitely has practical consequences. When gold filing, the gold file records every aspect of the output. It has no way of knowing and makes no attempt to distinguish the "relevant" or "important" aspects from the irrelevant. For example, if you're comparing the actual text strings of JSON, then the order the keys are serialized in will matter to the gold file but don't matter to the JSON object. Even if you compare the gold output to the actual output as JSON objects, you still commonly run into this problem in many places. For example, if I have a list of results that represents a set of things, then the order doesn't matter but the gold file will still arbitrarily canonize a particular order. This mainly leads to spurious test failures. This problem is especially bad when these irrelevant aspects are effectively non-deterministic, which leads to the next downside.

It's pretty common that simply recording the output will not produce a usable gold file test. In fact, even before that, it's not always simple to record the output at all. The relevant objects you care about may not be serializable (e.g. including higher-order functions), may require non-trivial techniques to serialize (e.g. cyclic data structures), or, most commonly, may just be inconveniently large perhaps due to sharing that is lost. This leads to additional code to serialize the result which can itself be a source of bugs. The serialization process itself can hide bugs (e.g. numbers that look the same but are different types).

Assuming that you can get reasonable serialized output, it's pretty common for there to be parts of the output that you don't expect to stay constant across runs[2]. Common examples are timestamps and surrogate IDs from databases. Minor numerical differences can easily occur in more number crunching scenarios. It's also possible that there is some harmless non-determinism arising from concurrency, e.g. you accumulate the results of some asynchronous requests as a list but it is conceptually a set and the ordering in the list doesn't matter and can vary due to timing. A common way of dealing with this is to post-process the output (both when generating the gold file and in actual tests). This may involve sorting output to force an order, replacing IDs with local identifiers, making dates relative, and likely a myriad of other "patches". The problem is that this is additional code that can itself have bugs and makes gold file tests less cheap.

Ultimately, gold filing is just another tool in the testing toolbox. There are plenty of cases where it produces a fairly good quality point test (in terms of catching problems and coverage) with a very minimal amount of effort. There are also plenty of cases when it takes a decent amount of effort to get a reasonable gold file while simultaneously degrading the effectiveness of the test.


  1. Property-based testing is a much more efficient way to specify properties of your code. ↩︎

  2. This is another manifestation of overspecification. ↩︎

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »