When is it OK for duplication of information between message header and payload in a distributed software application?

−0

Nowadays DRY is usually with regards to code, not data. Regardless, even for data, DRY does not outlaw duplication, it just requires a single "authoritative" copy. There are certainly similar ideas for data in relation to database normalization. I don't think either are terribly applicable here.

I would say the separations of concerns would be the driving factor. This corresponds to the layering of the OSI model which has analogous situations.

As you've described the situation (which admittedly is fairly vaguely and broadly so far) it sounds like there are various pieces of data that "just so happen" to be in the metadata for logging purposes which occasionally are also in the payload. I'm using "logging" as representative for any payload-agnostic processing. It also sounds like the logging code doesn't actually understand the format of the payload.

In this context, you couldn't have the logging code look into the payload for the data instead because 1) the logging code doesn't understand the payload, and 2) the data may simply not be in a particular payload. Going the other way, the application also has problem 2 but at the level of system architecture. That is, while the application code can be certain certain fields are present in the metadata for any particular iteration, it can't be certain that they will remain present as the system architecture evolves. Of course, this is a blatant layering violation / failure to separate concerns.

The above doesn't mean there is no reason to reconsider the message design. It may be that the line between payload and metadata can be drawn in a different place to reduce this redundancy. There may also be fields that really shouldn't be duplicated. A simple but strong example is an authenticated user ID which will almost certainly live in the metadata. There should not be a field in the payload that (logically) represent the current user ID because, presumably, that payload field would not be authenticated. The application code, however, would likely not explicitly fetch the user ID from the metadata but use some GetCurrentUserId() to access it (which may well just fetch it from the metadata, but might do additional cryptographic checks or other authentication-related tasks).

I don't really understand the concern for serialization. You should be serializing entire messages, which shouldn't take any extra work, and/or none of the the header data should be relevant. There may be data carried by headers, like the current user ID, that you do want to include. Basically, you want to serialize the payload plus some additional application (request) context some of which — as an implementation detail — is maintained by data passed via headers.

This gives a way of deciding on the payload contents. Per DRY for data, the application (request) context should be the authoritative source for the data it contains and thus should not be duplicated in the payload. As an implementation detail, part of that context may be maintained by data stored in headers. That context, however, should only expose application-level concepts which will probably not reflect everything in the metadata and may include things outside of the metadata. That the request has some header is not an application-level concept, but that the current user is X is.

posted almost 2 years ago

CC BY-SA 4.0

Derek Elkins‭

2719 reputation 0 53 267 12

Copy Link

Raw

Markdown

History

Communities

When is it OK for duplication of information between message header and payload in a distributed software application?

0 comment threads

1 answer

0 comment threads