Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

When is it OK for duplication of information between message header and payload in a distributed software application?

+4
−0

A friend is involved in rewriting a distributed software application and while discussing the architecture, we noticed that in many cases the messages had duplication between headers and payload.

The headers mostly include metadata such as a correlation id, a phone number identifier, client identifier, which is mostly used for logging purposes.

Depending on business logic, the payload may also include one or several properties also found in the headers.

Due to some limitations of the queuing technology both headers and payload are always included in the HTTP message (i.e. not using queue message metadata).

The arguments discussed and researched so far include:

  • DRY violation - payload should not include information already in the headers, as it might lead to errors (what happens if the same property has different values in the same message?)
  • separation of concerns/client convenience - consumers' business logic should not bother reading headers and rely on the fact that all business-related information is provided in the payload
  • possible serialization issues - if a use case requires payload serialization, the client needs to do extra work to also include relevant header information

I am wondering if there are any best practices defined for such cases, as it is not clear how to correctly decide if a DRY violation makes sense in this case or not.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+3
−0

Nowadays DRY is usually with regards to code, not data. Regardless, even for data, DRY does not outlaw duplication, it just requires a single "authoritative" copy. There are certainly similar ideas for data in relation to database normalization. I don't think either are terribly applicable here.

I would say the separations of concerns would be the driving factor. This corresponds to the layering of the OSI model which has analogous situations.

As you've described the situation (which admittedly is fairly vaguely and broadly so far) it sounds like there are various pieces of data that "just so happen" to be in the metadata for logging purposes which occasionally are also in the payload. I'm using "logging" as representative for any payload-agnostic processing. It also sounds like the logging code doesn't actually understand the format of the payload.

In this context, you couldn't have the logging code look into the payload for the data instead because 1) the logging code doesn't understand the payload, and 2) the data may simply not be in a particular payload. Going the other way, the application also has problem 2 but at the level of system architecture. That is, while the application code can be certain certain fields are present in the metadata for any particular iteration, it can't be certain that they will remain present as the system architecture evolves. Of course, this is a blatant layering violation / failure to separate concerns.

The above doesn't mean there is no reason to reconsider the message design. It may be that the line between payload and metadata can be drawn in a different place to reduce this redundancy. There may also be fields that really shouldn't be duplicated. A simple but strong example is an authenticated user ID which will almost certainly live in the metadata. There should not be a field in the payload that (logically) represent the current user ID because, presumably, that payload field would not be authenticated. The application code, however, would likely not explicitly fetch the user ID from the metadata but use some GetCurrentUserId() to access it (which may well just fetch it from the metadata, but might do additional cryptographic checks or other authentication-related tasks).

I don't really understand the concern for serialization. You should be serializing entire messages, which shouldn't take any extra work, and/or none of the the header data should be relevant. There may be data carried by headers, like the current user ID, that you do want to include. Basically, you want to serialize the payload plus some additional application (request) context some of which — as an implementation detail — is maintained by data passed via headers.

This gives a way of deciding on the payload contents. Per DRY for data, the application (request) context should be the authoritative source for the data it contains and thus should not be duplicated in the payload. As an implementation detail, part of that context may be maintained by data stored in headers. That context, however, should only expose application-level concepts which will probably not reflect everything in the metadata and may include things outside of the metadata. That the request has some header is not an application-level concept, but that the current user is X is.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »