Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

71%

+3 −0

Meta Are questions about (abstract) algorithms and datastructures considered on-topic?

As the author, this is my defense. Design and modeling are on-topic The site topicality documentation explicitly includes "Software design, architecture, or modeling" as on topic. There doesn't s...

posted 1y ago by Karl Knechtel‭

Answer

#1: Initial revision by

Karl Knechtel‭ · 2023-09-14T19:10:51Z (over 1 year ago)

Copy Link

Raw

Markdown

As the author, this is my defense.

## Design and modeling are on-topic

The [site topicality documentation](https://software.codidact.com/help/on-topic) explicitly includes "Software design, architecture, or modeling" as on topic. There [doesn't seem to have been any objection](https://software.codidact.com/posts/278648) in the original feedback process.

I formulated the question with the intent of **framing it as a design/modeling question**: choosing a representation of data, in order to solve a practical problem (efficiency, in the context of taking advantage of restricted input).

My interpretation is that the lack of a separate CS Codidact community is a deliberate choice, and I would probably vote against a proposal to add one. That split doesn't seem to have gone well for Stack Overflow, and nowadays most things that would be on topic for CS SE are accepted on SO anyway (which also gets endless "how do I find the big-O complexity of {code dump}?" spam, but I digress). I would only want to separate questions if there were somehow a major swell in interest about truly theoretical topics.

The same documentation does disallow "Asking for implementations", but I understand this more as a homework filter. Aside from that, the point of my post is to share an *idea about* implementation - i.e., a *design*. That said, data structures go hand in hand with algorithms; so I would expect that **Software is also the right place** for Q&A like "Q. How can I optimize sorting when the values come from a tightly restricted range of integers? A. Just tally how many times each value appears and output those values in order the right number of times (counting sort)".

## Artificiality and abstraction

This is the second time I've had pushback for posting something that doesn't reflect an actual problem that I encountered as the OP (and if we don't get this settled, it will happen many more times, as I intend to keep sharing knowledge this way). I want to emphasize that **it really shouldn't matter**, but: in point of fact I *have* had this problem before - in Java, more than 15 years ago.

In fact, the issue **applies to any set implementation that allows sets to store arbitrary objects** (or even just arbitrary values of a type with a wide value range). I tried to make this clear in the question, including by explicitly bringing up tree-based set implementations. (I also mentioned that such implementations could be either built-in, as in Python, or come from the standard library, as in Java.) If multiple sets have the same value as an element, the implementation would have to either re-point at an existing object, or copy the value (taking up at least a separate word of memory either way), not to mention the overhead of the actual data structure.

It's easy to imagine the same problem in many other languages. In particular, representing a set this way is very much analogous to *using a bit-flag enumeration type* - it's something that old-school C and C++ programmers should be familiar with. (And indeed, I've seen that old games often use a bunch of flags to keep track of e.g. what cutscenes have been shown, and might plausibly use bit arithmetic to check multiple flags at once - this is *fundamentally a set operation*, although it's easy to *not* think of it that way.)

By writing the question specifically about (e.g.) Python, **I would needlessly alienate others, and possibly cause drama later** down the road when someone else wants to close a (say) Java question as a "duplicate". Stack Overflow has an obsession with narrow focus in order to allow answers to talk about a single specific issue. However, I feel strongly that it's often important to *widen* focus in order to "cleave the problem space at the joints". That doesn't just go for language-agnosticism: if we don't preemptively *relax unnecessary* requirements, the end result is people whining about their tuple question being closed as a duplicate of a list question or vice-versa (when they just need to iterate and the differences don't actually matter).

Communities

Post History