Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Meta

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on Are questions about (abstract) algorithms and datastructures considered on-topic?

Parent

Are questions about (abstract) algorithms and datastructures considered on-topic?

+3
−0

I just noticed this question about data structures in the Q&A. Although algorithms and data structures are often (part of) a solution to a problem when writing software, this question is constructed in a way that aims to leave out the software part entirely. Especially, because I have a strong feeling that this could just have been a question about a problem in Python. If it would have been a Python question, I would have had no doubts that it fits in this community, but in its current form it feels a bit misplaced.

I checked the help and the on-topic list seems to have a strong focus on the software aspect, indicating that this question might not quite belong here. On the other hand, there is nothing in the off-topic list that would indicate that this question does not belong here.

To me, this is more of a computer science question than a programming question. Therefore, I would expect something like this on cs.stackexchange.com rather than stackoverflow.com. Currently, there is no computer science community, so these questions could also be included as on-topic here (until this community is created). On the other hand, it could also be reasonable to require that these kind of questions are to be asked in a concrete software context.

Does anybody have thoughts or opinions about this?

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

Related Meta discussion (1 comment)
I don't really see how it "leaves out the software part entirely". It's a problem that can arise duri... (2 comments)
Post
+3
−0

As the author, this is my defense.

Design and modeling are on-topic

The site topicality documentation explicitly includes "Software design, architecture, or modeling" as on topic. There doesn't seem to have been any objection in the original feedback process.

I formulated the question with the intent of framing it as a design/modeling question: choosing a representation of data, in order to solve a practical problem (efficiency, in the context of taking advantage of restricted input).

My interpretation is that the lack of a separate CS Codidact community is a deliberate choice, and I would probably vote against a proposal to add one. That split doesn't seem to have gone well for Stack Overflow, and nowadays most things that would be on topic for CS SE are accepted on SO anyway (which also gets endless "how do I find the big-O complexity of {code dump}?" spam, but I digress). I would only want to separate questions if there were somehow a major swell in interest about truly theoretical topics.

The same documentation does disallow "Asking for implementations", but I understand this more as a homework filter. Aside from that, the point of my post is to share an idea about implementation - i.e., a design. That said, data structures go hand in hand with algorithms; so I would expect that Software is also the right place for Q&A like "Q. How can I optimize sorting when the values come from a tightly restricted range of integers? A. Just tally how many times each value appears and output those values in order the right number of times (counting sort)".

Artificiality and abstraction

This is the second time I've had pushback for posting something that doesn't reflect an actual problem that I encountered as the OP (and if we don't get this settled, it will happen many more times, as I intend to keep sharing knowledge this way). I want to emphasize that it really shouldn't matter, but: in point of fact I have had this problem before - in Java, more than 15 years ago.

In fact, the issue applies to any set implementation that allows sets to store arbitrary objects (or even just arbitrary values of a type with a wide value range). I tried to make this clear in the question, including by explicitly bringing up tree-based set implementations. (I also mentioned that such implementations could be either built-in, as in Python, or come from the standard library, as in Java.) If multiple sets have the same value as an element, the implementation would have to either re-point at an existing object, or copy the value (taking up at least a separate word of memory either way), not to mention the overhead of the actual data structure.

It's easy to imagine the same problem in many other languages. In particular, representing a set this way is very much analogous to using a bit-flag enumeration type - it's something that old-school C and C++ programmers should be familiar with. (And indeed, I've seen that old games often use a bunch of flags to keep track of e.g. what cutscenes have been shown, and might plausibly use bit arithmetic to check multiple flags at once - this is fundamentally a set operation, although it's easy to not think of it that way.)

By writing the question specifically about (e.g.) Python, I would needlessly alienate others, and possibly cause drama later down the road when someone else wants to close a (say) Java question as a "duplicate". Stack Overflow has an obsession with narrow focus in order to allow answers to talk about a single specific issue. However, I feel strongly that it's often important to widen focus in order to "cleave the problem space at the joints". That doesn't just go for language-agnosticism: if we don't preemptively relax unnecessary requirements, the end result is people whining about their tuple question being closed as a duplicate of a list question or vice-versa (when they just need to iterate and the differences don't actually matter).

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

No need to defend anything, I am just trying to understand (3 comments)
No need to defend anything, I am just trying to understand
mr Tsjolder‭ wrote over 1 year ago

First of all, I hope it is clear that I did not intend to "attack" your question. I just found the topicality documentation a little ambiguous about this kind of question.

To me, software design/architecture/modelling is one level above the actual algorithm/data structure considerations. On a software design level, I would rather look for libraries that implement a set that does the things that I need before thinking about implementing something myself.

To make this point clear: in Java, I would probably end up with BitSet. In Python, on the other hand, I might consider the bitarray package, but it seems like it needs some extra work for it to be used as a set and therefore, it might just be easier to use set, because Python is typically not something you would use for performance-critical stuff.

If there is no place for a CS community, it might make sense to think about tags for questions like this.

Karl Knechtel‭ wrote over 1 year ago

I hadn't actually thought about using a helper class for interpreting the bits of the integer used to represent universe subsets - you might consider adding an answer to refine this idea, or proposing an edit :) (Although it's not exactly clear to me that this makes things any easier than just using the integer - and despite the name, BitSet doesn't seem to actually implement the set logic, either.)

FWIW, the point isn't just about the performance/ease-of-use tradeoff. Pythonistas often have to care about memory usage as well. Python's set has a very liberal reallocation strategy and a lot of spatial overhead: set(range(18)).__sizeof__() gives 712 on both 3.8 and 3.11 for me, and set(range(19)).__sizeof__() gives 2248.

mr Tsjolder‭ wrote over 1 year ago

The problem is that it does not really make sense to write an answer about BitSet in this general context, since e.g. Python does not really have a data structure like that. This is probably also the reason why I assumed this was a Python and not a Java question.

My point is that by abstracting the question, it is not really meaningful to give an answer on the level of software.