Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on Optimized representation for sets?

Parent

Optimized representation for sets?

+1
−0

I need to do a lot of calculations involving sets. There are relatively few values in the "universe" of candidates that could appear in any of the sets, but potentially very many such sets (they might not initially be distinct, either).

My language has a built-in (or standard library) representation for sets, but it's designed to be general-purpose - a set could contain any object (or at least any hashable object, for hash-based set representations). This makes it very inefficient: it takes a lot of space to store an internal structure (tree or hash table) along with individual objects (or at least pointers thereto), and a simple element membership test needs to either traverse a tree or check a hash table and then also compare an object for equality. To say nothing of basic union and intersection operations.

I don't need this flexibility and do need more efficiency. Is there a simple way to optimize this, taking advantage of the fact that the universe of values I need in my sets is fixed (and small)?

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

[computer-science] tag? (1 comment)
This feels more like computer science than programming (3 comments)
Post
+1
−0

First store the universe of potential elements as a sequence, then encode each set as an unsigned integer interpreted as follows: if the 1s bit in binary is set (1), the set contains the 0th element in the universe sequence; if the 2s bit is set, the set contains the 1st element; if the 4s bit, the 2nd element; and so on. (If the universe needs to contain more elements than there are bits in the machine word size, big-integer support will be needed; or the relevant aspects can be emulated with a simple array of integers.)

To identify the contents of a set, iterate over the sequence while testing bits from the integer, and output the appropriate ones.

To check whether a set contains a given element, just check the corresponding bit. (In order to do this starting with the actual element, rather than an index into the universe sequence, a lookup will be needed; it may therefore be a good idea to also build a dictionary mapping from elements to indices.)

Set unions turn into bitwise-or operations, and set intersections into bitwise-and operations; similarly for (symmetric) differences and so on.

As an added bonus, enumerating the powerset of the universe of elements is trivial: just count upwards from 0 to 2n.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

1 comment thread

Array of boolean (2 comments)
Array of boolean
Lundin‭ wrote about 1 year ago

You could expand this further than the machine word by simply using an array of boolean, which is "language agnostic" and might even boil down to an actual a bit-field if you are lucky. Also, pretty much all languages support bool but not all languages support bitwise arithmetic.

Karl Knechtel‭ wrote about 1 year ago

Of course, that will get into language-dependent details. The languages that do support large integers tend to represent booleans as separate objects and not necessarily have a bit-vector abstraction, so that can incur a lot of overhead again. But the main idea here is the representation of "subsets of the universe" by specifying whether each candidate is included, rather than directly specifying the elements.