Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Comments on Optimized representation for sets?
Parent
Optimized representation for sets?
I need to do a lot of calculations involving sets. There are relatively few values in the "universe" of candidates that could appear in any of the sets, but potentially very many such sets (they might not initially be distinct, either).
My language has a built-in (or standard library) representation for sets, but it's designed to be general-purpose - a set could contain any object (or at least any hashable object, for hash-based set representations). This makes it very inefficient: it takes a lot of space to store an internal structure (tree or hash table) along with individual objects (or at least pointers thereto), and a simple element membership test needs to either traverse a tree or check a hash table and then also compare an object for equality. To say nothing of basic union and intersection operations.
I don't need this flexibility and do need more efficiency. Is there a simple way to optimize this, taking advantage of the fact that the universe of values I need in my sets is fixed (and small)?
Post
The following users marked this post as Works for me:
User | Comment | Date |
---|---|---|
Karl Knechtel | (no comment) | Sep 14, 2023 at 04:54 |
First store the universe of potential elements as a sequence, then encode each set as an unsigned integer interpreted as follows: if the 1s bit in binary is set (1), the set contains the 0th element in the universe sequence; if the 2s bit is set, the set contains the 1st element; if the 4s bit, the 2nd element; and so on. (If the universe needs to contain more elements than there are bits in the machine word size, big-integer support will be needed; or the relevant aspects can be emulated with a simple array of integers.)
To identify the contents of a set, iterate over the sequence while testing bits from the integer, and output the appropriate ones.
To check whether a set contains a given element, just check the corresponding bit. (In order to do this starting with the actual element, rather than an index into the universe sequence, a lookup will be needed; it may therefore be a good idea to also build a dictionary mapping from elements to indices.)
Set unions turn into bitwise-or operations, and set intersections into bitwise-and operations; similarly for (symmetric) differences and so on.
As an added bonus, enumerating the powerset of the universe of elements is trivial: just count upwards from 0 to 2n.
2 comment threads