Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on Search tree supporting efficient bulk sequential insert

Post

Search tree supporting efficient bulk sequential insert

+5
−0

For holding ordered sets of keys, there are well-known data structures (the red-black tree, for example) that support O(log(n)) lookup and insertion algorithms. Of course this means that there trivially exist algorithms for inserting a sequence of k keys in O(k log(n + k)). But if the keys to insert are given in sequential order, and it is known that they will form a contiguous sequence in the final set (i.e., they are all greater or all lesser than all the keys already in the structure, or they are all between two consecutive keys already in the structure), is there a data structure that can insert them more efficiently, say in O(k + log(n)) time, while still supporting O(log(n)) lookup?

Intuitively, it seems like after inserting the first key, we already know a lot about where in the tree the remaining keys need to go. And perhaps whatever rebalancing operations are needed can be batched so that the entire operation only has to walk up or down the tree a constant number of times—that's how I came to O(k + log(n)) as a target. But I haven't yet found a way to realize these intuitions.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

General comments (2 comments)
General comments
Lundin‭ wrote about 4 years ago

Isn't the biggest problem here indeed the rebalancing of the tree after each insertion? Algorithm theory typically fails to take that in account and relies on rebalancing happening instantly through "magic". I suppose the most efficient would be to add a whole branch of a contiguous sequence and then rebalance the whole tree once after that. There might be big cache benefits in case the whole branch is allocated with a single heap allocation call.

r~~‭ wrote about 4 years ago

The magic of self-balancing trees isn't that the rebalancing is assumed to be instant, just that it's fast enough that it doesn't change the big-O of the insertion (and this is true for, e.g., red-black trees). Running the self-rebalancing k times for k insertions is still usually cheaper than rebalancing the whole tree naively, unless k is comparable to the original size of the tree. This is ignoring things like CPU caching, though; I don't know much about that.