Search tree supporting efficient bulk sequential insert


For holding ordered sets of keys, there are well-known data structures (the red-black tree, for example) that support O(log(n)) lookup and insertion algorithms. Of course this means that there trivially exist algorithms for inserting a sequence of k keys in O(k log(n + k)). But if the keys to insert are given in sequential order, and it is known that they will form a contiguous sequence in the final set (i.e., they are all greater or all lesser than all the keys already in the structure, or they are all between two consecutive keys already in the structure), is there a data structure that can insert them more efficiently, say in O(k + log(n)) time, while still supporting O(log(n)) lookup?

Intuitively, it seems like after inserting the first key, we already know a lot about where in the tree the remaining keys need to go. And perhaps whatever rebalancing operations are needed can be batched so that the entire operation only has to walk up or down the tree a constant number of times—that's how I came to O(k + log(n)) as a target. But I haven't yet found a way to realize these intuitions.

Why should this post be closed?


Isn't the biggest problem here indeed the rebalancing of the tree after each insertion? Algorithm theory typically fails to take that in account and relies on rebalancing happening instantly through "magic". I suppose the most efficient would be to add a whole branch of a contiguous sequence and then rebalance the whole tree once after that. There might be big cache benefits in case the whole branch is allocated with a single heap allocation call. ‭Lundin‭ 19 days ago

The magic of self-balancing trees isn't that the rebalancing is assumed to be instant, just that it's fast enough that it doesn't change the big-O of the insertion (and this is true for, e.g., red-black trees). Running the self-rebalancing k times for k insertions is still usually cheaper than rebalancing the whole tree naively, unless k is comparable to the original size of the tree. This is ignoring things like CPU caching, though; I don't know much about that. ‭r~~‭ 19 days ago

0 answers

Sign up to answer this question »