What is the correct cost formula for the average case in insertion sort algorithm?

−0

The following users marked this post as Works for me:

User	Comment	Date
Bennshinpoes‭	(no comment)	Jun 6, 2025 at 20:25

The difference (and the page that you cite actually does mention it, even if broadly) is that algorithmic analysis has interests and (arguably) traditions that most people find counter-intuitive. Specifically, an algorithm's complexity represents the asymptotic nature of how the resource usage (usually time, but not always) grows, and a lot of textbooks will refer to this as "asymptotic notation," as a reminder.

This means a few things.

We only care about the fastest-growing expression. This has theoretical reasons in terms of simplifying the analysis to whatever part of the algorithm dominates the conversation, but also practical applications, in that optimizing the piddly logarithmic part of an algorithm doesn't matter when another part runs in factorial time. I'll call this out at the end, because this answers your "bigger" question.
The complexity only "matters" where it's monotonic, only moving in one direction. If adding another input element sometimes increases the resource usage and sometimes decreases it, none of the analytical methods can deal with that.
We pretend that we have infinite input, because (a) that's probably where the function is most definitely monotonic, and (b) algorithmic analysis (and optimization) matter most in streaming situations, where you can either keep up with the input or fail miserably. Also, for small input sizes, you could just preprocess everything and not care at all, no matter how long it takes. This is so ingrained that weirdness like you noticed for small values gets called (and dismissed as) a "startup anomaly," something mildly interesting if you only deal with those small amounts, but forgotten quickly as the inputs get bigger.
Finally, we mostly ignore constants (addends and factors), because we can resolve those by spending money on better hardware.

When somebody says that an analysis is "wrong," because it overlooks a term in the sum (as the answer that you pointed to does), they mean that they'd rather be more precise. But as that answer indicates, it probably doesn't matter, because we only care about that dominant term for most purposes in theory and practice, that the quadratic-growth part.

As for where that extra term comes from, think of summation like integration, as in calculus.

$$\sum{{i + 1}\over{2}} \approx \int{{i + 1}\over{2}} = \int{{i}\over{2}} + {{1}\over{2}} = {{i^2\over{4}} + {i\over{2}}}$$

Don't actually do this on an exam, by the way. It's going to give you the right answer, but summation and integration are technically distinct, in that calculus only works on continuous functions and we treat computers as discrete and cost measurements as isolated data points, so mixing the operations will make any serious instructor extremely uncomfortable...

And technically, we should add + c to the end of that result, too, an "arbitrary constant," because if we take the derivative of the resulting function, adding any constant from -∞ to 0 to ∞ will give the same function that we integrated/summed. But again, we drop that, because (a) we didn't really want to integrate, and (b) the constant-time term wouldn't matter for any serious algorithmic analysis.

posted 21 days ago

CC BY-SA 4.0

19d ago

John C‭

259 reputation 1 6 27 2

Copy Link

Raw

Markdown

History

2 comment threads

No reason to talk about integration (2 comments)

Not everything is asymptotic analysis (1 comment)

Communities

What is the correct cost formula for the average case in insertion sort algorithm?

0 comment threads

1 answer

2 comment threads