Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

What is the correct cost formula for the average case in insertion sort algorithm?

+2
−0

I have been trying to learn about the cost of the insertion sort algorithm lately. I already understand the best case, worst case and average case formulas (eg $n-1$, $\frac{n(n-1)}{2}$, $\frac{n(n-1)}{4}$ respectively). But I saw that this formula isn't working for small numbers of $n$. If we take for example an array with a length of $n=3$ with $T(n)$ for the amount of comparison needed, we get $T_{\text{best}}(3)=3-1=2$, $T_{\text{worst}}(3)=\frac{3(2)}{2}=3$, and $T_{\text{avg}}(3)=\frac{3(2)}{4}=\frac{3}{2}=1.5$. This is very strange, since the average case is more efficient than the best case. And that is impossible. That means that $\frac{n(n-1)}{4}$ doesn't hold for this situation. So what I did was looking at the graph of these functions:

Graph of cost formulas

It seems like that these formulas only works for $n\ge4$. Is that correct? So it basically means that, whenever we have a cost formula for an algorithm, it will become more accurate when $n$ goes larger?

I also came across this page that says that the formula $\frac{n(n-1)}{4}$ is actually wrong. I already didn't understand the part where he came up with the first formula:

$\frac{1}{2}(1_{\text{element is in place}} + i_{\text{element is smallest yet}})$

According to the answer it should have been $$\sum_{i=1}^{n-1} \frac{i+1}2 = \frac{(n-1)n}4 + \frac{n-1}2 = \frac{(n-1)(n+2)}{4}$$ instead. Can someone please explain to me how he came up with this? And why the original one $\frac{n(n-1)}{4}$ wrong?

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+2
−0

The difference (and the page that you cite actually does mention it, even if broadly) is that algorithmic analysis has interests and (arguably) traditions that most people find counter-intuitive. Specifically, an algorithm's complexity represents the asymptotic nature of how the resource usage (usually time, but not always) grows, and a lot of textbooks will refer to this as "asymptotic notation," as a reminder.

This means a few things.

  • We only care about the fastest-growing expression. This has theoretical reasons in terms of simplifying the analysis to whatever part of the algorithm dominates the conversation, but also practical applications, in that optimizing the piddly logarithmic part of an algorithm doesn't matter when another part runs in factorial time. I'll call this out at the end, because this answers your "bigger" question.
  • The complexity only "matters" where it's monotonic, only moving in one direction. If adding another input element sometimes increases the resource usage and sometimes decreases it, none of the analytical methods can deal with that.
  • We pretend that we have infinite input, because (a) that's probably where the function is most definitely monotonic, and (b) algorithmic analysis (and optimization) matter most in streaming situations, where you can either keep up with the input or fail miserably. Also, for small input sizes, you could just preprocess everything and not care at all, no matter how long it takes. This is so ingrained that weirdness like you noticed for small values gets called (and dismissed as) a "startup anomaly," something mildly interesting if you only deal with those small amounts, but forgotten quickly as the inputs get bigger.
  • Finally, we mostly ignore constants (addends and factors), because we can resolve those by spending money on better hardware.

When somebody says that an analysis is "wrong," because it overlooks a term in the sum (as the answer that you pointed to does), they mean that they'd rather be more precise. But as that answer indicates, it probably doesn't matter, because we only care about that dominant term for most purposes in theory and practice, that the quadratic-growth part.

As for where that extra term comes from, think of summation like integration, as in calculus.

$$\sum{{i + 1}\over{2}} \approx \int{{i + 1}\over{2}} = \int{{i}\over{2}} + {{1}\over{2}} = {{i^2\over{4}} + {n\over{2}}}$$

Don't actually do this on an exam, by the way. It's going to give you the right answer, but summation and integration are technically distinct, in that calculus only works on continuous functions and we treat computers as discrete and cost measurements as isolated data points, so mixing the operations will make any serious instructor extremely uncomfortable...

And technically, we should add + c to the end of that result, too, an "arbitrary constant," because if we take the derivative of the resulting function, adding any constant from -∞ to 0 to ∞ will give the same function that we integrated/summed. But again, we drop that, because (a) we didn't really want to integrate, and (b) the constant-time term wouldn't matter for any serious algorithmic analysis.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.

2 comment threads

No reason to talk about integration (1 comment)
Not everything is asymptotic analysis (1 comment)

Sign up to answer this question »