Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Is an ORDER BY required when looping through MySQL records with LIMIT?

+5
−0

Let's say I have a table with 10,000 rows and instead of selecting all of the rows at once I select 1,000 at a time like

LIMIT 0,1000
LIMIT 1000,1000

etc.

Without an ORDER BY statement the order of results in MySQL is not guaranteed, could I possibly miss records if I do not specify a column to ORDER BY?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+8
−0

Any time an SQL SELECT query does not have any explicit ORDER BY clause, I personally find it useful to mentally read it as saying "I don't care about ordering of the output of this query".

If you make the decision that you don't care about ordering, then nothing will be affected if the ordering of the rows in the output of that query changes, right?

So in principle, you could add an ORDER BY to that SELECT which will sort the records in a random order, and nobody and nothing should be any the wiser for the change, except perhaps the query taking slightly longer to execute.

In MySQL, that would be something like ORDER BY RAND()

In MS SQL Server, something like ORDER BY NEWID()

In Oracle, something like ORDER BY dbms_random.value()

Now consider how that will affect this query, and how the data returned by it is being used. Maybe it really won't have any effect at all. But for LIMIT in particular,

With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return.

(quoted from the MySQL 8.0 reference manual, section 13.2.10 on the SELECT statement)

Nothing about this guarantees any ordering of the results that the limit applies to. So without any ORDER BY clause, we might as well put in ORDER BY RAND(). Now the LIMIT clause explicitly applies to rows that have been sorted randomly.

A sadistic RDBMS could, in principle, satisfy either of those queries by sorting the rows in such a way that for each consecutive SELECT with a LIMIT clause, you get the exact same set of n rows back. Unexpected and unlikely, but perfectly permissible considering what you asked for; a set (by row number) of rows out of a set of rows sorted in either random order or an unspecified order.

Therefore, if you want to be certain that the ordering of the rows is the same for each consecutive SELECT, and consequently that by iterating over the rows by row number in the output set you are guaranteed to receive each row exactly once, in the general case, you need to include an ORDER BY clause.

You might get by without it, until something changes and you no longer do, at which point you'll be getting screwy results and very likely some weird bug reports that you can't seem to reproduce. So better to just put the ORDER BY there and avoid the problem.

If you don't have anything else that is known to be unique and thus useful as a sort key (possibly as a final tie-breaker for more natural keys to sort by), then consider just sticking an order by the primary key column at the end. It's likely to be cheap, and while the resulting ordering of the rows might be non-obvious to a user, it will at least guarantee consistent ordering regardless of how the database engine decides to otherwise satisfy the query.

If for some reason you can't afford the processing cost of having the ORDER BY clause, then you will need to deal with the fact that rows could be returned in any order at all, and write your code in such a way that it can cope with that.

If you are using LIMIT to implement pagination, beware of the underlying tables being updated in between the various queries being executed.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »