Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Is an ORDER BY required when looping through MySQL records with LIMIT?
Let's say I have a table with 10,000 rows and instead of selecting all of the rows at once I select 1,000 at a time like
LIMIT 0,1000
LIMIT 1000,1000
etc.
Without an ORDER BY statement the order of results in MySQL is not guaranteed, could I possibly miss records if I do not specify a column to ORDER BY?
1 answer
Any time an SQL SELECT
query does not have any explicit ORDER BY clause, I personally find it useful to mentally read it as saying "I don't care about ordering of the output of this query".
If you make the decision that you don't care about ordering, then nothing will be affected if the ordering of the rows in the output of that query changes, right?
So in principle, you could add an ORDER BY to that SELECT which will sort the records in a random order, and nobody and nothing should be any the wiser for the change, except perhaps the query taking slightly longer to execute.
In MySQL, that would be something like ORDER BY RAND()
In MS SQL Server, something like ORDER BY NEWID()
In Oracle, something like ORDER BY dbms_random.value()
Now consider how that will affect this query, and how the data returned by it is being used. Maybe it really won't have any effect at all. But for LIMIT
in particular,
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return.
(quoted from the MySQL 8.0 reference manual, section 13.2.10 on the SELECT statement)
Nothing about this guarantees any ordering of the results that the limit applies to. So without any ORDER BY clause, we might as well put in ORDER BY RAND()
. Now the LIMIT
clause explicitly applies to rows that have been sorted randomly.
A sadistic RDBMS could, in principle, satisfy either of those queries by sorting the rows in such a way that for each consecutive SELECT
with a LIMIT
clause, you get the exact same set of n rows back. Unexpected and unlikely, but perfectly permissible considering what you asked for; a set (by row number) of rows out of a set of rows sorted in either random order or an unspecified order.
Therefore, if you want to be certain that the ordering of the rows is the same for each consecutive SELECT
, and consequently that by iterating over the rows by row number in the output set you are guaranteed to receive each row exactly once, in the general case, you need to include an ORDER BY
clause.
You might get by without it, until something changes and you no longer do, at which point you'll be getting screwy results and very likely some weird bug reports that you can't seem to reproduce. So better to just put the ORDER BY
there and avoid the problem.
If you don't have anything else that is known to be unique and thus useful as a sort key (possibly as a final tie-breaker for more natural keys to sort by), then consider just sticking an order by the primary key column at the end. It's likely to be cheap, and while the resulting ordering of the rows might be non-obvious to a user, it will at least guarantee consistent ordering regardless of how the database engine decides to otherwise satisfy the query.
If for some reason you can't afford the processing cost of having the ORDER BY
clause, then you will need to deal with the fact that rows could be returned in any order at all, and write your code in such a way that it can cope with that.
If you are using LIMIT
to implement pagination, beware of the underlying tables being updated in between the various queries being executed.
0 comment threads