Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

83%

+8 −0

Q&A Is an ORDER BY required when looping through MySQL records with LIMIT?

Any time an SQL SELECT query does not have any explicit ORDER BY clause, I personally find it useful to mentally read it as saying "I don't care about ordering of the output of this query". If you...

posted 4y ago by Canina‭ · edited 3y ago by Canina‭

Answer

#3: Post edited by

Canina‭ · 2021-11-09T15:14:23Z (over 3 years ago)

Copy Link

Raw

Markdown

Any time an SQL `SELECT` query does not have any explicit ORDER BY clause, I personally find it useful to mentally read it as saying "I don't care about ordering of the output of this query".
If you make the decision that you don't care about ordering, then nothing will be affected if the ordering of the rows in the output of that query changes, right?
So in principle, you could add an ORDER BY to that SELECT which will sort the records in a *random* order, and nobody and nothing should be any the wiser for the change, except perhaps the query taking slightly longer to execute.
In MySQL, that would be something like `ORDER BY RAND()`
In MS SQL Server, something like `ORDER BY NEWID()`
In Oracle, something like `ORDER BY dbms_random.value()`
Now consider how that will affect *this* query, and how the data returned by it is being used. Maybe it really won't have any effect at all. But for `LIMIT` in particular,
> With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return.
(quoted from the [MySQL 8.0 reference manual, section 13.2.10 on the SELECT statement](https://dev.mysql.com/doc/refman/8.0/en/select.html))
Nothing about this guarantees any ordering of the results that the limit applies to. So without any ORDER BY clause, we might as well put in `ORDER BY RAND()`. Now the `LIMIT` clause explicitly applies to rows that have been sorted randomly.
**A sadistic RDBMS could, in principle, satisfy that query** by sorting the rows in such a way that for each consecutive `SELECT` with a `LIMIT` clause, you get the exact same set of *n* rows back. Unexpected, but legal considering what you asked for; a set (by row number) of rows out of a set of rows sorted in a random order.
Therefore, if you want to be certain that the ordering of the rows is the same for each consecutive SELECT, and consequently that by iterating over the rows by row number in the output set you are guaranteed to receive each row exactly once, in the general case, **you need that ORDER BY**.
You might get by without it, until something changes and you no longer do, at which point you'll be getting screwy results and very likely some weird bug reports that you can't seem to reproduce. So **better to just put the ORDER BY there and avoid the problem.**
If you don't have anything else that is known to be unique and thus useful as a sort key (possibly as a final tie-breaker for more natural keys to sort by), then consider just sticking an **order by the primary key column at the end**. It's likely to be cheap, and it will guarantee consistent ordering regardless of how the database engine otherwise decides to satisfy the query.
If you can't afford the processing cost of having the ORDER BY clause, then you will need to deal with the fact that rows could be returned in any order at all, and write your code in such a way that it can cope with that.

Any time an SQL `SELECT` query does not have any explicit ORDER BY clause, I personally find it useful to mentally read it as saying "I don't care about ordering of the output of this query".
If you make the decision that you don't care about ordering, then nothing will be affected if the ordering of the rows in the output of that query changes, right?
So in principle, you could add an ORDER BY to that SELECT which will sort the records in a *random* order, and nobody and nothing should be any the wiser for the change, except perhaps the query taking slightly longer to execute.
In MySQL, that would be something like `ORDER BY RAND()`
In MS SQL Server, something like `ORDER BY NEWID()`
In Oracle, something like `ORDER BY dbms_random.value()`
Now consider how that will affect *this* query, and how the data returned by it is being used. Maybe it really won't have any effect at all. But for `LIMIT` in particular,
> With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return.
(quoted from the [MySQL 8.0 reference manual, section 13.2.10 on the SELECT statement](https://dev.mysql.com/doc/refman/8.0/en/select.html))
Nothing about this guarantees any ordering of the results that the limit applies to. So without any ORDER BY clause, we might as well put in `ORDER BY RAND()`. Now the `LIMIT` clause explicitly applies to rows that have been sorted randomly.
**A sadistic RDBMS could, in principle, satisfy either of those queries** by sorting the rows in such a way that for each consecutive `SELECT` with a `LIMIT` clause, you get the exact same set of *n* rows back. Unexpected and unlikely, but perfectly permissible considering what you asked for; a set (by row number) of rows out of a set of rows sorted in either random order or an unspecified order.
Therefore, if you want to be certain that the ordering of the rows is the same for each consecutive `SELECT`, and consequently that by iterating over the rows by row number in the output set you are guaranteed to receive each row exactly once, in the general case, **you need to include an `ORDER BY` clause**.
You *might* get by without it, until something changes and you no longer do, at which point you'll be getting screwy results and very likely some weird bug reports that you can't seem to reproduce. So **better to just put the `ORDER BY` there and avoid the problem.**
If you don't have anything else that is known to be unique and thus useful as a sort key (possibly as a final tie-breaker for more natural keys to sort by), then consider just sticking an **order by the primary key column at the end**. It's likely to be cheap, and while the resulting ordering of the rows might be non-obvious to a user, it will at least guarantee *consistent* ordering regardless of how the database engine decides to otherwise satisfy the query.
If for some reason you can't afford the processing cost of having the `ORDER BY` clause, then you will need to deal with the fact that rows could be returned in any order at all, and write your code in such a way that it can cope with that.
If you are using `LIMIT` to implement pagination, beware of the underlying tables being updated in between the various queries being executed.

#2: Post edited by

Canina‭ · 2020-10-23T21:28:49Z (over 4 years ago)

Copy Link

Raw

Markdown

~~Any time your SQL SELECT query does not have any explicit ORDER BY clause, I personally find it useful to mentally read it as saying "I don't care about ordering of the output of this query".~~
If you make the decision that you don't care about ordering, then nothing will be affected if the ordering of the rows in the output of that query changes, right?
So in principle, you could add an ORDER BY to that SELECT which will sort the records in a *random* order, and nobody and nothing should be any the wiser for the change, except perhaps the query taking slightly longer to execute.
In MySQL, that would be something like `ORDER BY RAND()`
In MS SQL Server, something like `ORDER BY NEWID()`
In Oracle, something like `ORDER BY dbms_random.value()`
Now consider how that will affect *this* query, and how the data returned by it is being used. Maybe it really won't have any effect at all. But for `LIMIT` in particular,
> With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return.
(quoted from the [MySQL 8.0 reference manual, section 13.2.10 on the SELECT statement](https://dev.mysql.com/doc/refman/8.0/en/select.html))
Nothing about this guarantees any ordering of the results that the limit applies to. So without any ORDER BY clause, we might as well put in `ORDER BY RAND()`. Now the `LIMIT` clause explicitly applies to rows that have been sorted randomly.
**A sadistic RDBMS could, in principle, satisfy that query** by sorting the rows in such a way that for each consecutive `SELECT` with a `LIMIT` clause, you get the exact same set of *n* rows back. Unexpected, but legal considering what you asked for; a set (by row number) of rows out of a set of rows sorted in a random order.
Therefore, if you want to be certain that the ordering of the rows is the same for each consecutive SELECT, and consequently that by iterating over the rows by row number in the output set you are guaranteed to receive each row exactly once, in the general case, **you need that ORDER BY**.
You might get by without it, until something changes and you no longer do, at which point you'll be getting screwy results and very likely some weird bug reports that you can't seem to reproduce. So **better to just put the ORDER BY there and avoid the problem.**
If you don't have anything else that is known to be unique and thus useful as a sort key (possibly as a final tie-breaker for more natural keys to sort by), then consider just sticking an **order by the primary key column at the end**. It's likely to be cheap, and it will guarantee consistent ordering regardless of how the database engine otherwise decides to satisfy the query.
If you can't afford the processing cost of having the ORDER BY clause, then you will need to deal with the fact that rows could be returned in any order at all, and write your code in such a way that it can cope with that.

Any time an SQL `SELECT` query does not have any explicit ORDER BY clause, I personally find it useful to mentally read it as saying "I don't care about ordering of the output of this query".
If you make the decision that you don't care about ordering, then nothing will be affected if the ordering of the rows in the output of that query changes, right?
So in principle, you could add an ORDER BY to that SELECT which will sort the records in a *random* order, and nobody and nothing should be any the wiser for the change, except perhaps the query taking slightly longer to execute.
In MySQL, that would be something like `ORDER BY RAND()`
In MS SQL Server, something like `ORDER BY NEWID()`
In Oracle, something like `ORDER BY dbms_random.value()`
Now consider how that will affect *this* query, and how the data returned by it is being used. Maybe it really won't have any effect at all. But for `LIMIT` in particular,
> With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return.
(quoted from the [MySQL 8.0 reference manual, section 13.2.10 on the SELECT statement](https://dev.mysql.com/doc/refman/8.0/en/select.html))
Nothing about this guarantees any ordering of the results that the limit applies to. So without any ORDER BY clause, we might as well put in `ORDER BY RAND()`. Now the `LIMIT` clause explicitly applies to rows that have been sorted randomly.
**A sadistic RDBMS could, in principle, satisfy that query** by sorting the rows in such a way that for each consecutive `SELECT` with a `LIMIT` clause, you get the exact same set of *n* rows back. Unexpected, but legal considering what you asked for; a set (by row number) of rows out of a set of rows sorted in a random order.
Therefore, if you want to be certain that the ordering of the rows is the same for each consecutive SELECT, and consequently that by iterating over the rows by row number in the output set you are guaranteed to receive each row exactly once, in the general case, **you need that ORDER BY**.
You might get by without it, until something changes and you no longer do, at which point you'll be getting screwy results and very likely some weird bug reports that you can't seem to reproduce. So **better to just put the ORDER BY there and avoid the problem.**
If you don't have anything else that is known to be unique and thus useful as a sort key (possibly as a final tie-breaker for more natural keys to sort by), then consider just sticking an **order by the primary key column at the end**. It's likely to be cheap, and it will guarantee consistent ordering regardless of how the database engine otherwise decides to satisfy the query.
If you can't afford the processing cost of having the ORDER BY clause, then you will need to deal with the fact that rows could be returned in any order at all, and write your code in such a way that it can cope with that.

#1: Initial revision by

Canina‭ · 2020-10-23T21:28:06Z (over 4 years ago)

Copy Link

Raw

Markdown

Any time your SQL SELECT query does not have any explicit ORDER BY clause, I personally find it useful to mentally read it as saying "I don't care about ordering of the output of this query".

If you make the decision that you don't care about ordering, then nothing will be affected if the ordering of the rows in the output of that query changes, right?

So in principle, you could add an ORDER BY to that SELECT which will sort the records in a *random* order, and nobody and nothing should be any the wiser for the change, except perhaps the query taking slightly longer to execute.

In MySQL, that would be something like `ORDER BY RAND()`

In MS SQL Server, something like `ORDER BY NEWID()`

In Oracle, something like `ORDER BY dbms_random.value()`

Now consider how that will affect *this* query, and how the data returned by it is being used. Maybe it really won't have any effect at all. But for `LIMIT` in particular,

> With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return.

(quoted from the [MySQL 8.0 reference manual, section 13.2.10 on the SELECT statement](https://dev.mysql.com/doc/refman/8.0/en/select.html))

Nothing about this guarantees any ordering of the results that the limit applies to. So without any ORDER BY clause, we might as well put in `ORDER BY RAND()`. Now the `LIMIT` clause explicitly applies to rows that have been sorted randomly.

**A sadistic RDBMS could, in principle, satisfy that query** by sorting the rows in such a way that for each consecutive `SELECT` with a `LIMIT` clause, you get the exact same set of *n* rows back. Unexpected, but legal considering what you asked for; a set (by row number) of rows out of a set of rows sorted in a random order.

Therefore, if you want to be certain that the ordering of the rows is the same for each consecutive SELECT, and consequently that by iterating over the rows by row number in the output set you are guaranteed to receive each row exactly once, in the general case, **you need that ORDER BY**.

You might get by without it, until something changes and you no longer do, at which point you'll be getting screwy results and very likely some weird bug reports that you can't seem to reproduce. So **better to just put the ORDER BY there and avoid the problem.**

If you don't have anything else that is known to be unique and thus useful as a sort key (possibly as a final tie-breaker for more natural keys to sort by), then consider just sticking an **order by the primary key column at the end**. It's likely to be cheap, and it will guarantee consistent ordering regardless of how the database engine otherwise decides to satisfy the query.

If you can't afford the processing cost of having the ORDER BY clause, then you will need to deal with the fact that rows could be returned in any order at all, and write your code in such a way that it can cope with that.

Communities

Post History