Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Iterating through a MySQL table with a LIMIT clause and PHP gets progressively slower
I have a very large table that I need to iterate through with a PHP script and I can't do all of the results at once so I do it in sections with a LIMIT like
for ($x = 0; $x < 10000000; $x += 10000) {
$sql ="
SELECT * FROM
example_table
LIMIT ".$x.",10000
ORDER BY primary_key_column";
$results = $connection->results($sql);
//Do stuff with results
}
At first the select returns the data very quickly but as time goes on they take more and more time until it plateaus at around 12x15 times longer than the first select.
Would there be a way to do this where the subsequent loops don't take so much longer?
1 answer
LIMIT/OFFSET, fundamentally, is slow for large datasets. With a clause like LIMIT 10000 OFFSET 150000
, you're asking the query engine to find 160,000 rows, then throw away 150,000 of them. Hardly efficient. It works... okay, for small page sizes and not-too-huge datasets, but when you're asking for lots of data in a large dataset, you're going to hit problems fast.
The most common way past is simply to use a WHERE condition to remove the need for the OFFSET. Your data does need to have a unique index column for this to work, which I'm assuming here is called id
. When you query for the first 10k rows, use LIMIT 10000
, and save the greatest ID from those 10k rows. On the next query for the next 10k, use the same LIMIT 10000
, but add a WHERE id > your_previous_max_id
to the query. This is faster, because you're only fetching 10k rows not 20k-then-throw-away-10k, and the query engine can use the unique index to scan straight to the start of the relevant data.
So, your first query looks like this:
SELECT *
FROM example_table
LIMIT 10000
then, assuming the greatest ID in that set is 10000, your second query:
SELECT *
FROM example_table
WHERE id > 10000
LIMIT 10000
and so on.
0 comment threads