Notifications
Mark all as read
Q&A

Iterating through a MySQL table with a LIMIT clause and PHP gets progressively slower

+3
−0

I have a very large table that I need to iterate through with a PHP script and I can't do all of the results at once so I do it in sections with a LIMIT like

for ($x = 0; $x < 10000000; $x += 10000) {
    $sql ="
    SELECT * FROM
    example_table 
    LIMIT ".$x.",10000
    ORDER BY primary_key_column";
    $results = $connection->results($sql);
    //Do stuff with results
}

At first the select returns the data very quickly but as time goes on they take more and more time until it plateaus at around 12x15 times longer than the first select.

Would there be a way to do this where the subsequent loops don't take so much longer?

Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comments

1 answer

+4
−0

LIMIT/OFFSET, fundamentally, is slow for large datasets. With a clause like LIMIT 10000 OFFSET 150000, you're asking the query engine to find 160,000 rows, then throw away 150,000 of them. Hardly efficient. It works... okay, for small page sizes and not-too-huge datasets, but when you're asking for lots of data in a large dataset, you're going to hit problems fast.

The most common way past is simply to use a WHERE condition to remove the need for the OFFSET. Your data does need to have a unique index column for this to work, which I'm assuming here is called id. When you query for the first 10k rows, use LIMIT 10000, and save the greatest ID from those 10k rows. On the next query for the next 10k, use the same LIMIT 10000, but add a WHERE id > your_previous_max_id to the query. This is faster, because you're only fetching 10k rows not 20k-then-throw-away-10k, and the query engine can use the unique index to scan straight to the start of the relevant data.

So, your first query looks like this:

SELECT *
FROM example_table
LIMIT 10000

then, assuming the greatest ID in that set is 10000, your second query:

SELECT *
FROM example_table
WHERE id > 10000
LIMIT 10000

and so on.

Why does this post require moderator attention?
You might want to add some details to your flag.

0 comments

Sign up to answer this question »