Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Alternatives to `EXPLAIN ANALYZE` for queries that won't complete

+11

−0

I have a large and complex PostgreSQL SELECT query that I would like to make faster. EXPLAIN suggests it should run quickly, with the worst parts being scans of a few thousand rows. When run, it does not complete in any reasonable amount of time (if statement_timeout is set to infinite, it eventually still gives up, complaining about having exceeded temporary file size limits, suggesting something is loading way more data than expected).

Usually, this would suggest to me that EXPLAIN's estimates are horribly inaccurate in some way, and I would try EXPLAIN ANALYZE to see what's really happening. But since this particular query is so bad I can't run it at all, I also can't run it with EXPLAIN ANALYZE.

What other tools are at my disposal for this sort of situation? Can I ask PostgreSQL for some sort of partial or time-limited EXPLAIN ANALYZE, as in "run this for five minutes, then stop and tell me what you spent those five minutes doing"? If I start commenting out bits of the query until it goes fast again, can I rely on the results being accurate, or does PostgreSQL's optimizer work more globally than that?

(Query itself omitted because I've run into this situation a few times, and would like general strategies rather than an answer for this specific query.)

postgresql database-performance

posted over 1 year ago

CC BY-SA 4.0

1y ago

Emily‭

186 reputation 3 2 29 4

Raw

Markdown

History

is a duplicate

This question has been asked before and has already been answered. It should be marked as a duplicate.

Please enter the URL of the proposed duplicate in the details field below.

not constructive

This question cannot be answered in a way that is helpful to anyone. It's not possible to learn something from possible answers, except for the solution for the specific problem of the asker.

1 comment thread

Would running the query on a new table -- same DDL but with, say, 10 rows copied into it from the rea... (4 comments)

2 answers

Score Active Age

−0

You can try to break up the query into CTEs, and then see if any of the individual CTEs are unusually slow.

I am guessing the query is not just one select, but probably has subqueries, window functions, aggregations, joins and so on. All of these can be split into CTEs pretty easily (if you have questions about specific syntax, like "how do I move aggregation to a CTE" that's worth a separate question). Some code editors can even do that refactor automatically.

Binary search is a good approach here. Start by splitting off half your query into a CTE, and the rest as the final query. See if the CTE is much faster. If so, then extract another half of the remaining query as a second CTE. If not, then extract half of the CTE instead. If you repeat this it should eventually identify the specific part that is slowing things down.

posted over 1 year ago

CC BY-SA 4.0

matthewsnyder‭

2275 reputation 52 61 266 93

Copy Link

Raw

Markdown

History

0 comment threads

−0

Note: I have limited experience with PostgreSQL, but extensive experience working with SQL Server, so not everything below might apply to PostgreSQL.

I have a large and complex PostgreSQL query that I would like to make faster. (..) When run, it does not complete in any reasonable amount of time

I would assume we are talking about a SELECT statement. One quick change to try would be using a LIMIT and see if the query ends for a small amount of returned rows.

However, I think that the real issue is that the query became large and complex. This should be broken into multiple statements with the help of temporary tables. This can also be encapsulated in a stored procedure.

The code structure can look like the following:

create the temporary table (i.e. empty, contains the output structure)
minimally populate the temporary table, for example having only a few columns populated with values (the rest remain NULL)
add UPDATE statements to deal with the rest of the columns. Define as many UPDATEs as are needed to have a good enough performance

Another advantage of this approach is readability (smaller queries) and maintainability (e.g. easier to change when a column is added as this affects a small query).

Other things to consider:

historical data - if the query deals with historical data aggregates, these can be precomputed in some persisted tables
indexes - consider adding covering indexes

posted over 1 year ago

CC BY-SA 4.0

Alexei‭

5052 reputation 115 102 697 496

Copy Link

Raw

Markdown

History

1 comment thread

Edited `SELECT` into the question for clarity, thanks. The query in question is mostly autogenerat... (2 comments)

Communities

Alternatives to `EXPLAIN ANALYZE` for queries that won't complete

1 comment thread

2 answers

0 comment threads

1 comment thread