Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Are there best practices for sticking conditions in WHERE clauses vs the JOIN statement?
Lets say I have two tables, A and B and I need to join a subset of them.
Is there best practices of sticking the conditions in the WHERE clause like this,
SELECT *
FROM A
JOIN B on a.fk_b = b.pk
WHERE a.pk <10000
versus sticking the condition in the JOIN like this,
SELECT *
FROM A
JOIN B on a.fk_b = b.pk
AND a.pk <10000
For these, it doesn't make any difference in speed or results, but are there best practices for where to put the conditions?
3 answers
@BruceAlderman gave a good answer with different aspects that covers the most. I'm not very good at SQL, so my answer is more general.
When I have to choose between two different things that are equivalent in performance and functionality and readability is the only thing that's left. Then I try to describe the result of the operation in plain English (or any other spoken language) and then pick whatever code that most accurately describes the intention of the operation.
I like when programming languages has features that makes it easier to express your intention. One of my favorites there is the keyword unless
in Ruby. AFIK it's completely equivalent to if not
. But in many cases it sounds way more natural. if not <condition>
tend to give the message "if this condition is not met" in a very dry manner. unless <condition>
tend to give the message "Always do this. Well, unless this very unlikely event has happened."
0 comment threads
SQL is a declarative language, and the form of the query does not dictate the form of the query plan that actually retrieves the data. So these two queries might be not only the same speed, but actually map to the exact same query plan to be executed.
So if speed and results don't give an advantage to one form, the best practice is to go with the one that is easier to read. In this case, I'd go with the first one.
In the second query, the JOIN handles both joining and filtering the data. By using a separate WHERE clause, you separate the actions of joining and filtering into different clauses.
If a second filter needs to be added later, it's a little less intuitive to add it to the JOIN. It's possible a later developer might create a WHERE clause, and then you'd have the JOIN filtering and joining, and the WHERE filtering as well.
But if you're already using a WHERE for filtering, it's simple and intuitive to add another condition to the WHERE.
1 comment thread
Some condition require you to put conditions in the join clause.
For example, if you're doing a LEFT OUTER JOIN, and you want to match all rows from table A with only the rows from table B with the corresponding id and another condition, then you must put the condition in the join clause.
SELECT *
FROM A
LEFT OUTER JOIN B on a.fk_b = b.pk
AND b.pk <10000
Because if you had put that condition in the WHERE clause, then b.pk < 10000
would naturally exclude all cases where the outer join found no matching row, and it would function as an inner join.
Aside from those cases where the logic demands it, I don't think there is a "best practice." It's up to personal preference.
As you noted, the MySQL query optimizer should behave the same for an inner join, regardless of whether you put the condition in the join clause or the where clause.
Not all SQL implementations behave this way, though. There could be some that optimize differently depending on the syntax you use. That could be considered a design flaw, but nevertheless, you should be aware of it and test to make sure the implementation you use behaves the way you expect.
My personal preference given the choice is to put conditions in the JOIN clause only if they pertain to the join itself. If there are other conditions that are simply row restrictions, I put them in the WHERE clause. To me, this is more clear and intention-revealing.
When there is no functional reason to prefer one style over the other, it's best to adopt a style and use it as consistently as you can within a given project. The "worst practice" is to flip-flop arbitrarily between different code styles, because this confuses anyone who needs to read or maintain the code. They won't know why you have two different styles, and whether it's important.
0 comment threads