Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Post History
Facts In relational theory, each row in a relation is uniquely identified by a primary key. That's why some purists say that every table should have a primary key. Foreign keys usually reference a ...
Answer
#2: Post edited
- In relational theory, each row in a relation is uniquely identified by a primary key. That's why some purists say that every table should have a primary key.
- Foreign keys usually reference a primary key. In particular, if you do not specify a column when creating a foreign key, the primary key of the referenced table is assumed. That's why primary keys should be immutable, and benefit from being short, so they're easy to compare and to store in indexes.
- Primary keys must not contain NULL values.
- There can only be a single primary key per table. Most databases create a clustered index for a primary key, which means that the table is stored in index order, making table accesses in index order sequential and therefore faster.
- That is, you should only consider a primary key on columns that are immutable and not-null.
- In that case, the only technical difference between a primary and a unique key is that the index is clustered. This only matters if the table contains additional columns not present in the constraint. Most m:n join tables do not have additional columns, so it doesn't matter whether the index is clustered or not. (And even if you do benefit some from a clustered index, you can often achieve the same effect by creating a non-clustered index with all relevant columns, so it's questionable whether we should let this drive our decision).
- That leaves maintainability as our only concern. What makes the schema easier to understand for our fellow developers? What captures our intent better?
- To me, a primary key constraint designates the primary way of referencing and accessing rows in that table. That is, if I had no need to reference that individual row anywhere else in my schema or application, for instance because it is just a "part" of another object (UML calls this *composition*), or because it represents a mere value rather than an entity with changing state (i.e. a *value type* rather than a *reference type*), I'd not bother with a primary key and use a plain unique constraint instead.
- On the other hand, if I did want to reference that row, I'd probably prefer an *easy* way to reference it, so I could say "comment #5421 was flagged" rather than "Johns 3rd comment on the 4th answer to question #874 was flagged". That is, I'd create a new primary key column in addition to the unique foreign key pair.
Off hand, I can't think of a case where I'd use a composite primary key - though in part, this might be because my favorite object relational mapper makes composite primary keys quite painful to use.
- # Facts
- In relational theory, each row in a relation is uniquely identified by a primary key. That's why some purists say that every table should have a primary key.
- Foreign keys usually reference a primary key. In particular, if you do not specify a column when creating a foreign key, the primary key of the referenced table is assumed. That's why primary keys should be immutable, and benefit from being short, so they're easy to compare and to store in indexes.
- Primary keys must not contain NULL values.
- There can only be a single primary key per table. Most databases create a clustered index for a primary key, which means that the table is stored in index order, making table accesses in index order sequential and therefore faster.
- # Recommendation
- That is, you should only consider a primary key on columns that are immutable and not-null.
- In that case, the only technical difference between a primary and a unique key is that the index is clustered. This only matters if the table contains additional columns not present in the constraint. Most m:n join tables do not have additional columns, so it doesn't matter whether the index is clustered or not. (And even if you do benefit some from a clustered index, you can often achieve the same effect by creating a non-clustered index with all relevant columns, so it's questionable whether we should let this drive our decision).
- That leaves maintainability as our only concern. What makes the schema easier to understand for our fellow developers? What captures our intent better?
- To me, a primary key constraint designates the primary way of referencing and accessing rows in that table. That is, if I had no need to reference that individual row anywhere else in my schema or application, for instance because it is just a "part" of another object (UML calls this *composition*), or because it represents a mere value rather than an entity with changing state (i.e. a *value type* rather than a *reference type*), I'd not bother with a primary key and use a plain unique constraint instead.
- On the other hand, if I did want to reference that row, I'd probably prefer an *easy* way to reference it, so I could say "comment #5421 was flagged" rather than "Johns 3rd comment on the 4th answer to question #874 was flagged". That is, I'd create a new primary key column in addition to the unique foreign key pair.
- Off hand, I can't think of a case where I'd use a composite primary key - though in part, this might be because my favorite object relational mapper makes composite primary keys quite painful to use.
- # TL;DR
- Use constraints to constrain your data, use primary keys to identify rows to keep track of them even as their data changes.
#1: Initial revision
In relational theory, each row in a relation is uniquely identified by a primary key. That's why some purists say that every table should have a primary key. Foreign keys usually reference a primary key. In particular, if you do not specify a column when creating a foreign key, the primary key of the referenced table is assumed. That's why primary keys should be immutable, and benefit from being short, so they're easy to compare and to store in indexes. Primary keys must not contain NULL values. There can only be a single primary key per table. Most databases create a clustered index for a primary key, which means that the table is stored in index order, making table accesses in index order sequential and therefore faster. That is, you should only consider a primary key on columns that are immutable and not-null. In that case, the only technical difference between a primary and a unique key is that the index is clustered. This only matters if the table contains additional columns not present in the constraint. Most m:n join tables do not have additional columns, so it doesn't matter whether the index is clustered or not. (And even if you do benefit some from a clustered index, you can often achieve the same effect by creating a non-clustered index with all relevant columns, so it's questionable whether we should let this drive our decision). That leaves maintainability as our only concern. What makes the schema easier to understand for our fellow developers? What captures our intent better? To me, a primary key constraint designates the primary way of referencing and accessing rows in that table. That is, if I had no need to reference that individual row anywhere else in my schema or application, for instance because it is just a "part" of another object (UML calls this *composition*), or because it represents a mere value rather than an entity with changing state (i.e. a *value type* rather than a *reference type*), I'd not bother with a primary key and use a plain unique constraint instead. On the other hand, if I did want to reference that row, I'd probably prefer an *easy* way to reference it, so I could say "comment #5421 was flagged" rather than "Johns 3rd comment on the 4th answer to question #874 was flagged". That is, I'd create a new primary key column in addition to the unique foreign key pair. Off hand, I can't think of a case where I'd use a composite primary key - though in part, this might be because my favorite object relational mapper makes composite primary keys quite painful to use.