One of my favorite sayings from Ben Franklin is "Experience is a dear teacher, but fools will learn at no other." True enough, but why, with the cornucopia of programming books, how-to articles, and blogs out there is experience still such the dear teacher?
Mistakes. Books, blogs, and best practices might tell us how to perform a task, but sadly, little but experience shows us why we shouldn't do something else. Someone recently asked me to review a database design, and I responded to this person with an admonition to "don't do that" and a friendly scowl. Of course, the response was, "Why not?". Here's the answer.
The Temptation of Surrogate Keys
If you do enough database programming, you'll probably see that developers like IDENTITY (AutoNumber, for you Access people) columns as table primary keys. A lot. In database language, this is referred to as a surrogate key. Why? Because they seem to simplify database design and they seem to make it easy to query for single values on a table. But if you aren't careful, you'll get into a heap of trouble, and it'll be hell to dig yourself out.
Surrogate Key Fields on Many-to-Many Relationship Tables
Let's say we're designing a budgeting system for a company. Remember that this is all for the sake of argument, I'm not trying to show you how to design a budgeting system. My apologies to all you CPAs out there...
There are three entities in the system, and a couple of simple business rules:
Our Entities:
- Account: If you don't know what an account is, I suggest reading someone else's blog.
- Department: A grouping of people within a company, such as "Accounts Receivable", "Software Development", "HR"...
- Budget: A grouping of the amounts assigned to each account in each department for a period of time (say, a fiscal year).
Our Rules, for the sake of argument:
- Only certain Accounts can be used in certain Departments. For example, "HR" isn't allowed to budget for "Sales Travel", but are allowed to budget for things like "Staffing Advertisements".
- Only certain Accounts are valid for certain Budgets. Maybe certain accounts are retired as the company restructures its budgeting process, and others are introduced.
Let's Start Designing (Badly)!
So let's start designing! For this first round, we'll design the system such that every table in the database has a surrogate key. To keep the diagrams simple, we'll leave off the attributes of each entity and just show the keys. First off, let's implement Business Rule #1 with our Account and Department entities. We'll do this by using a relationship table called AllowableAccounts and two foreign keys:
Now for Business Rule #2. We'll implement this one the same way:
Now, for the final relationship -- the "Line Item" entry portion. This is where we enter in the amount budgeted for each account, for each budget. You get a gift if you can already see where I'm going with this:
There are a couple of things to note here. There is a foreign key relationship between LineItem and the DepartmentAccount and BudgetableAccount tables. Notice that there are no references to BudgetID or AccountID in the LineItem table. This design seems to implement our business rules. Unfortunately, there's a fatal flaw.
There's no Relationship Between Budget, Account, and Department
Here's the problem: LineItem does not enforce the rule that the AccountID referred to in BudgetableAccount is the same as the AccountID referred to for a given DepartmentAccount. It's therefore possible to run these queries against the database and return different results:
SELECT AccountID c
FROM LineItem li
INNER JOIN BudgetableAccount bi
ON bi.BudgetAccountID = li.BudgetAccountID
WHERE li.LineItemID = 5;
and:
SELECT AccountID c
FROM LineItem li
INNER JOIN DepartmentAccount di
ON di.DepartmentAccountID = li.DepartmentAccountID
WHERE li.LineItemID = 5;
This is possible because you don't have direct relationship between Budget, Account, and Department. What would happen if the data were up-to-date and someone were to update the DepartmentAccount or BudgetAccount tables to a different AccountID? What if your business logic did not check this constraint before inserting?
There are Other Reasons, too
First off, consider how you're going to be using the database. Will you ever be querying off of DepartmentAccountID? BudgetAccountID? These fields have no business meaning, so you'll probably have to use a join in order to look up these columns. This will probably mean when asking questions like, "What's the total amount budgeted for Budget X?" -- That requires a join across 2 tables -- and all for such a basic question. Yuck.
A Better Design
Instead of the previous design, let's eliminate the surrogate keys on the BudgetableAccount and DepartmentAccount tables, and create a relationship table for our LineItems based solely on the Budget, Department, and Account IDs. Note that this design requires the use of 2-way composite foreign keys -- you can't just use one column to create your relationship, you must create a foreign key using both the Department/Account and Budget/Account key combinations. This is the key point that most beginning database programmers miss. These special foreign key relationships are shown in blue:
Conclusion
Watch out for surrogate keys in relationship tables. Avoid them when at all possible entirely -- they're meaningless from the context of your business. Some object-relational frameworks force you to have a surrogate key on every table. Perhaps you work in an environment where your DBA's or your "senior developers" mandate this practice, and you can't stop the madness. Fine. Just do yourself a favor -- don't define relationships based on a surrogate key in your relationship tables. It's a recipe for data integrity nightmares.