Rantings and Ruminations on Software: 2007

One of my favorite sayings from Ben Franklin is "Experience is a dear teacher, but fools will learn at no other." True enough, but why, with the cornucopia of programming books, how-to articles, and blogs out there is experience still such the dear teacher?

Mistakes. Books, blogs, and best practices might tell us how to perform a task, but sadly, little but experience shows us why we shouldn't do something else. Someone recently asked me to review a database design, and I responded to this person with an admonition to "don't do that" and a friendly scowl. Of course, the response was, "Why not?". Here's the answer.

The Temptation of Surrogate Keys

If you do enough database programming, you'll probably see that developers like IDENTITY (AutoNumber, for you Access people) columns as table primary keys. A lot. In database language, this is referred to as a surrogate key. Why? Because they seem to simplify database design and they seem to make it easy to query for single values on a table. But if you aren't careful, you'll get into a heap of trouble, and it'll be hell to dig yourself out.

Surrogate Key Fields on Many-to-Many Relationship Tables

Let's say we're designing a budgeting system for a company. Remember that this is all for the sake of argument, I'm not trying to show you how to design a budgeting system. My apologies to all you CPAs out there...

There are three entities in the system, and a couple of simple business rules:

Our Entities:

Account: If you don't know what an account is, I suggest reading someone else's blog.
Department: A grouping of people within a company, such as "Accounts Receivable", "Software Development", "HR"...
Budget: A grouping of the amounts assigned to each account in each department for a period of time (say, a fiscal year).

Our Rules, for the sake of argument:

Only certain Accounts can be used in certain Departments. For example, "HR" isn't allowed to budget for "Sales Travel", but are allowed to budget for things like "Staffing Advertisements".
Only certain Accounts are valid for certain Budgets. Maybe certain accounts are retired as the company restructures its budgeting process, and others are introduced.

Let's Start Designing (Badly)!

So let's start designing! For this first round, we'll design the system such that every table in the database has a surrogate key. To keep the diagrams simple, we'll leave off the attributes of each entity and just show the keys. First off, let's implement Business Rule #1 with our Account and Department entities. We'll do this by using a relationship table called AllowableAccounts and two foreign keys:

Now for Business Rule #2. We'll implement this one the same way:

Now, for the final relationship -- the "Line Item" entry portion. This is where we enter in the amount budgeted for each account, for each budget. You get a gift if you can already see where I'm going with this:

There are a couple of things to note here. There is a foreign key relationship between LineItem and the DepartmentAccount and BudgetableAccount tables. Notice that there are no references to BudgetID or AccountID in the LineItem table. This design seems to implement our business rules. Unfortunately, there's a fatal flaw.

There's no Relationship Between Budget, Account, and Department

Here's the problem: LineItem does not enforce the rule that the AccountID referred to in BudgetableAccount is the same as the AccountID referred to for a given DepartmentAccount. It's therefore possible to run these queries against the database and return different results:

SELECT AccountID c
FROM LineItem li
INNER JOIN BudgetableAccount bi
ON bi.BudgetAccountID = li.BudgetAccountID
WHERE li.LineItemID = 5;

and:

SELECT AccountID c
FROM LineItem li
INNER JOIN DepartmentAccount di
ON di.DepartmentAccountID = li.DepartmentAccountID
WHERE li.LineItemID = 5;

This is possible because you don't have direct relationship between Budget, Account, and Department. What would happen if the data were up-to-date and someone were to update the DepartmentAccount or BudgetAccount tables to a different AccountID? What if your business logic did not check this constraint before inserting?

There are Other Reasons, too

First off, consider how you're going to be using the database. Will you ever be querying off of DepartmentAccountID? BudgetAccountID? These fields have no business meaning, so you'll probably have to use a join in order to look up these columns. This will probably mean when asking questions like, "What's the total amount budgeted for Budget X?" -- That requires a join across 2 tables -- and all for such a basic question. Yuck.

A Better Design

Instead of the previous design, let's eliminate the surrogate keys on the BudgetableAccount and DepartmentAccount tables, and create a relationship table for our LineItems based solely on the Budget, Department, and Account IDs. Note that this design requires the use of 2-way composite foreign keys -- you can't just use one column to create your relationship, you must create a foreign key using both the Department/Account and Budget/Account key combinations. This is the key point that most beginning database programmers miss. These special foreign key relationships are shown in blue:

Conclusion

Watch out for surrogate keys in relationship tables. Avoid them when at all possible entirely -- they're meaningless from the context of your business. Some object-relational frameworks force you to have a surrogate key on every table. Perhaps you work in an environment where your DBA's or your "senior developers" mandate this practice, and you can't stop the madness. Fine. Just do yourself a favor -- don't define relationships based on a surrogate key in your relationship tables. It's a recipe for data integrity nightmares.

Hugo Kornelis just brought up a very interesting point about SQL Server 2005, and its behavior when using views with ORDER BY clauses in them.

You may be aware that SQL Server 2000 (and previous versions) allowed you to essentially create an ordered view -- something that's not really supposed to be allowed -- simply by using TOP 100 PERCENT:

CREATE VIEW myView AS
SELECT TOP 100 PERCENT LastName
FROM dbo.Stones
ORDER BY LastName;

In fact, this is exactly what Enterprise Manager, and as Hugo points out, Management Studio , does when you define a sorted view with the query designer (truly a tool of the Devil himself).

When I do a
SELECT * FROM dbo.myView;
in SQL Server 2005 I get the following results:

Wyman
Jones
Watts
Richards
Jagger

As you can imagine, this will break any query you have in your databases which rely on this "feature". The truly amazing thing is that SQL Server Management Studio still allows users to choose a sort order in views without so much as a warning dialog box!

So I wonder... Would this be considered a bug in the query parser, if it allows syntax that can't possibly work? Is it a bug in the engine? Should this evil trick still work? All of this got me to thinking (uh oh) -- how could one work around this issue if one is upgrading a legacy database?

And, what if you were to create the view like this?

CREATE VIEW myView AS
SELECT TOP (99.9999999999999) PERCENT LastName
FROM dbo.Stones
ORDER BY LastName;

Lo and behold, it "works" -- when you query the view, the results are shown in order. But add just one more nine:

CREATE VIEW myView AS
SELECT TOP (99.99999999999999) PERCENT LastName
FROM dbo.Stones
ORDER BY LastName;

and you're back to the unordered list. Strange. I suppose that the SQL Server engine is doing a bit of rounding internally and throwing away the sort order altogether as an optimization so it won't have to perform a final sort on the resultset for no reason. I really have no way to know exactly, though.

Think setting the compatibility mode to 80 will save you? Think again. This behavior is independent of the compatibility mode you use.

The moral of the story:
Before migrating your databases to SQL Server 2005, make sure you have no TOP 100 PERCENT ... ORDER BYs in your views.

Analyze every query you run which consumes views with TOP 100 PERCENT ... ORDER BYs, and make sure they're ready for the new behavior.

(This behavior is consistent both in SQL 2005 SP1 as well as the SQL 2005 SP2 Community Technology Preview)

(Photo by FRANKIEB)

Rantings and Ruminations on Software

Sunday, August 19, 2007

Why yes, I do a bit of Modeling...