The Agile Architect

Home » All Articles » Enforcing Unique Keys with NULL values

Enforcing Unique Keys with NULL values

Users of MySQL and some other databases often find themselves in this situation: You need to create a unique key as a combination of table columns but some of them are nullable. The problem is that uniqueness is not enforced in the way you would expect.

For example, suppose you have a table called called SONG and it has columns TITLE, ARTIST, ALBUM. You postulate that the combination of these 3 columns should uniquely identify a song and you want to enforce this constraint in your database model. Suppose however, that sometimes the ALBUM column does not have a value because some songs were never released in an album (say). In this case MySQL would let you insert two columns such that TITLE=’song’, ARTIST=’Band’, ALBUM=null. This is not the desired behavior as you would like to have only one record for a particular song and artist and no album.

In a simple situation like this, an easy workaround would be to alter ALBUM so that it is not nullable and supply some special value that represents the absence of an album. The empty string for example would work in this case. A unique key containing these columns produces the desired effect.

Consider however the trickier situation where the columns participating in the unique key are foreign keys to other tables. In the case the SONG table has columns (TITLE, ARTIST_ID, ALBUM_ID) with foreign keys to the ARTIST and ALBUM tables. As before, ALBUM_ID is nullable. But in this case, the technique of populating a “dummy” value for ALBUM_ID no longer works because it would break the integrity of the foreign key.

Discussion of Unsatisfactory Solutions

In this section I discuss some solutions that have been proposed and I did not find satisfactory. If you are anxious to get to the ultimate solution, you can skip to the next section.

Solution 1: Drop the foreign key constraint so you can populate a dummy value such as 0 or -1.

This solution would clearly work as far as the enforcement of the unique constraint is concerned. I find this solution unsatisfactory because it opens up some other, possibly harder, problems. The first problem is that you are trading one aspect of database integrity with another. Allowing references to other tables to point to non-existent rows can create problems that are much harder to debug than a duplicate entry. Another serious problem, that is very hard to work around, is that this solution will not work (without significant effort) with some ORM layers such as Hibernate. With or without a foreign key, when you define an association with Hibernate, it expects to retrieve an Album value for every value in ALBUM_ID. Missing values will cause an error.

Solution 2: Create a dummy row in the ALBUM table corresponding to an otherwise invalid value of the ALBUM_ID, such as -1.

The immediate problem with this solution is that, if you adopt it, then you must create special code in your application to deal with these dummy values. Furthermore, if ALBUM has foreign keys to other tables, then you will find yourself in a situation where you have to populate and manager more dummy rows in several tables across the database. The ideal solution should not require introducing any special logic in the code.

Solution 3: Do not enforce the unique key in the database, but rather enforce the uniqueness in the application logic (which is to say, the code).

This solution is obviously a last resort, if you cannot find a way to enforce uniqueness in the database. The obvious problems with this approach is that it is always possible for a codepath to bypass the uniqueness check. It is also possible for someone to insert a duplicate record in the database directly.

A Satisfactory Solution

Here is a solution that works and I find superior to the other proposals for reasons that I will soon discuss. The solution involves a certain degree of “hackiness” but it is mostly benign. Here it is:

For each nullable foreign key that you need to include in a unique index, create a dummy column that is not nullable. In our example, this would look something like this:

ALTER TABLE SONG ADD COLUMN ALBUM_ID_REQ int(11) NOT NULL;

The idea is that this column is going to mirror the foreign key ALBUM_ID but it is going to have a zero where the foreign key has a NULL. To make sure the new column is properly populated, the existing data (if any) needs to be updated as follows:

UPDATE SONG SET ALBUM_ID_REQ = 0 WHERE ALBUM_ID is NULL;
UPDATE SONG SET ALBUM_ID_REQ = ALBUM_ID where ALBUM_ID is NOT NULL;
There is one other step that’s necessary: We need to make sure the new column is correctly populated whenever a new row is inserted. We could enforce this at the application (code) level, but this solution would suffer from some of the drawbacks of the previous solutions. In particular, it is open to the possibility that some code path will simply forget to do it. It is also possible to insert a record in the database with the wrong values for the dummy column.

To make sure that our little hack stays in the database and does not leak into the application code, we need an insert trigger that will populate the dummy column with the correct value. This trigger is straightforward and inexpensive:

CREATE TRIGGER SONG_ALBUM_TRIGGER BEFORE INSERT ON SONG
FOR EACH ROW BEGIN 
  IF NEW.ALBUM_ID is null THEN SET NEW.ALBUM_ID_REQ = 0; 
  ELSE SET NEW.ALBUM_ID_REQ = NEW.ALBUM_ID; 
  END IF; 
END;

At this point, we can go ahead and create a unique index:
CREATE UNIQUE INDEX SONG_U1 ON SONG (TITLE, ARTIST_ID, ALBUM_ID_REQ);

Note that it is the required dummy column ALBUM_ID_REQ and not the original nullable column that participates in the unique index.

And voila! We have a unique constraint that works exactly as expected without having to change a single line of application code.

Solution Justification

Here is why I consider this solution a better solution than the ones discussed earlier:

1. No code changes are necessary: You do not need to introduce the dummy columns into your SQL queries or in your ORM layer. Any existing code will work as is. All the other solutions we discussed require code changes to some extent.

2. Low failure probability: Any solution that tries to enforce uniqueness in the code suffers from two problems: a) It is always possible to create code that bypasses the solution. b) You can always bypass the constraint by inserting directly in the database. Our solution does not suffer from these drawbacks. The only way to fail is for someone to manually update the records in the database, and this is pretty hard to do by mistake.

3. No superfluous data: Unlike solution 2, you don’t need to maintain dummy data in your database.

4. It is simple: The solution consists of four steps comprised of very straightforward SQL.

 


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Architected By:

%d bloggers like this: