Saturday, May 8, 2010

Database Objects: Constraints


I explained primary and foreign keys, and in previous tutorials on tables I explained the use of NULL values. I will continue the table discussion here, with an explanation of constraints.

It's fairly simple to describe a table constraint — it's just like the English use of the word, which means "to prevent or constrict". In the case of a SQL Server database table, constraints are rules you create to specify what kind of data is allowed in a table — specifically, a field in a table.

Whether you design databases or program against them, database integrity is absolutely essential. It's not just about making sure a certain table has data in it; it's about ensuring that the data throughout the system doesn't cause a program to crash or a result to be ambiguous. In fact, it's better for data not to make it into a table than for that data to make it into the database but be incorrect. Think about a transaction at your local Automatic Teller Machine (ATM). You would rather a transaction fail (and tell you that it failed) than for the wrong amount of money to be withdrawn.

Constraints enable SQL Server to implement this integrity. There are several classes of constraints, which I'll explain in a moment. By layering them intelligently, you can maintain the integrity of the database as a whole.

Most constraints are column-based, meaning that they are declared and enforced on a single column at a time. One, in particular, is declared and enforced on a table, and I'll explain that one in a moment.

SQL Server uses several classes of constraints. I've already shown you the basics of three of them, but I'll explain them a bit further in this article. The primary classes are:

  • Primary Key
  • Foreign Key
  • Not Null
  • Unique
  • Check

I will briefly explain each of these constraints.

Primary Key

I explained the primary key constraint in a previous article. A primary key is a column that makes a row unique. In fact, this is a constraint. In other words, you are saying that only one row can have this particular value. So that "constrains" the values you can enter for that column.

To declare the primary key constraint, you use the PRIMARY KEY directive after the column definition when you create a table, or with the ALTER TABLE command if the table is already there. Here's an example of each:

CREATE TABLE test
(
PrimaryKeyColumnName smallint
IDENTITY(1,1)
PRIMARY KEY
)
and:
ALTER TABLE TEST ADD
PrimaryKeyColumnName INT IDENTITY
CONSTRAINT PrimaryKeyName PRIMARY KEY

Keep in mind that you can't violate the rules for creating a primary key on a table if it's already got data. In the example above, I created a new column for the key, but if you use a current column as a key, it has to have unique values on it.

There's a qualifier you can use when you create this constraint (and others) called WITH NOCHECK. This qualifier applies the rule without first verifying that the affected data is within the rule — but you have to be careful here.

In the case of the primary key, if you have a repeated value in a column, applying the primary key constraint on that column the statement will fail. However, if you apply the constraint with the WITH NOCHECK statement, SQL Server will apply the constraint (and do it very quickly, I might add); but the very next time you load or alter data, that change will fail and the table will lock.

If that happens, you'll have to change the table by removing the primary key and then fix the data. So the short story is that you shouldn't use the WITH NOCHECK qualifier unless you're certain the data works with the constraint first. Primary keys enforce entity integrity, meaning that a column is guaranteed to be unique. The system simply won't let you enter duplicate data in a primary key column.

Foreign Key

I also explained foreign key constraints in the last tutorial. The foreign key "points" to a primary key of another table, guaranteeing that you can't enter data into a table unless the referenced table has the data already.

You can use this constraint in situations such as an order-entry system, where the item being sold in a transaction must exist in inventory first, or have a price in a pricing table, or both. It helps you enforce relationships between tables.

Primary keys must be unique, but foreign keys don't. In fact, if you think about the situations I mentioned above, you'll realize that the values in a foreign key column almost always involve repeated values. Think about a purchase order. You might have several items on a single purchase order. So you might create a table to hold the "Purchase Order" itself, and then another table to hold the various line items that belong to a particular purchase order. To do that you create a primary key in the purchase order table, and a foreign key field in the purchase order line item table pointing back to the primary key of the purchase order table. If you implement the constraints this way, you won't allow a line item to be created if there is no purchase order first. That follows standard business rules for a database.

You can create a foreign key either when you create the table or after the fact, with the same restrictions that I mentioned earlier regarding the primary key.

Here is the way to create a foreign key when you create a table:

CREATE TABLE test
(
foreignkeycolumnname smallint
FOREIGN KEY REFERENCES tablename(PrimaryKeyColumnName)
)
And here's the syntax example for adding the foreign key constraint after the fact:
ALTER TABLE Orders ADD CONSTRAINT
FK_Orders_Clients FOREIGN KEY
(
ClientID
) REFERENCES Clients
(
ClientID
) ON UPDATE CASCADE
ON DELETE CASCADE

Notice the qualifiers at the end of the statement. They provide for the updating and deletion of the child records from the parent table operations. Without these qualifiers, you're not able to delete a record from a table if records in another have a foreign key reference to the primary key in the first. You need to decide fairly early in your design process whether you'll allow users to delete child records when you delete a parent.

You may think that this is a great idea — but you may not want to unilaterally remove records like that. For instance, recall the example I mentioned earlier, regarding the order-entry system. If you delete an item from inventory because you're not carrying it any more, you certainly don't want to delete all the sales you'd recorded in the past for that item! In that case, it's better to have the program fail. Instead of deleting what you've carried as inventory, a better process might be to create a field in the parent table that indicates whether the item is current.

To recap, a foreign key enforces relational integrity, by guaranteeing the relationship between tables.

Not Null

I've explained NULL values several times, but the definition bears repeating: a NULL is not zero, or empty, or blank. A NULL is a special value type, and means that the value is not known (yet).

So how does that fit in with constraints? Well, if you set a value of NOT NULL when you create or alter a table column, then the program or user must enter a value. This is normally a really good process. It doesn't make a lot of sense to have a database that's filled with "I don't know" values.

If you do constrain values to NOT NULL, it's often a good practice to provide a default value for that column. Here's the syntax for that:

CREATE TABLE test
(
ColumnName VARCHAR(30)
NOT NULL
DEFAULT('Value Not Entered')
)

Use nulls wisely!

Unique

The unique constraint forces the values in a column to be unique. But isn't that what a primary key does? Yes, but there are two important distinctions for this type of constraint. The first is that you can have more than one unique constraint. You can only have one primary key. The other advantage is that the unique constraint doesn't count NULL values; if you have one, it's OK.

Here's the syntax for a unique constraint:

CREATE TABLE test
(
ColumnName VARCHAR(30)
UNIQUE
)

And if you already have the table in place:

ALTER TABLE test ADD NewColumnName VARCHAR(20) NULL 
CONSTRAINT ConstraintName UNIQUE

Notice that I've named the last constraint. Certain constraints, such as this one and the next, can have a name that can be re-used, so that you only have to create the constraint once.

Check

The check constraint is very useful, as it allows you to force the values that can be used in a field. This restriction is called the "domain" of values.

Take a look at the following syntax:

CREATE TABLE test
(
ColumnName int
CHECK (ColumName < 30)
)

Notice that this check constraint isn't named, and that it ensures the value of ColumnName will be less than 30. Here's another example:

CONSTRAINT CK_emp_id 
CHECK (emp_id LIKE
'[A-Z][A-Z][A-Z][1-9][0-9][0-9][0-9][0-9]' )

This example from Books Online shows you the complexity you can build using the CHECK constraint. The brackets indicate the range, and the OR qualifier sets an alternate range. In other words, there are nine characters that you can set: the first can be a letter from A to Z, the next must be an A to Z, and so on. You can see the fourth character must be the number 1 through 9, the next 0 through 9 and so on.


No comments:

Post a Comment