SQL Insider

Database Design 101

Queries work well if the databases are designed correctly. Here's what you need to know to build the foundations for your SQL Server databases.

Most IT professionals and developers understand the basic purpose of databases - to store data. But there's more to it than that. For example, you need to decide how the data should be stored. Good decisions will lead to an extensible database. Poor decisions may meet immediate requirements can lead to many headaches in the long run.

If you're strictly a database administrator (or a systems administrators who's responsible for the database servers), there's a good chance that the contents of the databases you administer are "black boxes." You understand the importance of database management issues - performing backups, testing restores, managing database file growth and monitoring and optimizing performance - but you're probably not familiar with the basics of how table structures are designed.

On the other hand, if you're a database developer, it's critically important that you understand how to design tables, views and other database objects. A seemingly simple decision about how to arrange Customers and Order information within a database can have severe impact on usability, manageability and performance. Regardless of your background with relational databases and how they work, a solid understanding of database design issues can form a great foundation for successful database design and management. As you might have guessed from the title, that's the topic of this article - the basics of database design.

Note
In the previous three articles in this series, I covered the basics of database objects and how SQL queries can be used to retrieve information. Although this month's topic is somewhat independent of the subject of those articles, it may help if you understand their content before reading this column.

In this column, I'll provide an overview of database design concepts. Instead of focusing on the specifics of such concepts as referential integrity and database normalization, I'll provide some basic info and point you to resources. The goal will be to understand the factors that make for good design decisions.

Determining Business Requirements
As with any technology-related project, you should clearly define the types of business problems you're trying to solve before you even think about solutions. The design and implementation of databases is no exception. The first step is to determine whether you need a database at all. All too often, it seems that when people want to collect information, this is the way to go. Most of the time, it probably is.

What reasons would compel you to house your important information in a relational database? Some reasons might include support for concurrency (that is, multiple people can access and change the data at the same time), data security (allowing people permissive access) and data manageability (modifying and reporting on large quantities of data). If you determine these features to be the basic requirements of your solution, chances are that a relational database is the best choice.

Before you begin the design of a database, ask these questions:

  • What is the purpose of the database? Often, the answer to this high-level question is obvious. For example, someone might request a database to track customer information such as names and addresses. You might need a method to keep track of inventory. In any case, you should understand whether the database will be used primarily for the storage of transactional information, or whether it will be used mainly for reporting purposes.
  • Who are the subject matter experts? In the ideal world, the developer would be the world's foremost expert on the business usage of the planned database. But it's the real world, so before you embark on a quest to design the perfect database, be sure you know which individuals within your company can answer questions that you might have. For example, if you're designing a financial database, you might need to know about the smallest and largest acceptable values for an "Amount" column. Knowing to whom the question should be asked is half of the solution.
  • What are the detailed requirements for the system? Be sure you can identify all of the entities (such as Customers, Orders and Invoices) that must be tracked within the database. You should also understand how these entities are related. It's important to be specific when you answer this question as a few small changes in the requirements can call for significant rework of a database design.
  • What types of users and applications will access this database? It's important to understand how the data you choose to track will be used. For example, will sales people frequently be viewing Customer information by region? Will hundreds of data entry clerks be entering thousands of Order records per day? The more information you have about the usage of your data, the more appropriate will be your database design.
  • What are the plans for the future of this database? Is it expected that users will be happy with the types of information we are planning to collect? Or, is it likely that they're want to add more to the database and/or integrate this information with other systems within the company.

If you have the above information, you're probably in good shape. So far, you know what you need. Now, let's move on the look at some considerations for designing a database.

The Basics of Database Design
When faced with "designing" a database, novice database implementers often try to think only about the storage of data. However, the method in which information will be retrieved from your database is just as important. Given a set of business constraints, I may be able to come up with multiple database designs. Some of these, however, will make it difficult to retrieve data in the format in which I need it. When you're designing a database, it's important to first think about the types of entities that you want to track. Let's continue to use the example of a Sales Order Entry system (so far, I've mentioned Customers and Orders). In such a system, you should be able to identify the characteristics of Customers, the characteristics of Orders, and how these entities interact.

A typical statement that describes entity relationships might be something like the following: "A Customer can have zero or more Orders, and all Orders must be linked to an existing Customer. An Order can contain one or more Line Items. Each line item includes information about exactly one Product that has been ordered. Each of these Products is identified by a unique number."

Figure 1 shows an example of a basic database schema that shows these relationships. The schema was created in SQL Server 2000 and is provided in the "pubs" sample database. However, many different database design tools allow you to easily create tables and define their relationships. For now, don't worry about the details of the lines between the various tables that I've created.

Table Relationships, on the next
Figure 1. Viewing table relationships in the pubs database schema in SQL Server 2000's Enterprise Manager. (Click image to view larger version.)

Now that we've covered a simple example of how a database can be designed, let's build upon this concept.

Dealing with Relationships
OK, so this title might sound like something out of a self-help book. Well, it sort of is - self-help for database developers. It's important to define and understand the relationships between objects in your database. If you've been following the examples thus far, you know that it's important to be able to enforce the rules that the database objects must follow. For example, you may want to make sure that a column named ZIPCode contains only valid United States ZIP code formats.

Note
This article primarily focuses on database design issues, in general. In some cases, SQL Server 2000 tools and terminology will be used to illustrate specific concepts. If your environment does not use SQL Server 2000, rest assured that the same information applies to most relational database systems that are available today.

In this section, we'll look at ways to make sure that only the right types of information are entered into your database. We'll focus on ways to ensure that your information is kept consistent and the relationships within and between tables are retained.

Defaults, Rules, and Constraints
When you define a column within a table, there's a good chance that you know the types of values that are acceptable. A PhoneNumber field might allow only integers, whereas an Address field might allow alphanumeric characters. The first step in determining acceptable types of data is in the table definition. When you create a table, you must choose from among the data types that are available in the table designer (see Figure 2). In SQL Server, for example, you might choose an int (an integer data type), a varchar (a variable-length character field), a text (long text fields), or a boolean (true/false) value for a column. Your choices here will determine what types of information can be stored in these columns.

SQL Server Table Designer
Figure 2. You create the table definition using the SQL Server 2000 table designer. (Click image to view larger version.)

Choosing a data type limits the types of data that can be stored in a column, but sometimes you want to take it even further. In addition to restrictions defined by data types, you can place other constraints on the types of information that are acceptable in a column of data. SQL Server provides several different types of constraints:

  • NOT NULL - A value must be specific for this column.
  • CHECK - The values supplied must meet the criteria specified in the constraint. A database developer can define a CHECK constraint to ensure that an entered integer is an even, positive value.
  • UNIQUE - No values in this column may be duplicates of another. This might be used, for example, in an employee information table to prevent duplicate employee numbers from being entered.
  • PRIMARY KEY - Defines which column or columns uniquely identify each row in the database. No two rows can have the same values for the primary key.
  • FOREIGN KEY - Enforces data integrity between tables by creating a reference for specific types of information.

We'll discuss the PRIMARY KEY and FOREIGN KEY constraints in the next section. CHECK constraints place limitations on the types of information that can be stored in a specific column of data. For example, I might want to restrict the value in a phone number column to a 10-digit format (with no dashes or other characters). A constraint can ensure that information is entered in numeric format. If the information does not meet the criteria, an error is returned to the user.

Constraints can be placed on one or more columns and can be quite complex, if that's what your business rules call for:

  • Defaults are settings placed on a table that specify which values should be used if none is specified. This type of constraint is commonly used in situations where the database assumes that certain values should be used unless they are otherwise specified by a user or an application. For example, I might want to create default responses for the columns that represent questions in a simple customer survey. If no value is provided for these columns, I may want the value to default to "N/A".
  • Rules function similarly to constraints but have the added benefit of existing as database objects. In contrast to constraints-which are defined as part of a column's definition-rules can be "bound" or "unbound" to columns. This allows the flexibility of disabling a rule without losing its definition. However, only one rule may apply to a column's definition. Rules are provided mainly for backward-compatibility with SQL Server applications. Microsoft recommends that, wherever possible, CHECK constraints be used instead of rules.

Now that we have a good idea of the types of column-level constraints that are available, let's look at the various types of referential integrity that are possible.

Domain, Entity, and Referential Integrity
It is possible to create a database in which all the information stored in tables is completely unstructured and unrelated. This would lead to many problems, however, as is often learned by those who do not take the time to adequately plan the structure of their databases. Generally, difference pieces of information stored in your database objects relate to each other in some way. Again, let's revisit the commonly used example of a sales database. Each sale might be tied to a customer, but the actual information about the customer (including shipping address and purchase history) might be stored in other tables. In this case, a clear relationship between the two tables must be kept intact. Additionally, business rules might require that each customer have a unique customer number that should never be reused.

Part of playing the database game is to ensure that the rules are followed; that is, you must be able to enforce the relationships between entities in your database. Consider the case in which you have a simple Customer - Orders relationship. Suppose that in your system, a Customer can have zero or more orders, and all orders must be attached to a customer. This might seem like an easy rule to define, and you can reasonably code for it within your application. But what happens if another developer forgets the rule and allows users to delete a customer who has several orders without first deleting all of the orders themselves? In this case, you'll have violated the integrity of your database by creating "orphan" records in the Orders table. The simple answer is that this shouldn't be allowed to happen, and the developer or end user should receive an error. Stopping short of this, you run the risk of entering invalid information into your database (something that can be a very serious problem and can be difficult to correct after it occurs). So what's a good database designer to do? Fortunately, there's a relatively easy solution.

Integrity constraints are created to ensure that these relationships are maintained in a consistent manner. There are three major types of integrity that database designers must keep in mind:

  • Domain integrity - Ensures that values stored within a column are consistent and in accordance with business rules. Domain integrity can be based on constraints such as UNIQUE and CHECK constraints that specify what values are acceptable for each column.
  • Entity integrity -- Refers to information stored in rows (remember that each row in a table stores information about one entity of the type that the table describes). This type of constraint makes sure that the information stored in rows within a table is consistent and follows the rules specified. For example, each row must contain the same number of columns (although some values may be left blank).
  • Referential integrity - Applies across tables and ensures that information between these objects is consistent. Referential integrity includes relationships between tables. The actual columns that match between the tables are known as foreign keys and primary keys, and they can be defined using PRIMARY KEY and FOREIGN KEY constraints (mentioned earlier). Referential integrity ensures that related information remains consistent. It solves, for example, the problem I mentioned earlier: ensuring that only valid customers are used for all orders placed in the database and avoids the problem of "orphan rows". Orphans might occur when a customer row is deleted from the database, but the customer still has orders. In this case, the orders are orphans since their parent row (the customer information) no longer exists. When it comes time to fulfill the orders, users will find that they do not have enough information.

Figure 3 illustrates the three main types of database integrity.

I think it's mirrors...
Figure 3. How domain, entity, and referential database integrity types compare.

With the basic ideas of database integrity out of the way, let's move on to another important topic: database normalization.

Understanding Database Normalization
Remember earlier in this article when I referred to the fact that there can be several good database designs that meet a given set of requirements? There's definitely a subjective aspect to database design. One important concept is that of normalization. Normalization refers to the level of separation of data into multiple tables. There are many rules of normalization, and they have been described in various ways. To remain at a high-level, I'll just provide the basics. One rule of normalization is that every column within a table should refer only to one entity. For example, if I have a Customer table, every piece of information within that table (like the customer's name and address) should correspond only to that customer. Additionally, each row within the Customers table should have a unique identifier of some sort (such as a Customer Number value that is unique for each customer).

A highly normalized database schema is characterized by having many "narrow" tables. By narrow, I mean the tables tend to have few columns. A denormalized database schema, on the other hand, tends to have fewer, wide tables. In general, a normalized database is easier to understand since tables refer to very specific entities.

So what should you consider when you decide how normalized your database should be? Well, performance is an important issue that is addressed by normalization. Highly normalized tables can improve performance in transaction processing systems as they limit the performance impacts of locking. If your database experiences a large number of data modification queries (INSERT, UPDATE and DELETE statements), then normalization is the way to go. On the other end of the spectrum, a denormalized database schema provides better performance for reporting functionality (SELECT statements). Since you'll have to join with fewer tables in your queries, reporting, especially with large sets of data, can be much quicker. Of course, denormalization comes at a cost: You may have redundant data, which requires additional storage space.

Remember that the concept of normalization is a continuum. That is, it's not like a database is either normalized or it's not. Rather, it can be "somewhat denormalized", or "highly normalized". Books have been written about database normalization techniques and recommendations. We don't have room in this article to dive into the details. Remember, this is an overview!

Summary
So, there you have it - the basics of database design. In this article, we looked at the basic design goals that you should keep in mind when you are developing a database solution. We covered the all-important concepts of referential integrity and database normalization. Both of these aspects form the basis for much longer discussions about how databases should be designed. The good news is that, if you understand these concepts, you're well on your way to designing efficient databases that meet your business requirements. The bad news is that we only scratched the surface within this article. Nevertheless, be sure you take the time determine your business requirements and design an appropriate database. Sometime soon, someone will be glad you did!

Additional Information
Microsoft wants to test your ability to design real-world database solutions and to take advantage of all of the many features available in SQL Server 2000. With exams that are related to database design and implementation, it can be difficult to judge when you're really ready. I recommend having at least several months of experience in working with SQL queries and designing databases, to give you a strong foundation.

With this information in mind, look at all of the new features in SQL Server 2000 that you have NOT used: don't forget about XML features, materialized views, partitioning and other features, just because you don't use them. The following resources should help you obtain more information about SQL Server 2000:

SQL Server 7.0 and SQL Server 2000 exams are available concurrently. If you're an MCPmag.com Premier member, you can check out the following articles:

I also review the 70-229 SQL Server 2000 Implementation exam in an upcoming issue.

comments powered by Disqus

Reader Comments:

Sat, Apr 29, 2006 Anonymous Anonymous

well done

Sun, Feb 19, 2006 Amadou Koita Mali

Very good article

Tue, Aug 16, 2005 Abubakar Saudi Arabia

It's so great

Wed, Feb 16, 2005 Anonymous Anonymous

Very good

Thu, Feb 10, 2005 Anonymous Anonymous

good basic summary of database design

Tue, Aug 17, 2004 Anonymous Anonymous

nice

Sat, Jul 10, 2004 Anonymous Anonymous

Valuable basic information. Highly recommended reading.

Thu, Apr 29, 2004 c k wadhwa india

i found the article very much useful.but i need that there should be something for newcomer.

Tue, Feb 24, 2004 Carol Atlanta

Excellent and timely for my Database Security class for my MISM degree at DeVry/Keller. Thank You!

Thu, Oct 2, 2003 David Tulsa

Good article. Been in industry for over 7 years. Good advise for new and old. Just need to ensure that us old timers listen and pickup good habits.

Wed, Sep 17, 2003 Anonymous Anonymous

Good document

Wed, Sep 3, 2003 John NY

Good intro article (which is why the title is DB Design 101). Prior comments about it being too general missed the title; its 101 not 201, 301, etc.

Mon, Aug 18, 2003 Anonymous Anonymous

This article is a very good introduction to what a database really is, without getting into too much detail

Wed, Jul 30, 2003 ro aus

Really cleared things up. Thank-You.

Can you reccomend any more articles regarding Relationships.

Sun, Jun 8, 2003 Ontolojim Phoenix

The issues with redundant data are much more significant than "storage". They are a major integrity issue in operational systems. Less of a problem in building a data mart or data warehouse.

Fri, May 23, 2003 Anonymous Anonymous

good

Mon, May 19, 2003 Anonymous Anonymous

great

Mon, Apr 7, 2003 Anonymous Anonymous

used in teaching a DBMS class, great overview and good tool to spook students into understanding this is serious stuff. Great article.

Fri, Mar 14, 2003 ibrahim bangalore

its was more informative ,small & beautiful explaination.

Mon, Feb 10, 2003 Uma India

For beginners it's a good teaching material to understand the concepts.A series of article expnading the concepts explained in the previous article would be more useful.

Wed, Jan 29, 2003 Edwin1608 London

Unfortunately this article doesn't fulfill what it claims to in it's opening paragraphs - it tells you more what you COULD do [as a start...] in database design, but not HOW to do it WELL. It wouldn't take a novice long to find out everything that is available in this article in terms of constraints, DRI and relationships within RDBMS, but I was hoping this article would go some way in suggesting best practice, pitfalls, what not to do, and how to approach database design from the point of data usage rather than simply compiling a subsection of SQL Server Book On-Line. This is meant to be a forum for Microsoft Professionals - not 'a beginners guide to...' - a more professional level of knowledge and discussion would be appreciated.

Thu, Jan 9, 2003 Vinayaka INDIA

to good A first Step Towards Normalizing with SQL Basics

Wed, Jan 8, 2003 Anonymous Anonymous

this is a the best one-page description on efficient database design ever written.

Wed, Jan 8, 2003 Dr. Phil Mclister USA

I fully agree with Norm. Without foundational knowledge one will simply display ignorance. And that is prevasive and is obvious in this article and subsequent comments from the 'Newbies'

Wed, Jan 8, 2003 Norm USA

First I would suggest the author understand the principles of set theory and predicate logic which is the foundation of relational theory. Then he should understand why SQL and proprietory products like SQL Server cannot fully conform to the relational theory. Then he should venture into writing articles based on database design. One should not base his article on relational design based on a non-relational and vendor product like SQL Server. Get informed.

Thu, Dec 19, 2002 mehuel Anonymous

excellent info, im on a school project, about databases, this info cleared my mind up in how i should approach the project.....A grade coming my way!

Fri, Nov 8, 2002 hecbor usa

Good, solid article

Thu, Nov 7, 2002 suresh s Hyderabad

I am luckey to read this article on "Basics for Database design"

Thu, Oct 31, 2002 Anonymous Anonymous

Very nice article discussing the basics of database design... I don't understand the previous posts that complained about needing more in-depth information - what do they want from an article? If you need in-depth database design education take a course or buy a book!

Thu, Sep 5, 2002 Anonymous Anonymous

x

Tue, Aug 13, 2002 yassersaleh egypt

Ifound it good but not in full examples so that peaple coud make samples likeit

Tue, Aug 6, 2002 Anonymous Anonymous

This article is an excellent primer for more detailed concepts concerning database design.

Thu, Jul 25, 2002 Robert Anonymous

Excellent! I agree with MrKets, so it would be great if you could drill down in some stages of the database desing process in futher articles.

Thu, Jul 11, 2002 Chris Cape Breton

Nice database development overview. would like to see more indepth stuff, say database integration with a third party document management system.

Tue, Jun 18, 2002 raj hegde india

Good article for newbies,
need more indepth refrences

Mon, Jun 17, 2002 Luis (MCSD, MCDBA) Mexico

Good article for newbies, let's get deeper.

Mon, May 27, 2002 Nick London

Totally agree - a really useful article, but I'm 17 and have done this with Access at school. I'd love to see some really in-depth stuff to get my teeth into!

Thu, Mar 28, 2002 Ashwani Sharma USA

A very good article for the beginners.

Sat, Mar 16, 2002 AndreM Repentigny

Interesting, full of good information, needs to be detailed further.
New to SQL server

Mon, Feb 4, 2002 Anonymous Anonymous

good one...for starters..

Sat, Jan 5, 2002 Nader Egypt

Great overall concept, I really have benefited from reading it, even I'm new to SQL Server.

Sun, Dec 30, 2001 mcr13 Anonymous

Good point to start, but a more detailed explanation would be great as much as necessary to cover this important topic.

Thu, Dec 27, 2001 MichaelJ California

This is a good starting point for beginners, however, SQL database design and implementation is a critical step in creating and maintaining an extensible database structure. Would like to see more in-depth and detailed information in a future article.

Mon, Dec 10, 2001 Anonymous Anonymous

a good primer to people who know just what database is!

Sun, Nov 25, 2001 Gac MN

I'm COMPLETELY new to SQL server, and this article was an excellent overview of what it takes to use it effectively. The generalities that you left are easily inumerated (I hope) in some other literature.

Sun, Nov 25, 2001 Garrydene Australia

When students asked me to define normalization I always take a deep breath. Your article explains it so clearly and with ease. I'll find it easy to explain from now on.

Fri, Nov 16, 2001 ken Edmonton

DB Design is am important subject, a lot of developers do not put too much serious thought in it, but this article is too general, i expect a much more in depth discussion in MCSD central.

Thu, Nov 8, 2001 sunil india

i found the article very much useful.but i need that there should be something for newcomer, as he need to know the like what are company review over the products....

Thu, Nov 1, 2001 Anonymous Anonymous

Very good article...news you can use!

Wed, Oct 3, 2001 epop Anonymous

I agree with Tosin and MrKets: need to expand this base article into series that actually go deeper in this subject.

Wed, Sep 5, 2001 Anonymous Anonymous

Good info!

Thu, Aug 16, 2001 MrKets India

It was a good article all in all. But if you can chain these article in further deep, i mean to say that in the coming articles if you can drill down to the techniques of normalization and what third party tools are available for normalizing the databases, it would be very much helpful to programmers who have just started their career.

Wed, Aug 8, 2001 Tosin Anonymous

All of these.....but what are your recommendations? Null values on columns degrade perfomance,use Unicode characters only when necessary,
Physical database design issues.... (table Partitioning, check constraints prefered over Referential constraints whenever possible.........

Add Your Comment Now:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above

Redmond Tech Watch

Sign up for our newsletter.

I agree to this site's Privacy Policy.