Designing Surrogate Keys for BI Fact Tables
By Tom Nonmacher
In the world of Business Intelligence (BI), fact tables are the cornerstone of any data warehouse. They store the measurable, quantitative data about a business. When designing these fact tables, it's essential to consider the use of surrogate keys. Surrogate keys are unique identifiers which are not derived from the application data. They are typically auto-generated by the database management system (DBMS) and are common in systems like SQL Server 2016, SQL Server 2017, MySQL 5.7, DB2 11.1, and Azure SQL.
Surrogate keys provide several advantages. They are immune to changes in the business rules or schema that may modify the natural keys. They also offer performance benefits, especially in the context of large fact tables, where querying on an integer surrogate key can be significantly faster than querying on a multi-field natural key. Given these benefits, it's clear that surrogate keys play a crucial role in the design of BI fact tables.
Creating a surrogate key in SQL Server 2016 and SQL Server 2017 is straightforward. Use the IDENTITY property to auto-generate a unique integer for each row. Here is a simple example:
CREATE TABLE FactSales (
SalesKey int IDENTITY(1,1) NOT NULL,
ProductID int NOT NULL,
DateKey int NOT NULL,
SalesAmount money NOT NULL
);
In MySQL 5.7, the AUTO_INCREMENT property serves a similar role to SQL Server's IDENTITY. Here's how you might create a surrogate key in a fact table in MySQL:
CREATE TABLE FactSales (
SalesKey int NOT NULL AUTO_INCREMENT,
ProductID int NOT NULL,
DateKey int NOT NULL,
SalesAmount decimal(10,2) NOT NULL,
PRIMARY KEY (SalesKey)
);
For DB2 11.1, the GENERATED ALWAYS AS IDENTITY clause is used to create a surrogate key. Here's an example:
CREATE TABLE FactSales (
SalesKey int NOT NULL GENERATED ALWAYS AS IDENTITY,
ProductID int NOT NULL,
DateKey int NOT NULL,
SalesAmount decimal(10,2) NOT NULL
);
Azure SQL, being a part of the SQL Server family, uses the same IDENTITY property to create surrogate keys as SQL Server 2016 and 2017. The syntax for creating a fact table with a surrogate key is identical to our previous SQL Server example.
The key takeaway here is the importance of designing your BI fact tables with surrogate keys in mind. Regardless of the DBMS you're using—be it SQL Server 2016, SQL Server 2017, MySQL 5.7, DB2 11.1, or Azure SQL—there's a mechanism in place to create these useful, performance-enhancing identifiers. By leveraging them in your fact table design, you can make your BI operations more robust, resilient, and efficient.