Data Vault Modeling Techniques for Historical Tracking Note from the Data Whisperer

By Tom Nonmacher

Hello Data Whisperers! Today we're going to delve into the world of Data Vault modeling techniques specifically designed for historical tracking. This modeling approach is central to the design of modern data warehouses, especially when working with SQL Server 2012, SQL Server 2014, MySQL 5.6, DB2 10.5, or Azure SQL. With Data Vault, we can capture and store every piece of data along with the context in which it was captured, thus allowing us to keep a record of data throughout its entire lifecycle.

Data Vault modeling consists of three primary components: Hubs, Links, and Satellites. Hubs represent the business keys, Links represent the associations or relationships between Hubs, and Satellites contain descriptive attributes related to Hubs and Links. This granular level of detail facilitates deep-dive historical analysis, trend identification, and pattern discovery.

Here's an example of how we might create a Hub table in SQL Server. The Hub table represents a unique list of business keys with an associated system-generated surrogate key. Note the use of the HashBytes function for hash key generation. This is a unique identifier for each record.

CREATE TABLE Hub_Customer (
Hub_Customer_HK BIGINT PRIMARY KEY,
Customer_BK INT,
Record_Source VARCHAR(100),
Load_Date_Time DATETIME
);
INSERT INTO Hub_Customer (Hub_Customer_HK, Customer_BK, Record_Source, Load_Date_Time)
VALUES (HashBytes('SHA2_256', CAST(NEWID() AS NVARCHAR(36))), 123, 'Source System', GETDATE());

Link tables in Data Vault are very similar to association tables in a traditional relational model. They represent the relationships between different Hubs. Here is how we might create a Link table in MySQL 5.6. Each Link table has a surrogate key, the hash keys of the associated Hubs, and additional metadata.

CREATE TABLE Link_Order_Customer (
Link_Order_Customer_HK BIGINT PRIMARY KEY,
Hub_Order_HK BIGINT,
Hub_Customer_HK BIGINT,
Record_Source VARCHAR(100),
Load_Date_Time DATETIME
);
INSERT INTO Link_Order_Customer (Link_Order_Customer_HK, Hub_Order_HK, Hub_Customer_HK, Record_Source, Load_Date_Time)
VALUES (SHA2('SHA256', UUID()), 123, 456, 'Source System', NOW());

Finally, let's look at Satellites. These tables store all the descriptive, contextual, and temporal data associated with a Hub or Link. Here's an example of a Satellite table creation in DB2 10.5. Note the use of the GENERATED ALWAYS AS IDENTITY clause to create a unique identifier for each record.

CREATE TABLE Satellite_Customer (
Satellite_Customer_SK INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
Hub_Customer_HK BIGINT,
Customer_Name VARCHAR(100),
Customer_Address VARCHAR(200),
Load_Date_Time TIMESTAMP,
End_Date_Time TIMESTAMP,
Record_Source VARCHAR(100),
PRIMARY KEY (Satellite_Customer_SK)
);

As you can see, Data Vault modeling provides a robust framework for capturing and maintaining historical data in a structured and query-able format. Whether you're using SQL Server, MySQL, DB2, or Azure SQL, these principles remain the same. Happy data vaulting!




0486B9
Please enter the code from the image above in the box below.