Building ETL Restartability in SSIS

By Tom Nonmacher

Building ETL (Extract, Transform, Load) restartability into SSIS (SQL Server Integration Services) is a key factor in ensuring the robustness and reliability of your data integration processes. In today’s blog post, we will focus on building ETL restartability using technologies from SQL Server 2016, SQL Server 2017, MySQL 5.7, DB2 11.1, and Azure SQL.

Restartability in ETL processes is crucial for handling failures and exceptions which may occur during the data integration pipeline. It allows for the continuation of the ETL process from the point of failure, rather than having to re-run the entire process. This is especially important for large scale ETL operations where a full re-run could be time-consuming and costly.

One approach to implementing ETL restartability is through the use of checkpoints in SSIS. Checkpoints allow SSIS packages to restart at the point of failure. When a package is configured to use checkpoints, SSIS creates a checkpoint file which stores information about package execution. If the package fails, the next time it is run, SSIS checks for the existence of a checkpoint file and if it finds one, it uses the information in the file to restart the package from the point of failure.


-- T-SQL code to enable checkpoints in SSIS
EXEC sp_configure 'show advanced options', 1;  
GO  
RECONFIGURE;  
GO  
EXEC sp_configure 'xp_cmdshell', 1;  
GO  
RECONFIGURE;  
GO  

On SQL Server 2016 and 2017, another way to ensure restartability is through the use of the built-in feature, Resumable Online Index Rebuild (ROIR). This feature allows users to pause and later resume an index rebuild operation, providing a great deal of flexibility in managing large index operations and mitigating the impact of failures.


-- T-SQL code to use ROIR
ALTER INDEX IX_SalesOrderHeader_SalesOrderID  
ON Sales.SalesOrderHeader  
REBUILD WITH (RESUMABLE = ON);  
GO 

On MySQL 5.7, a similar approach can be used by enabling the innodb_autoinc_lock_mode configuration set to 2 (Interleaved). This setting allows INSERT and UPDATE operations to be performed concurrently, thus improving concurrency and throughput.


-- MySQL code to set innodb_autoinc_lock_mode
SET GLOBAL innodb_autoinc_lock_mode = 2;

In DB2 11.1, the REORG TABLE command can be used to reorganize a table space to improve performance and manage space effectively. If the REORG TABLE operation is interrupted, it can be restarted from where it left off, ensuring the operation's restartability.


-- DB2 code to use REORG TABLE
REORG TABLE table_name;

Finally, in Azure SQL, Elastic Jobs can be used to schedule and execute T-SQL queries across a group of databases. Elastic Jobs can provide a high degree of resiliency and restartability, as they are designed to automatically retry failed operations.

In conclusion, implementing ETL restartability in SSIS is a vital component in building strong and reliable data integration processes. By employing the techniques and technologies discussed in this post, you can greatly enhance the robustness and resilience of your ETL operations.




07E0C0
Please enter the code from the image above in the box below.