SSIS Parallel Execution Patterns with ForEach Loop

By Tom Nonmacher

In the realm of SQL Server Integration Services (SSIS), parallel execution patterns have become a crucial tool for optimizing data flow and reducing processing time. SSIS, provided in SQL Server 2022, is a powerful ETL tool that enables data extraction, transformation, and loading across different data sources. The ForEach Loop task is one of the most versatile tasks in SSIS, which can iterate through a list of items and perform a set of operations for each item. This blog post will explore how to leverage this task to implement parallel execution patterns.

Parallel execution is the process of executing multiple tasks simultaneously, effectively utilizing system resources and improving performance. This is particularly beneficial when dealing with large volumes of data, as it significantly reduces the time taken for data processing. In SSIS, parallel execution can be implemented using the ForEach Loop container along with the Sequence container.

Consider a scenario where you want to load data from multiple tables in a database to Azure SQL. Each table's data is independent of the others, and hence these operations can be performed in parallel. Here's how the T-SQL script for this scenario might look:


-- Create a variable to hold table names
DECLARE @tableName NVARCHAR(128);
-- Cursor to iterate through the tables
DECLARE tableCursor CURSOR FOR
SELECT name FROM sys.tables WHERE type = 'U';
-- Open the cursor
OPEN tableCursor;
-- Fetch the first table name
FETCH NEXT FROM tableCursor INTO @tableName;
-- Loop through all the tables
WHILE @@FETCH_STATUS = 0
BEGIN
-- Load data from the table to Azure SQL
EXEC sp_loadData @tableName;
-- Fetch the next table name
FETCH NEXT FROM tableCursor INTO @tableName;
END;
-- Close and deallocate the cursor
CLOSE tableCursor;
DEALLOCATE tableCursor;

In this script, a cursor is used to iterate through the tables in the database. The stored procedure sp_loadData is called for each table, which loads the data from the table to Azure SQL. Although this script achieves the required task, it does not make use of parallel execution, and hence can be improved upon.

To implement parallel execution in SSIS, you can use the ForEach Loop container along with the Sequence container. The ForEach Loop container can be configured to iterate through the list of tables, and for each table, a Sequence container can be used to perform the data loading operation. By setting the MaximumConcurrentExecutables property of the package to a value greater than 1, you can achieve parallel execution. Here's how the Databricks notebook code for this scenario might look:


# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
# Create a Spark session
spark = SparkSession.builder.getOrCreate()
# List of table names
tableNames = ["table1", "table2", "table3"]
# Function to load data from a table to Azure SQL
def loadData(tableName):
# Load data from the table
df = spark.read.format("jdbc").option("url", "jdbc:sqlserver://[serverName];databaseName=[databaseName];").option("dbtable", tableName).option("user", "[username]").option("password", "[password]").load()
# Write data to Azure SQL
df.write.format("jdbc").option("url", "jdbc:sqlserver://[serverName];databaseName=[databaseName];").option("dbtable", tableName).option("user", "[username]").option("password", "[password]").save()
# Load data from all tables in parallel
for tableName in tableNames:
loadData(tableName)

In this script, a list of table names is created, and a function is defined to load data from a table to Azure SQL. The function is then called for each table in the list. Due to the nature of Databricks notebooks, these function calls are executed in parallel, leveraging the power of Apache Spark.

In conclusion, parallel execution patterns with ForEach Loop in SSIS provide a robust and efficient method for data processing. By leveraging these patterns, you can greatly improve the performance of your ETL processes, especially when dealing with large volumes of data. Remember that the degree of parallelism should be adjusted according to the resources available to avoid overloading the system.




26C481
Please enter the code from the image above in the box below.