SQL Support

SSIS Parallel Execution Patterns with ForEach Loop

By Tom Nonmacher

In the realm of SQL Server Integration Services (SSIS), parallel execution patterns have become a crucial tool for optimizing data flow and reducing processing time. SSIS, provided in SQL Server 2022, is a powerful ETL tool that enables data extraction, transformation, and loading across different data sources. The ForEach Loop task is one of the most versatile tasks in SSIS, which can iterate through a list of items and perform a set of operations for each item. This blog post will explore how to leverage this task to implement parallel execution patterns.

Parallel execution is the process of executing multiple tasks simultaneously, effectively utilizing system resources and improving performance. This is particularly beneficial when dealing with large volumes of data, as it significantly reduces the time taken for data processing. In SSIS, parallel execution can be implemented using the ForEach Loop container along with the Sequence container.

Consider a scenario where you want to load data from multiple tables in a database to Azure SQL. Each table's data is independent of the others, and hence these operations can be performed in parallel. Here's how the T-SQL script for this scenario might look:


-- Create a variable to hold table names

DECLARE @tableName NVARCHAR(128);

-- Cursor to iterate through the tables

DECLARE tableCursor CURSOR FOR

SELECT name FROM sys.tables WHERE type = 'U';

-- Open the cursor

OPEN tableCursor;

-- Fetch the first table name

FETCH NEXT FROM tableCursor INTO @tableName;

-- Loop through all the tables

WHILE @@FETCH_STATUS = 0

BEGIN

    -- Load data from the table to Azure SQL

    EXEC sp_loadData @tableName;

    -- Fetch the next table name

    FETCH NEXT FROM tableCursor INTO @tableName;

END;

-- Close and deallocate the cursor

CLOSE tableCursor;

DEALLOCATE tableCursor;

In this script, a cursor is used to iterate through the tables in the database. The stored procedure sp_loadData is called for each table, which loads the data from the table to Azure SQL. Although this script achieves the required task, it does not make use of parallel execution, and hence can be improved upon.

To implement parallel execution in SSIS, you can use the ForEach Loop container along with the Sequence container. The ForEach Loop container can be configured to iterate through the list of tables, and for each table, a Sequence container can be used to perform the data loading operation. By setting the MaximumConcurrentExecutables property of the package to a value greater than 1, you can achieve parallel execution. Here's how the Databricks notebook code for this scenario might look:


# Import necessary libraries

from pyspark.sql import SparkSession

from pyspark.sql.functions import col

# Create a Spark session

spark = SparkSession.builder.getOrCreate()

# List of table names

tableNames = ["table1", "table2", "table3"]

# Function to load data from a table to Azure SQL

def loadData(tableName):

    # Load data from the table

    df = spark.read.format("jdbc").option("url", "jdbc:sqlserver://[serverName];databaseName=[databaseName];").option("dbtable", tableName).option("user", "[username]").option("password", "[password]").load()

    # Write data to Azure SQL

    df.write.format("jdbc").option("url", "jdbc:sqlserver://[serverName];databaseName=[databaseName];").option("dbtable", tableName).option("user", "[username]").option("password", "[password]").save()

# Load data from all tables in parallel

for tableName in tableNames:

    loadData(tableName)

In this script, a list of table names is created, and a function is defined to load data from a table to Azure SQL. The function is then called for each table in the list. Due to the nature of Databricks notebooks, these function calls are executed in parallel, leveraging the power of Apache Spark.

In conclusion, parallel execution patterns with ForEach Loop in SSIS provide a robust and efficient method for data processing. By leveraging these patterns, you can greatly improve the performance of your ETL processes, especially when dealing with large volumes of data. Remember that the degree of parallelism should be adjusted according to the resources available to avoid overloading the system.

Stored Procedures

Free SQL Help for Devs, DBAs, and the Curious

SSIS Parallel Execution Patterns with ForEach Loop

Search Here:

Categories

Tags