SSIS Lookup Cache Modes Compared

By Tom Nonmacher

The SSIS Lookup Component is a versatile and essential component in SQL Server Integration Services (SSIS) that provides a mechanism to correlate data in your data flow with reference data, and it offers three different caching modes: Full, Partial, and No Cache. In this post, we'll be comparing these three modes using the latest technologies from SQL Server 2019, MySQL 8.0, DB2 11.5, Azure SQL, and Azure Synapse.

Full Cache mode, as the name implies, stores the entire reference dataset in memory before the data flow execution begins. This mode, while memory-intensive, offers the fastest lookup operation as it eliminates the need for disk I/O after the cache is loaded. Below is an example of a query in T-SQL that could be used in a Full Cache mode Lookup Component:


SELECT CustomerID, CustomerName
FROM Customers
ORDER BY CustomerID;

Partial Cache mode is a more memory-efficient alternative to Full Cache mode. Instead of loading the entire reference dataset into memory, it only stores rows that have been accessed at least once. If a lookup operation doesn't find a match in the cache, it queries the reference dataset and stores the result in the cache. The following is a MySQL example of a query that could be used in a Partial Cache mode Lookup Component:


SELECT order_id, order_date, customer_id, product_id
FROM orders
ORDER BY order_id;

No Cache mode does not store any data in memory. Every lookup operation results in a query against the reference data. This mode is the most memory-efficient but also the slowest because of the constant disk I/O. Here is a DB2 example of a query that could be used in a No Cache mode Lookup Component:


SELECT PRODUCT_ID, PRODUCT_NAME
FROM PRODUCTS
ORDER BY PRODUCT_ID;

In conclusion, the choice of cache mode in the SSIS Lookup Component can significantly impact the performance and memory consumption of your data flows. Full Cache mode provides the fastest lookup operation but consumes more memory, Partial Cache mode strikes a balance between memory consumption and speed, while No Cache mode is the most memory-efficient but the slowest. Therefore, it's important to understand the nature of your data and the resources at your disposal when choosing a cache mode. Remember, a well-optimized data flow can significantly improve the performance of your ETL processes, whether you're using SQL Server 2019, MySQL 8.0, DB2 11.5, Azure SQL, or Azure Synapse.




6589CC
Please enter the code from the image above in the box below.