SQL Server 2019: Big Data Clusters Introduction
By Tom Nonmacher
The release of SQL Server 2019 has brought about a substantial shift in the way we handle big data. With the introduction of Big Data Clusters (BDC), SQL Server 2019 provides a complete AI and big data solution. The BDC feature facilitates the deployment of scalable clusters of SQL Server, Spark, and HDFS containers running on Kubernetes. These components are running side by side to enable you to read, write, and process big data from Transact-SQL or Spark, allowing you to easily combine and analyze your high-value relational data with high-volume big data.
Big Data Clusters allow you to analyze big data from Transact-SQL or Spark. Let's take a look at an example of how to query external data in a SQL Server Big Data Cluster. This example uses Transact-SQL, the proprietary extension to the SQL language.
SELECT * FROM EXTERNAL TABLE [Hadoop].[Hdfs].[BigDataTable] WHERE Year = 2020;
The SQL Server 2019 Big Data Clusters feature provides a unified data platform that allows you to integrate and manage your structured and unstructured data. You can deploy Big Data Clusters on the Azure Kubernetes Service (AKS), where you can scale compute and storage separately according to your workload demands.
Azure SQL Data Warehouse, now part of Azure Synapse Analytics, seamlessly integrates with Big Data Clusters, offering a highly scalable, analytics service that brings together big data and data warehousing. This makes it easier than ever to explore, analyze and visualize big data.
While SQL Server 2019 offers Big Data Clusters, it's worth mentioning that other databases like MySQL 8.0 and IBM's DB2 11.5 also offer impressive big data solutions. MySQL 8.0, for instance, supports JSON, allowing you to store, search, and manipulate JSON documents using SQL. IBM's DB2 11.5, on the other hand, offers a unique feature known as BLU Acceleration, which uses a combination of columnar storage and memory optimization for high-speed analysis of big data.
As we can see, SQL Server 2019's Big Data Clusters represent a major step forward in big data management, offering a comprehensive solution for managing and analyzing big data. With BDC, you can now store and analyze your data in one place, using the tools and platforms that best meet your needs.