Massively parallel processing architecture
SQL Data Warehouse is a massively parallel processing (MPP) distributed database system. Behind the scenes, SQL Data Warehouse spreads your data across many shared-nothing storage and processing units. The data is stored in a Premium locally redundant storage layer on top of which dynamically linked Compute nodes execute queries. SQL Data Warehouse takes a “divide and conquer” approach to running loads and complex queries. Requests are received by a Control node, optimized for distribution, and then passed to Compute nodes to do their work in parallel.
Optimized for data warehouse workloads
The MPP approach is aided by several data warehousing specific performance optimizations, including:
- A distributed query optimizer and set of complex statistics across all data. Using information on data size and distribution, the service is able to optimize queries by assessing the cost of specific distributed query operations.
- Advanced algorithms and techniques integrated into the data movement process to efficiently move data among computing resources as necessary to perform the query. These data movement operations are built in, and all optimizations to the Data Movement Service happen automatically.
- Clustered columnstore indexes by default. By using column-based storage, SQL Data Warehouse gets on average 5x compression gains over traditional row-oriented storage, and up to 10x or more query performance gains. Analytics queries that need to scan a large number of rows work better with columnstore indexes.
Predictable and scalable performance With Data Warehouse Units
SQL Data Warehouse is built with similar technologies as SQL Database, which means that users can expect consistent and predictable performance for analytical queries. Users should expect to see performance scale linearly as they add or subtract Compute nodes. Allocation of resources to your SQL Data Warehouse is measured in Data Warehouse Units (DWUs). DWUs are a measure of underlying resources like CPU, memory, IOPS, which are allocated to your SQL Data Warehouse. Increasing the number of DWUs increases resources and performance. Specifically, DWUs help ensure that:
- You are able to scale your data warehouse without worrying about the underlying hardware or software.
- You can predict performance improvement for a DWU level before changing the compute of your data warehouse.
- The underlying hardware and software of your instance can change or move without affecting your workload performance.
- Microsoft can improve the underlying architecture of the service without affecting the performance of your workload.
- Microsoft can rapidly improve performance in SQL Data Warehouse, in a way that is scalable and evenly effects the system.
Built on SQL Server
SQL Data Warehouse is based on the SQL Server relational database engine, and includes many of the features you expect from an enterprise data warehouse. If you already know T-SQL, it’s easy to transfer your knowledge to SQL Data Warehouse. Whether you are advanced or just getting started, the examples across the documentation will help begin. Overall, you can think about the way that we’ve constructed the language elements of SQL Data Warehouse as follows:
- SQL Data Warehouse uses T-SQL syntax for many operations. It also supports a broad set of traditional SQL constructs, such as stored procedures, user-defined functions, table partitioning, indexes, and collations.
- SQL Data Warehouse also contains various newer SQL Server features, including: clustered columnstore indexes, PolyBase integration, and data auditing (complete with threat assessment).
Data protection
SQL Data Warehouse stores all data in Azure Premium locally redundant storage. Multiple synchronous copies of the data are maintained in the local data center to guarantee transparent data protection against localized failures. In addition, SQL Data Warehouse automatically backs up your active (unpaused) databases at regular intervals using Azure Storage Snapshots.
Integrated with Microsoft tools
SQL Data Warehouse also integrates many of the tools that SQL Server users may be familiar with. These tools include: