SQL Data Warehouse
What is Azure SQL Data Warehouse?
Azure SQL Data Warehouse, a truly elastic cloud data warehousing service, is your go-to SQL-based view across all your data, making it simple and cost-effective to transform business insights. At approximately 1/10th of the cost of traditional appliance solutions, SQL Data Warehouse delivers more freedom to scale as your business needs grow and change — you only pay for the storage you use and only the compute you need, when you need it.
Benefits of SQL Data Warehouse:
> Decouples storage from compute <
> Integrates across the Azure platform <
> Utilizes SQL Server Transact-SQL (T-SQL) and tools <
> Enables increasing, decreasing, pausing, or resuming compute <
> Combines the SQL Server relational database with Azure cloud scale-out capabilities <
> Complies with various legal and business security requirements such as SOC and ISO <
SQL Data Warehouse is a massively parallel processing (MPP) distributed database system. Behind the scenes, SQL Data Warehouse spreads your data across many shared-nothing storage and processing units. The data is stored in a Premium locally redundant storage layer on top of which dynamically linked Compute nodes execute queries. SQL Data Warehouse takes a “divide and conquer” approach to running loads and complex queries. Requests are received by a Control node, optimized for distribution, and then passed to Compute nodes to do their work in parallel.
The MPP approach is aided by several data warehousing specific performance optimizations, including:
A distributed query optimizer and set of complex statistics across all data. Using information on data size and distribution, the service is able to optimize queries by assessing the cost of specific distributed query operations.
Advanced algorithms and techniques integrated into the data movement process to efficiently move data among computing resources as necessary to perform the query. These data movement operations are built in, and all optimizations to the Data Movement Service happen automatically.
Clustered columnstore indexes by default. By using column-based storage, SQL Data Warehouse gets on average 5x compression gains over traditional row-oriented storage, and up to 10x or more query performance gains. Analytics queries that need to scan a large number of rows work better with columnstore indexes.
SQL Data Warehouse is built with similar technologies as SQL Database, which means that users can expect consistent and predictable performance for analytical queries. Users should expect to see performance scale linearly as they add or subtract Compute nodes. Allocation of resources to your SQL Data Warehouse is measured in Data Warehouse Units (DWUs). DWUs are a measure of underlying resources like CPU, memory, IOPS, which are allocated to your SQL Data Warehouse. Increasing the number of DWUs increases resources and performance. Specifically, DWUs help ensure that:
You are able to scale your data warehouse without worrying about the underlying hardware or software.
You can predict performance improvement for a DWU level before changing the compute of your data warehouse.
The underlying hardware and software of your instance can change or move without affecting your workload performance.
Microsoft can improve the underlying architecture of the service without affecting the performance of your workload.
Microsoft can rapidly improve performance in SQL Data Warehouse, in a way that is scalable and evenly effects the system.
SQL Data Warehouse is based on the SQL Server relational database engine, and includes many of the features you expect from an enterprise data warehouse. If you already know T-SQL, it’s easy to transfer your knowledge to SQL Data Warehouse. Whether you are advanced or just getting started, the examples across the documentation will help begin. Overall, you can think about the way that we’ve constructed the language elements of SQL Data Warehouse as follows:
SQL Data Warehouse uses T-SQL syntax for many operations. It also supports a broad set of traditional SQL constructs, such as stored procedures, user-defined functions, table partitioning, indexes, and collations.
SQL Data Warehouse also contains various newer SQL Server features, including: clustered columnstore indexes, PolyBase integration, and data auditing (complete with threat assessment).
SQL Data Warehouse stores all data in Azure Premium locally redundant storage. Multiple synchronous copies of the data are maintained in the local data center to guarantee transparent data protection against localized failures. In addition, SQL Data Warehouse automatically backs up your active (unpaused) databases at regular intervals using Azure Storage Snapshots.
SQL Data Warehouse also integrates many of the tools that SQL Server users may be familiar with. These tools include:
Polybase allows you to leverage your data from different sources by using familiar T-SQL commands. Polybase enables you to query non-relational data held in Azure Blob storage as though it is a regular table. Use Polybase to query non-relational data, or to import non-relational data into SQL Data Warehouse.+
PolyBase uses external tables to access non-relational data. The table definitions are stored in SQL Data Warehouse, and you can access them by using SQL and tools like you would access normal relational data.
Polybase is agnostic in its integration. It exposes the same features and functionality to all the sources that it supports. The data read by Polybase can be in various formats, including delimited files or ORC files.
PolyBase can be used to access blob storage that is also being used as storage for an HDInsight cluster. This gives you access to the same data with relational and non-relational tools.