
Hands-on Data Virtualization with Polybase
Administer Big Data, SQL Queries and Data Accessibility Across Hadoop, Azure, Spark, Cassandra, MongoDB, CosmosDB, MySQL and PostgreSQL (English Edition)
- English
- ePUB (mobile friendly)
- Available on iOS & Android
Hands-on Data Virtualization with Polybase
Administer Big Data, SQL Queries and Data Accessibility Across Hadoop, Azure, Spark, Cassandra, MongoDB, CosmosDB, MySQL and PostgreSQL (English Edition)
About this book
Run queries and analysis on big data clusters across relational and non relational databases
Description
This book brings exciting coverage on establishing and managing data virtualization using polybase. This book teaches how to configure polybase on almost all relational and nonrelational databases. You will learn to set up the test environment for any tool or software instantly without hassle. You will practice how to design and build some of the high performing data warehousing solutions and that too in a few minutes of time.You will almost become an expert in connecting to all databases including hadoop, cassandra, MySQL, PostgreSQL, MariaDB and Oracle database. This book also brings exclusive coverage on how to build data clusters on Azure and using Azure Synapse Analytics. By the end of this book, you just don't administer the polybase for managing big data clusters but rather you learn to optimize and boost the performance for enabling data analytics and ease of data accessibility.
What you will learn
? Learn to configure Polybase and process Transact SQL queries with ease.
? Create a Docker container with SQL Server 2019 on Windows and Polybase.
? Establish SQL Server instance with any other software or tool using Polybase.
? Connect with Cassandra, MongoDB, MySQL, PostgreSQL, MariaDB, and IBM DB2.
Who this book is for
This book is for database developers and administrators familiar with the SQL language and command prompt. Managers and decision-makers will also find this book useful. No prior knowledge of any other technology or language is required.
Table of Contents
1. What is Data Virtualization (Polybase)
2. History of Polybase
3. Polybase current state
4. Differences with other technologies
5. Usage
6. Future
7. SQL Server
8. Hadoop Cloudera and Hortonworks
9. Windows Azure Storage Blob
10. Spark
11. From Azure Synapse Analytics
12. From Big Data Clusters
13. Oracle
14. Teradata
15. Cassandra
16. MongoDB
17. CosmosDB
18. MySQL
19. PostgreSQL
20. MariaDB
21. SAP HANA
22. IBM DB2
23. Excel
Frequently asked questions
- Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
- Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Information
CHAPTER 1
Data Virtualization
Structure
- Filtering the information
- Link relational data with storage/file system data
- What you would have to do without data virtualization
- How data virtualization simplifies querying external data
- How learning PolyBase can help you irrespective of your role
Objectives
- Identify on which side of a computer communication network the information should be filtered
- Understand the importance of relational data
- Understand the importance of storage and file system data
- Understand the benefits of data virtualization
- Understand how PolyBase can help different roles
Filtering the information
- If it has more memory,
- If it has additional or faster CPUs,
- If it has additional or faster disks, or
- If it is a distributed system.
Link relational data with storage/file system data
What you would have to do without data virtualization
How data virtualization simplifies querying external data
How learning PolyBase can help you irrespective of your role
- As a database administrator (DBA), you have long-running processes that move information from one place to another, and that information is critical for the business decision support systems. When there is delay, or the process fails, your customer starts losing money and won't be willing to wait for the process to be restarted or lose a whole day of work. PolyBase can accelerate this process thanks to the parallel processing it offers.
- As a data engineer, you divide and sample the information from all data stores, which requires you to learn each data store system's basics and then gather the required information. PolyBase doesn't require you to know anything other than SQL Server and T-SQL, thus simplifying your job.
- As a data scientist, you perform exploratory data analysis before working on the whole data, which requires you to work on large amounts of information using large number of resources. PolyBase allows you to easily work on subsets of data using only SQL Server.
- As a developer, your main goal is to develop fast and efficient programs irrespective of where the data is located. PolyBase allows you to avoid using a linked server, which is slow.
- In a business intelligence (BI) role, you're more interested in the external data to be available than the details about how it works. PolyBase allows you to query the external data without moving all of it, and before all of it has been moved from one point to the other.
- In a machine learning (ML...
Table of contents
- Cover Page
- Title Page
- Copyright Page
- Dedication Page
- About the Author
- About the Reviewer
- Acknowledgement
- Preface
- Errata
- Table of Contents
- 1. Data Virtualization
- 2. History of PolyBase
- 3. PolyBase Current State
- 4. Difference between PolyBase and Other Technologies
- 5. Usage
- 6. Future of PolyBase
- 7. SQL Server
- 8. Hadoop Cloudera and Hortonworks
- 9. Azure Storage
- 10. Spark
- 11. Azure Synapse Analytics
- 12. Big Data Clusters
- 13. Oracle
- 14. Teradata
- 15. Cassandra
- 16. MongoDB
- 17. Cosmos DB
- 18. MySQL
- 19. PostgreSQL
- 20. MariaDB
- 21. SAP HANA
- 22. IBM Db2
- 23. Excel
- Index