eBook - ePub

Hands-On Data Science with SQL Server 2017

Name: Hands-On Data Science with SQL Server 2017
Author: Marek Chmel, Vladimír Mužný

Perform end-to-end data analysis to gain efficient data insight

Marek Chmel, Vladimír Mužný

Buch teilen

506 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

Hands-On Data Science with SQL Server 2017

Perform end-to-end data analysis to gain efficient data insight

Marek Chmel, Vladimír Mužný

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Über dieses Buch

Find, explore, and extract big data to transform into actionable insights

Key Features

Perform end-to-end data analysis—from exploration to visualization
Real-world examples, tasks, and interview queries to be a proficient data scientist
Understand how SQL is used for big data processing using HiveQL and SparkSQL

Book Description

SQL Server is a relational database management system that enables you to cover end-to-end data science processes using various inbuilt services and features.

Hands-On Data Science with SQL Server 2017 starts with an overview of data science with SQL to understand the core tasks in data science. You will learn intermediate-to-advanced level concepts to perform analytical tasks on data using SQL Server. The book has a unique approach, covering best practices, tasks, and challenges to test your abilities at the end of each chapter. You will explore the ins and outs of performing various key tasks such as data collection, cleaning, manipulation, aggregations, and filtering techniques. As you make your way through the chapters, you will turn raw data into actionable insights by wrangling and extracting data from databases using T-SQL. You will get to grips with preparing and presenting data in a meaningful way, using Power BI to reveal hidden patterns. In the concluding chapters, you will work with SQL Server integration services to transform data into a useful format and delve into advanced examples covering machine learning concepts such as predictive analytics using real-world examples.

By the end of this book, you will be in a position to handle the growing amounts of data and perform everyday activities that a data science professional performs.

What you will learn

Understand what data science is and how SQL Server is used for big data processing
Analyze incoming data with SQL queries and visualizations
Create, train, and evaluate predictive models
Make predictions using trained models and establish regular retraining courses
Incorporate data source querying into SQL Server
Enhance built-in T-SQL capabilities using SQLCLR
Visualize data with Reporting Services, Power View, and Power BI
Transform data with R, Python, and Azure

Who this book is for

Hands-On Data Science with SQL Server 2017 is intended for data scientists, data analysts, and big data professionals who want to master their skills learning SQL and its applications. This book will be helpful even for beginners who want to build their career as data science professionals using the power of SQL Server 2017. Basic familiarity with SQL language will aid with understanding the concepts covered in this book.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist Hands-On Data Science with SQL Server 2017 als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu Hands-On Data Science with SQL Server 2017 von Marek Chmel, Vladimír Mužný im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Informatique & Programmation en SQL. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

Packt Publishing

Jahr

2018

ISBN

9781788996433

Auflage

Thema

Informatique

Thema

Programmation en SQL

Data Exploration and Statistics with T-SQL

When creating our data science solutions, the data that we want to use in these tasks should be explored carefully. As we learned in Chapter 5, Data Transformation and Cleansing with T-SQL, when data is loaded into a desired format, we need to find the distributions and patterns within it. We should also use data exploration during the staging process to check and improve data quality.

In this chapter, we will learn how to use T-SQL language to get statistical results from our data. To do this, we will use the following techniques:

T-SQL aggregate queries: This section explains what the aggregate query is and which statistical measures it can show.
Ranking, framing, and windowing with T-SQL: Using framing and windowing helps to obtain results enriched by sorting or ranking. In this section, we will play with framing and windowing from the perspective of data exploration.
Running aggregates with T-SQL: This section will join knowledge from the aggregation queries section and the framing and windowing section to help us to create running aggregates or comparisons of values between rows.

Technical requirements

To play with all the scripts and projects in this chapter, SQL Server 2017 is required.

For all SQL scripts provided within this chapter, Management Studio version 17.5 or higher is recommended.

All scripts and other files are provided at https://github.com/PacktPublishing/Hands-On-Data-Science-with-SQL-Server-2017/tree/master/CH6_Data_Exploration_and_Statistics_with_T_SQL. Scripts on GitHub are also ordered accordingly with scripts reported throughout the chapter.

T-SQL aggregate queries

Data exploration and descriptive or comparative statistics are very important tasks that have to be done repeatedly and iteratively during every data science project. This gives us better insight into the data that we want to process throughout all projects. T-SQL aggregate queries are an important part of data exploration.

A T-SQL aggregate query is a kind of query that basically summarizes groups of records from underlying datasets and typically provides aggregated numeric values for each group of records generated from the dataset. Groups of records are not needed for every case or every assignment. Such aggregation queries give an aggregation of summarized values over whole underlying datasets.

The simplest aggregation query does not require grouping. With or without grouping, aggregate queries use special kinds of functions, called aggregate functions. Almost every aggregate function takes a column of values as a parameter, and a result of the function is a number. The only exception to this is a function called SUM(*), which simply counts records. SQL Server provides several aggregate functions, which will be covered in this chapter.

Common properties of aggregate functions

All aggregate functions have the same common behaviors:

Every aggregate function has one parameter of a typically numeric column that computes one value over that column. Two exceptions are made through the following functions:
- COUNT(*): This function computes an amount of records.
- STRING_AGG: This function concatenates strings from a column into one string so that it accepts the varchar or nvarchar columns as a parameter.
Almost every aggregate function except COUNT(*) ignores records in which the aggregated column contains NULL. This is because the COUNT(*) function does not work with certain columns, but with whole records.
Every aggregate function can be used in SELECT and HAVING clauses. In a SELECT clause, it provides scalar results. The HAVING clause serves as a conditional clause similar to the WHERE clause, but for the result of aggregations that are not yet known in a WHERE clause. We will cover more about the HAVING clause in the dedicated section later in this chapter.

Ignoring columns with NULL values by aggregate functions is a very important property of aggregate functions, and we must always consider this behavior. If we miss it, we can obtain inaccurate results from our queries.

Now it's time to introduce all aggregate functions using examples.

Aggregate functions

Now that we know the common properties of aggregate functions, let's start to describe and explore the aggregate functions one by one.

COUNT, COUNT(*), and COUNT_BIG

The simplest aggregate function is probably COUNT(*). This function just computes an amount of records. The following script sample demonstrates how to use this:

SELECT COUNT(*) FROM Sales.SalesOrderDetail

The preceding statement goes over all records from the Sales.SalesOrderDetail table and counts how many rows there are. As seen in the sample, the aggregate function (or functions) can be written into a SELECT clause in the same way as any other expression. The rest of the query can be combined in the usual manner. For example, this may be the case if we want to compute the amount of records belonging only to product ID 710. The following sample statement shows this:

SELECT COUNT(*) FROM Sales.SalesOrderDetail WHERE ProductID = 710

The result of the preceding query is 44 rows. This is because the query-only filters records when ProductID is equal to 710. In addition, the aggregation function only works with records for which the condition is true.

All aggregation functions work in the same way. Only records that meet whole WHERE predicates are processed using the aggregation function.

COUNT(*) is probably the simplest aggregate function provided by SQL Server. Sometimes it's changed with other aggregate functions, called COUNT. Even though both functions have the same name, the difference lies in the parameter. COUNT(*) uses the star (*) symbol to say that it just counts records, but the COUNT function uses certain columns to calculate a number of valid values. In other words, the COUNT fun...