eBook - ePub

Hands-On Data Science with SQL Server 2017

Name: Hands-On Data Science with SQL Server 2017
Author: Marek Chmel, Vladimír Mužný

Perform end-to-end data analysis to gain efficient data insight

Marek Chmel, Vladimír Mužný

Partager le livre

506 pages
English
ePUB (adapté aux mobiles)
Disponible sur iOS et Android

eBook - ePub

Hands-On Data Science with SQL Server 2017

Perform end-to-end data analysis to gain efficient data insight

Marek Chmel, Vladimír Mužný

Détails du livre

Aperçu du livre

Table des matières

Citations

À propos de ce livre

Find, explore, and extract big data to transform into actionable insights

Key Features

Perform end-to-end data analysis—from exploration to visualization
Real-world examples, tasks, and interview queries to be a proficient data scientist
Understand how SQL is used for big data processing using HiveQL and SparkSQL

Book Description

SQL Server is a relational database management system that enables you to cover end-to-end data science processes using various inbuilt services and features.

Hands-On Data Science with SQL Server 2017 starts with an overview of data science with SQL to understand the core tasks in data science. You will learn intermediate-to-advanced level concepts to perform analytical tasks on data using SQL Server. The book has a unique approach, covering best practices, tasks, and challenges to test your abilities at the end of each chapter. You will explore the ins and outs of performing various key tasks such as data collection, cleaning, manipulation, aggregations, and filtering techniques. As you make your way through the chapters, you will turn raw data into actionable insights by wrangling and extracting data from databases using T-SQL. You will get to grips with preparing and presenting data in a meaningful way, using Power BI to reveal hidden patterns. In the concluding chapters, you will work with SQL Server integration services to transform data into a useful format and delve into advanced examples covering machine learning concepts such as predictive analytics using real-world examples.

By the end of this book, you will be in a position to handle the growing amounts of data and perform everyday activities that a data science professional performs.

What you will learn

Understand what data science is and how SQL Server is used for big data processing
Analyze incoming data with SQL queries and visualizations
Create, train, and evaluate predictive models
Make predictions using trained models and establish regular retraining courses
Incorporate data source querying into SQL Server
Enhance built-in T-SQL capabilities using SQLCLR
Visualize data with Reporting Services, Power View, and Power BI
Transform data with R, Python, and Azure

Who this book is for

Hands-On Data Science with SQL Server 2017 is intended for data scientists, data analysts, and big data professionals who want to master their skills learning SQL and its applications. This book will be helpful even for beginners who want to build their career as data science professionals using the power of SQL Server 2017. Basic familiarity with SQL language will aid with understanding the concepts covered in this book.

Foire aux questions

Comment puis-je résilier mon abonnement ?

Il vous suffit de vous rendre dans la section compte dans paramètres et de cliquer sur « Résilier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez résilié votre abonnement, il restera actif pour le reste de la période pour laquelle vous avez payé. Découvrez-en plus ici.

Puis-je / comment puis-je télécharger des livres ?

Pour le moment, tous nos livres en format ePub adaptés aux mobiles peuvent être téléchargés via l’application. La plupart de nos PDF sont également disponibles en téléchargement et les autres seront téléchargeables très prochainement. Découvrez-en plus ici.

Quelle est la différence entre les formules tarifaires ?

Les deux abonnements vous donnent un accès complet à la bibliothèque et à toutes les fonctionnalités de Perlego. Les seules différences sont les tarifs ainsi que la période d’abonnement : avec l’abonnement annuel, vous économiserez environ 30 % par rapport à 12 mois d’abonnement mensuel.

Qu’est-ce que Perlego ?

Nous sommes un service d’abonnement à des ouvrages universitaires en ligne, où vous pouvez accéder à toute une bibliothèque pour un prix inférieur à celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! Découvrez-en plus ici.

Prenez-vous en charge la synthèse vocale ?

Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte à haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accélérer ou le ralentir. Découvrez-en plus ici.

Est-ce que Hands-On Data Science with SQL Server 2017 est un PDF/ePUB en ligne ?

Oui, vous pouvez accéder à Hands-On Data Science with SQL Server 2017 par Marek Chmel, Vladimír Mužný en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Informatique et Programmation en SQL. Nous disposons de plus d’un million d’ouvrages à découvrir dans notre catalogue.

Informations

Éditeur

Packt Publishing

Année

2018

ISBN

9781788996433

Édition

Sujet

Informatique

Sous-sujet

Programmation en SQL

Data Exploration and Statistics with T-SQL

When creating our data science solutions, the data that we want to use in these tasks should be explored carefully. As we learned in Chapter 5, Data Transformation and Cleansing with T-SQL, when data is loaded into a desired format, we need to find the distributions and patterns within it. We should also use data exploration during the staging process to check and improve data quality.

In this chapter, we will learn how to use T-SQL language to get statistical results from our data. To do this, we will use the following techniques:

T-SQL aggregate queries: This section explains what the aggregate query is and which statistical measures it can show.
Ranking, framing, and windowing with T-SQL: Using framing and windowing helps to obtain results enriched by sorting or ranking. In this section, we will play with framing and windowing from the perspective of data exploration.
Running aggregates with T-SQL: This section will join knowledge from the aggregation queries section and the framing and windowing section to help us to create running aggregates or comparisons of values between rows.

Technical requirements

To play with all the scripts and projects in this chapter, SQL Server 2017 is required.

For all SQL scripts provided within this chapter, Management Studio version 17.5 or higher is recommended.

All scripts and other files are provided at https://github.com/PacktPublishing/Hands-On-Data-Science-with-SQL-Server-2017/tree/master/CH6_Data_Exploration_and_Statistics_with_T_SQL. Scripts on GitHub are also ordered accordingly with scripts reported throughout the chapter.

T-SQL aggregate queries

Data exploration and descriptive or comparative statistics are very important tasks that have to be done repeatedly and iteratively during every data science project. This gives us better insight into the data that we want to process throughout all projects. T-SQL aggregate queries are an important part of data exploration.

A T-SQL aggregate query is a kind of query that basically summarizes groups of records from underlying datasets and typically provides aggregated numeric values for each group of records generated from the dataset. Groups of records are not needed for every case or every assignment. Such aggregation queries give an aggregation of summarized values over whole underlying datasets.

The simplest aggregation query does not require grouping. With or without grouping, aggregate queries use special kinds of functions, called aggregate functions. Almost every aggregate function takes a column of values as a parameter, and a result of the function is a number. The only exception to this is a function called SUM(*), which simply counts records. SQL Server provides several aggregate functions, which will be covered in this chapter.

Common properties of aggregate functions

All aggregate functions have the same common behaviors:

Every aggregate function has one parameter of a typically numeric column that computes one value over that column. Two exceptions are made through the following functions:
- COUNT(*): This function computes an amount of records.
- STRING_AGG: This function concatenates strings from a column into one string so that it accepts the varchar or nvarchar columns as a parameter.
Almost every aggregate function except COUNT(*) ignores records in which the aggregated column contains NULL. This is because the COUNT(*) function does not work with certain columns, but with whole records.
Every aggregate function can be used in SELECT and HAVING clauses. In a SELECT clause, it provides scalar results. The HAVING clause serves as a conditional clause similar to the WHERE clause, but for the result of aggregations that are not yet known in a WHERE clause. We will cover more about the HAVING clause in the dedicated section later in this chapter.

Ignoring columns with NULL values by aggregate functions is a very important property of aggregate functions, and we must always consider this behavior. If we miss it, we can obtain inaccurate results from our queries.

Now it's time to introduce all aggregate functions using examples.

Aggregate functions

Now that we know the common properties of aggregate functions, let's start to describe and explore the aggregate functions one by one.

COUNT, COUNT(*), and COUNT_BIG

The simplest aggregate function is probably COUNT(*). This function just computes an amount of records. The following script sample demonstrates how to use this:

SELECT COUNT(*) FROM Sales.SalesOrderDetail

The preceding statement goes over all records from the Sales.SalesOrderDetail table and counts how many rows there are. As seen in the sample, the aggregate function (or functions) can be written into a SELECT clause in the same way as any other expression. The rest of the query can be combined in the usual manner. For example, this may be the case if we want to compute the amount of records belonging only to product ID 710. The following sample statement shows this:

SELECT COUNT(*) FROM Sales.SalesOrderDetail WHERE ProductID = 710

The result of the preceding query is 44 rows. This is because the query-only filters records when ProductID is equal to 710. In addition, the aggregation function only works with records for which the condition is true.

All aggregation functions work in the same way. Only records that meet whole WHERE predicates are processed using the aggregation function.

COUNT(*) is probably the simplest aggregate function provided by SQL Server. Sometimes it's changed with other aggregate functions, called COUNT. Even though both functions have the same name, the difference lies in the parameter. COUNT(*) uses the star (*) symbol to say that it just counts records, but the COUNT function uses certain columns to calculate a number of valid values. In other words, the COUNT fun...