eBook - ePub

Hands-On Data Science with SQL Server 2017

Name: Hands-On Data Science with SQL Server 2017
Author: Marek Chmel, Vladimír Mužný

Perform end-to-end data analysis to gain efficient data insight

Marek Chmel, Vladimír Mužný

Compartir libro

506 páginas
English
ePUB (apto para móviles)
Disponible en iOS y Android

eBook - ePub

Hands-On Data Science with SQL Server 2017

Perform end-to-end data analysis to gain efficient data insight

Marek Chmel, Vladimír Mužný

Detalles del libro

Vista previa del libro

Índice

Citas

Información del libro

Find, explore, and extract big data to transform into actionable insights

Key Features

Perform end-to-end data analysis—from exploration to visualization
Real-world examples, tasks, and interview queries to be a proficient data scientist
Understand how SQL is used for big data processing using HiveQL and SparkSQL

Book Description

SQL Server is a relational database management system that enables you to cover end-to-end data science processes using various inbuilt services and features.

Hands-On Data Science with SQL Server 2017 starts with an overview of data science with SQL to understand the core tasks in data science. You will learn intermediate-to-advanced level concepts to perform analytical tasks on data using SQL Server. The book has a unique approach, covering best practices, tasks, and challenges to test your abilities at the end of each chapter. You will explore the ins and outs of performing various key tasks such as data collection, cleaning, manipulation, aggregations, and filtering techniques. As you make your way through the chapters, you will turn raw data into actionable insights by wrangling and extracting data from databases using T-SQL. You will get to grips with preparing and presenting data in a meaningful way, using Power BI to reveal hidden patterns. In the concluding chapters, you will work with SQL Server integration services to transform data into a useful format and delve into advanced examples covering machine learning concepts such as predictive analytics using real-world examples.

By the end of this book, you will be in a position to handle the growing amounts of data and perform everyday activities that a data science professional performs.

What you will learn

Understand what data science is and how SQL Server is used for big data processing
Analyze incoming data with SQL queries and visualizations
Create, train, and evaluate predictive models
Make predictions using trained models and establish regular retraining courses
Incorporate data source querying into SQL Server
Enhance built-in T-SQL capabilities using SQLCLR
Visualize data with Reporting Services, Power View, and Power BI
Transform data with R, Python, and Azure

Who this book is for

Hands-On Data Science with SQL Server 2017 is intended for data scientists, data analysts, and big data professionals who want to master their skills learning SQL and its applications. This book will be helpful even for beginners who want to build their career as data science professionals using the power of SQL Server 2017. Basic familiarity with SQL language will aid with understanding the concepts covered in this book.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?

Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.

¿Cómo descargo los libros?

Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.

¿En qué se diferencian los planes de precios?

Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.

¿Qué es Perlego?

Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.

¿Perlego ofrece la función de texto a voz?

Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.

¿Es Hands-On Data Science with SQL Server 2017 un PDF/ePUB en línea?

Sí, puedes acceder a Hands-On Data Science with SQL Server 2017 de Marek Chmel, Vladimír Mužný en formato PDF o ePUB, así como a otros libros populares de Informatique y Programmation en SQL. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial

Packt Publishing

Año

2018

ISBN

9781788996433

Edición

Categoría

Informatique

Categoría

Programmation en SQL

Data Exploration and Statistics with T-SQL

When creating our data science solutions, the data that we want to use in these tasks should be explored carefully. As we learned in Chapter 5, Data Transformation and Cleansing with T-SQL, when data is loaded into a desired format, we need to find the distributions and patterns within it. We should also use data exploration during the staging process to check and improve data quality.

In this chapter, we will learn how to use T-SQL language to get statistical results from our data. To do this, we will use the following techniques:

T-SQL aggregate queries: This section explains what the aggregate query is and which statistical measures it can show.
Ranking, framing, and windowing with T-SQL: Using framing and windowing helps to obtain results enriched by sorting or ranking. In this section, we will play with framing and windowing from the perspective of data exploration.
Running aggregates with T-SQL: This section will join knowledge from the aggregation queries section and the framing and windowing section to help us to create running aggregates or comparisons of values between rows.

Technical requirements

To play with all the scripts and projects in this chapter, SQL Server 2017 is required.

For all SQL scripts provided within this chapter, Management Studio version 17.5 or higher is recommended.

All scripts and other files are provided at https://github.com/PacktPublishing/Hands-On-Data-Science-with-SQL-Server-2017/tree/master/CH6_Data_Exploration_and_Statistics_with_T_SQL. Scripts on GitHub are also ordered accordingly with scripts reported throughout the chapter.

T-SQL aggregate queries

Data exploration and descriptive or comparative statistics are very important tasks that have to be done repeatedly and iteratively during every data science project. This gives us better insight into the data that we want to process throughout all projects. T-SQL aggregate queries are an important part of data exploration.

A T-SQL aggregate query is a kind of query that basically summarizes groups of records from underlying datasets and typically provides aggregated numeric values for each group of records generated from the dataset. Groups of records are not needed for every case or every assignment. Such aggregation queries give an aggregation of summarized values over whole underlying datasets.

The simplest aggregation query does not require grouping. With or without grouping, aggregate queries use special kinds of functions, called aggregate functions. Almost every aggregate function takes a column of values as a parameter, and a result of the function is a number. The only exception to this is a function called SUM(*), which simply counts records. SQL Server provides several aggregate functions, which will be covered in this chapter.

Common properties of aggregate functions

All aggregate functions have the same common behaviors:

Every aggregate function has one parameter of a typically numeric column that computes one value over that column. Two exceptions are made through the following functions:
- COUNT(*): This function computes an amount of records.
- STRING_AGG: This function concatenates strings from a column into one string so that it accepts the varchar or nvarchar columns as a parameter.
Almost every aggregate function except COUNT(*) ignores records in which the aggregated column contains NULL. This is because the COUNT(*) function does not work with certain columns, but with whole records.
Every aggregate function can be used in SELECT and HAVING clauses. In a SELECT clause, it provides scalar results. The HAVING clause serves as a conditional clause similar to the WHERE clause, but for the result of aggregations that are not yet known in a WHERE clause. We will cover more about the HAVING clause in the dedicated section later in this chapter.

Ignoring columns with NULL values by aggregate functions is a very important property of aggregate functions, and we must always consider this behavior. If we miss it, we can obtain inaccurate results from our queries.

Now it's time to introduce all aggregate functions using examples.

Aggregate functions

Now that we know the common properties of aggregate functions, let's start to describe and explore the aggregate functions one by one.

COUNT, COUNT(*), and COUNT_BIG

The simplest aggregate function is probably COUNT(*). This function just computes an amount of records. The following script sample demonstrates how to use this:

SELECT COUNT(*) FROM Sales.SalesOrderDetail

The preceding statement goes over all records from the Sales.SalesOrderDetail table and counts how many rows there are. As seen in the sample, the aggregate function (or functions) can be written into a SELECT clause in the same way as any other expression. The rest of the query can be combined in the usual manner. For example, this may be the case if we want to compute the amount of records belonging only to product ID 710. The following sample statement shows this:

SELECT COUNT(*) FROM Sales.SalesOrderDetail WHERE ProductID = 710

The result of the preceding query is 44 rows. This is because the query-only filters records when ProductID is equal to 710. In addition, the aggregation function only works with records for which the condition is true.

All aggregation functions work in the same way. Only records that meet whole WHERE predicates are processed using the aggregation function.

COUNT(*) is probably the simplest aggregate function provided by SQL Server. Sometimes it's changed with other aggregate functions, called COUNT. Even though both functions have the same name, the difference lies in the parameter. COUNT(*) uses the star (*) symbol to say that it just counts records, but the COUNT function uses certain columns to calculate a number of valid values. In other words, the COUNT fun...