Data Science with SQL Server Quick Start Guide
eBook - ePub

Data Science with SQL Server Quick Start Guide

Integrate SQL Server with data science

Dejan Sarka

Share book
  1. 206 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Data Science with SQL Server Quick Start Guide

Integrate SQL Server with data science

Dejan Sarka

Book details
Book preview
Table of contents
Citations

About This Book

Get unique insights from your data by combining the power of SQL Server, R and Python

Key Features

  • Use the features of SQL Server 2017 to implement the data science project life cycle
  • Leverage the power of R and Python to design and develop efficient data models
  • find unique insights from your data with powerful techniques for data preprocessing and analysis

Book Description

SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you.

This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding, through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment.

You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm.

What you will learn

  • Use the popular programming languages, T-SQL, R, and Python, for data science
  • Understand your data with queries and introductory statistics
  • Create and enhance the datasets for ML
  • Visualize and analyze data using basic and advanced graphs
  • Explore ML using unsupervised and supervised models
  • Deploy models in SQL Server and perform predictions

Who this book is for

SQL Server professionals who want to start with data science, and data scientists who would like to start using SQL Server in their projects will find this book to be useful. Prior exposure to SQL Server will be helpful.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Data Science with SQL Server Quick Start Guide an online PDF/ePUB?
Yes, you can access Data Science with SQL Server Quick Start Guide by Dejan Sarka in PDF and/or ePUB format, as well as other popular books in Informatik & Datenmodellierung- & design. We have over one million books available in our catalogue for you to explore.

Information

Year
2018
ISBN
9781789537130

Unsupervised Machine Learning

Finally, we are there—we are going to do some real data science now. In the last two chapters, I am going to introduce some of the most popular advanced data mining and machine learning algorithms. I will show you how to use them to get in-depth knowledge from your data.
The most common separation of the algorithms is separation into two groups: the unsupervised, or undirected, and the supervised, or directed algorithms. The unsupervised ones have no target variable. You just try to find some interesting patterns, for example, some distinctive groups of cases, in your data. Then you need to analyze the results to make the interpretation possible. Talking about groups of cases, or clusters – you don't know the labels of those clusters in advance. Once you determine them, you need to check the characteristics of input variables in the clusters in order to get an understanding of the meaning of the clusters.
Before starting with the advanced algorithms, I will make a quick detour. I will just show you how you can install additional R and Python packages on the server side, for ML Services (In-Database).
This chapter covers the following:
  • Installing ML Services (In-Database) packages
  • Performing market-basket analysis
  • Finding clusters of similar cases
  • Dimensionality-reduction with principal-component analysis
  • Extracting underlying factors from variables

Installing ML services (In-Database) packages

Because of security, you cannot just call the install.packages() R function from the sys.sp_exacute_external_script system procedure on the server side. There are many other ways to do it. You can find the complete list of options for installing R packages in the article Install new R packages on SQL Server at https://docs.microsoft.com/en-us/sql/advanced-analytics/r/install-additional-r-packages-on-sql-server?view=sql-server-2017. I will just show one option here, the one I am using when writing this book. I have my SQL Server installed on a virtual machine, and I can enable a web connection for the machine. Then the process of installing an additional R package is simple. You just need to run the R console, R.exe, from the ML Services (In-Database) installation, which is located in the C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\R_SERVICES\bin folder for the default instance installation. You need to run R.exe as an administrator:
Running R.exe with administrative permissions
Before starting to install a package, I check the installed packages with the following T-SQL code:
USE AdventureWorksDW2017;
EXECUTE sys.sp_execute_external_script
@language=N'R',
@script =
N'str(OutputDataSet);
instpack <- installed.packages();
NameOnly <- instpack[,1];
OutputDataSet <- as.data.frame(NameOnly);'
WITH RESULT SETS (
( PackageName nvarchar(20) )
);
GO
Initially, I had 57 packages installed. Then I used the install.packages("dplyr") command in R.exe to install the dplyr library. After installation, I closed the R.exe console with the q() function. Then I used...

Table of contents