eBook - ePub

Hands-On Data Warehousing with Azure Data Factory

Name: Hands-On Data Warehousing with Azure Data Factory
Author: Christian Cote, Michelle Kamrat Gutzait, Giuseppe Ciaburro

ETL techniques to load and transform data from various sources, both on-premises and on cloud

Christian Cote, Michelle Kamrat Gutzait, Giuseppe Ciaburro

Buch teilen

284 Seiten
English
ePUB (handyfreundlich)
Über iOS und Android verfügbar

eBook - ePub

Hands-On Data Warehousing with Azure Data Factory

ETL techniques to load and transform data from various sources, both on-premises and on cloud

Christian Cote, Michelle Kamrat Gutzait, Giuseppe Ciaburro

Angaben zum Buch

Buchvorschau

Inhaltsverzeichnis

Quellenangaben

Über dieses Buch

Leverage the power of Microsoft Azure Data Factory v2 to build hybrid data solutions

Key Features

Combine the power of Azure Data Factory v2 and SQL Server Integration Services
Design and enhance performance and scalability of a modern ETL hybrid solution
Interact with the loaded data in data warehouse and data lake using Power BI

Book Description

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources.

Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick's Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights.

By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.

What you will learn

Understand the key components of an ETL solution using Azure Data Factory and Integration Services
Design the architecture of a modern ETL hybrid solution
Implement ETL solutions for both on-premises and Azure data
Improve the performance and scalability of your ETL solution
Gain thorough knowledge of new capabilities and features added to Azure Data Factory and Integration Services

Who this book is for

This book is for you if you are a software professional who develops and implements ETL solutions using Microsoft SQL Server or Azure cloud. It will be an added advantage if you are a software engineer, DW/ETL architect, or ETL developer, and know how to create a new ETL implementation or enhance an existing one with ADF or SSIS.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?

Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.

(Wie) Kann ich Bücher herunterladen?

Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.

Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?

Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.

Was ist Perlego?

Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.

Unterstützt Perlego Text-zu-Sprache?

Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.

Ist Hands-On Data Warehousing with Azure Data Factory als Online-PDF/ePub verfügbar?

Ja, du hast Zugang zu Hands-On Data Warehousing with Azure Data Factory von Christian Cote, Michelle Kamrat Gutzait, Giuseppe Ciaburro im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Computer Science & Data Warehousing. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag

Packt Publishing

Jahr

2018

ISBN

9781789130096

Auflage

Thema

Computer Science

Thema

Data Warehousing

Machine Learning on the Cloud

Machine learning is the ability of a machine to expand its knowledge without human intervention. The concept of machine learning is used in software engineering, in data mining, and, in particular, in artificial intelligence. Starting from a knowledge base rich in information, an automatic learning system searches and extracts any regularity between data through data mining techniques. Machine learning algorithms use mathematical-computational methods to learn information directly from the data, without mathematical models and predetermined equations.

The applications of machine learning are already numerous today, some of which have commonly entered our daily life without us realizing it. For example, search engines, through one or more keywords, return lists of results. Spam filters of emails continuously learn both to intercept suspicious or fraudulent email messages and to act accordingly. Finally, we have speech recognition systems or manual writing identification.

In this chapter, we will be introduced to the basic concepts of machine learning, and then we will take a tour of different types of algorithms. In addition, an introduction, some background information, and basic knowledge of the Microsoft Azure Machine Learning Studio environment will be covered. Finally, we will explore some practical applications to understand the amazing world of machine learning.

In this chapter, we will cover the following topics:

Discovering the machine learning capabilities for classification, regression, clustering, and dimensionality reduction, including apps for automated model training and code generation
A tour of the most popular machine learning algorithms to choose the right one for our needs
Exploring the Azure Machine Learning Studio environment

By the end of this chapter, you will be able to recognize the different machine learning algorithms and the tools that Microsoft Azure Machine Learning Studio provides to handle them.

Machine learning overview

Machine learning is a multidisciplinary field created by intersection and synergy between computer science, statistics, neurobiology, and control theory. Its emergence has played a key role in several fields and has fundamentally changed the vision of software programming. If the question before was, How can we program a computer? now the question has become, How will computers program themselves?

Thus, it is clear that machine learning is a basic method that allows a computer to have its own intelligence.

Machine learning algorithms

The power of machine learning is due to the quality of its algorithms, which have been improved and updated over the years; these are divided into several main types depending on the nature of the signal used for learning or the type of feedback adopted by the system.

They are:

Supervised learning: The algorithm generates a function that links input values to a desired output through the observation of a set of examples in which each data input has its relative output data; that is used to construct predictive models.
Unsupervised learning: The algorithm tries to derive knowledge from a general input without the help of a set of pre-classified examples that are used to build descriptive models. A typical example of the application of these algorithms is search engines.
Reinforcement learning: The algorithm is ability to learn depending on the changes that occur in the environment in which it is performed. In fact, since every action has some effect on the environment concerned, the algorithm is driven by the same feedback environment. Some of these algorithms are used in speech or text recognition.

Supervised learning

Supervised learning is a machine learning technique that aims to program a computer system so that it can resolve the relevant tasks automatically. To do this, the input data is included in a set I (typically vectors). Then the set of output data is fixed as set O, and finally it defines a function f that associates each input with the correct answer. Such information is called a training set.

These types of algorithms are based on learning by example theory: knowledge is gained by starting from a set of positive examples, which are instances of the concept to be learned, and negative examples, which are non-instances of the concept. In other words, there is a teacher who shows what is right and what is wrong; based on these teachings (training phase), the algorithm will learn to recognize new instances of the problem automatically, as shown in the following diagram:

All supervised learning algorithms are based on the following thesis:

If an algorithm provides an adequate number of examples, it will be able to create a derived function B that will approximate the desired function A.

If the approximation of the desired function is adequate, when the input data is offered to the derived function, this function should be able to provide output responses similar to those provided by the desired function and then be acceptable. These algorithms are based on the "similar inputs correspond to similar outputs" concept.

Generally, in the real world, this assumption is not valid; however, some situations exist in which it is acceptable. Clearly, the proper functioning of such algorithms depends significantly on the input data. If there are only a few training inputs, the algorithm might not have enough experience to provide a correct output. Conversely, many inputs may make it excessively slow since the derivative function generated by a large number of inputs could be very complicated.

Moreover, experience shows that this type of algorithm is very sensitive to noise; even a few pieces of incorrect data can make the entire system unreliable and lead to wrong decisions. In supervised learning, it's possible to split problems based on the nature of the data. If the output value is categorical, such as membership/non-membership of a certain class, it is a classification problem. If the output is a continuous real value in a certain range, then it is a regression problem.

Unsupervised learning

The aim of unsupervised learning is to extract information from databases automatically. This process occurs without prior knowledge of the contents to be analyzed. Unlike supervised learning, there is no information on membership classes of the examples or generally on the output corresponding to a certain input. The goal is to get a model that is able to discover interesting properties, groups with similar characteristics (clustering) for instance,...