eBook - ePub

Hands-On Data Warehousing with Azure Data Factory

Name: Hands-On Data Warehousing with Azure Data Factory
ISBN: 9781789130096

ETL techniques to load and transform data from various sources, both on-premises and on cloud

Christian Cote,

Michelle Kamrat Gutzait,

Giuseppe Ciaburro,

284 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Hands-On Data Warehousing with Azure Data Factory

ETL techniques to load and transform data from various sources, both on-premises and on cloud

Christian Cote,

Michelle Kamrat Gutzait,

Giuseppe Ciaburro,

About this book

Leverage the power of Microsoft Azure Data Factory v2 to build hybrid data solutions

Key Features

Combine the power of Azure Data Factory v2 and SQL Server Integration Services
Design and enhance performance and scalability of a modern ETL hybrid solution
Interact with the loaded data in data warehouse and data lake using Power BI

Book Description

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources.

Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick's Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights.

By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.

What you will learn

Understand the key components of an ETL solution using Azure Data Factory and Integration Services
Design the architecture of a modern ETL hybrid solution
Implement ETL solutions for both on-premises and Azure data
Improve the performance and scalability of your ETL solution
Gain thorough knowledge of new capabilities and features added to Azure Data Factory and Integration Services

Who this book is for

This book is for you if you are a software professional who develops and implements ETL solutions using Microsoft SQL Server or Azure cloud. It will be an added advantage if you are a software engineer, DW/ETL architect, or ETL developer, and know how to create a new ETL implementation or enhance an existing one with ADF or SSIS.

Tools to learn more effectively

Saving Books

Keyword Search

Annotating Text

Listen to it instead

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Informatique

Subtopic

Modélisation et conception de données

Machine Learning on the Cloud

Machine learning is the ability of a machine to expand its knowledge without human intervention. The concept of machine learning is used in software engineering, in data mining, and, in particular, in artificial intelligence. Starting from a knowledge base rich in information, an automatic learning system searches and extracts any regularity between data through data mining techniques. Machine learning algorithms use mathematical-computational methods to learn information directly from the data, without mathematical models and predetermined equations.

The applications of machine learning are already numerous today, some of which have commonly entered our daily life without us realizing it. For example, search engines, through one or more keywords, return lists of results. Spam filters of emails continuously learn both to intercept suspicious or fraudulent email messages and to act accordingly. Finally, we have speech recognition systems or manual writing identification.

In this chapter, we will be introduced to the basic concepts of machine learning, and then we will take a tour of different types of algorithms. In addition, an introduction, some background information, and basic knowledge of the Microsoft Azure Machine Learning Studio environment will be covered. Finally, we will explore some practical applications to understand the amazing world of machine learning.

In this chapter, we will cover the following topics:

Discovering the machine learning capabilities for classification, regression, clustering, and dimensionality reduction, including apps for automated model training and code generation
A tour of the most popular machine learning algorithms to choose the right one for our needs
Exploring the Azure Machine Learning Studio environment

By the end of this chapter, you will be able to recognize the different machine learning algorithms and the tools that Microsoft Azure Machine Learning Studio provides to handle them.

Machine learning overview

Machine learning is a multidisciplinary field created by intersection and synergy between computer science, statistics, neurobiology, and control theory. Its emergence has played a key role in several fields and has fundamentally changed the vision of software programming. If the question before was, How can we program a computer? now the question has become, How will computers program themselves?

Thus, it is clear that machine learning is a basic method that allows a computer to have its own intelligence.

Machine learning algorithms

The power of machine learning is due to the quality of its algorithms, which have been improved and updated over the years; these are divided into several main types depending on the nature of the signal used for learning or the type of feedback adopted by the system.

They are:

Supervised learning: The algorithm generates a function that links input values to a desired output through the observation of a set of examples in which each data input has its relative output data; that is used to construct predictive models.
Unsupervised learning: The algorithm tries to derive knowledge from a general input without the help of a set of pre-classified examples that are used to build descriptive models. A typical example of the application of these algorithms is search engines.
Reinforcement learning: The algorithm is ability to learn depending on the changes that occur in the environment in which it is performed. In fact, since every action has some effect on the environment concerned, the algorithm is driven by the same feedback environment. Some of these algorithms are used in speech or text recognition.

Supervised learning

Supervised learning is a machine learning technique that aims to program a computer system so that it can resolve the relevant tasks automatically. To do this, the input data is included in a set I (typically vectors). Then the set of output data is fixed as set O, and finally it defines a function f that associates each input with the correct answer. Such information is called a training set.

These types of algorithms are based on learning by example theory: knowledge is gained by starting from a set of positive examples, which are instances of the concept to be learned, and negative examples, which are non-instances of the concept. In other words, there is a teacher who shows what is right and what is wrong; based on these teachings (training phase), the algorithm will learn to recognize new instances of the problem automatically, as shown in the following diagram:

All supervised learning algorithms are based on the following thesis:

If an algorithm provides an adequate number of examples, it will be able to create a derived function B that will approximate the desired function A.

If the approximation of the desired function is adequate, when the input data is offered to the derived function, this function should be able to provide output responses similar to those provided by the desired function and then be acceptable. These algorithms are based on the "similar inputs correspond to similar outputs" concept.

Generally, in the real world, this assumption is not valid; however, some situations exist in which it is acceptable. Clearly, the proper functioning of such algorithms depends significantly on the input data. If there are only a few training inputs, the algorithm might not have enough experience to provide a correct output. Conversely, many inputs may make it excessively slow since the derivative function generated by a large number of inputs could be very complicated.

Moreover, experience shows that this type of algorithm is very sensitive to noise; even a few pieces of incorrect data can make the entire system unreliable and lead to wrong decisions. In supervised learning, it's possible to split problems based on the nature of the data. If the output value is categorical, such as membership/non-membership of a certain class, it is a classification problem. If the output is a continuous real value in a certain range, then it is a regression problem.

Unsupervised learning

The aim of unsupervised learning is to extract information from databases automatically. This process occurs without prior knowledge of the contents to be analyzed. Unlike supervised learning, there is no information on membership classes of the examples or generally on the output corresponding to a certain input. The goal is to get a model that is able to discover interesting properties, groups with similar characteristics (clustering) for instance,...

Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
The Modern Data Warehouse
Getting Started with Our First Data Factory
SSIS Lift and Shift
Azure Data Lake
Machine Learning on the Cloud
Introduction to Azure Databricks
Reporting on the Modern Data Warehouse

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Hands-On Data Warehousing with Azure Data Factory by Christian Cote, Michelle Kamrat Gutzait, Giuseppe Ciaburro in PDF and/or ePUB format, as well as other popular books in Informatique & Modélisation et conception de données. We have over one million books available in our catalogue for you to explore.

About this book

Tools to learn more effectively

Information

Table of contents

Frequently asked questions