Hands-On Data Warehousing with Azure Data Factory
eBook - ePub

Hands-On Data Warehousing with Azure Data Factory

ETL techniques to load and transform data from various sources, both on-premises and on cloud

Christian Cote, Michelle Kamrat Gutzait, Giuseppe Ciaburro

Share book
  1. 284 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Hands-On Data Warehousing with Azure Data Factory

ETL techniques to load and transform data from various sources, both on-premises and on cloud

Christian Cote, Michelle Kamrat Gutzait, Giuseppe Ciaburro

Book details
Book preview
Table of contents
Citations

About This Book

Leverage the power of Microsoft Azure Data Factory v2 to build hybrid data solutions

Key Features

  • Combine the power of Azure Data Factory v2 and SQL Server Integration Services
  • Design and enhance performance and scalability of a modern ETL hybrid solution
  • Interact with the loaded data in data warehouse and data lake using Power BI

Book Description

ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources.

Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick's Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights.

By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them.

What you will learn

  • Understand the key components of an ETL solution using Azure Data Factory and Integration Services
  • Design the architecture of a modern ETL hybrid solution
  • Implement ETL solutions for both on-premises and Azure data
  • Improve the performance and scalability of your ETL solution
  • Gain thorough knowledge of new capabilities and features added to Azure Data Factory and Integration Services

Who this book is for

This book is for you if you are a software professional who develops and implements ETL solutions using Microsoft SQL Server or Azure cloud. It will be an added advantage if you are a software engineer, DW/ETL architect, or ETL developer, and know how to create a new ETL implementation or enhance an existing one with ADF or SSIS.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Hands-On Data Warehousing with Azure Data Factory an online PDF/ePUB?
Yes, you can access Hands-On Data Warehousing with Azure Data Factory by Christian Cote, Michelle Kamrat Gutzait, Giuseppe Ciaburro in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Warehousing. We have over one million books available in our catalogue for you to explore.

Information

Year
2018
ISBN
9781789130096
Edition
1

Machine Learning on the Cloud

Machine learning is the ability of a machine to expand its knowledge without human intervention. The concept of machine learning is used in software engineering, in data mining, and, in particular, in artificial intelligence. Starting from a knowledge base rich in information, an automatic learning system searches and extracts any regularity between data through data mining techniques. Machine learning algorithms use mathematical-computational methods to learn information directly from the data, without mathematical models and predetermined equations.
The applications of machine learning are already numerous today, some of which have commonly entered our daily life without us realizing it. For example, search engines, through one or more keywords, return lists of results. Spam filters of emails continuously learn both to intercept suspicious or fraudulent email messages and to act accordingly. Finally, we have speech recognition systems or manual writing identification.
In this chapter, we will be introduced to the basic concepts of machine learning, and then we will take a tour of different types of algorithms. In addition, an introduction, some background information, and basic knowledge of the Microsoft Azure Machine Learning Studio environment will be covered. Finally, we will explore some practical applications to understand the amazing world of machine learning.
In this chapter, we will cover the following topics:
  • Discovering the machine learning capabilities for classification, regression, clustering, and dimensionality reduction, including apps for automated model training and code generation
  • A tour of the most popular machine learning algorithms to choose the right one for our needs
  • Exploring the Azure Machine Learning Studio environment
By the end of this chapter, you will be able to recognize the different machine learning algorithms and the tools that Microsoft Azure Machine Learning Studio provides to handle them.

Machine learning overview

Machine learning is a multidisciplinary field created by intersection and synergy between computer science, statistics, neurobiology, and control theory. Its emergence has played a key role in several fields and has fundamentally changed the vision of software programming. If the question before was, How can we program a computer? now the question has become, How will computers program themselves?
Thus, it is clear that machine learning is a basic method that allows a computer to have its own intelligence.

Machine learning algorithms

The power of machine learning is due to the quality of its algorithms, which have been improved and updated over the years; these are divided into several main types depending on the nature of the signal used for learning or the type of feedback adopted by the system.
They are:
  • Supervised learning: The algorithm generates a function that links input values to a desired output through the observation of a set of examples in which each data input has its relative output data; that is used to construct predictive models.
  • Unsupervised learning: The algorithm tries to derive knowledge from a general input without the help of a set of pre-classified examples that are used to build descriptive models. A typical example of the application of these algorithms is search engines.
  • Reinforcement learning: The algorithm is ability to learn depending on the changes that occur in the environment in which it is performed. In fact, since every action has some effect on the environment concerned, the algorithm is driven by the same feedback environment. Some of these algorithms are used in speech or text recognition.

Supervised learning

Supervised learning is a machine learning technique that aims to program a computer system so that it can resolve the relevant tasks automatically. To do this, the input data is included in a set I (typically vectors). Then the set of output data is fixed as set O, and finally it defines a function f that associates each input with the correct answer. Such information is called a training set.
These types of algorithms are based on learning by example theory: knowledge is gained by starting from a set of positive examples, which are instances of the concept to be learned, and negative examples, which are non-instances of the concept. In other words, there is a teacher who shows what is right and what is wrong; based on these teachings (training phase), the algorithm will learn to recognize new instances of the problem automatically, as shown in the following diagram:
All supervised learning algorithms are based on the following thesis:
If an algorithm provides an adequate number of examples, it will be able to create a derived function B that will approximate the desired function A.
If the approximation of the desired function is adequate, when the input data is offered to the derived function, this function should be able to provide output responses similar to those provided by the desired function and then be acceptable. These algorithms are based on the "similar inputs correspond to similar outputs" concept.
Generally, in the real world, this assumption is not valid; however, some situations exist in which it is acceptable. Clearly, the proper functioning of such algorithms depends significantly on the input data. If there are only a few training inputs, the algorithm might not have enough experience to provide a correct output. Conversely, many inputs may make it excessively slow since the derivative function generated by a large number of inputs could be very complicated.
Moreover, experience shows that this type of algorithm is very sensitive to noise; even a few pieces of incorrect data can make the entire system unreliable and lead to wrong decisions. In supervised learning, it's possible to split problems based on the nature of the data. If the output value is categorical, such as membership/non-membership of a certain class, it is a classification problem. If the output is a continuous real value in a certain range, then it is a regression problem.

Unsupervised learning

The aim of unsupervised learning is to extract information from databases automatically. This process occurs without prior knowledge of the contents to be analyzed. Unlike supervised learning, there is no information on membership classes of the examples or generally on the output corresponding to a certain input. The goal is to get a model that is able to discover interesting properties, groups with similar characteristics (clustering) for instance,...

Table of contents