ETL with Azure Cookbook
eBook - ePub

ETL with Azure Cookbook

Practical recipes for building modern ETL solutions to load and transform data from any source

  1. 446 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

ETL with Azure Cookbook

Practical recipes for building modern ETL solutions to load and transform data from any source

About this book

Explore the latest Azure ETL techniques both on-premises and in the cloud using Azure services such as SQL Server Integration Services (SSIS), Azure Data Factory, and Azure Databricks

Key Features

  • Understand the key components of an ETL solution using Azure Integration Services
  • Discover the common and not-so-common challenges faced while creating modern and scalable ETL solutions
  • Program and extend your packages to develop efficient data integration and data transformation solutions

Book Description

ETL is one of the most common and tedious procedures for moving and processing data from one database to another. With the help of this book, you will be able to speed up the process by designing effective ETL solutions using the Azure services available for handling and transforming any data to suit your requirements.

With this cookbook, you'll become well versed in all the features of SQL Server Integration Services (SSIS) to perform data migration and ETL tasks that integrate with Azure. You'll learn how to transform data in Azure and understand how legacy systems perform ETL on-premises using SSIS. Later chapters will get you up to speed with connecting and retrieving data from SQL Server 2019 Big Data Clusters, and even show you how to extend and customize the SSIS toolbox using custom-developed tasks and transforms. This ETL book also contains practical recipes for moving and transforming data with Azure services, such as Data Factory and Azure Databricks, and lets you explore various options for migrating SSIS packages to Azure. Toward the end, you'll find out how to profile data in the cloud and automate service creation with Business Intelligence Markup Language (BIML).

By the end of this book, you'll have developed the skills you need to create and automate ETL solutions on-premises as well as in Azure.

What you will learn

  • Explore ETL and how it is different from ELT
  • Move and transform various data sources with Azure ETL and ELT services
  • Use SSIS 2019 with Azure HDInsight clusters
  • Discover how to query SQL Server 2019 Big Data Clusters hosted in Azure
  • Migrate SSIS solutions to Azure and solve key challenges associated with it
  • Understand why data profiling is crucial and how to implement it in Azure Databricks
  • Get to grips with BIML and learn how it applies to SSIS and Azure Data Factory solutions

Who this book is for

This book is for data warehouse architects, ETL developers, or anyone who wants to build scalable ETL applications in Azure. Those looking to extend their existing on-premise ETL applications to use big data and a variety of Azure services or others interested in migrating existing on-premise solutions to the Azure cloud platform will also find the book useful. Familiarity with SQL Server services is necessary to get the most out of this book.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access ETL with Azure Cookbook by Christian Coté,Matija Lah,Madina Saitakhmetova in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Modelling & Design. We have over one million books available in our catalogue for you to explore.

Information

Chapter 1: Getting Started with Azure and SSIS 2019

In this cookbook, we are going to cover Extract, Transform, Load (ETL) development with Microsoft Azure. We will start with Microsoft SQL Server 2019 Integration Services (SSIS) and then gradually move on to data management capabilities in Azure. Throughout this cookbook we will use hands-on examples, which will not only provide you with genuine first-hand experience in SSIS development but also demonstrate how to design a working ETL solution or a module that could be built into an ETL solution.
Let's start by establishing your development environment. We are going to guide you through the installation of a new SQL Server instance, including all the components needed for on-premises as well as cloud-based SSIS solution development. You are also going to install the tools used in administering and maintaining SQL Server instances – either on-premises or in the cloud. You are also going to install the tools used in developing, testing, and deploying your SSIS solutions. You will finish this chapter by creating an Azure subscription, which will allow your on-premises development environment to extend into the cloud.
This chapter covers the following recipes that will help you get started with SSIS development on-premises as well as in the cloud:
  • Installing Microsoft SQL Server 2019 Integration Services
  • Installing SQL Server Management Studio
  • Installing SQL Server Data Tools
  • Installing Azure Data Studio
  • Creating an Azure subscription

Technical requirements

In order to install the necessary components, you will need a workstation – either a physical personal computer or a virtual one – running a compatible Microsoft Windows operating system. SQL Server 2019 can be installed on any edition of the Microsoft Windows Server operating system, starting with Windows Server 2016, or even on its desktop edition, Windows 10, starting with version TH1, build 1507.
Depending on the features installed, SQL Server is going to require between 6 and roughly 8 GB of free space on the hard drive, and at least 1 GB of available system memory, though around 4 GB is the recommended minimum. Practically any 64-bit CPU is supported.
Important note
You can find more detailed information about SQL Server 2019 hardware and software requirements in the online vendor documentation (SQL docs) article entitled SQL Server 2019: Hardware and software requirements, at https://docs.microsoft.com/en-us/sql/sql-server/install/hardware-and-software-requirements-for-installing-sql-server-ver15.
Speaking of online documentation, your workstation is going to need internet access – even to perform the installations.

Installing Microsoft SQL Server 2019 Integration Services

You are going to install a new instance of SQL Server on your workstation; if you prefer, you can also upgrade an existing SQL Server instance, as noted in the recipe. The installation will include all the features necessary to design SSIS solutions generally, not just to perform the work presented in this book.
The Client Tools Software Development Kit (SDK), an otherwise optional SQL Server component, must be installed on the workstation used in SSIS development. It contains the .NET managed assemblies with design-time access to the SSIS runtime, without which SSIS development simply cannot take place.
The installation of the SQL Client Connectivity SDK, another optional component, is recommended, though not necessary. It contains SQL Server Native Client connectivity resources that you might need in database application development.
Important note
None of the SDKs mentioned would ever need to be installed on a server used for hosting data, as such a server would normally not be used for SSIS development.

Getting ready

You need access to the SQL Server 2019 installation media. Since this is going to be a typical development workstation installation, you can use the free, specialized SQL Server 2019 Developer Edition, available for download at https://www.microsoft.com/en-us/sql-server/sql-server-downloads. Alternatively, you can also use a free trial edition of SQL Server 2019 Standard, or the Data Center editions, but please note that their use is limited to 180 days.
Important note
The Developer edition of SQL Server 2019 is a full-featured free edition, licensed for use as a development and test database in a non-production environment.
Either download the installation media from the website or use the installation media provided by the administrator in your organization. If you decide to use the Developer edition of SQL Server 2019, download the executable from the website, and save it to the local drive on the workstation that you will be using for the recipes in this book. If you prefer to use the Evaluation edition of SQL Server 2019, you will have to sign up on the website by providing some of your personal information before being able to access the installation media.

How to do it…

Start your workstation and log in with an account with administrative operating system privileges. Make sure the workstation has access to the SQL Server 2019 installation media – for instance, make sure that the Developer edition executable file is available on the local drive:
  1. Locate the installation media, and double-click the installation executable (for instance, SQL2019-SSEI-Dev.exe for SQL Server 2019 Developer Edition) to start the installation.
    When prompted by the operating system, which will ask whether you want to allow the application to make changes to your device, as shown in the following screenshot, click Yes to continue:
    Figure 1.1 – The SQL Server installation User Account Control dialog
    Figure 1.1 – The SQL Server installation User Account Control dialog
  2. At the beginning of the installation, you are asked whether you would like to use the basic settings, customize them, or simply download the rest of the installation media.
    Tip
    Generally, you will only download the installation media if you plan to install SQL Server on additional devices. On the other hand, a very good reason for having the media available locally would be in case you need to repair the installation later.
    Select Custom to continue.
  3. You now have to select the location of the installation files. Use the Default folder unless you prefer to use an alternative location.
    When ready, click Install to continue. Depending on your internet connection, it should take just a few minutes to download and extract the files.
  4. After the installation files have been prepared, the SQL Server Installation Center window will open, as shown in the following screenshot:
    Figure 1.2 – SQL Server Installation Center
    Figure 1.2 – SQL Server Installation Center
    On the Install...

Table of contents

  1. ETL with Azure Cookbook
  2. Why subscribe?
  3. Preface
  4. Chapter 1: Getting Started with Azure and SSIS 2019
  5. Chapter 2: Introducing ETL
  6. Chapter 3: Creating and Using SQL Server 2019 Big Data Clusters
  7. Chapter 4: Azure Data Integration
  8. Chapter 5: Extending SSIS with Custom Tasks and Transformations
  9. Chapter 6: Azure Data Factory
  10. Chapter 7: Azure Databricks
  11. Chapter 8: SSIS Migration Strategies
  12. Chapter 9: Profiling data in Azure
  13. Chapter 10: Manage SSIS and Azure Data Factory with Biml
  14. Other Books You May Enjoy