The Data Science Workshop
eBook - ePub

The Data Science Workshop

Learn how you can build machine learning models and create your own real-world data science projects, 2nd Edition

Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare

Condividi libro
  1. 824 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

The Data Science Workshop

Learn how you can build machine learning models and create your own real-world data science projects, 2nd Edition

Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare

Dettagli del libro
Anteprima del libro
Indice dei contenuti
Citazioni

Informazioni sul libro

Gain expert guidance on how to successfully develop machine learning models in Python and build your own unique data platforms

Key Features

  • Gain a full understanding of the model production and deployment process
  • Build your first machine learning model in just five minutes and get a hands-on machine learning experience
  • Understand how to deal with common challenges in data science projects

Book Description

Where there's data, there's insight. With so much data being generated, there is immense scope to extract meaningful information that'll boost business productivity and profitability. By learning to convert raw data into game-changing insights, you'll open new career paths and opportunities.

The Data Science Workshop begins by introducing different types of projects and showing you how to incorporate machine learning algorithms in them. You'll learn to select a relevant metric and even assess the performance of your model. To tune the hyperparameters of an algorithm and improve its accuracy, you'll get hands-on with approaches such as grid search and random search.

Next, you'll learn dimensionality reduction techniques to easily handle many variables at once, before exploring how to use model ensembling techniques and create new features to enhance model performance. In a bid to help you automatically create new features that improve your model, the book demonstrates how to use the automated feature engineering tool. You'll also understand how to use the orchestration and scheduling workflow to deploy machine learning models in batch.

By the end of this book, you'll have the skills to start working on data science projects confidently. By the end of this book, you'll have the skills to start working on data science projects confidently.

What you will learn

  • Explore the key differences between supervised learning and unsupervised learning
  • Manipulate and analyze data using scikit-learn and pandas libraries
  • Understand key concepts such as regression, classification, and clustering
  • Discover advanced techniques to improve the accuracy of your model
  • Understand how to speed up the process of adding new features
  • Simplify your machine learning workflow for production

Who this book is for

This is one of the most useful data science books for aspiring data analysts, data scientists, database engineers, and business analysts. It is aimed at those who want to kick-start their careers in data science by quickly learning data science techniques without going through all the mathematics behind machine learning algorithms. Basic knowledge of the Python programming language will help you easily grasp the concepts explained in this book.

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
The Data Science Workshop è disponibile online in formato PDF/ePub?
Sì, puoi accedere a The Data Science Workshop di Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Computer Science e Programming in Python. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Anno
2020
ISBN
9781800569409
Edizione
2

1. Introduction to Data Science in Python

Overview
This very first chapter will introduce you to the field of data science and walk you through an overview of Python's core concepts and their application in the world of data science.
By the end of this chapter, you will be able to explain what data science is and distinguish between supervised and unsupervised learning. You will also be able to explain what machine learning is and distinguish between regression, classification, and clustering problems. You'll have learnt to create and manipulate different types of Python variable, including core variables, lists, and dictionaries. You'll be able to build a for loop, print results using f-strings, define functions, import Python packages and load data in different formats using pandas. You will also have had your first taste of training a model using scikit-learn.

Introduction

Welcome to the fascinating world of data science! We are sure you must be pretty excited to start your journey and learn interesting and exciting techniques and algorithms. This is exactly what this book is intended for.
But before diving into it, let's define what data science is: it is a combination of multiple disciplines, including business, statistics, and programming, that intends to extract meaningful insights from data by running controlled experiments similar to scientific research.
The objective of any data science project is to derive valuable knowledge for the business from data in order to make better decisions. It is the responsibility of data scientists to define the goals to be achieved for a project. This requires business knowledge and expertise. In this book, you will be exposed to some examples of data science tasks from real-world datasets.
Statistics is a mathematical field used for analyzing and finding patterns from data. A lot of the newest and most advanced techniques still rely on core statistical approaches. This book will present to you the basic techniques required to understand the concepts we will be covering.
With an exponential increase in data generation, more computational power is required for processing it efficiently. This is the reason why programming is a required skill for data scientists. You may wonder why we chose Python for this Workshop. That's because Python is one of the most popular programming languages for data science. It is extremely easy to learn how to code in Python thanks to its simple and easily readable syntax. It also has an incredible number of packages available to anyone for free, such as pandas, scikit-learn, TensorFlow, and PyTorch. Its community is expanding at an incredible rate, adding more and more new functionalities and improving its performance and reliability. It's no wonder companies such as Facebook, Airbnb, and Google are using it as one of their main stacks. No prior knowledge of Python is required for this book. If you do have some experience with Python or other programming languages, then this will be an advantage, but all concepts will be fully explained, so don't worry if you are new to programming.

Application of Data Science

As mentioned in the introduction, data science is a multidisciplinary approach to analyzing and identifying complex patterns and extracting valuable insights from data. Running a data science project usually involves multiple steps, including the following:
  1. Defining the business problem to be solved
  2. Collecting or extracting existing data
  3. Analyzing, visualizing, and preparing data
  4. Training a model to spot patterns in data and make predictions
  5. Assessing a model's performance and making improvements
  6. Communicating and presenting findings and gained insights
  7. Deploying and maintaining a model
As its name implies, data science projects require data, but it is actually more important to have defined a clear business problem to solve first. If it's not framed correctly, a project may lead to incorrect results as you may have used the wrong information, not prepared the data properly, or led a model to learn the wrong patterns. So, it is absolutely critical to properly define the scope and objective of a data science project with your stakeholders.
There are a lot of data science applications in real-world situations or in business environments. For example, healthcare providers may train a model for predicting a medical outcome or its severity based on medical measurements, or a high school may want to predict which students are at risk of dropping out within a year's time based on their historical grades and past behaviors. Corporations may be interested to know the likelihood of a customer buying a certain product based on his or her past purchases. They may also need to better understand which customers are more likely to stop using existing services and churn. These are examples where data science can be used to achieve a clearly defined goal, such as increasing the number of patients detected with a heart...

Indice dei contenuti