The Data Science Workshop
eBook - ePub

The Data Science Workshop

Learn how you can build machine learning models and create your own real-world data science projects, 2nd Edition

Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare

Partager le livre
  1. 824 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

The Data Science Workshop

Learn how you can build machine learning models and create your own real-world data science projects, 2nd Edition

Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

Gain expert guidance on how to successfully develop machine learning models in Python and build your own unique data platforms

Key Features

  • Gain a full understanding of the model production and deployment process
  • Build your first machine learning model in just five minutes and get a hands-on machine learning experience
  • Understand how to deal with common challenges in data science projects

Book Description

Where there's data, there's insight. With so much data being generated, there is immense scope to extract meaningful information that'll boost business productivity and profitability. By learning to convert raw data into game-changing insights, you'll open new career paths and opportunities.

The Data Science Workshop begins by introducing different types of projects and showing you how to incorporate machine learning algorithms in them. You'll learn to select a relevant metric and even assess the performance of your model. To tune the hyperparameters of an algorithm and improve its accuracy, you'll get hands-on with approaches such as grid search and random search.

Next, you'll learn dimensionality reduction techniques to easily handle many variables at once, before exploring how to use model ensembling techniques and create new features to enhance model performance. In a bid to help you automatically create new features that improve your model, the book demonstrates how to use the automated feature engineering tool. You'll also understand how to use the orchestration and scheduling workflow to deploy machine learning models in batch.

By the end of this book, you'll have the skills to start working on data science projects confidently. By the end of this book, you'll have the skills to start working on data science projects confidently.

What you will learn

  • Explore the key differences between supervised learning and unsupervised learning
  • Manipulate and analyze data using scikit-learn and pandas libraries
  • Understand key concepts such as regression, classification, and clustering
  • Discover advanced techniques to improve the accuracy of your model
  • Understand how to speed up the process of adding new features
  • Simplify your machine learning workflow for production

Who this book is for

This is one of the most useful data science books for aspiring data analysts, data scientists, database engineers, and business analysts. It is aimed at those who want to kick-start their careers in data science by quickly learning data science techniques without going through all the mathematics behind machine learning algorithms. Basic knowledge of the Python programming language will help you easily grasp the concepts explained in this book.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que The Data Science Workshop est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  The Data Science Workshop par Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Computer Science et Programming in Python. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2020
ISBN
9781800569409

1. Introduction to Data Science in Python

Overview
This very first chapter will introduce you to the field of data science and walk you through an overview of Python's core concepts and their application in the world of data science.
By the end of this chapter, you will be able to explain what data science is and distinguish between supervised and unsupervised learning. You will also be able to explain what machine learning is and distinguish between regression, classification, and clustering problems. You'll have learnt to create and manipulate different types of Python variable, including core variables, lists, and dictionaries. You'll be able to build a for loop, print results using f-strings, define functions, import Python packages and load data in different formats using pandas. You will also have had your first taste of training a model using scikit-learn.

Introduction

Welcome to the fascinating world of data science! We are sure you must be pretty excited to start your journey and learn interesting and exciting techniques and algorithms. This is exactly what this book is intended for.
But before diving into it, let's define what data science is: it is a combination of multiple disciplines, including business, statistics, and programming, that intends to extract meaningful insights from data by running controlled experiments similar to scientific research.
The objective of any data science project is to derive valuable knowledge for the business from data in order to make better decisions. It is the responsibility of data scientists to define the goals to be achieved for a project. This requires business knowledge and expertise. In this book, you will be exposed to some examples of data science tasks from real-world datasets.
Statistics is a mathematical field used for analyzing and finding patterns from data. A lot of the newest and most advanced techniques still rely on core statistical approaches. This book will present to you the basic techniques required to understand the concepts we will be covering.
With an exponential increase in data generation, more computational power is required for processing it efficiently. This is the reason why programming is a required skill for data scientists. You may wonder why we chose Python for this Workshop. That's because Python is one of the most popular programming languages for data science. It is extremely easy to learn how to code in Python thanks to its simple and easily readable syntax. It also has an incredible number of packages available to anyone for free, such as pandas, scikit-learn, TensorFlow, and PyTorch. Its community is expanding at an incredible rate, adding more and more new functionalities and improving its performance and reliability. It's no wonder companies such as Facebook, Airbnb, and Google are using it as one of their main stacks. No prior knowledge of Python is required for this book. If you do have some experience with Python or other programming languages, then this will be an advantage, but all concepts will be fully explained, so don't worry if you are new to programming.

Application of Data Science

As mentioned in the introduction, data science is a multidisciplinary approach to analyzing and identifying complex patterns and extracting valuable insights from data. Running a data science project usually involves multiple steps, including the following:
  1. Defining the business problem to be solved
  2. Collecting or extracting existing data
  3. Analyzing, visualizing, and preparing data
  4. Training a model to spot patterns in data and make predictions
  5. Assessing a model's performance and making improvements
  6. Communicating and presenting findings and gained insights
  7. Deploying and maintaining a model
As its name implies, data science projects require data, but it is actually more important to have defined a clear business problem to solve first. If it's not framed correctly, a project may lead to incorrect results as you may have used the wrong information, not prepared the data properly, or led a model to learn the wrong patterns. So, it is absolutely critical to properly define the scope and objective of a data science project with your stakeholders.
There are a lot of data science applications in real-world situations or in business environments. For example, healthcare providers may train a model for predicting a medical outcome or its severity based on medical measurements, or a high school may want to predict which students are at risk of dropping out within a year's time based on their historical grades and past behaviors. Corporations may be interested to know the likelihood of a customer buying a certain product based on his or her past purchases. They may also need to better understand which customers are more likely to stop using existing services and churn. These are examples where data science can be used to achieve a clearly defined goal, such as increasing the number of patients detected with a heart...

Table des matiĂšres