The Data Science Workshop
eBook - ePub

The Data Science Workshop

Learn how you can build machine learning models and create your own real-world data science projects, 2nd Edition

Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare

Compartir libro
  1. 824 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

The Data Science Workshop

Learn how you can build machine learning models and create your own real-world data science projects, 2nd Edition

Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Gain expert guidance on how to successfully develop machine learning models in Python and build your own unique data platforms

Key Features

  • Gain a full understanding of the model production and deployment process
  • Build your first machine learning model in just five minutes and get a hands-on machine learning experience
  • Understand how to deal with common challenges in data science projects

Book Description

Where there's data, there's insight. With so much data being generated, there is immense scope to extract meaningful information that'll boost business productivity and profitability. By learning to convert raw data into game-changing insights, you'll open new career paths and opportunities.

The Data Science Workshop begins by introducing different types of projects and showing you how to incorporate machine learning algorithms in them. You'll learn to select a relevant metric and even assess the performance of your model. To tune the hyperparameters of an algorithm and improve its accuracy, you'll get hands-on with approaches such as grid search and random search.

Next, you'll learn dimensionality reduction techniques to easily handle many variables at once, before exploring how to use model ensembling techniques and create new features to enhance model performance. In a bid to help you automatically create new features that improve your model, the book demonstrates how to use the automated feature engineering tool. You'll also understand how to use the orchestration and scheduling workflow to deploy machine learning models in batch.

By the end of this book, you'll have the skills to start working on data science projects confidently. By the end of this book, you'll have the skills to start working on data science projects confidently.

What you will learn

  • Explore the key differences between supervised learning and unsupervised learning
  • Manipulate and analyze data using scikit-learn and pandas libraries
  • Understand key concepts such as regression, classification, and clustering
  • Discover advanced techniques to improve the accuracy of your model
  • Understand how to speed up the process of adding new features
  • Simplify your machine learning workflow for production

Who this book is for

This is one of the most useful data science books for aspiring data analysts, data scientists, database engineers, and business analysts. It is aimed at those who want to kick-start their careers in data science by quickly learning data science techniques without going through all the mathematics behind machine learning algorithms. Basic knowledge of the Python programming language will help you easily grasp the concepts explained in this book.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es The Data Science Workshop un PDF/ePUB en línea?
Sí, puedes acceder a The Data Science Workshop de Anthony So, Thomas V. Joseph, Robert Thas John, Andrew Worsley, Dr. Samuel Asare en formato PDF o ePUB, así como a otros libros populares de Computer Science y Programming in Python. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2020
ISBN
9781800569409
Edición
2

1. Introduction to Data Science in Python

Overview
This very first chapter will introduce you to the field of data science and walk you through an overview of Python's core concepts and their application in the world of data science.
By the end of this chapter, you will be able to explain what data science is and distinguish between supervised and unsupervised learning. You will also be able to explain what machine learning is and distinguish between regression, classification, and clustering problems. You'll have learnt to create and manipulate different types of Python variable, including core variables, lists, and dictionaries. You'll be able to build a for loop, print results using f-strings, define functions, import Python packages and load data in different formats using pandas. You will also have had your first taste of training a model using scikit-learn.

Introduction

Welcome to the fascinating world of data science! We are sure you must be pretty excited to start your journey and learn interesting and exciting techniques and algorithms. This is exactly what this book is intended for.
But before diving into it, let's define what data science is: it is a combination of multiple disciplines, including business, statistics, and programming, that intends to extract meaningful insights from data by running controlled experiments similar to scientific research.
The objective of any data science project is to derive valuable knowledge for the business from data in order to make better decisions. It is the responsibility of data scientists to define the goals to be achieved for a project. This requires business knowledge and expertise. In this book, you will be exposed to some examples of data science tasks from real-world datasets.
Statistics is a mathematical field used for analyzing and finding patterns from data. A lot of the newest and most advanced techniques still rely on core statistical approaches. This book will present to you the basic techniques required to understand the concepts we will be covering.
With an exponential increase in data generation, more computational power is required for processing it efficiently. This is the reason why programming is a required skill for data scientists. You may wonder why we chose Python for this Workshop. That's because Python is one of the most popular programming languages for data science. It is extremely easy to learn how to code in Python thanks to its simple and easily readable syntax. It also has an incredible number of packages available to anyone for free, such as pandas, scikit-learn, TensorFlow, and PyTorch. Its community is expanding at an incredible rate, adding more and more new functionalities and improving its performance and reliability. It's no wonder companies such as Facebook, Airbnb, and Google are using it as one of their main stacks. No prior knowledge of Python is required for this book. If you do have some experience with Python or other programming languages, then this will be an advantage, but all concepts will be fully explained, so don't worry if you are new to programming.

Application of Data Science

As mentioned in the introduction, data science is a multidisciplinary approach to analyzing and identifying complex patterns and extracting valuable insights from data. Running a data science project usually involves multiple steps, including the following:
  1. Defining the business problem to be solved
  2. Collecting or extracting existing data
  3. Analyzing, visualizing, and preparing data
  4. Training a model to spot patterns in data and make predictions
  5. Assessing a model's performance and making improvements
  6. Communicating and presenting findings and gained insights
  7. Deploying and maintaining a model
As its name implies, data science projects require data, but it is actually more important to have defined a clear business problem to solve first. If it's not framed correctly, a project may lead to incorrect results as you may have used the wrong information, not prepared the data properly, or led a model to learn the wrong patterns. So, it is absolutely critical to properly define the scope and objective of a data science project with your stakeholders.
There are a lot of data science applications in real-world situations or in business environments. For example, healthcare providers may train a model for predicting a medical outcome or its severity based on medical measurements, or a high school may want to predict which students are at risk of dropping out within a year's time based on their historical grades and past behaviors. Corporations may be interested to know the likelihood of a customer buying a certain product based on his or her past purchases. They may also need to better understand which customers are more likely to stop using existing services and churn. These are examples where data science can be used to achieve a clearly defined goal, such as increasing the number of patients detected with a heart...

Índice