eBook - ePub

Data Science Bookcamp

Name: Data Science Bookcamp
Author: Leonard Apeltsin

Five real-world Python projects

Leonard Apeltsin

Compartir libro

704 páginas
English
ePUB (apto para móviles)
Disponible en iOS y Android

eBook - ePub

Data Science Bookcamp

Five real-world Python projects

Leonard Apeltsin

Detalles del libro

Vista previa del libro

Índice

Citas

Información del libro

Learn data science with Python by building five real-world projects! Experiment with card game predictions, tracking disease outbreaks, and more, as you build a flexible and intuitive understanding of data science. In Data Science Bookcamp you will learn: - Techniques for computing and plotting probabilities
- Statistical analysis using Scipy
- How to organize datasets with clustering algorithms
- How to visualize complex multi-variable datasets
- How to train a decision tree machine learning algorithm In Data Science Bookcamp you'll test and build your knowledge of Python with the kind of open-ended problems that professional data scientists work on every day. Downloadable data sets and thoroughly-explained solutions help you lock in what you've learned, building your confidence and making you ready for an exciting new data science career. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the technology
A data science project has a lot of moving parts, and it takes practice and skill to get all the code, algorithms, datasets, formats, and visualizations working together harmoniously. This unique book guides you through five realistic projects, including tracking disease outbreaks from news headlines, analyzing social networks, and finding relevant patterns in ad click data. About the book
Data Science Bookcamp doesn't stop with surface-level theory and toy examples. As you work through each project, you'll learn how to troubleshoot common problems like missing data, messy data, and algorithms that don't quite fit the model you're building. You'll appreciate the detailed setup instructions and the fully explained solutions that highlight common failure points. In the end, you'll be confident in your skills because you can see the results. What's inside - Web scraping
- Organize datasets with clustering algorithms
- Visualize complex multi-variable datasets
- Train a decision tree machine learning algorithm About the reader
For readers who know the basics of Python. No prior data science or machine learning skills required. About the author
Leonard Apeltsin is the Head of Data Science at Anomaly, where his team applies advanced analytics to uncover healthcare fraud, waste, and abuse. Table of Contents
CASE STUDY 1 FINDING THE WINNING STRATEGY IN A CARD GAME
1 Computing probabilities using Python
2 Plotting probabilities using Matplotlib
3 Running random simulations in NumPy
4 Case study 1 solution
CASE STUDY 2 ASSESSING ONLINE AD CLICKS FOR SIGNIFICANCE
5 Basic probability and statistical analysis using SciPy
6 Making predictions using the central limit theorem and SciPy
7 Statistical hypothesis testing
8 Analyzing tables using Pandas
9 Case study 2 solution
CASE STUDY 3 TRACKING DISEASE OUTBREAKS USING NEWS HEADLINES
10 Clustering data into groups
11 Geographic location visualization and analysis
12 Case study 3 solution
CASE STUDY 4 USING ONLINE JOB POSTINGS TO IMPROVE YOUR DATA SCIENCE RESUME
13 Measuring text similarities
14 Dimension reduction of matrix data
15 NLP analysis of large text datasets
16 Extracting text from web pages
17 Case study 4 solution
CASE STUDY 5 PREDICTING FUTURE FRIENDSHIPS FROM SOCIAL NETWORK DATA
18 An introduction to graph theory and network analysis
19 Dynamic graph theory techniques for node ranking and social network analysis
20 Network-driven supervised machine learning
21 Training linear classifiers with logistic regression
22 Training nonlinear classifiers with decision tree techniques
23 Case study 5 solution

Preguntas frecuentes

¿Cómo cancelo mi suscripción?

Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.

¿Cómo descargo los libros?

Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.

¿En qué se diferencian los planes de precios?

Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.

¿Qué es Perlego?

Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.

¿Perlego ofrece la función de texto a voz?

Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.

¿Es Data Science Bookcamp un PDF/ePUB en línea?

Sí, puedes acceder a Data Science Bookcamp de Leonard Apeltsin en formato PDF o ePUB, así como a otros libros populares de Ciencia de la computación y Programación en Python. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial

Manning

Año

2021

ISBN

9781638352303

Categoría

Ciencia de la computación

Categoría

Programación en Python

Part 1. Case study 1: Finding the winning strategy in a card game

Problem statement

Would you like to win a bit of money? Let’s wager on a card game for minor stakes. In front of you is a shuffled deck of cards. All 52 cards lie face down. Half the cards are red, and half are black. I will proceed to flip over the cards one by one. If the last card I flip over is red, you’ll win a dollar. Otherwise, you’ll lose a dollar.

Here’s the twist: you can ask me to halt the game at any time. Once you say “Halt,” I will flip over the next card and end the game. That next card will serve as the final card. You will win a dollar if it’s red, as shown in figure CS1.1.

Figure CS1.1 The card-flipping game. We start with a shuffled deck. I repeatedly flip over the top card from the deck. (A) I have just flipped the fourth card. You instruct me to stop. (B) I flip over the fifth and final card. The final card is red. You win a dollar.

We can play the game as many times as you like. The deck will be reshuffled every time. After each round, we’ll exchange money. What is your best approach to winning this game?

Overview

To address the problem at hand, we will need to know how to

Compute the probabilities of observable events using sample space analysis.
Plot the probabilities of events across a range of interval values.
Simulate random processes, such as coin flips and card shuffling, using Python.
Evaluate our confidence in decisions drawn from simulations using confidence interval analysis.

1 Computing probabilities using Python

This section covers

What are the basics of probability theory?
Computing probabilities of a single observation
Computing probabilities across a range of observations

Few things in life are certain; most things are driven by chance. Whenever we cheer for our favorite sports team, or purchase a lottery ticket, or make an investment in the stock market, we hope for some particular outcome, but that outcome cannot ever be guaranteed. Randomness permeates our day-to-day experiences. Fortunately, that randomness can still be mitigated and controlled. We know that some unpredictable events occur more rarely than others and that certain decisions carry less uncertainty than other much-riskier choices. Driving to work in a car is safer than riding a motorcycle. Investing part of your savings in a retirement account is safer than betting it all on a single hand of blackjack. We can intrinsically sense these trade-offs in certainty because even the most unpredictable systems still show some predictable behaviors. These behaviors have been rigorously studied using probability theory. Probability theory is an inherently complex branch of math. However, aspects of the theory can be understood without knowing the mathematical underpinnings. In fact, difficult probability problems can be solved in Python without needing to know a single math equation. Such an equation-free approach to probability requires a baseline understanding of what mathematicians call a sample space.

1.1 Sample space analysis: An equation-free approach for measuring uncertainty in outcomes

Certain actions have measurable outcomes. A sample space is the set of all the possible outcomes an action could produce. Let’s take the simple action of flipping a coin. The coin will land on either heads or tails. Thus, the coin flip will produce one of two measurable outcomes: heads or tails. By storing these outcomes in a Python set, we can create a sample space of coin flips.

Listing 1.1 Creating a sample space of coin flips

sample_space = {'Heads', 'Tails'} ❶

❶ Storing elements in curly brackets creates a Python set. A Python set is a collection of unique, unordered elements.

Suppose we choose an element of sample_space at random. What fraction of the time will the chosen element equal Heads? Well, our sample space holds two possible elements. Each element occupies an equal fraction of the space within the set. Therefore, we expect Heads to be selected with a frequency of 1/2. That frequency is formally defined as the probability of an outcome. All outcomes within sample_space share an identical probability, which is equal to 1 / len(sample_space).

Listing 1.2 Computing the probability of heads

probability_heads = 1 / len(sample_space) print(f'Probability of choosing heads is {probability_heads}')  Probability of choosing heads is 0.5

The probability of choosing Heads equals 0.5. This relates directly to the action of flipping a coin. We’ll assume the coin is unbiased, which means the coin is equally likely to fall on either heads or tails. Thus, a coin flip is conceptually equivalent to choosing a random element from sample_space. The probability of the coin landing on heads is therefore 0.5; the probability of it landing on tails is also equal to 0.5.

We’ve assigned probabilities to our two measurable outcomes. However, there are additional questions we could ask. What is the probability that the coin lands on either heads or tails? Or, more exotically, what is the probability that the coin will spin forever in the air, landing on neither heads nor tails? To find rigorous answers, we need to define the concept of an event. An event is the subset of those elements within sample_space that satisfy some event condition (as shown in figure 1.1). An event condition is a simple Boolean function whose input is a single sample_space element. The function returns True only if the element satisfies our condition constraints.

Figure 1.1 Four event conditions applied to a sample space. The sample space contains two outcomes: heads and tails. Arrows represent the event conditions. Ever...