eBook - ePub

Mastering Python for Data Science

Name: Mastering Python for Data Science
Author: Samir Madhavan

Samir Madhavan

Condividi libro

294 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

Mastering Python for Data Science

Samir Madhavan

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

Explore the world of data science through Python and learn how to make sense of data

About This Book

Master data science methods using Python and its libraries
Create data visualizations and mine for patterns
Advanced techniques for the four fundamentals of Data Science with Python - data mining, data analysis, data visualization, and machine learning

Who This Book Is For

If you are a Python developer who wants to master the world of data science then this book is for you. Some knowledge of data science is assumed.

What You Will Learn

Manage data and perform linear algebra in Python
Derive inferences from the analysis by performing inferential statistics
Solve data science problems in Python
Create high-end visualizations using Python
Evaluate and apply the linear regression technique to estimate the relationships among variables.
Build recommendation engines with the various collaborative filtering algorithms
Apply the ensemble methods to improve your predictions
Work with big data technologies to handle data at scale

In Detail

Data science is a relatively new knowledge domain which is used by various organizations to make data driven decisions. Data scientists have to wear various hats to work with data and to derive value from it. The Python programming language, beyond having conquered the scientific community in the last decade, is now an indispensable tool for the data science practitioner and a must-know tool for every aspiring data scientist. Using Python will offer you a fast, reliable, cross-platform, and mature environment for data analysis, machine learning, and algorithmic problem solving.

This comprehensive guide helps you move beyond the hype and transcend the theory by providing you with a hands-on, advanced study of data science.

Beginning with the essentials of Python in data science, you will learn to manage data and perform linear algebra in Python. You will move on to deriving inferences from the analysis by performing inferential statistics, and mining data to reveal hidden patterns and trends. You will use the matplot library to create high-end visualizations in Python and uncover the fundamentals of machine learning. Next, you will apply the linear regression technique and also learn to apply the logistic regression technique to your applications, before creating recommendation engines with various collaborative filtering algorithms and improving your predictions by applying the ensemble methods.

Finally, you will perform K-means clustering, along with an analysis of unstructured data with different text mining techniques and leveraging the power of Python in big data analytics.

Style and approach

This book is an easy-to-follow, comprehensive guide on data science using Python. The topics covered in the book can all be used in real world scenarios.

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

Mastering Python for Data Science è disponibile online in formato PDF/ePub?

Sì, puoi accedere a Mastering Python for Data Science di Samir Madhavan in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Computer Science e Data Visualisation. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Packt Publishing

Anno

2015

ISBN

9781784390150

Edizione

Argomento

Computer Science

Categoria

Data Visualisation

Mastering Python for Data Science

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with Raw Data

The world of arrays with NumPy

Creating an array

Mathematical operations

Array subtraction

Squaring an array

A trigonometric function performed on the array

Conditional operations

Matrix multiplication

Indexing and slicing

Shape manipulation

Empowering data analysis with pandas

The data structure of pandas

Series

DataFrame

Panel

Inserting and exporting data

CSV

XLS

JSON

Database

Data cleansing

Checking the missing data

Filling the missing data

String operations

Merging data

Data operations

Aggregation operations

Joins

The inner join

The left outer join

The full outer join

The groupby function

Summary

2. Inferential Statistics

Various forms of distribution

A normal distribution

A normal distribution from a binomial distribution

A Poisson distribution

A Bernoulli distribution

A z-score

A p-value

One-tailed and two-tailed tests

Type 1 and Type 2 errors

A confidence interval

Correlation

Z-test vs T-test

The F distribution

The chi-square distribution

Chi-square for the goodness of fit

The chi-square test of independence

ANOVA

Summary

3. Finding a Needle in a Haystack

What is data mining?

Presenting an analysis

Studying the Titanic

Which passenger class has the maximum number of survivors?

What is the distribution of survivors based on gender among the various classes?

What is the distribution of nonsurvivors among the various classes who have family aboard the ship?

What was the survival percentage among different age groups?

Summary

4. Making Sense of Data through Advanced Visualization

Controlling the line properties of a chart

Using keyword arguments

Using the setter methods

Using the setp() command

Creating multiple plots

Playing with text

Styling your plots

Box plots

Heatmaps

Scatter plots with histograms

A scatter plot matrix

Area plots

Bubble charts

Hexagon bin plots

Trellis plots

A 3D plot of a surface

Summary

5. Uncovering Machine Learning

Different types of machine learning

Supervised learning

Unsupervised learning

Reinforcement learning

Decision trees

Linear regression

Logistic regression

The naive Bayes classifier

The k-means clustering

Hierarchical clustering

Summary

6. Performing Predictions with a Linear Regression

Simple linear regression

Multiple regression

Training and testing a model

Summary

7. Estimating the Likelihood of Events

Logistic regression

Data preparation

Creating training and testing sets

Building a model

Model evaluation

Evaluating a model based on test data

Model building and evaluation with SciKit

Summary

8. Generating Recommendations with Collaborative Filtering

Recommendation data

User-based collaborative filtering

Finding similar users

The Euclidean distance score

The Pearson correlation score

Ranking the users

Recommending items

Item-based collaborative filtering

Summary

9. Pushing Boundaries with Ensemble Models

The census income dataset

Exploring the census data

Hypothesis 1: People who are older earn more

Hypothesis 2: Income bias based on working class

Hypothesis 3: People with more education earn more

Hypothesis 4: Married people tend to earn more

Hypothesis 5: There is a bias in income based on race

Hypothesis 6: There is a bias in the income based on occupation

Hypothesis 7: Men earn more

Hypothesis 8: People who clock in more hours earn more

Hypothesis 9: There is a bias in income based on the country of origin

Decision trees

Random forests

Summary

10. Applying Segmentation with k-means Clustering

The k-means algorithm and its working

A simple example

The k-means clustering with countries

Determining the number of clusters

Clustering the countries

Summary

11. Analyzing Unstructured Data with Text Mining

Preprocessing data

Crea...

Informazioni sul libro

Domande frequenti

Informazioni

Mastering Python for Data Science

Table of Contents

Indice dei contenuti