Mastering Python for Data Science
eBook - ePub

Mastering Python for Data Science

Samir Madhavan

Condividi libro
  1. 294 pagine
  2. English
  3. ePUB (disponibile sull'app)
  4. Disponibile su iOS e Android
eBook - ePub

Mastering Python for Data Science

Samir Madhavan

Dettagli del libro
Anteprima del libro
Indice dei contenuti
Citazioni

Informazioni sul libro

Explore the world of data science through Python and learn how to make sense of data

About This Book

  • Master data science methods using Python and its libraries
  • Create data visualizations and mine for patterns
  • Advanced techniques for the four fundamentals of Data Science with Python - data mining, data analysis, data visualization, and machine learning

Who This Book Is For

If you are a Python developer who wants to master the world of data science then this book is for you. Some knowledge of data science is assumed.

What You Will Learn

  • Manage data and perform linear algebra in Python
  • Derive inferences from the analysis by performing inferential statistics
  • Solve data science problems in Python
  • Create high-end visualizations using Python
  • Evaluate and apply the linear regression technique to estimate the relationships among variables.
  • Build recommendation engines with the various collaborative filtering algorithms
  • Apply the ensemble methods to improve your predictions
  • Work with big data technologies to handle data at scale

In Detail

Data science is a relatively new knowledge domain which is used by various organizations to make data driven decisions. Data scientists have to wear various hats to work with data and to derive value from it. The Python programming language, beyond having conquered the scientific community in the last decade, is now an indispensable tool for the data science practitioner and a must-know tool for every aspiring data scientist. Using Python will offer you a fast, reliable, cross-platform, and mature environment for data analysis, machine learning, and algorithmic problem solving.

This comprehensive guide helps you move beyond the hype and transcend the theory by providing you with a hands-on, advanced study of data science.

Beginning with the essentials of Python in data science, you will learn to manage data and perform linear algebra in Python. You will move on to deriving inferences from the analysis by performing inferential statistics, and mining data to reveal hidden patterns and trends. You will use the matplot library to create high-end visualizations in Python and uncover the fundamentals of machine learning. Next, you will apply the linear regression technique and also learn to apply the logistic regression technique to your applications, before creating recommendation engines with various collaborative filtering algorithms and improving your predictions by applying the ensemble methods.

Finally, you will perform K-means clustering, along with an analysis of unstructured data with different text mining techniques and leveraging the power of Python in big data analytics.

Style and approach

This book is an easy-to-follow, comprehensive guide on data science using Python. The topics covered in the book can all be used in real world scenarios.

Domande frequenti

Come faccio ad annullare l'abbonamento?
È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui
È possibile scaricare libri? Se sì, come?
Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui
Che differenza c'è tra i piani?
Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.
Cos'è Perlego?
Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.
Perlego supporta la sintesi vocale?
Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.
Mastering Python for Data Science è disponibile online in formato PDF/ePub?
Sì, puoi accedere a Mastering Python for Data Science di Samir Madhavan in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Computer Science e Data Visualisation. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Anno
2015
ISBN
9781784390150
Edizione
1

Mastering Python for Data Science


Table of Contents

Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Getting Started with Raw Data
The world of arrays with NumPy
Creating an array
Mathematical operations
Array subtraction
Squaring an array
A trigonometric function performed on the array
Conditional operations
Matrix multiplication
Indexing and slicing
Shape manipulation
Empowering data analysis with pandas
The data structure of pandas
Series
DataFrame
Panel
Inserting and exporting data
CSV
XLS
JSON
Database
Data cleansing
Checking the missing data
Filling the missing data
String operations
Merging data
Data operations
Aggregation operations
Joins
The inner join
The left outer join
The full outer join
The groupby function
Summary
2. Inferential Statistics
Various forms of distribution
A normal distribution
A normal distribution from a binomial distribution
A Poisson distribution
A Bernoulli distribution
A z-score
A p-value
One-tailed and two-tailed tests
Type 1 and Type 2 errors
A confidence interval
Correlation
Z-test vs T-test
The F distribution
The chi-square distribution
Chi-square for the goodness of fit
The chi-square test of independence
ANOVA
Summary
3. Finding a Needle in a Haystack
What is data mining?
Presenting an analysis
Studying the Titanic
Which passenger class has the maximum number of survivors?
What is the distribution of survivors based on gender among the various classes?
What is the distribution of nonsurvivors among the various classes who have family aboard the ship?
What was the survival percentage among different age groups?
Summary
4. Making Sense of Data through Advanced Visualization
Controlling the line properties of a chart
Using keyword arguments
Using the setter methods
Using the setp() command
Creating multiple plots
Playing with text
Styling your plots
Box plots
Heatmaps
Scatter plots with histograms
A scatter plot matrix
Area plots
Bubble charts
Hexagon bin plots
Trellis plots
A 3D plot of a surface
Summary
5. Uncovering Machine Learning
Different types of machine learning
Supervised learning
Unsupervised learning
Reinforcement learning
Decision trees
Linear regression
Logistic regression
The naive Bayes classifier
The k-means clustering
Hierarchical clustering
Summary
6. Performing Predictions with a Linear Regression
Simple linear regression
Multiple regression
Training and testing a model
Summary
7. Estimating the Likelihood of Events
Logistic regression
Data preparation
Creating training and testing sets
Building a model
Model evaluation
Evaluating a model based on test data
Model building and evaluation with SciKit
Summary
8. Generating Recommendations with Collaborative Filtering
Recommendation data
User-based collaborative filtering
Finding similar users
The Euclidean distance score
The Pearson correlation score
Ranking the users
Recommending items
Item-based collaborative filtering
Summary
9. Pushing Boundaries with Ensemble Models
The census income dataset
Exploring the census data
Hypothesis 1: People who are older earn more
Hypothesis 2: Income bias based on working class
Hypothesis 3: People with more education earn more
Hypothesis 4: Married people tend to earn more
Hypothesis 5: There is a bias in income based on race
Hypothesis 6: There is a bias in the income based on occupation
Hypothesis 7: Men earn more
Hypothesis 8: People who clock in more hours earn more
Hypothesis 9: There is a bias in income based on the country of origin
Decision trees
Random forests
Summary
10. Applying Segmentation with k-means Clustering
The k-means algorithm and its working
A simple example
The k-means clustering with countries
Determining the number of clusters
Clustering the countries
Summary
11. Analyzing Unstructured Data with Text Mining
Preprocessing data
Crea...

Indice dei contenuti