Principles of Data Science
eBook - ePub

Principles of Data Science

Sinan Ozdemir

Buch teilen
  1. 388 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfĂŒgbar
eBook - ePub

Principles of Data Science

Sinan Ozdemir

Angaben zum Buch

Über dieses Buch

Learn the techniques and math you need to start making sense of your data

About This Book

  • Enhance your knowledge of coding with data science theory for practical insight into data science and analysis
  • More than just a math class, learn how to perform real-world data science tasks with R and Python
  • Create actionable insights and transform raw data into tangible value

Who This Book Is For

You should be fairly well acquainted with basic algebra and should feel comfortable reading snippets of R/Python as well as pseudo code. You should have the urge to learn and apply the techniques put forth in this book on either your own data sets or those provided to you. If you have the basic math skills but want to apply them in data science or you have good programming skills but lack math, then this book is for you.

What You Will Learn

  • Get to know the five most important steps of data science
  • Use your data intelligently and learn how to handle it with care
  • Bridge the gap between mathematics and programming
  • Learn about probability, calculus, and how to use statistical models to control and clean your data and drive actionable results
  • Build and evaluate baseline machine learning models
  • Explore the most effective metrics to determine the success of your machine learning models
  • Create data visualizations that communicate actionable insights
  • Read and apply machine learning concepts to your problems and make actual predictions

In Detail

Need to turn your skills at programming into effective data science skills? Principles of Data Science is created to help you join the dots between mathematics, programming, and business analysis. With this book, you'll feel confident about asking—and answering—complex and sophisticated questions of your data to move from abstract and raw statistics to actionable ideas.

With a unique approach that bridges the gap between mathematics and computer science, this books takes you through the entire data science pipeline. Beginning with cleaning and preparing data, and effective data mining strategies and techniques, you'll move on to build a comprehensive picture of how every piece of the data science puzzle fits together. Learn the fundamentals of computational mathematics and statistics, as well as some pseudocode being used today by data scientists and analysts. You'll get to grips with machine learning, discover the statistical models that help you take control and navigate even the densest datasets, and find out how to create powerful visualizations that communicate what your data means.

Style and approach

This is an easy-to-understand and accessible tutorial. It is a step-by-step guide with use cases, examples, and illustrations to get you well-versed with the concepts of data science. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts later on and will help you implement these techniques in the real world.

HĂ€ufig gestellte Fragen

Wie kann ich mein Abo kĂŒndigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kĂŒndigen“ – ganz einfach. Nachdem du gekĂŒndigt hast, bleibt deine Mitgliedschaft fĂŒr den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich BĂŒcher herunterladen?
Derzeit stehen all unsere auf MobilgerĂ€te reagierenden ePub-BĂŒcher zum Download ĂŒber die App zur VerfĂŒgung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die ĂŒbrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den AboplÀnen?
Mit beiden AboplÀnen erhÀltst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst fĂŒr LehrbĂŒcher, bei dem du fĂŒr weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhĂ€ltst. Mit ĂŒber 1 Million BĂŒchern zu ĂŒber 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
UnterstĂŒtzt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nÀchsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Principles of Data Science als Online-PDF/ePub verfĂŒgbar?
Ja, du hast Zugang zu Principles of Data Science von Sinan Ozdemir im PDF- und/oder ePub-Format sowie zu anderen beliebten BĂŒchern aus Computer Science & Programming Algorithms. Aus unserem Katalog stehen dir ĂŒber 1 Million BĂŒcher zur VerfĂŒgung.



Principles of Data Science

Table of Contents

Principles of Data Science
About the Author
About the Reviewers
eBooks, discount offers, and more
Why subscribe?
What this book covers
What you need for this book
Who this book is for
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
1. How to Sound Like a Data Scientist
What is data science?
Basic terminology
Why data science?
Example – Sigma Technologies
The data science Venn diagram
The math
Example – spawner-recruit models
Computer programming
Why Python?
Python practices
Example of basic Python
Example – parsing a single tweet
Domain knowledge
Some more terminology
Data science case studies
Case study – automating government paper pushing
Fire all humans, right?
Case study – marketing dollars
Case study – what's in a job description?
2. Types of Data
Flavors of data
Why look at these distinctions?
Structured versus unstructured data
Example of data preprocessing
Word/phrase counts
Presence of certain special characters
Relative length of text
Picking out topics
Quantitative versus qualitative data
Example – coffee shop data
Example – world alcohol consumption data
Digging deeper
The road thus far

The four levels of data
The nominal level
Mathematical operations allowed
Measures of center
What data is like at the nominal level
The ordinal level
Mathematical operations allowed
Measures of center
Quick recap and check
The interval level
Mathematical operations allowed
Measures of center
Measures of variation
Standard deviation
The ratio level
Measures of center
Problems with the ratio level
Data is in the eye of the beholder
3. The Five Steps of Data Science
Introduction to data science
Overview of the five steps
Ask an interesting question
Obtain the data
Explore the data
Model the data
Communicate and visualize the results
Explore the data
Basic questions for data exploration
Dataset 1 – Yelp
Exploration tips for qualitative data
Nominal level columns
Filtering in Pandas
Ordinal level columns
Dataset 2 – titanic
4. Basic Mathematics
Mathematics as a discipline
Basic symbols and terminology
Vectors and matrices
Quick exercises
Arithmetic symbols
Dot product
Set theory
Linear algebra
Matrix multiplication
How to multiply matrices
5. Impossible or Improbable – A Gentle Introduction to Probability
Basic definitions
Bayesian versus Frequentist
Frequentist approach
The law of large numbers
Compound events
Conditional probability
The rules of probability
The addition rule
Mutual exclusivity
The multiplication rule
Complementary events
A bit deeper
6. Advanced Probability
Collectively exhaustive events
Bayesian ideas revisited
Bayes theorem
More applications of Bayes theorem
Example – Titanic
Example – medical studies
Random variables
Discrete random variables
Types of discrete random variables
Binomial random variables
Poisson random variable,
Continuous random variables
7. Basic Statistics
What are statistics?
How do we obtain and sample data?
Obtaining data
Sampling data
Probability sampling
Random sampling
Unequal probability sampling
How do we measure statistics?
Measures of center
Measures of variation
Example – employee salaries
Measures of relative standing
The insightful part – correlations in data
The Empirical rule
8. Advanced Statistics
Point estimates
Sampling distributions
Confidence intervals
Hypothesis tests
Conducting a hypothesis test
One sample t-tests
Example of a one sample t-tests
Assumptions of the one sample t-tests
Type I and type II errors
Hypothesis test for categorical variables
Chi-square goodness of fit test
Assumptions of the chi-square goodness of fit test
Example of a chi-square test for goodness of fit
Chi-square test for association/independence
Assumptions of the chi-square independence test
9. Communicating Data
Why does communication matter?
Identifying effective and ineffective visualizations
Scatter plots
Line graphs
Bar charts
Box plots
When graphs and statistics lie
Correlation versus causation
Simpson's paradox
If correlation doesn't imply causation, then what does?
Verbal communication
It's about telling a story
On the more formal side of things
The why/how/what strategy of presenting
10. How to Tell If Your Toaster Is Learning – Machine Learning Essentials
What is machine learning?
Machine learning isn't perfect
How does machine learning work?
Types of machine learning
Supervised learning
It's not only about predictions
Types of supervised learning
Data is in the eyes of the beholder
Unsupervised learning
Reinforcement learning
Overview of the types of machine learning
How does statistical modeling fit into all of this?
Linear regression
Adding more predictors
Regression metrics
Logistic regression
Probability, odds, and log odds
The math of logistic regression
Dummy variables
11. Predictions Don't Grow on Trees – or Do They?
NaĂŻve Bayes classification
Decision trees
How does a computer build a regression tree?
How does a computer fit a classification tree?
Unsupervised learning
When to use unsupervised learning
K-means clustering
Illustrative example – data points
Illustrative example – beer!
Choosing an optimal number for K and cluster validation
The Silhouette Coefficient
Feature extraction and principal component analysis