Probability and Statistics for Data Science
eBook - ePub

Probability and Statistics for Data Science

Math + R + Data

Norman Matloff

Share book
  1. 412 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Probability and Statistics for Data Science

Math + R + Data

Norman Matloff

Book details
Book preview
Table of contents
Citations

About This Book

Probability and Statistics for Data Science: Math + R + Data covers "math stat"ā€”distributions, expected value, estimation etc.ā€”but takes the phrase "Data Science" in the title quite seriously:

* Real datasets are used extensively.

* All data analysis is supported by R coding.

* Includes many Data Science applications, such as PCA, mixture distributions, random graph models, Hidden Markov models, linear and logistic regression, and neural networks.

* Leads the student to think critically about the "how" and "why" of statistics, and to "see the big picture."

* Not "theorem/proof"-oriented, but concepts and models are stated in a mathematically precise manner.

Prerequisites are calculus, some matrix algebra, and some experience in programming.

Norman Matloff is a professor of computer science at the University of California, Davis, and was formerly a statistics professor there. He is on the editorial boards of the Journal of Statistical Software and The R Journal. His book Statistical Regression and Classification: From Linear Models to Machine Learning was the recipient of the Ziegel Award for the best book reviewed in Technometrics in 2017. He is a recipient of his university's Distinguished Teaching Award.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on ā€œCancel Subscriptionā€ - itā€™s as simple as that. After you cancel, your membership will stay active for the remainder of the time youā€™ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlegoā€™s features. The only differences are the price and subscription period: With the annual plan youā€™ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, weā€™ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Probability and Statistics for Data Science an online PDF/ePUB?
Yes, you can access Probability and Statistics for Data Science by Norman Matloff in PDF and/or ePUB format, as well as other popular books in Economics & Statistics for Business & Economics. We have over one million books available in our catalogue for you to explore.

Information

Year
2019
ISBN
9780429687112
Edition
1

Part I

Fundamentals of Probability

Chapter 1

Basic Probability Models

This chapter will introduce the general notions of probability. Most of it will seem intuitive to you, and intuition is indeed crucial in the field of probability and statistics. On the other hand, do not rely on intuition alone; pay careful attention to the general principles which are developed. In more complex settings intuition may not be enough, or may even mislead you. The tools discussed here will be essential, and will be cited frequently throughout the book.
In this book, we will be discussing both ā€œclassicalā€ probability examples involving coins, cards and dice, and also examples involving applications in the real world. The latter will involve diverse fields such as data mining, machine learning, computer networks, bioinformatics, document classification, medical fields and so on. Applied problems actually require a bit more work to fully absorb, but needless to say, you will derive the most benefit from those examples rather than ones involving coins, cards and dice.1
Letā€™s start with one concerning transportation.

1.1 Example: Bus Ridership

Consider the following analysis of bus ridership, which (in more complex form) could be used by the bus company/agency to plan the number of buses, frequency of stops and so on. Again, in order to keep things easy, it will be quite oversimplified, but the principles will be clear.
Here is the model:
ā€¢ At each stop, each passsenger alights from the bus, independently of the actions of others, with probability 0.2 each.
ā€¢ Either 0, 1 or 2 new passengers get on the bus, with probabilities 0.5, 0.4 and 0.1, respectively. Passengers at successive stops act independently.
ā€¢ Assume the bus is so large that it never becomes full, so the new passengers can always board.
ā€¢ Suppose the bus is empty when it arrives at its first stop.
Here and throughout the book, it will be greatly helpful to first name the quantities or events involved. Let Li denote the number of passengers on the bus as it leaves its ith stop, i = 1, 2, 3,ā€¦ Let Bi denote the number of new passengers who board the bus at the ith stop.
We will be interested in various probabilities, such as the probability that no passengers board the bus at the first three stops, i.e.,
P(B1=B2=B3=0)
The reader may correctly guess that the answer is 0.53 = 0.125. But again, we need to do this properly. In order to make such calculations, we must first set up ...

Table of contents