Mastering Python Data Analysis
eBook - ePub

Mastering Python Data Analysis

  1. 284 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Mastering Python Data Analysis

About this book

Become an expert at using Python for advanced statistical analysis of data using real-world examples

About This Book

  • Clean, format, and explore data using graphical and numerical summaries
  • Leverage the IPython environment to efficiently analyze data with Python
  • Packed with easy-to-follow examples to develop advanced computational skills for the analysis of complex data

Who This Book Is For

If you are a competent Python developer who wants to take your data analysis skills to the next level by solving complex problems, then this advanced guide is for you. Familiarity with the basics of applying Python libraries to data sets is assumed.

What You Will Learn

  • Read, sort, and map various data into Python and Pandas
  • Recognise patterns so you can understand and explore data
  • Use statistical models to discover patterns in data
  • Review classical statistical inference using Python, Pandas, and SciPy
  • Detect similarities and differences in data with clustering
  • Clean your data to make it useful
  • Work in Jupyter Notebook to produce publication ready figures to be included in reports

In Detail

Python, a multi-paradigm programming language, has become the language of choice for data scientists for data analysis, visualization, and machine learning. Ever imagined how to become an expert at effectively approaching data analysis problems, solving them, and extracting all of the available information from your data? Well, look no further, this is the book you want!

Through this comprehensive guide, you will explore data and present results and conclusions from statistical analysis in a meaningful way. You'll be able to quickly and accurately perform the hands-on sorting, reduction, and subsequent analysis, and fully appreciate how data analysis methods can support business decision-making.

You'll start off by learning about the tools available for data analysis in Python and will then explore the statistical models that are used to identify patterns in data. Gradually, you'll move on to review statistical inference using Python, Pandas, and SciPy. After that, we'll focus on performing regression using computational tools and you'll get to understand the problem of identifying clusters in data in an algorithmic way. Finally, we delve into advanced techniques to quantify cause and effect using Bayesian methods and you'll discover how to use Python's tools for supervised machine learning.

Style and approach

This book takes a step-by-step approach to reading, processing, and analyzing data in Python using various methods and tools. Rich in examples, each topic connects to real-world examples and retrieves data directly online where possible. With this book, you are given the knowledge and tools to explore any data on your own, encouraging a curiosity befitting all data scientists.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

Mastering Python Data Analysis


Mastering Python Data Analysis

Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Month: June 2016
Production reference: 1230616
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78355-329-7
www.packtpub.com

Credits

Authors
Magnus Vilhelm Persson
Luiz Felipe Martins
Copy Editor
Tasneem Fatehi
Reviewers
Hang (Harvey) Yu
Laurie Lugrin
Chris Morgan
Michele Pratusevich
Project Coordinator
Ritika Manoj
Commissioning Editor
Akram Hussain
Proofreader
Safis Editing
Acquisition Editor
Vinay Argekar
Indexer
Monica Ajmera Mehta
Content Development Editor
Arun Nadar
Graphics
Kirk D'Penha
Jason Monteiro
Technical Editors
Bharat Patil
Pranil Pathare
Production Coordinator
Nilesh Mohite

About the Authors

Magnus Vilhelm Persson is a scientist with a passion for Python and open source software usage and development. He obtained his PhD in Physics/Astronomy from Copenhagen University’s Centre for Star and Planet Formation (StarPlan) in 2013. Since then, he has continued his research in Astronomy at various academic institutes across Europe. In his research, he uses various types of data and analysis to gain insights into how stars are formed. He has participated in radio shows about Astronomy and also organized workshops and intensive courses about the use of Python for data analysis.
You can check out his web page at http://vilhelm.nu.
This book would not have been possible without the great work that all the people at Packt are doing. I would like to highlight Arun, Bharat, Vinay, and Pranil's work. Thank you for your patience during the whole process. Furthermore, I would like to thank Packt for giving me the opportunity to develop and write this book, it was really fun and I learned a lot. There where times when the work was little overwhelming, but at those times, my colleague and friend Alan Heays always had some supporting words to say. Finally, my wife, Mihaela, is the most supportive partner anyone could ever have. For all the late evenings and nights where you pushed me to continue working on this to finish it, thank you. You are the most loving wife and best friend anyone could ever ask for.
Luiz Felipe Martins holds a PhD in applied mathematics from Brown University and has worked as a researcher and educator for more than 20 years. His research is mainly in the field of applied probability. He has been involved in developing code for open source homework system, WeBWorK, where he wrote a library for the visualization of systems of differential equations. He was supported by an NSF grant for this project. Currently, he is an associate professor in the department of mathematics at Cleveland State University, Cleveland, Ohio, where he has developed several courses in applied mathematics and scientific computing. His current duties include coordinating all first-year calculus sessions.

About the Reviewer

Hang (Harvey) Yu is a data scientist in Silicon Valley. He works on search engine development and model optimization. He has ample experience in big data and machine learning. He graduated from the University of Illinois at Urbana-Champaign with a background in data mining and statistics. Besides this book, he has also reviewed multiple other books and papers including Mastering Python Data Visualization and R Data Analysis Cookbook both by Packt Publishing. When Harvey is not coding, he is playing soccer, reading fiction books, or listening to classical music. You can get in touch with him at [email protected] or on LinkedIn at http://www.linkedin.com/in/hangyu1.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
www.PacktPub.com
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

  • Fully searchable across every book published by Packt
  • Copy and paste, print, and bookmark content
  • On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

The use of Python for data analysis and visualization has only increased in popularity in the last few years. One reason for this is the availability and continued development of a number of excellent tools for conducting advanced data analysis and visualization. Another reason is the possibility of rapid and easy development, deployment, and sharing of code. For these reasons, Python has become one of the most widely used programming and scripting language for data analysis in many industries.
The aim of this book is to develop skills to effectively approach almost any data analysis problem, and extract all of the available information. This is done by introducing a range of varying techniques and methods such as uni- and multi-variate linear regression, cluster finding, Bayesian analysis, machine learning, and time series analysis. Exploratory data analysis is a key aspect to get a sense of what can be done and to maximize the insights that are gained from the data. Additionally, emphasis is put on presentation-ready figures that are clear and easy to interpret.
Knowing how to explore data and present results and conclusions from data analysis in a meaningful way is an important skill. While the theory behind statistical analysis is important to know, to be able to quickly and accurately perform hands-on sorting, reduction, analysis, and subsequently present the insights gained, is a make or break for today's quickly evolving business and academic sector.

What this book covers

Chapter 1, Tools of the Trade, provides an overview of the tools available for data analysis in Python and details the packages and libraries that will be used i...

Table of contents

  1. Mastering Python Data Analysis

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Mastering Python Data Analysis by Magnus Vilhelm Persson, Luiz Felipe Martins in PDF and/or ePUB format, as well as other popular books in Computer Science & Data Modelling & Design. We have over one million books available in our catalogue for you to explore.