Python Data Analysis
eBook - ePub

Python Data Analysis

Ivan Idris

Share book
  1. 348 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Python Data Analysis

Ivan Idris

Book details
Book preview
Table of contents
Citations

About This Book

Python is a multi-paradigm programming language well suited for both object-oriented application development as well as functional design patterns. Python has become the language of choice for data scientists for data analysis, visualization, and machine learning. It will give you velocity and promote high productivity.

This book will teach novices about data analysis with Python in the broadest sense possible, covering everything from data retrieval, cleaning, manipulation, visualization, and storage to complex analysis and modeling. It focuses on a plethora of open source Python modules such as NumPy, SciPy, matplotlib, pandas, IPython, Cython, scikit-learn, and NLTK. In later chapters, the book covers topics such as data visualization, signal processing, and time-series analysis, databases, predictive analytics and machine learning. This book will turn you into an ace data analyst in no time.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Python Data Analysis an online PDF/ePUB?
Yes, you can access Python Data Analysis by Ivan Idris in PDF and/or ePUB format, as well as other popular books in Computer Science & Open Source Programming. We have over one million books available in our catalogue for you to explore.

Information

Year
2014
ISBN
9781783553358
Edition
1

Python Data Analysis


Table of Contents

Python Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Getting Started with Python Libraries
Software used in this book
Installing software and setup
On Windows
On Linux
On Mac OS X
Building NumPy, SciPy, matplotlib, and IPython from source
Installing with setuptools
NumPy arrays
A simple application
Using IPython as a shell
Reading manual pages
IPython notebooks
Where to find help and references
Summary
2. NumPy Arrays
The NumPy array object
The advantages of NumPy arrays
Creating a multidimensional array
Selecting NumPy array elements
NumPy numerical types
Data type objects
Character codes
The dtype constructors
The dtype attributes
One-dimensional slicing and indexing
Manipulating array shapes
Stacking arrays
Splitting NumPy arrays
NumPy array attributes
Converting arrays
Creating array views and copies
Fancy indexing
Indexing with a list of locations
Indexing NumPy arrays with Booleans
Broadcasting NumPy arrays
Summary
3. Statistics and Linear Algebra
NumPy and SciPy modules
Basic descriptive statistics with NumPy
Linear algebra with NumPy
Inverting matrices with NumPy
Solving linear systems with NumPy
Finding eigenvalues and eigenvectors with NumPy
NumPy random numbers
Gambling with the binomial distribution
Sampling the normal distribution
Performing a normality test with SciPy
Creating a NumPy-masked array
Disregarding negative and extreme values
Summary
4. pandas Primer
Installing and exploring pandas
pandas DataFrames
pandas Series
Querying data in pandas
Statistics with pandas DataFrames
Data aggregation with pandas DataFrames
Concatenating and appending DataFrames
Joining DataFrames
Handling missing values
Dealing with dates
Pivot tables
Remote data access
Summary
5. Retrieving, Processing, and Storing Data
Writing CSV files with NumPy and pandas
Comparing the NumPy .npy binary format and pickling pandas DataFrames
Storing data with PyTables
Reading and writing pandas DataFrames to HDF5 stores
Reading and writing to Excel with pandas
Using REST web services and JSON
Reading and writing JSON with pandas
Parsing RSS and Atom feeds
Parsing HTML with Beautiful Soup
Summary
6. Data Visualization
matplotlib subpackages
Basic matplotlib plots
Logarithmic plots
Scatter plots
Legends and annotations
Three-dimensional plots
Plotting in pandas
Lag plots
Autocorrelation plots
Plot.ly
Summary
7. Signal Processing and Time Series
statsmodels subpackages
Moving averages
Window functions
Defining cointegration
Autocorrelation
Autoregressive models
ARMA models
Generating periodic signals
Fourier analysis
Spectral analysis
Filtering
Summary
8. Working with Databases
Lightweight access with sqlite3
Accessing databases from pandas
SQLAlchemy
Installing and setting up SQLAlchemy
Populating a database with SQLAlchemy
Querying the database with SQLAlchemy
Pony ORM
Dataset – databases for lazy people
PyMongo and MongoDB
Storing data in Redis
Apache Cassandra
Summary
9. Analyzing Textual Data and Social Media
Installing NLTK
Filtering out stopwords, names, and numbers
The bag-of-words model
Analyzing word frequencies
Naive Bayes classification
Sentiment analysis
Creating word clouds
Social network analysis
Summary
10. Predictive Analytics and Machine Learning
A tour of scikit-learn
Preprocessing
Classification with logistic regression
Classification with support vector machines
Regression with ElasticNetCV
Support vector regression
Clustering with affinity propagation
Mean Shift
Genetic algorithms
Neural networks
Decision trees
Summary
11. Environments Outside the Python Ecosystem and Cloud Computing
Exchanging information with MATLAB/Octave
Installing rpy2
Interfacing with R
Sending NumPy arrays to Java
Integrating SWIG and NumPy
Integrating Boost and Python
Using Fortran code through f2py
Setting up Google App Engine
Running programs on PythonAnywhere
Working with Wakari
Summary
12. Performance Tuning, Profiling, and Concurrency
Profiling the code
Installing Cython
Calling C code
Creating a process pool with multiprocessing
Speeding up embarrassingly parallel for loops with Joblib
Comparing Bottleneck to NumPy functions
Performing MapReduce with Jug
Installing MPI for Python
IPython Parallel
Summary
A. Key Concepts
B. Useful Functions
matplotlib
NumPy
pandas
Scikit-learn
SciPy
scipy.fftpack
scipy.signal
scipy.stats
C. Online Resources
Index

Python Data Analysis

Copyright © 2014 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the auth...

Table of contents