Practical Data Analysis
eBook - ePub

Practical Data Analysis

Hector Cuesta

Partager le livre
  1. 360 pages
  2. English
  3. ePUB (adapté aux mobiles)
  4. Disponible sur iOS et Android
eBook - ePub

Practical Data Analysis

Hector Cuesta

DĂ©tails du livre
Aperçu du livre
Table des matiĂšres
Citations

À propos de ce livre

In Detail

Plenty of small businesses face big amounts of data but lack the internal skills to support quantitative analysis. Understanding how to harness the power of data analysis using the latest open source technology can lead them to providing better customer service, the visualization of customer needs, or even the ability to obtain fresh insights about the performance of previous products. Practical Data Analysis is a book ideal for home and small business users who want to slice and dice the data they have on hand with minimum hassle.

Practical Data Analysis is a hands-on guide to understanding the nature of your data and turn it into insight. It will introduce you to the use of machine learning techniques, social networks analytics, and econometrics to help your clients get insights about the pool of data they have at hand. Performing data preparation and processing over several kinds of data such as text, images, graphs, documents, and time series will also be covered.

Practical Data Analysis presents a detailed exploration of the current work in data analysis through self-contained projects. First you will explore the basics of data preparation and transformation through OpenRefine. Then you will get started with exploratory data analysis using the D3js visualization framework. You will also be introduced to some of the machine learning techniques such as, classification, regression, and clusterization through practical projects such as spam classification, predicting gold prices, and finding clusters in your Facebook friends network. You will learn how to solve problems in text classification, simulation, time series forecast, social media, and MapReduce through detailed projects. Finally you will work with large amounts of Twitter data using MapReduce to perform a sentiment analysis implemented in Python and MongoDB.

Practical Data Analysis contains a combination of carefully selected algorithms and data scrubbing that enables you to turn your data into insight.

Approach

Practical Data Analysis is a practical, step-by-step guide to empower small businesses to manage and analyze your data and extract valuable information from the data.

Who this book is for

This book is for developers, small business users, and analysts who want to implement data analysis and visualization for their company in a practical way. You need no prior experience with data analysis or data processing; however, basic knowledge of programming, statistics, and linear algebra is assumed.

Foire aux questions

Comment puis-je résilier mon abonnement ?
Il vous suffit de vous rendre dans la section compte dans paramĂštres et de cliquer sur « RĂ©silier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez rĂ©siliĂ© votre abonnement, il restera actif pour le reste de la pĂ©riode pour laquelle vous avez payĂ©. DĂ©couvrez-en plus ici.
Puis-je / comment puis-je télécharger des livres ?
Pour le moment, tous nos livres en format ePub adaptĂ©s aux mobiles peuvent ĂȘtre tĂ©lĂ©chargĂ©s via l’application. La plupart de nos PDF sont Ă©galement disponibles en tĂ©lĂ©chargement et les autres seront tĂ©lĂ©chargeables trĂšs prochainement. DĂ©couvrez-en plus ici.
Quelle est la différence entre les formules tarifaires ?
Les deux abonnements vous donnent un accĂšs complet Ă  la bibliothĂšque et Ă  toutes les fonctionnalitĂ©s de Perlego. Les seules diffĂ©rences sont les tarifs ainsi que la pĂ©riode d’abonnement : avec l’abonnement annuel, vous Ă©conomiserez environ 30 % par rapport Ă  12 mois d’abonnement mensuel.
Qu’est-ce que Perlego ?
Nous sommes un service d’abonnement Ă  des ouvrages universitaires en ligne, oĂč vous pouvez accĂ©der Ă  toute une bibliothĂšque pour un prix infĂ©rieur Ă  celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! DĂ©couvrez-en plus ici.
Prenez-vous en charge la synthÚse vocale ?
Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte Ă  haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accĂ©lĂ©rer ou le ralentir. DĂ©couvrez-en plus ici.
Est-ce que Practical Data Analysis est un PDF/ePUB en ligne ?
Oui, vous pouvez accĂ©der Ă  Practical Data Analysis par Hector Cuesta en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Computer Science et Databases. Nous disposons de plus d’un million d’ouvrages Ă  dĂ©couvrir dans notre catalogue.

Informations

Année
2013
ISBN
9781783280995

Practical Data Analysis


Table of Contents

Practical Data Analysis
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Getting Started
Computer science
Artificial intelligence (AI)
Machine Learning (ML)
Statistics
Mathematics
Knowledge domain
Data, information, and knowledge
The nature of data
The data analysis process
The problem
Data preparation
Data exploration
Predictive modeling
Visualization of results
Quantitative versus qualitative data analysis
Importance of data visualization
What about big data?
Sensors and cameras
Social networks analysis
Tools and toys for this book
Why Python?
Why mlpy?
Why D3.js?
Why MongoDB?
Summary
2. Working with Data
Datasource
Open data
Text files
Excel files
SQL databases
NoSQL databases
Multimedia
Web scraping
Data scrubbing
Statistical methods
Text parsing
Data transformation
Data formats
CSV
Parsing a CSV file with the csv module
Parsing a CSV file using NumPy
JSON
Parsing a JSON file using json module
XML
Parsing an XML file in Python using xml module
YAML
Getting started with OpenRefine
Text facet
Clustering
Text filters
Numeric facets
Transforming data
Exporting data
Operation history
Summary
3. Data Visualization
Data-Driven Documents (D3)
HTML
DOM
CSS
JavaScript
SVG
Getting started with D3.js
Bar chart
Pie chart
Scatter plot
Single line chart
Multi-line chart
Interaction and animation
Summary
4. Text Classification
Learning and classification
Bayesian classification
NaĂŻve Bayes algorithm
E-mail subject line tester
The algorithm
Classifier accuracy
Summary
5. Similarity-based Image Retrieval
Image similarity search
Dynamic time warping (DTW)
Processing the image dataset
Implementing DTW
Analyzing the results
Summary
6. Simulation of Stock Prices
Financial time series
Random walk simulation
Monte Carlo methods
Generating random numbers
Implementation in D3.js
Summary
7. Predicting Gold Prices
Working with the time series data
Components of a time series
Smoothing the time series
The data – historical gold prices
Nonlinear regression
Kernel ridge regression
Smoothing the gold prices time series
Predicting in the smoothed time series
Contrasting the predicted value
Summary
8. Working with Support Vector Machines
Understanding the multivariate dataset
Dimensionality reduction
Linear Discriminant Analysis
Principal Component Analysis
Getting started with support vector machine
Kernel functions
Double spiral problem
SVM implemented on mlpy
Summary
9. Modeling Infectious Disease with Cellular Automata
Introduction to epidemiology
The epidemiology triangle
The epidemic models
The SIR model
Solving ordinary differential equation for the SIR model with SciPy
The SIRS model
Modeling with cellular automata
Cell, state, grid, and neighborhood
Global stochastic contact model
Simulation of the SIRS model in CA with D3.js
Summary
10. Working with Social Graphs
Structure of a graph
Undirected graph
Directed graph
Social Networks Analysis
Acquiring my Facebook graph
Using Netvizz
Representing graphs with Gephi
Statistical analysis
Male to female ratio
Degree distribution
Histogram of a graph
Centrality
Transforming GDF to JSON
Graph visualization with D3.js
Summary
11. Sentiment Analysis of Twitter Data
The anatomy of Twitter data
Tweet
Followers
Trending topics
Using OAuth to access Twitter API
Getting started with Twython
Simple search
Working with timelines
Working with followers
Working with places and trends
Sentiment classification
Affective Norms for English Words
Text corpus
Getting started with Natural Language Toolkit (NLTK)
Bag of words
Naive Bayes
Sentiment analysis of tweets
Summary
12. Data Processing and Aggregation with MongoDB
Getting started with MongoDB
Database
Collection
Document
Mongo shell
Insert/Update/Delete
Queries
Data preparation
Data transformation with OpenRefine
Inserting documents with PyMongo
Group
The aggregation framework
Pipelines
Expressions
Summary
13. Working with MapReduce
MapReduce overview
Programming model
Using MapReduce with MongoDB
The map func...

Table des matiĂšres