Mastering Text Mining with R
eBook - ePub

Mastering Text Mining with R

Ashish Kumar, Avinash Paul

Buch teilen
  1. 258 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

Mastering Text Mining with R

Ashish Kumar, Avinash Paul

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

Master text-taming techniques and build effective text-processing applications with R

About This Book

  • Develop all the relevant skills for building text-mining apps with R with this easy-to-follow guide
  • Gain in-depth understanding of the text mining process with lucid implementation in the R language
  • Example-rich guide that lets you gain high-quality information from text data

Who This Book Is For

If you are an R programmer, analyst, or data scientist who wants to gain experience in performing text data mining and analytics with R, then this book is for you. Exposure to working with statistical methods and language processing would be helpful.

What You Will Learn

  • Get acquainted with some of the highly efficient R packages such as OpenNLP and RWeka to perform various steps in the text mining process
  • Access and manipulate data from different sources such as JSON and HTTP
  • Process text using regular expressions
  • Get to know the different approaches of tagging texts, such as POS tagging, to get started with text analysis
  • Explore different dimensionality reduction techniques, such as Principal Component Analysis (PCA), and understand its implementation in R
  • Discover the underlying themes or topics that are present in an unstructured collection of documents, using common topic models such as Latent Dirichlet Allocation (LDA)
  • Build a baseline sentence completing application
  • Perform entity extraction and named entity recognition using R

In Detail

Text Mining (or text data mining or text analytics) is the process of extracting useful and high-quality information from text by devising patterns and trends. R provides an extensive ecosystem to mine text through its many frameworks and packages.

Starting with basic information about the statistics concepts used in text mining, this book will teach you how to access, cleanse, and process text using the R language and will equip you with the tools and the associated knowledge about different tagging, chunking, and entailment approaches and their usage in natural language processing. Moving on, this book will teach you different dimensionality reduction techniques and their implementation in R. Next, we will cover pattern recognition in text data utilizing classification mechanisms, perform entity recognition, and develop an ontology learning framework.

By the end of the book, you will develop a practical application from the concepts learned, and will understand how text mining can be leveraged to analyze the massively available data on social media.

Style and approach

This book takes a hands-on, example-driven approach to the text mining process with lucid implementation in R.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Mastering Text Mining with R als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu Mastering Text Mining with R von Ashish Kumar, Avinash Paul im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Informatica & Data mining. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Jahr
2016
ISBN
9781783551811
Auflage
1

Mastering Text Mining with R


Table of Contents

Mastering Text Mining with R
Credits
About the Authors
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Statistical Linguistics with R
Probability theory and basic statistics
Probability space and event
Theorem of compound probabilities
Conditional probability
Bayes' formula for conditional probability
Independent events
Random variables
Discrete random variables
Continuous random variables
Probability frequency function
Probability distributions using R
Cumulative distribution function
Joint distribution
Binomial distribution
Poisson distribution
Counting occurrences
Zipf's law
Heaps' law
Lexical richness
Lexical variation
Lexical density
Lexical originality
Lexical sophistication
Language models
N-gram models
Markov assumption
Hidden Markov models
Quantitative methods in linguistics
Document term matrix
Inverse document frequency
Words similarity and edit-distance functions
Euclidean distance
Cosine similarity
Levenshtein distance
Damerau-Levenshtein distance
Hamming distance
Jaro-Winkler distance
Measuring readability of a text
Gunning frog index
R packages for text mining
OpenNLP
Rweka
RcmdrPlugin.temis
tm
languageR
koRpus
RKEA
maxent
lsa
Summary
2. Processing Text
Accessing text from diverse sources
File system
PDF documents
Microsoft Word documents
HTML
XML
JSON
HTTP
Databases
Processing text using regular expressions
Tokenization and segmentation
Word tokenization
Operations on a document-term matrix
Sentence segmentation
Normalizing texts
Lemmatization and stemming
Stemming
Lemmatization
Synonyms
Lexical diversity
Analyse lexical diversity
Calculate lexical diversity
Readability
Automated readability index
Language detection
Summary
3. Categorizing and Tagging Text
Parts of speech tagging
POS tagging with R packages
Hidden Markov Models for POS tagging
Basic definitions and notations
Implementing HMMs
Viterbi underflow
Forward algorithm underflow
OpenNLP chunking
Chunk tags
Collocation and contingency tables
Extracting co-occurrences
Surface Co-occurrence
Textual co-occurrence
Syntactic co-occurrence
Co-occurrence in a document
Quantifying the relation between words
Contingency tables
Detailed analysis on textual collocations
Feature extraction
Synonymy and similarity
Multiwords, negation, and antonymy
Concept similarity
Path length
Resnik similarity
Lin similarity
Jiang – Conrath distance
Summary
4. Dimensionality Reduction
The curse of dimensionality
Distance concentration and computational infeasibility
Dimensionality reduction
Principal component analysis
Using R for PCA
Understanding the FactoMineR package
Amap package
Proportion of variance
Scree plot
Reconstruction error
Correspondence analysis
Canonical correspondence analysis
Pearson's Chi-squared test
Multiple correspondence analysis
Implementation of SVD using R
Summary
5. Text Summarization and Clustering
Topic modeling
Latent Dirichlet Allocation
Correlated topic model
Model selection
R Package for topic modeling
Fitting the LDA model with the VEM algorithm
Latent semantic analysis
R Package for latent semantic analysis
Illustrative example of LSA
Text clustering
Document clustering
Feature selection for text clustering
Mutual information
Statistic Chi Square feature selection
Frequency-based feature selection
Sentence completion
Summary
6. Text Classification
Text classification
Document representation
Feature hashing
Classifiers – inductive learning
Tree-based learning
Bayesian classifiers: Naive Bayes classification
K-Nearest neighbors
Kernel methods
Support vector machines
Kernel Trick
How to apply SVM on a real world example?
Number of instances is significantly larger than the number of dimensions.Maximum entropy classifier
Maxent implemenation in R
RTextTools: a text classification framework
Model evaluation
Confusion matrix
ROC curve
Precision-recall
Bias–variance trade-off and learning curve
Bias-variance decomposition
Learning curve
Dealing with reducible error components
Cross validation
Leave-one-out
k-Fold
Bootstrap
Stratified
Summary
7. Entity Recognition
Entity extraction
The rule-based approach
Machine learning
Sentence boundary detection
Word token annotator
Named entity recognition
Training a model with new features
Summary
Index

Mastering Text Mining with R

Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any fo...

Inhaltsverzeichnis