Mastering Text Mining with R
eBook - ePub

Mastering Text Mining with R

Ashish Kumar, Avinash Paul

Compartir libro
  1. 258 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Mastering Text Mining with R

Ashish Kumar, Avinash Paul

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Master text-taming techniques and build effective text-processing applications with R

About This Book

  • Develop all the relevant skills for building text-mining apps with R with this easy-to-follow guide
  • Gain in-depth understanding of the text mining process with lucid implementation in the R language
  • Example-rich guide that lets you gain high-quality information from text data

Who This Book Is For

If you are an R programmer, analyst, or data scientist who wants to gain experience in performing text data mining and analytics with R, then this book is for you. Exposure to working with statistical methods and language processing would be helpful.

What You Will Learn

  • Get acquainted with some of the highly efficient R packages such as OpenNLP and RWeka to perform various steps in the text mining process
  • Access and manipulate data from different sources such as JSON and HTTP
  • Process text using regular expressions
  • Get to know the different approaches of tagging texts, such as POS tagging, to get started with text analysis
  • Explore different dimensionality reduction techniques, such as Principal Component Analysis (PCA), and understand its implementation in R
  • Discover the underlying themes or topics that are present in an unstructured collection of documents, using common topic models such as Latent Dirichlet Allocation (LDA)
  • Build a baseline sentence completing application
  • Perform entity extraction and named entity recognition using R

In Detail

Text Mining (or text data mining or text analytics) is the process of extracting useful and high-quality information from text by devising patterns and trends. R provides an extensive ecosystem to mine text through its many frameworks and packages.

Starting with basic information about the statistics concepts used in text mining, this book will teach you how to access, cleanse, and process text using the R language and will equip you with the tools and the associated knowledge about different tagging, chunking, and entailment approaches and their usage in natural language processing. Moving on, this book will teach you different dimensionality reduction techniques and their implementation in R. Next, we will cover pattern recognition in text data utilizing classification mechanisms, perform entity recognition, and develop an ontology learning framework.

By the end of the book, you will develop a practical application from the concepts learned, and will understand how text mining can be leveraged to analyze the massively available data on social media.

Style and approach

This book takes a hands-on, example-driven approach to the text mining process with lucid implementation in R.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Mastering Text Mining with R un PDF/ePUB en línea?
Sí, puedes acceder a Mastering Text Mining with R de Ashish Kumar, Avinash Paul en formato PDF o ePUB, así como a otros libros populares de Informatica y Data mining. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2016
ISBN
9781783551811
Edición
1
Categoría
Informatica
Categoría
Data mining

Mastering Text Mining with R


Table of Contents

Mastering Text Mining with R
Credits
About the Authors
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Statistical Linguistics with R
Probability theory and basic statistics
Probability space and event
Theorem of compound probabilities
Conditional probability
Bayes' formula for conditional probability
Independent events
Random variables
Discrete random variables
Continuous random variables
Probability frequency function
Probability distributions using R
Cumulative distribution function
Joint distribution
Binomial distribution
Poisson distribution
Counting occurrences
Zipf's law
Heaps' law
Lexical richness
Lexical variation
Lexical density
Lexical originality
Lexical sophistication
Language models
N-gram models
Markov assumption
Hidden Markov models
Quantitative methods in linguistics
Document term matrix
Inverse document frequency
Words similarity and edit-distance functions
Euclidean distance
Cosine similarity
Levenshtein distance
Damerau-Levenshtein distance
Hamming distance
Jaro-Winkler distance
Measuring readability of a text
Gunning frog index
R packages for text mining
OpenNLP
Rweka
RcmdrPlugin.temis
tm
languageR
koRpus
RKEA
maxent
lsa
Summary
2. Processing Text
Accessing text from diverse sources
File system
PDF documents
Microsoft Word documents
HTML
XML
JSON
HTTP
Databases
Processing text using regular expressions
Tokenization and segmentation
Word tokenization
Operations on a document-term matrix
Sentence segmentation
Normalizing texts
Lemmatization and stemming
Stemming
Lemmatization
Synonyms
Lexical diversity
Analyse lexical diversity
Calculate lexical diversity
Readability
Automated readability index
Language detection
Summary
3. Categorizing and Tagging Text
Parts of speech tagging
POS tagging with R packages
Hidden Markov Models for POS tagging
Basic definitions and notations
Implementing HMMs
Viterbi underflow
Forward algorithm underflow
OpenNLP chunking
Chunk tags
Collocation and contingency tables
Extracting co-occurrences
Surface Co-occurrence
Textual co-occurrence
Syntactic co-occurrence
Co-occurrence in a document
Quantifying the relation between words
Contingency tables
Detailed analysis on textual collocations
Feature extraction
Synonymy and similarity
Multiwords, negation, and antonymy
Concept similarity
Path length
Resnik similarity
Lin similarity
Jiang – Conrath distance
Summary
4. Dimensionality Reduction
The curse of dimensionality
Distance concentration and computational infeasibility
Dimensionality reduction
Principal component analysis
Using R for PCA
Understanding the FactoMineR package
Amap package
Proportion of variance
Scree plot
Reconstruction error
Correspondence analysis
Canonical correspondence analysis
Pearson's Chi-squared test
Multiple correspondence analysis
Implementation of SVD using R
Summary
5. Text Summarization and Clustering
Topic modeling
Latent Dirichlet Allocation
Correlated topic model
Model selection
R Package for topic modeling
Fitting the LDA model with the VEM algorithm
Latent semantic analysis
R Package for latent semantic analysis
Illustrative example of LSA
Text clustering
Document clustering
Feature selection for text clustering
Mutual information
Statistic Chi Square feature selection
Frequency-based feature selection
Sentence completion
Summary
6. Text Classification
Text classification
Document representation
Feature hashing
Classifiers – inductive learning
Tree-based learning
Bayesian classifiers: Naive Bayes classification
K-Nearest neighbors
Kernel methods
Support vector machines
Kernel Trick
How to apply SVM on a real world example?
Number of instances is significantly larger than the number of dimensions.Maximum entropy classifier
Maxent implemenation in R
RTextTools: a text classification framework
Model evaluation
Confusion matrix
ROC curve
Precision-recall
Bias–variance trade-off and learning curve
Bias-variance decomposition
Learning curve
Dealing with reducible error components
Cross validation
Leave-one-out
k-Fold
Bootstrap
Stratified
Summary
7. Entity Recognition
Entity extraction
The rule-based approach
Machine learning
Sentence boundary detection
Word token annotator
Named entity recognition
Training a model with new features
Summary
Index

Mastering Text Mining with R

Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any fo...

Índice