Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences
eBook - ePub

Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences

John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard

Compartir libro
  1. 474 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences

John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

This book reviews the latest techniques in exploratory data mining (EDM) for the analysis of data in the social and behavioral sciences to help researchers assess the predictive value of different combinations of variables in large data sets. Methodological findings and conceptual models that explain reliable EDM techniques for predicting and understanding various risk mechanisms are integrated throughout. Numerous examples illustrate the use of these techniques in practice. Contributors provide insight through hands-on experiences with their own use of EDM techniques in various settings. Readers are also introduced to the most popular EDM software programs. A related website at http://mephisto.unige.ch/pub/edm-book-supplement/ offers color versions of the book's figures, asupplemental paper to chapter 3, and R commands for some chapters.

The results of EDM analyses can be perilous – they are often taken as predictions with little regard for cross-validating the results. This carelessness can be catastrophic in terms of money lost or patients misdiagnosed. This book addresses these concerns and advocates for the development of checks and balances for EDM analyses. Both the promises and the perils of EDM are addressed.

Editors McArdle and Ritschard taught the "Exploratory Data Mining" Advanced Training Institute of the American Psychological Association (APA). All contributors are top researchers from the US and Europe. Organized into two parts--methodology and applications, the techniques covered include decision, regression, and SEM tree models, growth mixture modeling, and time based categorical sequential analysis. Some of the applications of EDM (and the corresponding data) explored include:



selection to college based on risky prior academic profiles





the decline of cognitive abilities in older persons





global perceptions of stress in adulthood





predicting mortality from demographics and cognitive abilities





risk factors during pregnancy and the impact on neonatal development



Intended as a reference for researchers, methodologists, and advanced students in the social and behavioral sciences including psychology, sociology, business, econometrics, and medicine, interested in learning to apply the latest exploratory data mining techniques. Prerequisites include a basic class in statistics.

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences un PDF/ePUB en línea?
Sí, puedes acceder a Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences de John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard en formato PDF o ePUB, así como a otros libros populares de Psychology y Research & Methodology in Psychology. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Editorial
Routledge
Año
2013
ISBN
9781135044084
Part I
Methodological Aspects
1
Exploratory Data Mining Using Decision Trees in the Behavioral Sciences

John J. McArdle
Introduction
This first chapter starts off with a discussion of confirmatory versus exploratory analyses in behavioral research, and exploratory approaches are considered most useful. Decision Tree Analysis (DTA) is defined in historical and technical detail. Four real-life examples are presented to give a flavor of what is now possible with DTA: (1) Predicting Coronary Heart Disease from Age; (2) Some New Approaches to the Classification of Alzheimer’s Disease; (3) Exploring Predictors of College Academic Performances from High School; and (4) Exploring Patterns of Changes in Longitudinal WISC Data. In each case, current questions regarding DTA are raised. The discussion that follows considers the benefits and limitations of this exploratory approach, and the author concludes that confirmatory analyses should be always be done first, but this should at all times be followed by exploratory analyses.
The term “exploratory” is considered by many as less than an approach to data analysis and more a confession of guilt—a dishonest act has been performed with one’s data. This becomes obvious when we reflexively recoil at the thought of exploratory methods, or when immediate rejections occur when one proposes research exploration in a research grant application, or when one tries to publish new results found by exploration. We need to face up to the fact that we now have a clear preference for confirmatory and a priori testing of well-formulated research hypotheses in psychological research. One radical interpretation of this explicit preference is that we simply do not yet trust one another.
Unfortunately, as many researchers know, quite the opposite is actually the truth. That is, it can be said that exploratory analyses predominate in our actual research activities. To be more extreme, we can assert there is actually no such thing as a true confirmatory analysis of data, nor should there be. Either way, we can try to be clearer about this problem. We need better responses when well-meaning students and colleagues ask, “Is it OK to do procedure X?” I assume they are asking, “Is there a well-known probability basis for procedure X, and will I be able to publish it?” Fear of rejection is strong among many good researchers, and one side effect is that rejection leaves scientific creativity only to the bold. As I will imply several times here, the only real requirement for a useful data analysis is that we remain honest (see McArdle, 2010).
When I was searching around for materials on this topic I stumbled upon the informative work by Berk (2009) where he starts out by saying:
As I was writing my recent book on regression analysis (Berk, 2003), I was struck by how few alternatives to conventional regression there were. In the social sciences, for example, one either did casual modeling econometric style, or largely gave up quantitative work … The life sciences did not seem quite as driven by causal modeling, but causal modeling was a popular tool. As I argued at length in my book, causal modeling as commonly undertaken is a loser.
There also seemed to be a more general problem. Across a range of scientific disciplines there was often too little interest in statistical tools emphasizing induction and description. With the primary goal of getting the “right” model and its associated p-values, the older and more interesting tradition of exploratory data analysis had largely become an under-the-table activity: the approach was in fact commonly used, but rarely discussed in polite company. How could one be a real scientist, guided by “theory” and engaged in deductive model testing, while at the same time snooping around in the data to determine which models to test? In the battle for prestige, model testing had won.
At the same time, I became aware of some new developments in applied mathematics, computer sciences, and statistics making data exploration a virtue. And with this virtue came a variety of new ideas and concepts, coupled with the very latest in statistical computing. These new approaches, variously identified as “data mining,” “statistical learning,” “machine learning,” and other names, were being tried in a number of natural and biomedical sciences, and the initial experience looked promising.
As I started to read more deeply, however, I was stuck by how difficult it was to work across writings from such disparate disciplines. Even when the material was essentially the same, it was very difficult to tell if it was. Each discipline brought it own goals, concepts, naming conventions, and (maybe worst of all) notation to the table . Finally, there is the matter of tone. The past several decades have seen the development of a dizzying array of new statistical procedures, sometimes introduced with the hype of a big-budget movie. Advertising from major statistical software providers has typically made things worse. Although there have been genuine and useful advances, none of the techniques have ever lived up to their original billing. Widespread misuse has further increased the gap between promised performance and actual performance. In this book, the tone will be cautious, some might even say dark …
(p. xi)
The problems raised by Berk (2009) are pervasive and we need new ways to overcome them. In my own view, the traditional use of the simple independent groups t-test should have provided our first warning message that something was wrong about the standard “confirmatory” mantras. For example, we know it is fine to calculate the classic test of the mean difference between two groups and calculate the “probability of equality” or “significance of the mean difference” under the typical assumptions (i.e., random sampling of persons, random assignment to groups, equal variance within cells). But we also know it is not appropriate to achieve significance by: (a) using another variable when the first variable fails to please, (b) getting data on more people until the observed difference is significant, (c) using various transformations of the data until we achieve significance, (d) tossing out outliers until we achieve significance, (e) examining possible differences in the variance instead of the means when we do not get what we want, (f) accepting a significant difference in the opposite direction to that we originally thought. I assume all good researchers do these kinds of things all the time. In my view, the problem is not with us but with the way we are taught to revere the apparent objectivity of the t-test approach. It is bound to be even more complex when we use this t-test procedure over and over again in hopes of isolating multivariate relationships.
For similar reasons, the one-way analysis of variance (ANOVA) should have been our next warning sign about the overall statistical dilemma. When we have three or more groups and perform a one-way ANOVA we can consider the resulting F-ratio as an indicator of “any group difference.” In practice, we can calculate ...

Índice