Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences
eBook - ePub

Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences

John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard

Buch teilen
  1. 474 Seiten
  2. English
  3. ePUB (handyfreundlich)
  4. Über iOS und Android verfügbar
eBook - ePub

Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences

John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard

Angaben zum Buch
Buchvorschau
Inhaltsverzeichnis
Quellenangaben

Über dieses Buch

This book reviews the latest techniques in exploratory data mining (EDM) for the analysis of data in the social and behavioral sciences to help researchers assess the predictive value of different combinations of variables in large data sets. Methodological findings and conceptual models that explain reliable EDM techniques for predicting and understanding various risk mechanisms are integrated throughout. Numerous examples illustrate the use of these techniques in practice. Contributors provide insight through hands-on experiences with their own use of EDM techniques in various settings. Readers are also introduced to the most popular EDM software programs. A related website at http://mephisto.unige.ch/pub/edm-book-supplement/ offers color versions of the book's figures, asupplemental paper to chapter 3, and R commands for some chapters.

The results of EDM analyses can be perilous – they are often taken as predictions with little regard for cross-validating the results. This carelessness can be catastrophic in terms of money lost or patients misdiagnosed. This book addresses these concerns and advocates for the development of checks and balances for EDM analyses. Both the promises and the perils of EDM are addressed.

Editors McArdle and Ritschard taught the "Exploratory Data Mining" Advanced Training Institute of the American Psychological Association (APA). All contributors are top researchers from the US and Europe. Organized into two parts--methodology and applications, the techniques covered include decision, regression, and SEM tree models, growth mixture modeling, and time based categorical sequential analysis. Some of the applications of EDM (and the corresponding data) explored include:



selection to college based on risky prior academic profiles





the decline of cognitive abilities in older persons





global perceptions of stress in adulthood





predicting mortality from demographics and cognitive abilities





risk factors during pregnancy and the impact on neonatal development



Intended as a reference for researchers, methodologists, and advanced students in the social and behavioral sciences including psychology, sociology, business, econometrics, and medicine, interested in learning to apply the latest exploratory data mining techniques. Prerequisites include a basic class in statistics.

Häufig gestellte Fragen

Wie kann ich mein Abo kündigen?
Gehe einfach zum Kontobereich in den Einstellungen und klicke auf „Abo kündigen“ – ganz einfach. Nachdem du gekündigt hast, bleibt deine Mitgliedschaft für den verbleibenden Abozeitraum, den du bereits bezahlt hast, aktiv. Mehr Informationen hier.
(Wie) Kann ich Bücher herunterladen?
Derzeit stehen all unsere auf Mobilgeräte reagierenden ePub-Bücher zum Download über die App zur Verfügung. Die meisten unserer PDFs stehen ebenfalls zum Download bereit; wir arbeiten daran, auch die übrigen PDFs zum Download anzubieten, bei denen dies aktuell noch nicht möglich ist. Weitere Informationen hier.
Welcher Unterschied besteht bei den Preisen zwischen den Aboplänen?
Mit beiden Aboplänen erhältst du vollen Zugang zur Bibliothek und allen Funktionen von Perlego. Die einzigen Unterschiede bestehen im Preis und dem Abozeitraum: Mit dem Jahresabo sparst du auf 12 Monate gerechnet im Vergleich zum Monatsabo rund 30 %.
Was ist Perlego?
Wir sind ein Online-Abodienst für Lehrbücher, bei dem du für weniger als den Preis eines einzelnen Buches pro Monat Zugang zu einer ganzen Online-Bibliothek erhältst. Mit über 1 Million Büchern zu über 1.000 verschiedenen Themen haben wir bestimmt alles, was du brauchst! Weitere Informationen hier.
Unterstützt Perlego Text-zu-Sprache?
Achte auf das Symbol zum Vorlesen in deinem nächsten Buch, um zu sehen, ob du es dir auch anhören kannst. Bei diesem Tool wird dir Text laut vorgelesen, wobei der Text beim Vorlesen auch grafisch hervorgehoben wird. Du kannst das Vorlesen jederzeit anhalten, beschleunigen und verlangsamen. Weitere Informationen hier.
Ist Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences als Online-PDF/ePub verfügbar?
Ja, du hast Zugang zu Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences von John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard im PDF- und/oder ePub-Format sowie zu anderen beliebten Büchern aus Psychology & Research & Methodology in Psychology. Aus unserem Katalog stehen dir über 1 Million Bücher zur Verfügung.

Information

Verlag
Routledge
Jahr
2013
ISBN
9781135044084
Part I
Methodological Aspects
1
Exploratory Data Mining Using Decision Trees in the Behavioral Sciences

John J. McArdle
Introduction
This first chapter starts off with a discussion of confirmatory versus exploratory analyses in behavioral research, and exploratory approaches are considered most useful. Decision Tree Analysis (DTA) is defined in historical and technical detail. Four real-life examples are presented to give a flavor of what is now possible with DTA: (1) Predicting Coronary Heart Disease from Age; (2) Some New Approaches to the Classification of Alzheimer’s Disease; (3) Exploring Predictors of College Academic Performances from High School; and (4) Exploring Patterns of Changes in Longitudinal WISC Data. In each case, current questions regarding DTA are raised. The discussion that follows considers the benefits and limitations of this exploratory approach, and the author concludes that confirmatory analyses should be always be done first, but this should at all times be followed by exploratory analyses.
The term “exploratory” is considered by many as less than an approach to data analysis and more a confession of guilt—a dishonest act has been performed with one’s data. This becomes obvious when we reflexively recoil at the thought of exploratory methods, or when immediate rejections occur when one proposes research exploration in a research grant application, or when one tries to publish new results found by exploration. We need to face up to the fact that we now have a clear preference for confirmatory and a priori testing of well-formulated research hypotheses in psychological research. One radical interpretation of this explicit preference is that we simply do not yet trust one another.
Unfortunately, as many researchers know, quite the opposite is actually the truth. That is, it can be said that exploratory analyses predominate in our actual research activities. To be more extreme, we can assert there is actually no such thing as a true confirmatory analysis of data, nor should there be. Either way, we can try to be clearer about this problem. We need better responses when well-meaning students and colleagues ask, “Is it OK to do procedure X?” I assume they are asking, “Is there a well-known probability basis for procedure X, and will I be able to publish it?” Fear of rejection is strong among many good researchers, and one side effect is that rejection leaves scientific creativity only to the bold. As I will imply several times here, the only real requirement for a useful data analysis is that we remain honest (see McArdle, 2010).
When I was searching around for materials on this topic I stumbled upon the informative work by Berk (2009) where he starts out by saying:
As I was writing my recent book on regression analysis (Berk, 2003), I was struck by how few alternatives to conventional regression there were. In the social sciences, for example, one either did casual modeling econometric style, or largely gave up quantitative work … The life sciences did not seem quite as driven by causal modeling, but causal modeling was a popular tool. As I argued at length in my book, causal modeling as commonly undertaken is a loser.
There also seemed to be a more general problem. Across a range of scientific disciplines there was often too little interest in statistical tools emphasizing induction and description. With the primary goal of getting the “right” model and its associated p-values, the older and more interesting tradition of exploratory data analysis had largely become an under-the-table activity: the approach was in fact commonly used, but rarely discussed in polite company. How could one be a real scientist, guided by “theory” and engaged in deductive model testing, while at the same time snooping around in the data to determine which models to test? In the battle for prestige, model testing had won.
At the same time, I became aware of some new developments in applied mathematics, computer sciences, and statistics making data exploration a virtue. And with this virtue came a variety of new ideas and concepts, coupled with the very latest in statistical computing. These new approaches, variously identified as “data mining,” “statistical learning,” “machine learning,” and other names, were being tried in a number of natural and biomedical sciences, and the initial experience looked promising.
As I started to read more deeply, however, I was stuck by how difficult it was to work across writings from such disparate disciplines. Even when the material was essentially the same, it was very difficult to tell if it was. Each discipline brought it own goals, concepts, naming conventions, and (maybe worst of all) notation to the table . Finally, there is the matter of tone. The past several decades have seen the development of a dizzying array of new statistical procedures, sometimes introduced with the hype of a big-budget movie. Advertising from major statistical software providers has typically made things worse. Although there have been genuine and useful advances, none of the techniques have ever lived up to their original billing. Widespread misuse has further increased the gap between promised performance and actual performance. In this book, the tone will be cautious, some might even say dark …
(p. xi)
The problems raised by Berk (2009) are pervasive and we need new ways to overcome them. In my own view, the traditional use of the simple independent groups t-test should have provided our first warning message that something was wrong about the standard “confirmatory” mantras. For example, we know it is fine to calculate the classic test of the mean difference between two groups and calculate the “probability of equality” or “significance of the mean difference” under the typical assumptions (i.e., random sampling of persons, random assignment to groups, equal variance within cells). But we also know it is not appropriate to achieve significance by: (a) using another variable when the first variable fails to please, (b) getting data on more people until the observed difference is significant, (c) using various transformations of the data until we achieve significance, (d) tossing out outliers until we achieve significance, (e) examining possible differences in the variance instead of the means when we do not get what we want, (f) accepting a significant difference in the opposite direction to that we originally thought. I assume all good researchers do these kinds of things all the time. In my view, the problem is not with us but with the way we are taught to revere the apparent objectivity of the t-test approach. It is bound to be even more complex when we use this t-test procedure over and over again in hopes of isolating multivariate relationships.
For similar reasons, the one-way analysis of variance (ANOVA) should have been our next warning sign about the overall statistical dilemma. When we have three or more groups and perform a one-way ANOVA we can consider the resulting F-ratio as an indicator of “any group difference.” In practice, we can calculate ...

Inhaltsverzeichnis