eBook - ePub

Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences

Name: Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences
Author: John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard

John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard

Share book

474 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences

John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard

Book details

Book preview

Table of contents

Citations

About This Book

This book reviews the latest techniques in exploratory data mining (EDM) for the analysis of data in the social and behavioral sciences to help researchers assess the predictive value of different combinations of variables in large data sets. Methodological findings and conceptual models that explain reliable EDM techniques for predicting and understanding various risk mechanisms are integrated throughout. Numerous examples illustrate the use of these techniques in practice. Contributors provide insight through hands-on experiences with their own use of EDM techniques in various settings. Readers are also introduced to the most popular EDM software programs. A related website at http://mephisto.unige.ch/pub/edm-book-supplement/ offers color versions of the book's figures, asupplemental paper to chapter 3, and R commands for some chapters.

The results of EDM analyses can be perilous – they are often taken as predictions with little regard for cross-validating the results. This carelessness can be catastrophic in terms of money lost or patients misdiagnosed. This book addresses these concerns and advocates for the development of checks and balances for EDM analyses. Both the promises and the perils of EDM are addressed.

Editors McArdle and Ritschard taught the "Exploratory Data Mining" Advanced Training Institute of the American Psychological Association (APA). All contributors are top researchers from the US and Europe. Organized into two parts--methodology and applications, the techniques covered include decision, regression, and SEM tree models, growth mixture modeling, and time based categorical sequential analysis. Some of the applications of EDM (and the corresponding data) explored include:

selection to college based on risky prior academic profiles

the decline of cognitive abilities in older persons

global perceptions of stress in adulthood

predicting mortality from demographics and cognitive abilities

risk factors during pregnancy and the impact on neonatal development

Intended as a reference for researchers, methodologists, and advanced students in the social and behavioral sciences including psychology, sociology, business, econometrics, and medicine, interested in learning to apply the latest exploratory data mining techniques. Prerequisites include a basic class in statistics.

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences an online PDF/ePUB?

Yes, you can access Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences by John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard in PDF and/or ePUB format, as well as other popular books in Psychologie & Forschung & Methodik in der Psychologie. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Routledge

Year

2013

ISBN

9781135044084

Edition

Topic

Psychologie

Subtopic

Forschung & Methodik in der Psychologie

Part I

Methodological Aspects

1	Exploratory Data Mining Using Decision Trees in the Behavioral Sciences
	John J. McArdle

Introduction

This first chapter starts off with a discussion of confirmatory versus exploratory analyses in behavioral research, and exploratory approaches are considered most useful. Decision Tree Analysis (DTA) is defined in historical and technical detail. Four real-life examples are presented to give a flavor of what is now possible with DTA: (1) Predicting Coronary Heart Disease from Age; (2) Some New Approaches to the Classification of Alzheimer’s Disease; (3) Exploring Predictors of College Academic Performances from High School; and (4) Exploring Patterns of Changes in Longitudinal WISC Data. In each case, current questions regarding DTA are raised. The discussion that follows considers the benefits and limitations of this exploratory approach, and the author concludes that confirmatory analyses should be always be done first, but this should at all times be followed by exploratory analyses.

The term “exploratory” is considered by many as less than an approach to data analysis and more a confession of guilt—a dishonest act has been performed with one’s data. This becomes obvious when we reflexively recoil at the thought of exploratory methods, or when immediate rejections occur when one proposes research exploration in a research grant application, or when one tries to publish new results found by exploration. We need to face up to the fact that we now have a clear preference for confirmatory and a priori testing of well-formulated research hypotheses in psychological research. One radical interpretation of this explicit preference is that we simply do not yet trust one another.

Unfortunately, as many researchers know, quite the opposite is actually the truth. That is, it can be said that exploratory analyses predominate in our actual research activities. To be more extreme, we can assert there is actually no such thing as a true confirmatory analysis of data, nor should there be. Either way, we can try to be clearer about this problem. We need better responses when well-meaning students and colleagues ask, “Is it OK to do procedure X?” I assume they are asking, “Is there a well-known probability basis for procedure X, and will I be able to publish it?” Fear of rejection is strong among many good researchers, and one side effect is that rejection leaves scientific creativity only to the bold. As I will imply several times here, the only real requirement for a useful data analysis is that we remain honest (see McArdle, 2010).

When I was searching around for materials on this topic I stumbled upon the informative work by Berk (2009) where he starts out by saying:

As I was writing my recent book on regression analysis (Berk, 2003), I was struck by how few alternatives to conventional regression there were. In the social sciences, for example, one either did casual modeling econometric style, or largely gave up quantitative work … The life sciences did not seem quite as driven by causal modeling, but causal modeling was a popular tool. As I argued at length in my book, causal modeling as commonly undertaken is a loser.

There also seemed to be a more general problem. Across a range of scientific disciplines there was often too little interest in statistical tools emphasizing induction and description. With the primary goal of getting the “right” model and its associated p-values, the older and more interesting tradition of exploratory data analysis had largely become an under-the-table activity: the approach was in fact commonly used, but rarely discussed in polite company. How could one be a real scientist, guided by “theory” and engaged in deductive model testing, while at the same time snooping around in the data to determine which models to test? In the battle for prestige, model testing had won.

At the same time, I became aware of some new developments in applied mathematics, computer sciences, and statistics making data exploration a virtue. And with this virtue came a variety of new ideas and concepts, coupled with the very latest in statistical computing. These new approaches, variously identified as “data mining,” “statistical learning,” “machine learning,” and other names, were being tried in a number of natural and biomedical sciences, and the initial experience looked promising.

As I started to read more deeply, however, I was stuck by how difficult it was to work across writings from such disparate disciplines. Even when the material was essentially the same, it was very difficult to tell if it was. Each discipline brought it own goals, concepts, naming conventions, and (maybe worst of all) notation to the table . Finally, there is the matter of tone. The past several decades have seen the development of a dizzying array of new statistical procedures, sometimes introduced with the hype of a big-budget movie. Advertising from major statistical software providers has typically made things worse. Although there have been genuine and useful advances, none of the techniques have ever lived up to their original billing. Widespread misuse has further increased the gap between promised performance and actual performance. In this book, the tone will be cautious, some might even say dark …

(p. xi)

The problems raised by Berk (2009) are pervasive and we need new ways to overcome them. In my own view, the traditional use of the simple independent groups t-test should have provided our first warning message that something was wrong about the standard “confirmatory” mantras. For example, we know it is fine to calculate the classic test of the mean difference between two groups and calculate the “probability of equality” or “significance of the mean difference” under the typical assumptions (i.e., random sampling of persons, random assignment to groups, equal variance within cells). But we also know it is not appropriate to achieve significance by: (a) using another variable when the first variable fails to please, (b) getting data on more people until the observed difference is significant, (c) using various transformations of the data until we achieve significance, (d) tossing out outliers until we achieve significance, (e) examining possible differences in the variance instead of the means when we do not get what we want, (f) accepting a significant difference in the opposite direction to that we originally thought. I assume all good researchers do these kinds of things all the time. In my view, the problem is not with us but with the way we are taught to revere the apparent objectivity of the t-test approach. It is bound to be even more complex when we use this t-test procedure over and over again in hopes of isolating multivariate relationships.

For similar reasons, the one-way analysis of variance (ANOVA) should have been our next warning sign about the overall statistical dilemma. When we have three or more groups and perform a one-way ANOVA we can consider the resulting F-ratio as an indicator of “any group difference.” In practice, we can calculate ...