Part 1
The Analysis of Data
Introduction to Part 1
The first part of this book contains a discussion of standard methods in statistical data analysis: hypothesis tests, regressions, cluster and factor analysis and time series analysis. They have been chosen for their importance in the field of tourism studies, even though they are scarcely treated in general tourism textbooks.
We have avoided highly sophisticated methods that, usually, can only be applied well in special circumstances, but we have included some extensions to the standard techniques. These, although well diffused in other disciplines (e.g. non-linear analysis techniques for time series), have not had wide use in tourism studies. Their effectiveness has been demonstrated many times in other fields and we think they will prove useful in this area too.
The content of this part is organised as follows.
The Nature of Data in Tourism
Data are the main ingredient of all the methods discussed in this book and are examined from a general perspective. The various types are described and examined. The quality of data is then discussed and practical suggestions for assessing and evaluating the suitability of data in relation to the objective of an investigation are given. Finally, a list of electronic sources of tourism data is provided.
Testing Hypotheses and Comparing Samples
This chapter contains a review of the main concepts and techniques connected with statistical hypotheses testing. Issues regarding the power of tests and the effects of sample size are discussed. Also, bootstrap and meta-analysis as methods to improve the reliability of the outcomes are presented. A summary of the most commonly used statistical tests is included. The chapter closes with a description of different methods to assess similarity (or diversity) within and between samples.
Data Reduction
An analysis of multivariate data is presented here. Factor analysis and cluster analysis as well as multidimensional scaling techniques are also described and discussed along with the main issue, advantages, disadvantages and applicability.
Model Building
The chapter discusses regression models and structural equation modelling. Focusing on the tourism field, the chapter highlights the issues related to computational techniques and the reliability of the results in different conditions.
Time-Dependent Phenomena and Forecasting
This chapter contains a quick overview of time series analysis methods and their use for forecasting purposes. In addition, different uses of time series are discussed, such as simple non-linear analysis techniques to provide different ways of studying the basic characteristics of the structure and the behaviour of a tourism system.
1The Nature of Data in Tourism
This chapter contains a brief review of the nature of data as used in tourism and hospitality, and discusses the main quality characteristics needed to obtain useful and reliable outcomes from data analysis. A list of the main sources of tourism data is provided.
The protagonist in the adventures described in this book is the datum, better known in its plural form, data. The original Latin meaning, something given (and accepted as true), defines it well. It is (usually) a number, the result of some observation or measurement process, objectively1 representing concepts or other entities, put in a form suitable for communication, interpretation or processing by humans or automated systems. By themselves, and out of a specified context, data have no meaning at all; they are merely strings of symbols. Once organised or processed in some way, and associated with some other concepts or entities, they become useful information, assuming relevance and purpose, providing insights into phenomena, allowing judgements to be made and decisions to be taken (if interested in a discussion of these concepts, the review by Zins [2007] is a good starting point). All statistical techniques have exactly this objective.
Many disciplines, and tourism is no exception, require large quantities of data. The main challenge a researcher has today is that of managing a huge quantity, variety and complexity of data types, and of being sure to obtain useful and valid outcomes.
Data: A Taxonomy
It is possible to categorise data in several ways. One distinction is between primary and secondary data. Another classifies data by their level of measurement or measurement scale. Yet another is the medium or form from which the data are derived. We provide a brief overview of the key issues associated with data of each type here.
The distinction between primary and secondary data is made on the basis of the source of the data and their specificity to the study for which they are gathered. Each type of source has strengths and weaknesses, the focus of our discussion here.
Primary data
Primary data are those directly collected from the original or ‘primary’ source by researchers through methods such as direct observation (both human observation and automatic collection of data such as clicks on links in websites or through use of other information and communications technology), questionnaire surveys (online, printed or administered by telephone or computer), structured or unstructured interviews2 and case studies. To be classified as primary data, the data elements collected using any one of these techniques will be unique and tailored to the specific purposes of the study conducted. The most used techniques and their strengths and limitations are well described in many books (Babbie, 2010; Creswell, 2003; Hair et al., 2005; Neuman, 2006; Phillimore & Goodson, 2004; Veal, 2006; Yin, 1994). Here, we concentrate on recent developments and issues of particular relevance to tourism research.
The main disadvantages are well known: cost and time. Collecting tailored information tends to be expensive in terms of resources needed (money and people) and it may take a long time to properly design the research and process the results. Recently, use of the internet and the world wide web has reduced the cost and time requirements for conducting surveys. However, unless used carefully, the use of online surveys can hide problems related to the representativeness of the sample and the technical characteristics of the medium used and individual differences among respondents can bias results. Of course, these concerns are not unique to electronic media, but can be exacerbated by the seductive ease and speed of online data collection. Indeed, many survey experts consider internet surveying (provided the sample is representative) to provide valid, reliable and relatively error-free results, among other reasons because data are captured directly from the respondent without the need for an interviewer or assistant to enter the data separately into a database for analysis (Dillman, 2007).
Regardless of the method used to capture primary data, the researcher should consider and understand well all issues associated with sampling (representativeness and sample size) and obtaining data of suitable quality. From a practical point of view, it is advisable to start any study by surveying a pilot sample and studying the responses obtained. Participants in the pilot study can be asked to identify any questions that they found difficult to understand or to answer and, using a technique known as cognitive interviewing, they can also be asked how they interpreted specific questions. The data collected from a pilot study can be used to estimate population parameters for the statistical models that will be used to draw conclusions from the final survey, information that can be ...