Part 1
Numbers, Data and Analysis | 1 |
Contents
Collecting data
Analysis
Induction and deduction
Variables
Variables and cases
Variable-centred analysis
The data sets
The GHS data set
The WDI data set
Concepts and indicators
Kinds of data
Individual and aggregate data
Continuous and discrete
Levels of measurement
Overview of book
Summary
Exercises
This chapter introduces the basic terminology of statistical data analysis. After reading it, you should understand:
- the stages of analysis
- what variables and cases are
- the idea of concepts and their indicators
- the four levels of measurement
and have become familiar with the two data sets used for examples throughout the book: the World Development Indicators (WDI) and the General Household Survey (GHS) data sets.
‘More women go out to work … Then do chores too’ said a Daily Mirror headline.1 To back up the point, it quoted a report from the Office of National Statistics which showed that 72 per cent of women have jobs (up from 57 per cent in 1971), and that women with full-time jobs spend about two hours a day doing housework, compared with an average of 32 minutes for men. Nevertheless, 62 per cent of couples think that household tasks should be shared equally.
Being at ease with figures and charts is an essential part of becoming a good social scientist. This book is about how to calculate and interpret percentages and other similar statistics. In this introductory chapter, we shall consider where quantitative data come from, how they differ from qualitative data and how quantitative data can be organized and compared.
Throughout the book, we shall be using examples from two social surveys. The GHS is run by the Office of National Statistics (part of the UK Government) and questions a large sample of British households every year. The WDI data set is collected by the World Bank and consists of data on 208 countries of the world. In this chapter, we shall also briefly introduce these two data sets.
Collecting data
Most social science data are collected by interviewing people. Sociologists either ask a set of pre-set questions (a structured interview) or conduct an interview which is more like a conversation, often recording the answers on a tape recorder for later transcription and analysis. In the former case, every respondent is asked essentially the same set of questions and the answers can be quantified relatively easily. A large number of interviews can be carried out quickly. In the latter case, the interviews often last much longer and are much harder to compare with one another. Nevertheless, the insights these qualitative interviews give can be much deeper than those offered by standardized quantitative interviews.
There are other ways of gathering data. You have probably filled out questionnaires which arrived through the post, where you tick boxes according to how you want to answer and post back the form (a ‘mail survey’). This type of data is also easy to quantify. Other social scientists gather data through observation of social settings, or through examining administrative records or documents. In each of these cases, there is a choice about whether to quantify the data – that is, express it in terms of numbers – or leave it as qualitative data, using terms such as ‘larger’, ‘more frequent’ and so on, without putting a numerical value on them. Often, a sociologist will want to express some data quantitatively and some qualitatively, because this is best for the topic at hand. Neither quantitative nor qualitative data are intrinsically better than the other. It all depends on what you are trying to achieve. And often a mixture is better than either alone.
Not all sociologists find that they need to collect data themselves. Often it is quicker, cheaper and better to analyse data that have already been collected. For example, if you wanted to find out about the proportion of women who are now working full-time, it would be a waste of time and money to undertake a survey yourself. The job has already been done by the GHS (and by other large government surveys) at considerable expense and with great attention to the accuracy of the data collected. Fortunately, the data can be obtained at little or no cost from a Data Archive, a type of library which holds data from previous surveys.
Analysing data that was collected for some other reason or by some other organization is called secondary analysis, as contrasted with the primary analysis which you carry out on data you collect yourself. Secondary analysis as a form of research is increasingly popular as more and more high-quality data sets covering a very wide range of topics become available.
This book is only about the analysis of quantitative data. Analysing qualitative data requires somewhat different techniques and tools and is therefore left to other texts (e.g. Atkinson et al, 2001; May, 2002; Silverman, 2004). It also focuses specifically on the analysis of the data, saying little about how to collect it. Again, other texts will help you with data collection (e.g. de Vaus, 2002; Robson, 2002). Guidance on the overall process of research, of which statistical analysis is but one part, can be found in Gilbert (2001).
Analysis
The numbers which one collects from a survey tell you very little by themselves. Sociologists are much more interested in patterns and regularities: the features which are common to groups of people in different contexts and situations. To find these patterns, you need to engage in analysis. For example, the government survey which the Daily Mirror quoted from and which we summarized at the beginning of this chapter asked several thousand people throughout Britain about how much time they spent doing domestic tasks. Their actual answers, taken one by one, are not of much interest to a sociologist. Put the answers from all the men and all the women together and it becomes plain that women say that they still do many more of the household chores than men.
Analysis consists partly of constructing generalizations: for example, the generalization that women do more household work than men. Another important element of analysis is explanation. As sociologists, we not only want to know about the social world, but also about why it is like that. So, for example, we might come to the data believing that UK society remains patriarchal, that is, with men still dominating women. We might therefore not be surprised to find that women are still doing the majority of the housework. Alternatively, we might think that as the proportion of women in employment increases, the distribution of housework in dual-earner couples would tend to equalize and therefore be surprised that it is still so unequal. In either case, we would be approaching the data with a prior theory and then testing that theory against generalizations derived from the data. The data either support or cast doubt on the theory.
Induction and deduction
Where does such theory come from? There are two sources: theory can be generated from comparing lots of examples and finding where they have features in common, a process called induction. Or theory can come from deriving consequences from a more wide-ranging, ‘grander’ theory, a process called deduction.
For example, suppose that you noticed that among couples you knew, the women were doing a lot more of the cooking and cleaning and men much more of the repairs and decorating even though the men said that they were in favour of equality in doing domestic chores. You might wonder about this disparity between what your friends say and what they actually do. In a small way you are building a theory by induction. You have made an observation and generalized it. The next step would be to se...