Background
Within the social sciences the survey method has become one of the most widely used methods of collecting information about a large and varied range of topics (Marsh 1982; Bryman 1988; Goldthorpe 2007, Chap. 4). The methodological features that distinguish the survey method from other data collection methods are (a) standardized measurement that is consistent across all respondents and which ensures that comparable information is obtained about everyone who is described; and (b) universal coverage of the target population (i.e. a census survey) or the use of probability sampling to select a representative sample (a sample survey) (Fowler 2014, p. 3). While it is convenient at times to refer to the ‘survey method’ as if it was a single uniform method, over time the method has become quite heterogeneous in the ways that it has been used in practice (de Leeuw et al. 2008, p. 2). Today, surveys are conventionally seen to include national censuses of population, a wide variety of sample surveys conducted within states, e.g. social surveys, market research surveys and opinion polls and, in an ever-increasing number, a range of cross-national survey programmes. All of these various survey types share common operational features, but there are also significant differences between them.
Historically , censuses of population were the first form of social survey to be conducted (Hakim 1985). Although there are earlier examples, censuses in their modern form mostly date from the early 1800s (Baffour et al. 2013, p. 408). Their purpose was, and is, to provide governments with reliable and accurate information about the size and geographic distribution of their citizens, and other demographic and socioeconomic information. Census questionnaires began to include questions about the languages spoken by census respondents as early as the mid-nineteenth century (Lieberson 1966; Kominski 2012; Ó Gliasáin 1996). Currently, the inclusion of questions in official censuses about the language or languages of respondents is widespread, although by no means universal. Questions about languages have been part of the US decennial census for over 100 years (Stevens 1999), while all Soviet censuses included questions about ‘native’ and ‘other’ languages (Silver 1986) and most post-Soviet countries have continued this practice (Silver 2001). Christopher (2011, p. 536) in his review of the censuses of 71 countries that were or are part of the British Commonwealth found that 37 (52%) included one or more questions about language. The 37 countries to include a language question in their censuses include the UK and Ireland, India, South Africa, Canada and South Africa.
Sample surveys were not regularly used for data-collection purposes until the mid-twentieth century when statistical procedures and sampling methods had been developed to the point where it was possible to reliably estimate the social and demographic characteristics of a population by interviewing relatively small samples of respondents (Marsh 1982; Bulmer et al. 1991). For practical reasons, census questionnaires typically contain questions that are short and straightforward and cover only the basic structural characteristics of the population. By contrast, the interviewing techniques used in sample surveys to collect information allowed for longer questionnaires, with a greater number and range of questions. Therefore, since about 1940 (Bulmer et al. 1991, p. 42), the practice of state administrations and academic researchers has been increasingly to supplement census information by using sample surveys to collect more complex information about a larger range of social, economic and cultural characteristics. Once the methods of sample surveys became an established feature of national systems of data-collection in the second half of the twentieth century, they were also used to measure and describe the languages spoken and/or used by respondents (For general overviews see Cooper 1980; de Vries 2006; Baker 2007).
Most of the surveys that included questions about language, however, were and are primarily designed to explore other economic, social and political topics. Language issues are often of marginal interest to those who commission and/or conduct the surveys. It is quite common to find social surveys that, like censuses of population, include only one or two language-related questions. Nonetheless, in the later part of the twentieth century (i.e. after 1960) it is also important to note the emergence of large scale surveys which included longer modules of questions, and sometimes entire questionnaires, about the languages spoken and used by respondents and their attitudes to these languages and related government policies. One of the first such language surveys in the developed world was conducted in Canada in 1965 for the Royal Commission on Bilingualism and Biculturalism (Pool 1973). Shortly afterwards, in 1973, a major sociolinguistic survey was undertaken in Ireland (Ó Riagáin 1997). Examples of more recent language surveys include those conducted in the Basque Country (Eusko Jaurlaritza 1991), Friesland (Gorter and Jonkman 1995), Wales (Williams and Morris 2000) and Scotland (Paterson et al. 2014). It reasonable to describe such surveys as ‘sociolinguistic’ surveys (Cooper 1980), while other surveys that offer a limited coverage of sociolinguistic items are best described as sociological or political surveys depending on their primary objectives.
At a somewhat later stage in the historical development of survey research, large multi-national, multi-cultural survey programmes began to appear (Lagos 2008; Smith 2010). Two important programmes of comparative international research, oriented toward replication , began during the last three decades of the twentieth century with the establishment of the Eurobarometer (EB) and the International Social Survey Programme (ISSP) . (For operational details of these two programmes see Signorelli 2012 and Skjak 2010 respectively). Although they were followed by others, e.g. the European Social Survey and the European Values Survey, the EB and ISSP surveys are the longest-running governmental and academic international survey programmes. These international survey programmes use reliable sampling procedures and the size of the sample in each participating country is usually comparable in size and quality to national surveys undertaken within the same territories. While the objective of the two survey programmes is to measure and compare social, political and economic patterns and trends between countries, they have also included language related questions, albeit only to a limited extent and at irregular intervals.
Taken together, the combined output from these three types of surveys has created an enormous bank of language related data which has an impressive geographical range and which also, in many instances, demonstrates considerable historical depth. The empirical results of census and sample surveys frequently form the basis for analysis, arguments and proposals in governmental and academic publications. Many state policy areas have a linguistic dimension (see Chiswick 2008; Chiswick and Miller 2007) and, depending on the circumstances of individual countries and regions, these may include educational programmes, the provision of social and other public services such as television and radio, and labour market policies. In all policy areas where language concerns are at issue, language related survey and census data have become primary inputs within the process of policy formulation, implementation or evaluation. In academic research, language related census and survey data are drawn upon largely and most typically in those research areas that Fishman has termed ‘macro-sociolinguistic’ (Fishman 1985). These include studies of societal bilingualism and multilingualism, language contact and language spread, educational linguistics, language maintenance and shift, language attitudes and language planning generally.
Furthermore, censuses and surveys not only provide the most widely used language related statistics of government operations, but they are also the statistics most widely known by the general public (Alonso and Starr 1987). To paraphrase Starr (1987, p. 530), society ‘thinks collectively’ about social phenomena ‘in the way that statistical agencies have settled upon’. Census categories thus ‘constitute or divide’ groups and thereby ‘illuminate or obscure their problems and achievements’ (see also Kertzer and Arel 2002). There is, therefore, a political economy of statistics, quantification and categorization, and these issues have emerged in recent decades as an important sub-discipline within the social sciences (Desrosières 1998; Diaz-Bone an...