Introduction: Situating Data ScienceāExploring How Relationships to Data Shape Learning
The emerging field of Data Science has had a large impact on science and society. This has led to over a decade of calls to establish a corresponding field of Data Science Education. There is still a need, however, to more deeply conceptualize what a field of Data Science Education might entail in terms of scope, responsibility, and execution. This special issue explores how one distinguishing feature of Data Scienceāits focus on data collected from social and environmental contexts within which learners often find themselves deeply embeddedāsuggests serious implications for learning and education. The learning sciences is uniquely positioned to investigate how such contextual embeddings impact learnersā engagement with data including conceptual, experiential, communal, racialized, spatial, and political dimensions. This special issue demonstrates the richly layered relationships learners build with data and reveals them to be not merely utilitarian mechanisms for learning about data, but a critical part of navigating data as social text and understanding Data Science as a discipline. Together, the contributions offer a vision of how the learning sciences can contribute to a more expansive, agentive and socially aware Data Science Education.
The emerging field of Data Science has had a large impact on science and society. This has led to over a decade of calls to establish a corresponding field of Data Science Education (Berman et al., 2018; Cleveland, 2001; Finzer, 2013). Data Science, the argument goes, prepares students for high-paying jobs, fuels scientific advancement, and provides communities with new tools for expression and empowerment. There is still a need, however, to more deeply conceptualize what a field of Data Science Education might entailāin terms of scope, responsibility, and execution. In particular, it is important to understand what makes learning Data Science sufficiently different from mathematics, computer science, or statistics that it requires new approaches to research and instructional design, and to explore the theoretical and practical implications of these differences for constructing an ethical and effective Data Science Education.
This special issue explores how one distinguishing feature of Data Science, the relational nature of data involved, suggests serious implications for learning and education. In contrast to data that are constructed to answer a particular question, Data Science is concerned with data collected in an incidental or automated manner from extensive social and environmental contexts (Donoho, 2017)ācontexts within which learners often find themselves deeply embedded. The learning sciences is uniquely positioned to investigate how such contextual embeddings impact learnersā engagement with data including along conceptual, experiential, social, racialized, spatial, and political dimensions (Philip, Olivares-Pasillas, & Rocha, 2016; Rubel, Lim, Hall-Wieckert, & Sullivan, 2016). The authors in this issue explore how learnersā situatedness relative to data, to the contemporary and emerging field of data science as well as other disciplinary domains, and to the social histories interwoven with data necessitate new lines of research, new theoretical and methodological development, and new approaches to educational design and practice.
CURRENT CONVERSATIONS IN DATA SCIENCE AND DATA SCIENCE EDUCATION
Colloquially, the term data science refers to the use of computational tools and methods to collect, process, analyze, store, and visualize large quantities of data. It is broadly associated with the emergence of a growing number of data visualizations, open data repositories, and infographics intended for public consumption (McGhee, 2010), as well as shifts in how professional disciplines, from the sciences to the arts, make use of data and computing (Hey, Tansley, & Tolle, 2009). Data Science as a formal field of study, however, is still not well defined; the diversity of perspectives regarding its identity has led some to avoid defining it altogether (Cassel & Topi, 2015). Instead of attempting to create our own definition, we highlight some distinguishing characteristics that emerge as points of agreement in discussions of Data Science and Data Science Education.
One of the most commonly cited characteristics of Data Science is that it is concerned with a new class of data that is not only big in the traditional sense of scale, but ā ⦠pervasive, tacit, and often collected without a specific [or explicit] intentā (Cassel & Topi, 2015, p. 10). Social network and clickstream data are recorded from popular websites such as Facebook. Large-scale civic data about legislation, policy opinions, and municipal demographics are captured within historical census and voting data; weather, climate, and air or water quality benchmark data are collected multiple times per day from satellites, tide gauges, and other automated devices. Such datasets are encompassing in scope, and capture details about ourselves, our behavior, and our place in a shared world.
Such data can then be repurposed, or exploited, by students and practitioners alike for a variety of purposes (Donoho, 2017; Wilkerson & Laina, 2018). Individual preferences can be inferred from website interaction data to target advertisements, and predictive climate models can be constructed from environmental data to aid in policy development and city planning. This process requires not only computational and statistical knowledge, but also a deep knowledge of both the target domain and the original context in which the data were collected. As a result, Data Science is intensely interdisciplinary. New tools and statistical techniques are constantly in development that incorporate and innovate on methods from multiple disciplines to visualize, store, organize, process, and interpret data.
These shifts in how, and why, data are constructed and used have led to the development of new programs, primarily in higher education, that focus on helping students learn to āthink with dataā (Baumer, 2015; Hardin et al., 2015) and learn computational methods for working with and communicating about data (Nolan & Temple Lang, 2010). Others focus on data science competencies as part of the civic and information literacies needed to navigate a data-rich world (e.g., Bergstrom & West, 2017, ācalling bullshitā course and initiative; Grawe, 2011). Recommendations across these efforts (e.g., De Veaux et al., 2017) highlight two core pedagogical commitments. The first is that educators must not only attend to technical skill (e.g., programming languages such as R or Python, statistical methods and machine learning) but also flexibility to learn and develop novel tools and methods for working with data. The second is that Data Science must be grounded in consequential investigations in which learners pose questions, obtain data, and communicate findings within meaningful disciplinary contexts. Some go so far as to suggest that Data Science courses should always be offered concurrent to, and even embedded within, content courses in relevant disciplines1 While these pedagogical commitments are important contributions, they often do not address the challenges and possibilities that repurposable, exploitable, and contextually embedded data present for learners.
THE NEED FOR LEARNING SCIENCES PERSPECTIVES ON DATA SCIENCE EDUCATION
At the same time, emerging literature from the learning sciences highlights a number of ways in which skills- and technology-driven approaches to Data Science Education often fall short. One early such challenge was put forth by Philip, Schuler-Brown, and Way (2013), who surfaced tensions between the purported goals of Data Science Education efforts to support equity, power, and democratic participation on one hand and the status of Data Science as a discourse of power developed to advance economic and national interests on the other. These authors suggested that developing student proficiency with tools and techniques for working with data should be only one strand of a much broader Data Science Education project. Another goal must be to develop studentsā identities as agentive data practitioners who recognize the historical and political dimensions of data as social texts, and of Data Science as a disciplinary discourse.
The role of data, the role of data as historical and political has become especially apparent in recent efforts to introduce Data Science Education at pre-collegiate levels (e.g., Gould, Machado, Ong, Johnson, & Molyneux, 2016). A major reason for this is that whereas undergraduate programs situate data science within studentsā disciplinarily relevant areas of study (e.g., ecology, genetics, demography), precollegiate efforts have sought to do the same through the use of socially relevant datasets (e.g., popular movies or music, studentsā local communities). In one case, students resisted units that leveraged spatial data about the lottery and alternative banking institutions to teach statisticsāa curricular approach in which the researchers reflectively noted pathologized communities of color (Rubel, Hall-Wieckert, & Lim, 2016). In a study to explore how new features of the Scratch programming language could introduce young learners to Data Science, children warned that allowing code that takes othersā user statistics as input could create exclusionary programs that only āpopularā programmers could access (Hautea, Dasgupta, & Hill, 2017). In other cases, high school studentsā reasoning about racial and economic dimensions of spatial datasets was dismissed (Philip et al., 2016); or not well motivated given studentsā familiarity with the neighborhoods of study (Enyedy & Mukhopadhyay, 2007). All of these instances demonstrate how learners find themselves embedded within, and impacted by, the environments, histories, and social narratives woven into the datasets they investigate.
That context impacts learning is no surprise to learning scientists, who have well-developed theoretical and methodological tools for understanding how experiences, tools and materials, environments, communities, and social positioning impact learning (Brown, Collins, & Duguid, 1989; Lave & Wenger, 1991). It is for these reasons we dedicate this issue to situated perspectives toward learning Data Science. Some progress has already been made, for example, in exploring the role of embodiment and mobility in learnersā reasoning with and about data, leading to the development of successful interventions that engage learners deeply with data analysis cycles about self and movement relative to history and health in formal and...