PART I
Making the Case for Data Information Literacy
CHAPTER 1
DETERMINING DATA INFORMATION LITERACY NEEDS
A Study of Students and Research Faculty
Jake Carlson, University of Michigan
Michael Fosmire, Purdue University
C. C. Miller, Purdue University
Megan Sapp Nelson, Purdue University
INTRODUCTION
The nature and practice of research and scholarship is undergoing dramatic change with the advent of ready access to high-bandwidth networks, the capacity to store massive amounts of data, and a robust and growing suite of advanced informational and computational data analysis and visualization tools. The practice of technology-driven research, known as e-science, or more broadly as e-research, has had a transformative effect in the science and engineering fields. E-research applications are growing within the humanities and social science disciplines as well, where e-research is poised to have similar effects on the nature and practice of research.
The complexity and scale of e-research in turn requires an evolution of traditional models of scholarly communication, library services, and the role of librarians themselves. In response, librarians are initiating discussions and projects to situate themselves in those areas of e-research most in need of library science expertise (Jones, Lougee, Rambo, & Celeste, 2008). In light of the federal expectation that grant proposals have a data management plan (DMP; NSF, 2011), libraries are starting conversations in their universities to negotiate a role in the management of research outputs.
Data management skills also provide the opportunity for an evolution of instruction in libraries. Academic libraries offer information literacy courses and programs as part of the educational mission of the institution. Extending information literacy to include programs on data management and curation provides a logical entry point into increasing the role of libraries in supporting e-research. A successful education program, however, must be based on a firm understanding of current practice and standards as well as the needs of the target audience. There is a lack of research on the needs of both the researchers and the students grappling with these issues in the classroom and in the laboratory. The authors attempted to address this knowledge gap by gathering data from interviews with faculty researchers and from the authorsâ own Geoinformatics course. With this information, the authors proposed a model set of outcomes for data information literacy (DIL).
BACKGROUND
E-Research and Implications for Libraries
E-research has had a tremendous impact on a number of fields, increasing the capabilities of researchers to ask new questions and reduce the barriers of time and geography to form new collaborations. In astronomy for example, the National Virtual Observatory (NVO) makes it possible for anyone from professional astronomers to the general public to find, retrieve, and analyze vast quantities of data collected from telescopes all over the world (Gray, Szalay, Thakar, Stoughton, & vandenBerg, 2002; National Virtual Observatory, 2010). For scholars of literature, the HathiTrust Digital Library not only provides a tremendous collection of scanned and digitized texts, but also its Research Center provides tools and computational access to scholars seeking to apply data mining, visualization, and other techniques toward the discovery of new patterns and insights (HathiTrust Research Center, n.d.). It should be no surprise, of course, that such projects simultaneously produce and feed upon large amounts of data. The capture, dissemination, stewardship, and preservation of digital data are critical issues in the development and sustainability of e-research.
Funding organizations and professional societies identified a need for educational initiatives to support a workforce capable of e-research initiatives. The National Science Foundation (NSF) first described the connection between e-research and education. The 2003 Atkins Report highlighted the need for coordinated, large-scale investments in several areas, including developing skilled personnel and facilities to provide operational support and services (Atkins et al., 2003). In 2005 the National Science Board produced a report that articulated existing and needed roles and responsibilities required for stewarding data collections, followed by a series of recommendations for technical, financial, and policy strategies to guide the continued development and use of data collections (National Science Board, 2005). The American Council of Learned Societies issued a report in 2006 calling for similar attention and investments in developing infrastructure and services for e-research in the humanities fields (Welshons, 2006). More recently, the National Academy of Sciences issued a report advocating the stewardship of research data in ways that ensured research integrity and data accessibility. The recommendations issued in the report included the creation of systems for the documentation and peer review of data, data management training for all researchers, and the development of standards and policies regarding the dissemination and management of data (National Research Council, 2009).
While the rich, collaborative, and challenging paradigm of e-research promises to produce important, even priceless, cultural and scientific data, librarians are determining their role in the curation, preservation, and dissemination of these assets. In examining how e-research may affect libraries, Hey and Hey argued that e-research âis intended to empower scientists to do their research in faster, better and different ways,â (Hey & Hey, 2006, para. 10). They particularly emphasized that information and social technologies made e-research a more communal and participatory exercise, one that will see scientists, information technology (IT) staff, and librarians working more closely together. A particular challenge looming with the rise of e-research is the âdata delugeââthat is, the need to store, describe, organize, track, preserve, and interoperate data generated by a multitude of researchers to make the data accessible and usable by others for the long term. The sheer quantity of data being generated and our current lack of tools, infrastructure, standardized processes, shared workflows, and personnel who are skilled in managing and curating these data pose a real threat to the continued development of e-research.
Gold (2007) provided an outline of the issues and opportunities for librarians in e-science. Starting from the familiar ground of GIS (geographic information systems), bioinformatics, and social science data, Gold argued that librarians working in e-science will develop relationshipsâboth upstream and downstream of data generationâand the effort may be âboth revitalizing and transformative for librarianshipâ (Sec. 2.2, para. 6). Similarly, the Agenda for Developing E-Science in Research Libraries outlined five main outcomes that focused on capacity building and service development in libraries for supporting e-science (Lougee et al., 2007). Walters (2009) further asserted that libraries taking âentrepreneurial stepsâ toward becoming data curation centers are on the right track, reasoning that âa profound role for the university research library in research data curation is possible. If the role is not developed, then a significant opportunity and responsibility to care for unique research information is being lostâ (p. 85). In other words, the academic library community seems reasonably sure that supporting e-research is not so novel that it falls outside of the mission and founding principles under which libraries operate.
Educational Preparation for E-Research
Ogburn (2010) predicted that e-science will quite certainly fail if future generations of scholars are not savvy with both the consumption and production of data and tools. âTo prepare the next generation of scholars the knowledge and skills for managing data should become part of an education process that includes opportunities for students to contribute to the creation and the preservation of research in their fieldsâ (p. 244). It is not enough to teach students about handling incoming data, they must also know, and practice, how to develop and manage their own data with an eye toward the next scientist down the line. The Association of Research Libraries reported to the NSF in 2006 that because
many scientists continue to use traditional approaches to data, i.e., developing custom datasets for their own use with little attention to long-term reuse, dissemination, and curation, a change of behavior is in order. ⌠[This change] will require a range of efforts, including ⌠perhaps most important of all, concerted efforts to educate current and future scientists to adopt better practices. (Friedlander & Adler, 2006, p. 122)
The inspiration for the authorsâ own work on instructional components to e-science comes from the NSFâs Cyberinfrastructure Vision for 21st Century Discovery, in which the dramatic rhetoric of revolution and recreation does indeed trickle down to education:
Curricula must also be reinvented to exploit emerging cyberinfrastructure capabilities. The full engagement of students is vitally important since they are in a special position to inspire future students with the excitement and understanding of cyberinfrastructure-enabled scientific inquiry and learning. Ongoing attention must be paid to the education of the professionals who will support, deploy, develop, and design current and emerging cyberinfrastructure. (National Science Foundation Cyberinfrastructure Council, 2007, p. 38)
Although many articulated the need for educating a workforce that understands the importance of managing and curating data in ways that support broad dissemination, use by others, and preservation beyond the life of its original research project, there has been very little examination of what such a program would contain. We believe that librarians have a role in developing these education programs and will need to actively engage in these discussions.
Gabridge (2009) notes that institutions experience
a constantly revolving community of students who arrive with ⌠uneven skills in data management. ⌠Librarian subject liaisons already teach students how to be self-sufficient, independent information consumers. This role can be easily extended to include instruction on data management and planning. (p. 17)
With the respectful elision of âeasily,â we argue in the remainder of this chapter that there are indeed gaps in the knowledge of current eresearching faculty and students (both as producers and consumers of data) that librarians may address by developing DIL curricula.
Environmental Scan of Related Literacies
For the sake of clarity, it is important to distinguish DIL from other literacies such as data literacy, statistical literacy, and information literacy. Typically, data literacy involves understanding what data mean, including how to read graphs and charts appropriately, draw correct conclusions from data, and recognize when data are being used in misleading or inappropriate ways (Hunt, 2004). Statistical literacy is âthe ability to read and interpret summary statistics in the everyday media: in graphs, tables, statements, surveys and studies,â (Schield, 2010, p. 135). Schield finds common ground in data, statistical, and information literacy, stating that information literate students must be able to âthink critically about concepts, claims, and arguments: to read, interpret and evaluate information.â Furthermore, statistically literate students must be able to âthink critically about basic descriptive statistics, analyzing...