Data and its technologies now play a large and growing role in humanities research and teaching. This book addresses the needs of humanities scholars who seek deeper expertise in the area of data modeling and representation. The authors, all experts in digital humanities, offer a clear explanation of key technical principles, a grounded discussion of case studies, and an exploration of important theoretical concerns. The book opens with an orientation, giving the reader a history of data modeling in the humanities and a grounding in the technical concepts necessary to understand and engage with the second part of the book. The second part of the book is a wide-ranging exploration of topics central for a deeper understanding of data modeling in digital humanities. Chapters cover data modeling standards and the role they play in shaping digital humanities practice, traditional forms of modeling in the humanities and how they have been transformed by digital approaches, ontologies which seek to anchor meaning in digital humanities resources, and how data models inhabit the other analytical tools used in digital humanities research. It concludes with a glossary chapter that explains specific terms and concepts for data modeling in the digital humanities context. This book is a unique and invaluable resource for teaching and practising data modeling in a digital humanities context.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access The Shape of Data in Digital Humanities by Julia Flanders, Fotis Jannidis, Julia Flanders,Fotis Jannidis in PDF and/or ePUB format, as well as other popular books in Languages & Linguistics & Data Modelling & Design. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Routledge

Year

2018

eBook ISBN

9781317016144

Edition

Topic

Languages & Linguistics

Subtopic

Data Modelling & Design

Index

Languages & Linguistics

Part I
Orientation

1 Data modeling in a digital humanities context

An introduction

Julia Flanders and Fotis Jannidis

1 Modeling in the humanities

Despite persistent ambivalence about the concept of “data” in humanities research,¹ there is a long and rich tradition of gathering and modeling information as part of humanities research practice. From the perspective of the digital humanities, that tradition now appears in retrospect like important prehistory for an understanding of data modeling. And that prehistory is significant not only because it shows how integral such activities have been to humanities research, but also because it reminds us of the distinctive complexities and challenges that humanities data poses. While the terms “data” and “modeling” may be new, many of the activities and intellectual frameworks they entail are familiar and deep-rooted. In a general sense, we understand intuitively that specific theoretical approaches rely on concepts and terms that divide the universe of ideas in specific ways. For instance, literary periodization constitutes a model of history in which spans of time are associated with distinct stylistic patterns and, indirectly, with cultural, economic, and historical phenomena that are presumed to influence those patterns and their evolution. The literary-historical approach is in itself a kind of general model, within whose terms more specific models could be framed and debated (for instance, concerning whether and how one might distinguish the medieval and Renaissance periods, and where the boundary falls in different national traditions). And we might reject the literary-historical way of modeling culture altogether, in favor of a model that disregards periodization, or that is uninterested in historical change, or that denies the existence of “literature” as a category. Debates about method are ultimately debates about our models.

In a more specific sense, our models represent the shaping choices we make in representing and analyzing the materials we study. As Michael Sperberg-McQueen put it in his keynote to the 2012 workshop on Knowledge Organization and Data Modeling, “modeling is a way to make explicit our assumptions about the nature of a text/artefact,” and this statement is importantly agnostic with respect to medium. Although the digital medium has brought these choices and representational systems into heightened visibility, they have been at the heart of scholarship since the beginning. A classic example is the critical apparatus in a scholarly edition, a form of knowledge management that might be said to originate with humanism itself. As pure content, the critical apparatus is simply an account of the variations among the witnesses to a particular text, which could be communicated through a footnote or a prose essay. As information, however, the critical apparatus has taken its current structured shape through two closely related processes. The first of these is the formalization of the information it contains, placing it under regulation so that all of the components are verifiably present: the lemma or base reading, the variant readings and their sources, and so forth. The second, and closely related, is the development of standard notation systems that enable that formalized information to be processed efficiently and consistently. The use of punctuation, standardized abbreviations, and other notational conventions to group, delimit, and document each variant makes it possible for a reader to process this information quickly and systematically, and to perceive patterns—in effect, to do with the human mind what we now consider the hallmark outcome of good data modeling in digital systems. Digital scholarly editions emerged so early in the history of humanities computing in part because they were able to build on a clear existing model deriving from a long-standing tradition of practice.

The more recent history of data modeling builds on this trajectory. It draws on the insights generated by informal models (such as the difference between types of variant readings), which offer a descriptive language and an underlying set of ideas, but not at a level of precision that would support the creation of a formal model. It realizes the informational potential represented by existing formalizable models, such as the critical apparatus, or the structure of a dictionary entry, or the organization of a financial ledger, which possess all of the qualities requisite for formalization: a clearly defined set of informational items with clearly defined relationships. Research on data modeling has sought to express this information in ways that support computational reasoning, as a formal model: one that rests on a logical or mathematical basis, whose definitions are expressed using some formal constraint notation (such as a schema), such that the information being modeled can be processed and analyzed with reference to the model.

This chapter will explore the significance of this shift for our research methods, for the tools of scholarship, and for our understanding of the relationship between our models and our theories of the world. We will first consider in more detail what the digital turn has done for modeling approaches in the humanities and digital humanities. Then we will discuss the kinds of intellectual traction formal modeling can provide for researchers—a point that is picked up more fully in the next chapter, and in Michael Sperberg-McQueen’s contribution to this volume— and the complex relationship between those formal models and the tools through which we express and work with them. Next we will consider the relationship between our models and the intellectual scenarios they seek to represent, the relationship between models and the tools we use to manipulate and process humanities data, the tension between models and data, and the forms of critical engagement we must exercise in using digital models in a humanities context. We’ll conclude this chapter with some proposals for a research and pedagogical agenda in the domain of data modeling in digital humanities.

1.1 The digital turn: modeling in digital humanities

It is often assumed that the affordances of the digital medium have brought into being new ways of thinking about data and new kinds of questions that were not previously thinkable. But in fact historical examples reveal a long tradition of attempts to analyze and represent data, often representing great ingenuity in the face of limitations in the medium of print. A regularly cited example is the attempt by Teena Rochfort Smith in 1883 to present a four-column edition of Hamlet, in which the Folio and the first and second Quartos are shown in parallel, together with a conflated text, with complex typography through which the reader can apprehend the specific passages that differ between versions.² Isabel Meirelles’s Design for Information (2013) offers numerous samples of complex visualizations representing analysis by hand of mortality data, historical imports and exports, agricultural production, and attendance at the Paris Universal Exhibition. The members of the New Shakspeare (sic) Society in the 1870s developed notation systems for displaying metrical patterns in poetry, aimed at supporting a large-scale analysis of prosody to assist in dating Shakespeare’s plays. And concordances were a common form of humanities research data (and one of the earliest forms of digital humanities data) until they gave way to widespread use of dynamic searching.

Although more or less formal information models can be found in a variety of humanities contexts, there are some environments in which their operation is particularly visible to humanities scholarship. One of these is (naturally enough) in the domain of information science, where it impinges on humanities research practice: in the controlled vocabularies and information systems of the research library. Reference works such as dictionaries, bibliographies, concordances, and catalogues represent another long tradition of strongly modeled information. Still another is to be found in certain kinds of paratexts: title pages, colophons, footnotes, indexes, tables of contents, running heads, and other systematic apparatus through which publishers frame the cultural intelligibility of the text. These are particularly interesting examples since some of these formal systems exist not as an aid to the reader, but as an artifact of the work processes of publication itself: for instance, the printed signatures that assist in the ordering and assembly of the book, or the verbal formulae associated with the license to publish (“Cum privilegio” and equivalents), which are a standard component of title pages during periods when this kind of oversight was in place.

With this long history in mind, what does data modeling mean in a digital humanities context? The landscape is characterized by complexity and challenge. We inherit from the humanistic tradition a set of modeling practices and concepts that, while foundational, are often unsystematic, poorly understood by nonspecialists, and invisible through their very familiarity. Complicating this relationship is the fact that, as Scott Weingart observes, “humanists care more about the differences than the regularities”;³ even in the domains where formalisms and “regularities” are well-established, we are inclined to treat exceptions and variations as the phenomena of greatest interest. Furthermore, humanistic data is strongly layered: the artifacts modeled in digital humanities are created with a purpose by identifiable agents and have a history that is part of their identity, and they then undergo further processes of curation whose intentions and methods need to be kept visible. Museum and cultural heritage institutions have developed ontologies—notably the CIDOC Conceptual Reference Model (CRM)—in which concepts like provenance and purpose are explicitly represented. Our models thus in many cases need to represent not only the history of the artifact itself, but also the history of the ways in which it has been described and contextualized. Alongside this humanistic legacy we also inherit from the history of digital technology a deep, thoroughly elaborated understanding of data modeling that has found a home in some specific domains of the digital humanities: notably, in areas including markup languages, network analysis, ontologies, and game studies. These are all spaces in which an understanding of the models themselves, and a critical and theoretical atte...