eBook - ePub

Digital Humanities and Buddhism

Name: Digital Humanities and Buddhism
ISBN: 9783110518399

An Introduction

Daniel Veidlinger,

240 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Digital Humanities and Buddhism

An Introduction

Daniel Veidlinger,

About this book

IDH Religion provides a series of short introductions to specific areas of study at the intersections of digital humanities and religion, offering an overview of current methodologies, techniques, tools, and projects as well as defining challenges and opportunities for further research. This volume explores DH and Buddhism in four sections: Theory and Method; Digital Conservation, Preservation and Archiving; Digital Analysis; Digital Resources. It covers themes such as language processing, digital libraries, online lexicography, and ethnographic methods.

Erratum: Unfortunately there is a mistake in the print version in the last paragraph of page 14. READ is an open-source software system developed by a team consisting of Stefan Baums at the Bavarian Academy of Sciences and Humanities, Andrew Glass in Seattle, Ian McCrabb at the University of Sydney and Stephen White in Venice ( https://github.com/readsoftware/read).

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.

Perlego offers two plans: Essential and Complete

Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.

Both plans are available with monthly, semester, or annual billing cycles.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.

Yes, you can access Digital Humanities and Buddhism by Daniel Veidlinger in PDF and/or ePUB format, as well as other popular books in Théologie et religion & Histoire de la Chine. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Year

Print ISBN

eBook ISBN

Edition

Topic

Théologie et religion

Subtopic

Histoire de la Chine

Part One: Theoretical and Methodological Issues

Daniel Veidlinger

Computational Linguistics and the Buddhist Corpus

The process of reading and interpreting religious texts has been going on for millennia, and in fact many of the hermeneutical techniques that scholars throughout the humanities use today were developed over the centuries in attempts to get at the meaning – hidden, metaphorical or otherwise – of various religious texts. Many of the greatest advances in communication technology have also taken place in the effort to preserve and transmit religious texts, from the astonishingly accurate oral transmission of the Hindu Vedas to the legends of Egyptian deities depicted in hieroglyphs, and on through the block printing of the Buddhist sutras and later the movable type of Gutenberg’s Bible. Many people’s first encounter with the radio was through hearing a preacher’s voice emanating from the speaker, and in contemporary times over a quarter of Americans regularly use the Internet to find information about religion (Pew Foundation 2001). It is now possible to use computers and other related digital technologies to help in the hermeneutical enterprise. This chapter will focus on some of the more popular computational language processing and text mining techniques and explain how they can be used to further our understanding of Buddhist texts and reveal new perspectives on their meaning.

A human scholar might read all of the words in a passage and examine their individual meaning, then consider the context in which the words occur, what is known about the author, the historical circumstances surrounding the creation of the text, and perform many other intellectual maneuvers in order to understand the passage. On the other hand, a computer at the current stage of development is not able to understand the piece in the same way. However, computers are able to digest enormous amounts of text – millions and millions of words that would take many lifetimes for a human to read – and apply various algorithms to the text in order to find relationships between the words and hidden patterns and stylistic features that are not immediately evident to a human reader. As John Burrows, an important pioneer of this kind of analysis, states,

Statistical analysis is necessary for the management of words that occur too frequently to be studied one by one… they constitute the underlying fabric of a text, a barely visible web that gives shape to whatever is being said… An appropriate analogy, perhaps, is with the contrast between handwoven rugs where the russet tones predominate and those where they give way to the greens and blues. The principal point of interest is neither a single stitch, a single thread, nor even a single color, but the overall effect (Burrows 2004, 323 – 324).

The digital techniques are not intended to replace human readers, but rather are best used in tandem with the insights gained by close human reading, for invariably the human scholar is forced to draw conclusions about a corpus based on only a sampling of the texts within it. Ultimately, of course, all literary analysis depends upon massive processing of data and detecting of trends. However, the traditional way relies upon years of research that lies collected in the human critic’s head who over time achieves the ability to reliably detect meaningful patterns, whereas a computer does it explicitly and in an instant. A human critic, in other words, is never just reading one document in isolation, but is processing that document through a neural net constructed in her own brain from previous readings of hundreds or thousands of documents that have left a latent impression. As Burrows puts it,

literary analysis often rests upon seemingly intuitive insights and discriminations, processes that may seem remote from the gathering and combining and classifying on which [digital humanities] have concentrated and in which computational stylistics is usually engaged. But those insights and discriminations are not ultimately intuitive because they draw, albeit covertly, upon data gathered in a lifetime’s reading, stored away in a subconscious memory bank, and put to use, as Samuel Johnson reminds us, through processes of comparison and classification, whether tacit or overt (Burrows 2004, 344).

Digital techniques can help scholars expand the data that they are able to examine using the insights gained from the initial close reading. They can be confirmed or contradicted by the statistical analysis of the entire corpus, and then new insights gained from the mechanical reading process can be cycled back and used to check the texts through a close reading. Ideally, therefore, the two techniques should be used to complement each other. In this chapter, I will examine a few of the more popular statistically based methods and provide some examples of how these techniques can be used to discover new insights about Buddhist texts.

The techniques that will be examined in this chapter have been used for some time already in the fields of Digital Humanities, Machine Learning and Natural Language Processing and there are several publically available systems that can be used to deploy them. These techniques are Term Frequency-Inverse Document Frequency (TF-IDF), Collocation Analysis and Vector Space Semantic Mapping.1 Each of these techniques is able to process very large amounts of text and look for relations between the words that can tell us many things about the overall topic of the text, and different ways words are used within it.

Of course, for any of these techniques to work, the text that one wishes to examine must be machine readable and properly formatted. The first task, then, is to identify a good machine readable text that one wishes to analyze. Ideally, the text should be in a raw text form, such as a file ending with .txt. There are then various transformations that must be effected on the text during the preprocessing phase, including sentence boundary detection, punctuation cleansing, stemming and normalization of spelling. The punctuation marks can cause a lot of confusion for the algorithms and skew the results significantly if they are not dealt with properly. For example, in the sentence “The Buddha taught the Dharma, and the Dharma lives on today in many forms” we would want the computer to recognize that “Dharma,” (note the comma) and “Dharma” are the same term. Although this might seem straightforward, a number of complicated issues arise that must be resolved, because sometimes the punctuation may carry important semantic meaning, such as in a hyphenated word, so that removing it will lead the computer down the wrong path. However, one of the benefits of working with an extremely large corpus of material is that these issues should hopefully resolve themselves in many cases, as the number of correct hits far outweighs the number of improperly parsed terms. Stemming involves associating the different forms of a word with the same stem or lemma, which, again, can greatly skew the results if not done correctly. Should plural and singular forms of the same noun be associated with each other, for example, so that three occurrences of the word “ox” and two of “oxen” would count as five occurrences of the lemma “ox”? What about different tenses of the same verb? It is also important to associate various contractions with the correct longform term, for example isn’t and is not. These are all questions that need to be resolved, although the answer may be different depending on the nature of the text one is dealing with and the kinds of questions one wishes to ask.2

Associated with these issues is the question of determining the size of the entities that one wants to examine in the analysis. One may wish to process each word separately, or one may wish to process 2, 3, 4 or more words together in order to retain the meaning of phrases, as Chris Handy discusses in his chapter herein. For example, the phrase “the four noble truths” would obviously be processed very differently by an algorithm that allows for 4-gram phrases than by one that just looks at each word individually. There is no single “correct” way to process texts, and the determination of the amount of words to be examined as a unit should be up to the researcher, with trial and error often being the best or even only way of knowing which one works better. A lot depends on exactly what the purpose of the analysis in question is. For some lines of research, a uni-gram parse might be best, and for others, a multi-gram parse might fare better. The results, as any responsible Digital Humanities scholar will admit, always have to be judged and tweaked based upon the learned opinion of the researcher. There will almost always be results in any language processing or data mining project that do not seem to make any sense and can be discarded. However, it is important at least to try to understand why the system would have provided the problematic output, because perhaps therein might lie some of the most useful insights due to the fact that they go against what was previously held to be the case.

For Buddhist studies, there are many sources of digitized texts that can be used...

Cover
Title Page
Copyright
Table of Contents
Introduction
Past, Present, and Future of Digital Buddhology
Part One: Theoretical and Methodological Issues
Part Two: Digital Conservation, Presentation and Archiving
Part Three: Digital Analysis of Buddhist Documents
Appendix: Selected Digital Humanities Resources
Index

About this book

Frequently asked questions

Information

Part One: Theoretical and Methodological Issues

Computational Linguistics and the Buddhist Corpus

Table of contents