eBook - ePub

Indexing

Name: Indexing
ISBN: 9781780633411

From Thesauri to the Semantic Web

Piet de Keyser,

272 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Indexing

From Thesauri to the Semantic Web

Piet de Keyser,

About this book

Indexing consists of both novel and more traditional techniques. Cutting-edge indexing techniques, such as automatic indexing, ontologies, and topic maps, were developed independently of older techniques such as thesauri, but it is now recognized that these older methods also hold expertise.Indexing describes various traditional and novel indexing techniques, giving information professionals and students of library and information sciences a broad and comprehensible introduction to indexing. This title consists of twelve chapters: an Introduction to subject readings and theasauri; Automatic indexing versus manual indexing; Techniques applied in automatic indexing of text material; Automatic indexing of images; The black art of indexing moving images; Automatic indexing of music; Taxonomies and ontologies; Metadata formats and indexing; Tagging; Topic maps; Indexing the web; and The Semantic Web. - Makes difficult and complex techniques understandable - Contains may links to and illustrations from websites where new indexing techniques can be experienced - Provides references for further reading

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Publisher

Chandos Publishing

Year

2012

eBook ISBN

9781780633411

Topic

Design

Subtopic

Computer Science General

Introduction to subject headings and thesauri

Abstract:

This chapter provides an introduction to traditional controlled vocabularies, i.e. subject headings and thesauri. Both are highly used in libraries, but only for thesauri are standards still updated. The main difference between both kinds is that subject headings are precoordinate and thesauri are postcoordinate. Notwithstanding this, basic rules can be formulated that apply to both types. This chapter also deals with some practical aspects of controlled vocabularies, i.e. where they can be found, how they can be created or maintained. The purpose of this chapter is not to treat controlled vocabularies in depth, but to give the reader a general overview as a reference point for the next chapters.

Key words

controlled vocabularies

thesauri

subject headings

thesaurus software

precoordinate

postcoordinate

Finally, the thesaurus is like a taxonomy on steroids.

(Gene Smith [1])

Introduction

Libraries use more than one system to tell their patrons what a document is about – and they mostly use a mix of different instruments. A traditional library, whose main activity consists of collecting books and keeping them at the disposal of the public, will classify them according to a classification scheme, e.g. the Dewey Decimal Classification (DDC), the Universal Decimal Classification (UDC), or the Library of Congress Classification (LCC), etc. In a classification each subject is represented by a code; complex subjects may be expressed by a combination of codes. In fact this should be enough to express the contents of a document, and a flexible classification, e.g. UDC, allows the expression of each subject adequately, no matter how specific it may be.

The reality, however, is that libraries see classification mainly as an instrument to arrange their books on the shelves, as the basis for the call number system, and as a consequence of this a rich and very detailed classification like UDC is reduced to a scheme with broad classes because of the simple fact that the long string of numbers and characters of a detailed UDC code does not fit onto a relatively small book label; moreover, every librarian knows that only a few readers have any idea what is hidden behind the notations of the library classification. Frankly, the readers do not care; they just want to know where to find the book they need.

In order to convey what a document is about, most libraries also describe its content in words, which they find in a list of ‘subject headings’ or in a ‘thesaurus’. Both are called ‘controlled vocabularies’, as opposed to ‘non-controlled’ vocabularies, i.e. keywords assigned to documents which are not based on any predefined list and are not based on any standards.

Online databases also use more than one instrument to tell their public something about the subject of the documents they contain. Let us look at an example from the LISTA (Library, Information Science and Technology Abstracts) database, a database about library science (http://www.libraryresearch.com). An entry for an article by David Erdos entitled ‘Systematically handicapped: social research in the data protection from work’, published in the journal Information & Communications Technology Law in 2011, is enriched with four controlled subject terms:

Data protection–Law and legislation

Electronic data processing

Data integrity

Computer security software

It also gets one ‘geographic term’ (‘Great Britain’) and no fewer than 14 ‘author-supplied keywords’: ‘academic freedom’, ‘covert research’, ‘data export’, ‘data minimization’, ‘data protection’, ‘ethical review’, ‘freedom of expression’, ‘historical research’, ‘informational self-determination’, ‘personal data’, ‘privacy’, ‘regulation’, ‘research governance’, ‘subject access’. Moreover, the database contains an abstract of more than ten lines of text. A lot could be said about the relations between these four kinds of indexing for one and the same article, and they have indeed been the subject of a few studies. From an economical point of view we could easily jump to the conclusion that articles with abstracts can do with fewer added index terms because the abstracts would contain many significant words or word combinations which would make excellent search keys. Mohammad Tavakolizadeh-Ravari, an Iranian scholar who presented his dissertation at the Humboldt University in Berlin, showed that the opposite is true. For Medline, the world’s leading medical database, he calculated that articles with abstracts received more subject headings than those without abstracts [2]. The indexers probably regard those articles as more important and the abstracts help them in finding suitable index terms. The possible economical arguments seem to be of no importance.

In this era of social networking, libraries offer their public the possibility to add their personal ‘tags’ to the description of documents in the catalogue. This opens the door for uncontrolled indexing, and although libraries embrace the interaction with the public and although they complain about the costs of controlled subject indexing, they still feel the need to provide controlled subject terms in their catalogues. Social tagging is – at this moment – just one more additional indexing method. In this book we will deal with uncontrolled indexing in more than one way, bu...

Cover image
Title page
Table of Contents
Copyright
List of figures
List of abbreviations
Preface
About the author
Chapter 1: Introduction to subject headings and thesauri
Chapter 2: Automatic indexing versus manual indexing
Chapter 3: Techniques applied in automatic indexing of text material
Chapter 4: Automatic indexing of images
Chapter 5: The black art of indexing moving images
Chapter 6: Automatic indexing of music
Chapter 7: Taxonomies and ontologies
Chapter 8: Metadata formats and indexing
Chapter 9: Tagging
Chapter 10: Topic Maps
Chapter 11: Indexing the web
Chapter 12: The Semantic Web
Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription

No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud

Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app

Yes, you can access Indexing by Piet de Keyser in PDF and/or ePUB format, as well as other popular books in Design & Computer Science General. We have over 1.5 million books available in our catalogue for you to explore.