Indexing
eBook - ePub

Indexing

From Thesauri to the Semantic Web

Piet de Keyser

  1. 272 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Indexing

From Thesauri to the Semantic Web

Piet de Keyser

Book details
Book preview
Table of contents
Citations

About This Book

Indexing consists of both novel and more traditional techniques. Cutting-edge indexing techniques, such as automatic indexing, ontologies, and topic maps, were developed independently of older techniques such as thesauri, but it is now recognized that these older methods also hold expertise.Indexing describes various traditional and novel indexing techniques, giving information professionals and students of library and information sciences a broad and comprehensible introduction to indexing. This title consists of twelve chapters: an Introduction to subject readings and theasauri; Automatic indexing versus manual indexing; Techniques applied in automatic indexing of text material; Automatic indexing of images; The black art of indexing moving images; Automatic indexing of music; Taxonomies and ontologies; Metadata formats and indexing; Tagging; Topic maps; Indexing the web; and The Semantic Web.

  • Makes difficult and complex techniques understandable
  • Contains may links to and illustrations from websites where new indexing techniques can be experienced
  • Provides references for further reading

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Indexing an online PDF/ePUB?
Yes, you can access Indexing by Piet de Keyser in PDF and/or ePUB format, as well as other popular books in Design & UI/UX Design. We have over one million books available in our catalogue for you to explore.

Information

Year
2012
ISBN
9781780633411
Topic
Design
Subtopic
UI/UX Design
1

Introduction to subject headings and thesauri

Abstract:

This chapter provides an introduction to traditional controlled vocabularies, i.e. subject headings and thesauri. Both are highly used in libraries, but only for thesauri are standards still updated. The main difference between both kinds is that subject headings are precoordinate and thesauri are postcoordinate. Notwithstanding this, basic rules can be formulated that apply to both types. This chapter also deals with some practical aspects of controlled vocabularies, i.e. where they can be found, how they can be created or maintained. The purpose of this chapter is not to treat controlled vocabularies in depth, but to give the reader a general overview as a reference point for the next chapters.
Key words
controlled vocabularies
thesauri
subject headings
thesaurus software
precoordinate
postcoordinate
Finally, the thesaurus is like a taxonomy on steroids.
(Gene Smith [1])

Introduction

Libraries use more than one system to tell their patrons what a document is about – and they mostly use a mix of different instruments. A traditional library, whose main activity consists of collecting books and keeping them at the disposal of the public, will classify them according to a classification scheme, e.g. the Dewey Decimal Classification (DDC), the Universal Decimal Classification (UDC), or the Library of Congress Classification (LCC), etc. In a classification each subject is represented by a code; complex subjects may be expressed by a combination of codes. In fact this should be enough to express the contents of a document, and a flexible classification, e.g. UDC, allows the expression of each subject adequately, no matter how specific it may be.
The reality, however, is that libraries see classification mainly as an instrument to arrange their books on the shelves, as the basis for the call number system, and as a consequence of this a rich and very detailed classification like UDC is reduced to a scheme with broad classes because of the simple fact that the long string of numbers and characters of a detailed UDC code does not fit onto a relatively small book label; moreover, every librarian knows that only a few readers have any idea what is hidden behind the notations of the library classification. Frankly, the readers do not care; they just want to know where to find the book they need.
In order to convey what a document is about, most libraries also describe its content in words, which they find in a list of ‘subject headings’ or in a ‘thesaurus’. Both are called ‘controlled vocabularies’, as opposed to ‘non-controlled’ vocabularies, i.e. keywords assigned to documents which are not based on any predefined list and are not based on any standards.
Online databases also use more than one instrument to tell their public something about the subject of the documents they contain. Let us look at an example from the LISTA (Library, Information Science and Technology Abstracts) database, a database about library science (http://www.libraryresearch.com). An entry for an article by David Erdos entitled ‘Systematically handicapped: social research in the data protection from work’, published in the journal Information & Communications Technology Law in 2011, is enriched with four controlled subject terms:
Data protection–Law and legislation
Electronic data processing
Data integrity
Computer security software
It also gets one ‘geographic term’ (‘Great Britain’) and no fewer than 14 ‘author-supplied keywords’: ‘academic freedom’, ‘covert research’, ‘data export’, ‘data minimization’, ‘data protection’, ‘ethical review’, ‘freedom of expression’, ‘historical research’, ‘informational self-determination’, ‘personal data’, ‘privacy’, ‘regulation’, ‘research governance’, ‘subject access’. Moreover, the database contains an abstract of more than ten lines of text. A lot could be said about the relations between these four kinds of indexing for one and the same article, and they have indeed been the subject of a few studies. From an economical point of view we could easily jump to the conclusion that articles with abstracts can do with fewer added index terms because the abstracts would contain many significant words or word combinations which would make excellent search keys. Mohammad Tavakolizadeh-Ravari, an Iranian scholar who presented his dissertation at the Humboldt University in Berlin, showed that the opposite is true. For Medline, the world’s leading medical database, he calculated that articles with abstracts received more subject headings than those without abstracts [2]. The indexers probably regard those articles as more important and the abstracts help them in finding suitable index terms. The possible economical arguments seem to be of no importance.
In this era of social networking, libraries offer their public the possibility to add their personal ‘tags’ to the description of documents in the catalogue. This opens the door for uncontrolled indexing, and although libraries embrace the interaction with the public and although they complain about the costs of controlled subject indexing, they still feel the need to provide controlled subject terms in their catalogues. Social tagging is – at this moment – just one more additional indexing method. In this book we will deal with uncontrolled indexing in more than one way, bu...

Table of contents

Citation styles for Indexing

APA 6 Citation

Keyser, P. (2012). Indexing ([edition unavailable]). Elsevier Science. Retrieved from https://www.perlego.com/book/1835212/indexing-from-thesauri-to-the-semantic-web-pdf (Original work published 2012)

Chicago Citation

Keyser, Piet. (2012) 2012. Indexing. [Edition unavailable]. Elsevier Science. https://www.perlego.com/book/1835212/indexing-from-thesauri-to-the-semantic-web-pdf.

Harvard Citation

Keyser, P. (2012) Indexing. [edition unavailable]. Elsevier Science. Available at: https://www.perlego.com/book/1835212/indexing-from-thesauri-to-the-semantic-web-pdf (Accessed: 15 October 2022).

MLA 7 Citation

Keyser, Piet. Indexing. [edition unavailable]. Elsevier Science, 2012. Web. 15 Oct. 2022.