Introduction
Libraries use more than one system to tell their patrons what a document is about â and they mostly use a mix of different instruments. A traditional library, whose main activity consists of collecting books and keeping them at the disposal of the public, will classify them according to a classification scheme, e.g. the Dewey Decimal Classification (DDC), the Universal Decimal Classification (UDC), or the Library of Congress Classification (LCC), etc. In a classification each subject is represented by a code; complex subjects may be expressed by a combination of codes. In fact this should be enough to express the contents of a document, and a flexible classification, e.g. UDC, allows the expression of each subject adequately, no matter how specific it may be.
The reality, however, is that libraries see classification mainly as an instrument to arrange their books on the shelves, as the basis for the call number system, and as a consequence of this a rich and very detailed classification like UDC is reduced to a scheme with broad classes because of the simple fact that the long string of numbers and characters of a detailed UDC code does not fit onto a relatively small book label; moreover, every librarian knows that only a few readers have any idea what is hidden behind the notations of the library classification. Frankly, the readers do not care; they just want to know where to find the book they need.
In order to convey what a document is about, most libraries also describe its content in words, which they find in a list of âsubject headingsâ or in a âthesaurusâ. Both are called âcontrolled vocabulariesâ, as opposed to ânon-controlledâ vocabularies, i.e. keywords assigned to documents which are not based on any predefined list and are not based on any standards.
Online databases also use more than one instrument to tell their public something about the subject of the documents they contain. Let us look at an example from the LISTA (Library, Information Science and Technology Abstracts) database, a database about library science (http://www.libraryresearch.com). An entry for an article by David Erdos entitled âSystematically handicapped: social research in the data protection from workâ, published in the journal Information & Communications Technology Law in 2011, is enriched with four controlled subject terms:
Data protectionâLaw and legislation
Electronic data processing
Data integrity
Computer security software
It also gets one âgeographic termâ (âGreat Britainâ) and no fewer than 14 âauthor-supplied keywordsâ: âacademic freedomâ, âcovert researchâ, âdata exportâ, âdata minimizationâ, âdata protectionâ, âethical reviewâ, âfreedom of expressionâ, âhistorical researchâ, âinformational self-determinationâ, âpersonal dataâ, âprivacyâ, âregulationâ, âresearch governanceâ, âsubject accessâ. Moreover, the database contains an abstract of more than ten lines of text. A lot could be said about the relations between these four kinds of indexing for one and the same article, and they have indeed been the subject of a few studies. From an economical point of view we could easily jump to the conclusion that articles with abstracts can do with fewer added index terms because the abstracts would contain many significant words or word combinations which would make excellent search keys. Mohammad Tavakolizadeh-Ravari, an Iranian scholar who presented his dissertation at the Humboldt University in Berlin, showed that the opposite is true. For Medline, the worldâs leading medical database, he calculated that articles with abstracts received more subject headings than those without abstracts [2]. The indexers probably regard those articles as more important and the abstracts help them in finding suitable index terms. The possible economical arguments seem to be of no importance.
In this era of social networking, libraries offer their public the possibility to add their personal âtagsâ to the description of documents in the catalogue. This opens the door for uncontrolled indexing, and although libraries embrace the interaction with the public and although they complain about the costs of controlled subject indexing, they still feel the need to provide controlled subject terms in their catalogues. Social tagging is â at this moment â just one more additional indexing method. In this book we will deal with uncontrolled indexing in more than one way, bu...