Part I:
Semantic Web Foundations, Standards, and Tools
The Birth of the New Web: A Foucauldian Reading of the Semantic Web
D. Grant Campbell
SUMMARY. Foucault’s
The Birth of the Clinic serves as a pattern for understanding the paradigm shifts represented by the Semantic Web. Foucault presents the history of medical practice as a 3-stage sequence of transitions: from classificatory techniques to clinical strategies, and then to anatomico-pathological strategies. In this paper, the author removes these three stages both from their medical context and from Foucault’s
historical sequence, to produce a model for understanding information organization in the context of the Semantic Web. We can extract from Foucault’s theory a triadic relationship between three interpretive strategies, all of them defined by their different relationships to a textual body: classification, description, and analysis. doi:10.1300/J104v43n03_02
[Article copies available for a fee from The Haworth Document Delivery Service: 1-800-HAWORTH. E-mail address: <[email protected]> Web-site: <http://www.HaworthPress.com> © 2007 by The Haworth Press, Inc. All rights reserved.] KEYWORDS. Semantic Web, Foucault, classification, discourse analysis
Introduction
The emergence of file-sharing networks, folksonomies, weblogs, instant messaging, RSS feeds, and other next-generation Web tools presents fresh challenges to libraries and other organizations devoted to organizing and providing access to information. These tools, increasingly classified as the “Web 2.0,” promise to create a new environment in which the Web serves as a platform offering services rather than software: services that place the user in control of the data through an architecture of participation, data mixing, and the harnessing of collective intelligence (O’Reilly 2005). Libraries, with their traditional emphasis on documents organized according to internationally-produced and centrally-administered standards of description and classification, seem to be the antithesis of this exciting new user-centered environment. Instead of controlled vocabularies and classification schemes, the Web 2.0 offers folksonomies: user-centered tagging systems that classify tags into data “clouds.” Instead of documents, the Web 2.0 offers BitTorrent, which breaks files into pieces and then reassembles them, and file sharing systems that enable one to download individual data files and shuffle them according to individual needs and whims.
This has happened before. Online databases initially promised to do away with controlled vocabularies; metadata initiatives initially promised to do away with library cataloguing. In earlier cases, libraries survived, not by ignoring the new innovation, but by negotiating a fresh role for themselves in relation to these innovations. Online databases generated a need for rigorously-designed thesauri (Williamson 1996, 156); metadata systems gave rise to the adaptation of cataloguing principles to serve metadata application profiles, such as the Dublin Core’s Library Application Profile (Clayphan & Guenther 2004); information architects found a new use for principles of faceted classification (Rosenfield & Morville 2002). As libraries face a new generation of Web users and Web tools, the questions remain the same as for previous changes:
- What facets of the new information environment would be most receptive to collaboration with libraries and information professionals?
- What specific skills and tools in the library environment would be most useful, if reinvented for this new context?
The author has argued elsewhere (Campbell & Fast 2004, 382) that the Semantic Web initiatives of the World Wide Web Consortium offer the closest and most reasonable link between the emerging Web environment and library services. Envisioned by Tim Berners-Lee as the next step in the Web’s evolution, the Semantic Web offers a pyramid of information standards that would enable information to be machine-understandable as well as machine-readable. The Semantic Web will utilize semantic markup (XML), a standard for describing resources (Resource Description Framework), methods of reconciling differences in namespaces (ontologies), and methods of certifying authenticity through digital signatures. If fully realized, this new network will enable intelligent agents to extract documents on-the-fly that directly answer users’ individual questions: “Now we can imagine the world of people with active machines forming part of the infrastructure. … Search engines, from looking for pages containing interesting words, will start indexes of assertions that might be useful for answering questions or finding justifications” (Berners-Lee 2002, xvii).
The presence of W3C representatives at important conferences such as the Dublin Core Workshop and the meetings of the American Society for Information Science and Technology suggest that the skills of librarianship and information management are a key component of the Semantic Web. Creating machine-understandable data that enables agents to draw inferences and extract data correctly requires disambiguation, recognition of context, vocabulary control, and categorization. Librarians know how to do this, and the World Wide Web is eager to draw on the library community’s skills.
Nonetheless, we know from experience that when skills and tools migrate into new environments, they change. Post-coordinate thesauri designed for use in online databases are far different from the Library of Congress Subject Headings; the Dublin Core metadata set is a far cry from the Anglo-American Cataloguing Rules. Libraries are already finding ways to adapt their powers of description to Semantic Web needs; OCLC’s Connexion service transforms MARC records into RDF Dublin Core records. But we need to understand how these adaptations will change the basic concepts underlying information organization and description, and how those changes will have an impact on our traditional practices of bibliographic control.
This paper attempts to establish a theoretical frame for comprehending this transition, by drawing on the archaeological theory of Michel Foucault. In particular, I will argue that Foucault’s analysis of clinical diagnosis in The Birth of the Clinic (1963) provides a triadic model of three different principles of description and diagnosis; these three principles, when extracted both from the historical context and from Foucault’s insistence on a historical progression, provide a useful understanding of the relationship between bibliographic control in libraries and resource description on the Semantic Web.
Background: Foucault’s Birth of the Clinic
The Birth of the Clinic marks Foucault’s first major work in his “archaeological” period: a period that also produced The Order of Things (1966) and The Archaeology of Knowledge (1969). LIS researchers frequently use The Archaeology of Knowledge, which questions, on a more general level, the ways in which discourse is produced within communities according to surfaces of emergence and institutional places which sanction authoritative statements. This is very useful for analyzing how libraries get trapped in their own discursive formations (Radford 2003, 16), or for analyzing how our conceptions of knowledge are based on pre-conceptual underpinnings (Hannabuss 1996, 87). The Birth of the Clinic, however, offers special advantages, in that it analyzes changes in knowledge development within a specific relationship–that of doctor and patient–which can translate more easily to the relationship between the information professional and the information user.
In The Birth of the Clinic, Foucault uses a close analysis of French medical texts in the late eighteenth and early nineteenth centuries to demonstrate a transition in the conceptualizations of medical diagnosis: a transition from diagnosis based on classification to diagnosis based on patient observation, and then to diagnosis based on dissection and anatomo-clinical theory. At the risk of missing Foucault’s entire point (that discourse is historically and culturally situated), I propose to extract this transition both from its historical context and from Foucault’s sequence. Instead, I will argue that information organization can be classified according to the simultaneous presence of these three factors in a triadic relationship of classification, description, and analysis, each of which maps to a different facet of the Semantic Web.
Knowledge Through Classification
In the eighteenth century, Foucault argues, doctors diagnosed disease by abstracting it from the patient’s body and assigning it a place in a grid of resemblances and differences. Understanding emerges from the spatial dimensions of the classification as “picture,” similar to that of the genealogical tree:
This organization … defines a fundamental system of relations involving envelopments, subordinations, divisions, resemblances. This space involves: a ‘vertical,’ in which the implications are drawn up–fever, ‘a successive struggle between cold and heat,’ may occur in a single episode or in several; …; and a ‘horizontal,’ in which the homologies are transferred– … what catarrh is to the throat, dysentery is to the intestines. (Foucault, 5)
As with traditional classification schemes, medicine in this stage relies for its work upon an external construct, similar to a classification schedule, which takes subjects and sets them into relationships of hierarchy and differentiation, relationships that exist apart from the appearance of the subjects in documents.
Like library classification schemes, with their principles of order, their standard subdivisions, tables of wide application and defined facets, the taxonomies of disease in the eighteenth century produce “a space in which analogies define essences. The pictures resemble things, but they also resemble each other” (Foucault, 6). And in both library and medical classifications, “the form of the similarity uncovers the rational order” (Foucault, 7). The structure of the ordering system enables knowledge discovery by revealing meaningful relationships.
On the Semantic Web, the classification principle exists most prominently in the need for ontologies: web-based tools for resolving different conceptions and usages of terms. Ranging in complexity from simple glossaries to complex tools capable of representing numerous types of relationship, ontologies serve the conventional purposes of controlled vocabularies and classification schemes; they can resolve different namespaces, recognizing synonymous terms in different domains and provide a basis for site navigation and support (McGuinness, 179). In addition, they also provide opportunities for other, more sophisticated actions, such as consistency checking, completing a user’s query in a logical fashion, interoperability support, validation of data, and supporting structured, comparative, and customized search (McGuinness, 181-84). Most of these uses imply that knowledge can be removed from its specific context, to some degree, and classified, mapped to other domains, and used to facilitate information and knowledge discovery.
Knowledge Through Symptoms
In the early nineteenth century, Foucault argues, a new clinical approach to diagnosis and treatment supplanted the traditional classificatory approach. This clinical method places a much greater emphasis on symptoms as they occur in the patient: “There is no l...