February 14, 2005

Should I use LCSH terms in my metadata records?

Librarians usually think of authority control in the context of the library catalog, where an elaborate set of rules and a long history of consensus building and professional discipline have resulted in a relatively high uniformity of practice and understanding. Other communities, perceiving the value ascribed to authorized vocabularies, have shown an interest in making use of these vocabularies and presumably of adding the value inherent in them to other resource discovery tools and databases. But where exactly does the “added value” of the terms in a controlled vocabulary reside? (What follows focuses on subject vocabularies, but could also be applied to other kinds of controlled vocabularies.)

One simple notion is that the added value resides in the terms themselves. There is some logic to this. The terms in an authorized list like Library of Congress Subject Headings (LCSH) have been selected from a number of synonyms as being the “best” in some sense (e.g., most used by practitioners of a field). Automatic validation of a database’s terms against the source list can ensure formal consistency with the approved terminology. Declaration of a database’s terms’ adherence to a known authorized list can in principle enable searchers looking across databases to use a single controlled vocabulary. It would appear that all of these good things can be achieved simply by picking terms from an authorized list. This could be called formal adherence to a controlled vocabulary.

The problem with formal adherence is that it sidesteps the question of what terms mean. Only the simplest controlled terms lists use terms that are fully and mutually exclusive of one another. In most cases, the relationships between terms are more complex. In LCSH, two main devices are used to provide logical hierarchies of terms. Broad terms are assigned one or more subdivisions to narrow the meaning of a heading. Related terms are linked by cross references in broader/narrower hierarchies, again to specify a range of broad to narrow meanings. The specific meaning and correct application of a term in this kind of system depends on an understanding of the system’s principle of using the most specific terminology appropriate for describing a particular resource, and of how each term relates to the larger list. This could be called semantic adherence. Semantic adherence to a controlled vocabulary goes beyond formal adherence in that it seeks consistency with the meaning of terms as defined in the authorizing source, as well as with their forms.

Semantic adherence is more complicated that formal adherence. The specific meanings of terms are not routinely expressed in the authority records which make up a controlled vocabulary, and are rarely conveyed in a simple list of the terms. They are not typically available to automatic term matching algorithms, and cannot be typically be validated by machine tests. They may not be understood by users of the list who are not informed about the rules and discipline of the community that prepared the list. For all these reasons, it is doubtful that simple formal adherence to a controlled vocabulary will equate with semantic adherence to the same vocabulary.

The question then becomes, how much does the added value of a controlled term depend simply on its form, and how much does it depend on both the term’s form and its semantics? Consider two examples:

The diary of an American Civil War Confederate soldier might be assigned any of the following controlled LCSH headings, all of which are formally valid:
 United States
 United States—History
 United States—History—Civil War, 1861-1865
 United States—History—Civil War, 1861-1865—Personal narratives
 United States—History—Civil War, 1861-1865—Personal narratives, Confederate

A collection of photos of the B-1 bomber might be assigned any of the following controlled LCSH headings, all formally valid:
 Airplanes
 Government aircraft
 Airplanes, Military
 Bombers
 Strategic bombers
 Jet bombers
 Supersonic bombers
 B-1 bomber

In the context of a particular database, any of these levels of specificity might seem more or less appropriate for describing the specified content. However, to be consistent with LCSH semantics and rules of application, only the last and most specific terms would be appropriate. If semantic adherence is not undertaken by the creators of a set of databases, how much of the value of the controlled terms will be preserved for users attempting to search across those databases with the controlled vocabulary?

User satisfaction with searching across databases which share only formal adherence to a particular controlled vocabulary is a question that can only be properly settled by research. However, in principle the value of controlled terms does not reside simply in their “valid” form. It depends also and crucially on their semantics. The added value of authority control thus depends on:
 acknowledgment of an authoritative source for the form and meaning of terms
 discipline in the application of terms
As agencies and database creators seek to expand the utility of controlled vocabularies for users outside their source communities, there is an increased need to assign explicit definitions to authority records and to inform outside users of their rules of application. This need for more explicitly specific semantics in authority records was recognized by Francoise Bourdon in her book International cooperation in the field of authority data (Saur, 1993) in relation to name authority files; it is just as true for subjects, and for other kinds of controlled vocabularies.

Posted by s-hear at February 14, 2005 12:10 PM