Librarians usually think of authority control in the context of the library catalog, where an elaborate set of rules and a long history of consensus building and professional discipline have resulted in a relatively high uniformity of practice and understanding. Other communities, perceiving the value ascribed to authorized vocabularies, have shown an interest in making use of these vocabularies and presumably of adding the value inherent in them to other resource discovery tools and databases. But where exactly does the “added value” of the terms in a controlled vocabulary reside? (What follows focuses on subject vocabularies, but could also be applied to other kinds of controlled vocabularies.)
One simple notion is that the added value resides in the terms themselves. There is some logic to this. The terms in an authorized list like Library of Congress Subject Headings (LCSH) have been selected from a number of synonyms as being the “best” in some sense (e.g., most used by practitioners of a field). Automatic validation of a database’s terms against the source list can ensure formal consistency with the approved terminology. Declaration of a database’s terms’ adherence to a known authorized list can in principle enable searchers looking across databases to use a single controlled vocabulary. It would appear that all of these good things can be achieved simply by picking terms from an authorized list. This could be called formal adherence to a controlled vocabulary.
The problem with formal adherence is that it sidesteps the question of what terms mean. Only the simplest controlled terms lists use terms that are fully and mutually exclusive of one another. In most cases, the relationships between terms are more complex. In LCSH, two main devices are used to provide logical hierarchies of terms. Broad terms are assigned one or more subdivisions to narrow the meaning of a heading. Related terms are linked by cross references in broader/narrower hierarchies, again to specify a range of broad to narrow meanings. The specific meaning and correct application of a term in this kind of system depends on an understanding of the system’s principle of using the most specific terminology appropriate for describing a particular resource, and of how each term relates to the larger list. This could be called semantic adherence. Semantic adherence to a controlled vocabulary goes beyond formal adherence in that it seeks consistency with the meaning of terms as defined in the authorizing source, as well as with their forms.
Semantic adherence is more complicated that formal adherence. The specific meanings of terms are not routinely expressed in the authority records which make up a controlled vocabulary, and are rarely conveyed in a simple list of the terms. They are not typically available to automatic term matching algorithms, and cannot be typically be validated by machine tests. They may not be understood by users of the list who are not informed about the rules and discipline of the community that prepared the list. For all these reasons, it is doubtful that simple formal adherence to a controlled vocabulary will equate with semantic adherence to the same vocabulary.
The question then becomes, how much does the added value of a controlled term depend simply on its form, and how much does it depend on both the term’s form and its semantics? Consider two examples:
The diary of an American Civil War Confederate soldier might be assigned any of the following controlled LCSH headings, all of which are formally valid:
United States
United States—History
United States—History—Civil War, 1861-1865
United States—History—Civil War, 1861-1865—Personal narratives
United States—History—Civil War, 1861-1865—Personal narratives, Confederate
A collection of photos of the B-1 bomber might be assigned any of the following controlled LCSH headings, all formally valid:
Airplanes
Government aircraft
Airplanes, Military
Bombers
Strategic bombers
Jet bombers
Supersonic bombers
B-1 bomber
In the context of a particular database, any of these levels of specificity might seem more or less appropriate for describing the specified content. However, to be consistent with LCSH semantics and rules of application, only the last and most specific terms would be appropriate. If semantic adherence is not undertaken by the creators of a set of databases, how much of the value of the controlled terms will be preserved for users attempting to search across those databases with the controlled vocabulary?
User satisfaction with searching across databases which share only formal adherence to a particular controlled vocabulary is a question that can only be properly settled by research. However, in principle the value of controlled terms does not reside simply in their “valid” form. It depends also and crucially on their semantics. The added value of authority control thus depends on:
acknowledgment of an authoritative source for the form and meaning of terms
discipline in the application of terms
As agencies and database creators seek to expand the utility of controlled vocabularies for users outside their source communities, there is an increased need to assign explicit definitions to authority records and to inform outside users of their rules of application. This need for more explicitly specific semantics in authority records was recognized by Francoise Bourdon in her book International cooperation in the field of authority data (Saur, 1993) in relation to name authority files; it is just as true for subjects, and for other kinds of controlled vocabularies.
The index regeneration that has done so much good for MNCAT (authorizing many additional headings, correcting problems from the first index gen for v14, finally making thousands of romanized Arabic, Russian, Indic and Icelandic headings searchable) has also in a few cases gone haywire. Aleph is not sensitive to tag differences; for example, a 400 or 430 reference can match to a 150 heading, resulting in the replacement of the initial subfield(s) of a topical heading with the 100 or 130 data from the mismatched authority. Not good; but fixable.
Some of these bad heading flips were present but undetected in the v14 MNCAT. Others may have newly occured because the updated authority file included new authority records, or because the internal sequence numbers of the authority records changed. Aleph's heading checker essentially accepts the first authority guidance it encounters. Hence, in v16 Aleph may have encountered the wrong authority first, where in v14 it happened to encounter the correct authority first.
Broadly speaking, these changes fall into two types. In some cases, a 4XX reference flips all instances of an unrelated heading to match its 1XX, and there are no correct uses of the 1XX heading in MNCAT. These are relatively easy to fix using either the Correct button or an Aleph batch job. More problematic are cases where the flipped headings get merged into a larger batch of correct MNCAT headings. For example, the LC children's subject authorities simplify some LCSH headings by replacing them with a single heading, which also happens to be an LCSH heading. The LC children's subject authority combines the LCSH headings "Evolution" and "Human evolution" into the single heading "Evolution." Heading flips of this kind can be very hard to untangle.
Fortunately, LEO has been able to run a job comparing the v16 headings with their v14 counterparts and listing all those which were flipped, including each bib record number. This list will be of great aid in identifying which headings need to be corrected, and which bib records were affected in those cases where headings were incorrectly merged. LEO and TS are also working on measures to ensure that these kinds of flips will not continue to be a problem.
So, what should you do? For now, if you encounter an out-or-place heading, please report it to fixit@umn.edu . If you want to understand how things went wrong in more detail and fix a problem yourself, look for an authority record matching the base of the out-of-place heading. Check its 4XX fields--could one of them have matched the base of a more appropriate heading for the bib record? If so, that's the culprit.
Two changes are usually necessary in the authority record before the bib records can be fixed. Call up the problem authority in the Cataloging Record display. Extend the 4XX with some reasonable text (e.g., add a parenthetical qualifier) to get past a mandatory block on updating an authority with a 4XX matching another authority's 1XX; and add a UPD field to block future use of the record for updating, e.g., "UPD __ $$a NO" Once the new update status is reflected in authority link text in the UMN01 index, the bib headings can be safely corrected.
The purpose of the on-screen "Correct Display" button, new in Aleph version 16, was a bit obscure to me, but now I think I understand it.
In v14, Aleph was hypersensitive to small differences in the display form of its headings, to the point that minor differences in capitalization or punctuation could split a heading into two entries. In v16, this sensitivity is reduced, which is good on the whole and relieves us of lots of work to merge split headings. But a consequence of this reduction in sensitivity has been a break in the link between the heading form on the record and the display form in the index entry as regards these minor differences. This comes through most clearly when one is correcting a minor discrepancy in a single entry, for example:
"Biology--Polar Regions" should be "Biology--Polar regions."
There is only one hit for this heading. The Correct Heading button lets me change the bib heading to "regions", but the display form of the heading is not changed--it still shows "Regions." To complete the correction, I use the Correct Display button. It lets me doctor the display text in the index record to the form I want. Having taught Aleph not to regard differences like R/r as significant enough to result in a separate display, we also have to accept that when we make such a change, Aleph will ignore it unless we intervene with the Correct Display button.
So my guess is that we'll use the Correct Display button mainly to tidy up the index after using the Correct Heading button to correct the bib data. If we used only the Correct Display button, the entry would look OK, but the underlying bib data would be uncorrected.
Please note that only a few staff are authorized to use the Correct Heading and Correct Display buttons. If you have corrections which would best be done with these buttons, please report them to fixit@umn.edu