« March 2005 | Main | July 2005 »

April 21, 2005

DLF Session: "METS Profiles"

Second in a series of reports from the Digital Library Federation 2005 Spring Forum: a session entitled "METS Profiles."

Metadata Encoding and Transmission Standard (METS) is a metadata schema designed to capture structural information about complex information resources. The Library of Congress (LC), which manages the standard, has been moving aggressively to develop “profiles” for certain classes of materials. These profiles indicate a range of information on how to apply the schema. Cundiff’s presentation focused mainly on a current LC application of profiles to the problem of describing compact discs (CDs); Keith’s presentation was dedicated to the discussion of how to make such profiles “machine –actionable.”

Cundiff described how the Model Object Description Standard (MODS) schema used to provide descriptive metadata within the METS framework can convey structural information of its own. LC has decided to intellectually divide the hierarchies expressed within CDs (and one would presume, other formats) into two baskets: a logical hierarchy and a physical hierarchy. MODS covers the logical, METS’s section the physical. LC’s profile links the physical tracks of the CD to the intellectual content (song/album/performer) associated with the MODS record for that piece of music (or inherited from the information for the parent disc as a whole).

For the profile, rules are created to intellectually link the hierarchies. For example, the s lowest in the hierarchy for CDs are equal to

elements that are parents of
This allows extremely flexible display and linking options, and is being put to use in their “I Hear America Singing” project. However, this type of description is extremely resource intensive and would likely only be suitable for collections undergoing “selective enhancement.”

Keith’s presentation began by indicating some of the weaknesses of METS profiles in their current form. They do not support validation of documents against the profile, and they need human intervention to decode their requirements. The flexibility of METS works against the use of profiles as currently designed. He expects an eventual workflow in which a subject expert will write a prose profile, and then a developer will use that document to create a machine-readable version. He demonstrated some early attempts at the solution using XSLT transformations.

DLF Session: "Fasten Your Seatbelts..."

First in a series of reports from the Digital Library Federation 2005 Spring Forum: a session entitled "Fasten Your Seatbelts: We Are Approaching a Period of Turbulence…"

This session was designed to look at controversial topics. The first speaker, Mark Sandler, is Director of Collections at the University of Michigan Library. As one of the libraries currently contracted with Google to digitize holdings, he wanted to describe what has changed and what hasn’t.

He started by reminding the listeners that Google co-founder Larry Page is a UMich graduate and has been discussing the idea with UMich since 2002. The project is focusing on 7 million volumes from the stacks: no Special Collections (this may change), Business or Law. The images are high resolution color jpegs, which undergo optimization at another location. They are combined with optical character recognition (OCR) data and UMich or OCLC metadata. UMich receives back bitonal images at a resolution of 600 dots per inch (dpi), in Tagged Image File Format (TIFF). The system can also sense when pages are largely images; these pages are returned to the UMich folks as JPEG2000 files. Carlson stated several ongoing projects that continue despite the Google project, and he stated that the project was having the welcome effect of freeing up staff time to focus on selective enhancement of worthy collections. The digital library operation has 32 FTE, and reformats 5,000 volumes a year (text encoding work). Carlson reiterated that this is a time in history to “get in the game” and not fall into passivity as technology evolves.

Joseph Esposito is a technical consultant who focuses on the publishing sector. His presentation was focused on the future of scholarly publishing. He framed his talk by asking, “If journals did not exist, would we invent them?” His take was that we certainly would not create something like a journal in its present form, instead, he imagined, we would create something like a wiki or blog. He considers open access journals irrelevant in the long view, seeing them as a short-term backlash against the excesses of scholarly publishing under the dominant Elsevierian model. Peer review, he believes, is a anachronism, borne of the time in which printing was an expensive proposition. Its manner is to “create a monument to an idea,” when many scholarly disciplines are more fluid and up to debate than that model allows. Thus the model of a wiki or blog, in which interested parties would comment and shape a debate.

The most interesting part of the discussion was how he represented market trends in publishing as representatives of a common lifecycle in industries, in which strongly branded but functionally similar products, created by thousands of companies, eventually fall under the control of a few conglomerates, who can afford research and development to bring new features before the customers request them. This process results in higher prices supported by a market with a high barrier of entry to competitors, allowing the conglomerates to dictate terms. [Compare the auto industry, or the PC operating system industry]. In this model, libraries move from being ancillary to primary consumers of such journals (due to cost, audience, volume, etc) and bear the full brunt of pricing pressure. The interesting part was not just the application of the model to the publishing industry, but that he fully expects that whatever market emerges to host the new form of communication (i.e. blog service providers, fancy webboards or wikis, etc) will undergo exactly the same transformation. Despite many of our ideas about the social “good”-ness about Google, the democracy of the internet, etc., in 20 years (or whenever the market matures) our communication lanes will be dominated by gatekeepers equally powerful to Elsevier et al.

The last talk was by Bernie Frischer, who is devoted to exploring three-dimensional (3D) virtual reality (VR) technology in libraries. He demonstrated a VR installation that creates an immersive model of the Roman Forum. The model is specific to date and time, so much so that not only are the buildings correct in form and condition to the year specified (c. 400 C.E., but even the light and shadows are correct to the second for that date in that location. Speaking of location, every point can be associated with a precise latitude and longitude, especially interesting for the Global Positioning System (GPS) gearheads. One could imagine an interface in which a GPS unit controls the movement and perspective of the VR environment, perhaps along with one of those Powerbooks with the motion sensor.

Two interesting points were raised in the discussion. First, this sort of historical 3D archaeology brings what Frischer termed a new “rigor” in research. No longer is it possible to fudge on what a floor, ceiling or roof looked like: The nature of the representation in 3D gives everything the gloss of truth like a drawing cannot; thus it is crucial to make sure that every detail is justified by research and rendered to the best of possible knowledge.

The second interesting point is that now that 3D models of many buildings are available, new approaches to history are opened. He detailed a project in which a 3D model of the Roman Collosseum was being populated by thousands of intelligent software agents representing Roman citizens to determine whether the oft-repeated legends of the building’s efficiency in handling crowds was legitimate. Among their findings was that nearly 80% of the attendees would have had to go through a particularly dank hallway that served as an open latrine. The modeling in this project could be used to analyze ventilation, strength, lighting, or many other aspects of a building. Frischer foresees the use of VR in libraries; while this is a minority view, it is nonetheless an interesting one and reminds me of the connection between the Planetarium and the Central Library here in Minneapolis.

April 20, 2005

Google Print Searching

Just a brief entry to let people know that you can easily search just the "Print" version of Google (scanned books) by one of a few methods...

The main "Print" home page (print.google.com) does not provide a search box, so entering URLs into the address bar is necessary.

The first is to tailor a URL like so:

The "print" can be replaced by www:

Note the "as_q" syntax. These are query statements, linkable by ampersands.

You can get Boolean like so:


Add "+" to include another word:


Create an"as_oq=" statement, linking any "or" terms with a +:


Use "as_eq=", append to main query with an &: