040430 (Friday)

Googlization

I had a nice day today at our local ARLD (Academic and Research Libraries Division of the Minnesota Library Assocation) Day conference at the Arboretum in Chanhassen. Most interesting to me was a presentation on the "Googlization of Library Values" by librarians from St. Cloud State, St. Catherine's, and Carleton College. I was expecting the usual library lament about how we have to resist the dominion of google which is teaching our patrons that search is simple and everything is on the net. Surprise! Every one of the presenters spent their time sharing positive lessons we need to learn from Google. Robin Ewing asked us to learn from three core values demonstrated by Google: vision, usability, and whimsy. We all know Google thinks big and keeps stuff simple, but I had not really valued their sense of whimsy. Upon some reflection I find it true: Google takes the time to make things a little fun, from their name and logo to things like allowing their interface to be translated into Klingon and Elmer Fudd. Can we make library services anything other than deadly dull?

Posted by efc at 11:00 PM

040427 (Tuesday)

Powerless without Video

It is the smallest things! A server we use in Digital Collections had its video card go bad. The disk array on the server has a 4-hour service plan, but it turns out that the University and one local shop were quoting a four day turnaround for the video card. Luckily a genius (Scott) at the Apple Store at the Mall of America was able to swap the card out while I waited there. Only one morning lost to shuttling the machine around campus and town. We are back up again. Almost. I forgot to tell the server not to go to sleep!

Posted by efc at 4:40 PM

040423 (Friday)

A Rose by any other name

John clued me in to SRW today. I thought I had not heard of it, but have just found that it is the name for Z39.50 Next Generation, something I had hear of. Do you think Z39.50's reputation is so poor this group had to choose a new name?

Posted by efc at 9:48 PM

Censorship?

There has been a controversy swirling around the net about some pictures taken by a (now fired) Kuwait airport worker of the coffins of US soldiers being placed aboard transport home. We sometimes imagine that the net makes information free (though in this case it had something to do with the Freedom of Information Act), but I wonder how many people will be able to reach the http://www.thememoryhole.org site which was distributing similar pictures. It seems the US Government has (maybe) shut it down. A chilling sign of the times, and a reminder of just how fragile even the net can be? Or not? The NYT is running the story, with photos.

Posted by efc at 11:02 AM

040422 (Thursday)

The Many Uses of Creative Commons

A nice mainstream article on the many uses of Creative Commons licenses is available at Business2.0. If you don't know what the Creative Commons is, you should. This article may make for a gentle intro alongside real-life stories of how it is making a difference. [Source: OAN]

Posted by efc at 3:24 PM

040421 (Wednesday)

OAI on Apache

In a promising new twist, mod_oai will bring the Open Archives Initiative protocol to Apache. What sort of services will this make possible.

Posted by efc at 11:41 PM

Simpson on Google and Akamai

I've been a fan of Simpson Garfinkel since my days in the NeXT Users Group at MIT. Simpson was an active NeXT user and has since gone on to become a tech writer with a clear point of view, defending privacy as he engages the future. Today Simpson wrote about Google and Akamai as competitors, or at least fellow travelers on the the path to distributed terascale computing. What should Libraries be buying from Akamai and learning from Google?

Posted by efc at 10:43 PM

040420 (Tuesday)

Upscale

We talk a lot about "scale" and "robust" when discussing library systems. This posting about Google reminded me just how much the scale has shifted in the last 15 years. Amazing.

Posted by efc at 11:02 PM

040416 (Friday)

Public Access to the Public Domain

Brewster Kahle gave the talk at the closing plenary of the CNI Spring Task Force meeting. Brewster just keeps on doing, he never seems to be daunted by the scope of large tasks. The amazing thing is that it works! He set out to capture the web, and the Internet Archive (IA) does that better than any other entity. He called on us to "put the best we have to offer within the reach of our children." Within reach, to Brewster (and to our children) means "on the web." He then walked us through a back-of-the-napkin calculation of what it would take, concluding that the goal is within reach of us today and within our budgets to boot. Are we ready to answer the call?

Books. The Library of Congress = 20M volumes = 26TB = $60,000 disk space. At 2 hours/book (without destroying the books) this is doable. Output back to book form costs $1/book. This print-on-demand solution is being demonstrated today by the BookMobile the Internet Archive has put on the streets not just of the USA, but also India, Egypt, and most recently rural Uganda.

Audio. 2M "saleable objects" of audio exist, but much of it behind IP regs that make it hard to deal with. The IA approached the "taper" community of people who have taken advantage of performance oriented rock bands who followed the Grateful Dead's lead into allowing fans to tape their music and exchange it for non-commercial use. "How would you like infinite bandwidth and infinite storage for free?" the IA asked the tapers. Guess what? They love the idea. 500 rock bands have given the IA permission to archive this material and share it for free. The tapers have already produced 10-20TB of concerts available on the IA.

Moving Images. Don't just consider the 100-200,000 mainstream films (half of them from India). Consider the 2M films created in the 20th century that document daily life. Some of these may be in your very own basement. One hour of film costs about $100 to convert. One hour of video costs only $15. The IA is also now capturing 20 channels of video from around the world 24/7 for about $500,000. It is estimated there may be about 400 channels around the world.

Software. The IA has received a DMCA exception to circumvent copy protection for the purpose of ripping some of the 50,000 software packages that exist to date. They are only allowed to rip titles from no-longer-supported operating systems.

Web. The IA now captures 20TB/month of web content. The WayBackMachine holds over 30B (yes, billion) pages from 50M sites on 15M hosts. Anna Patterson's search engine based on this corpus searches 4 times the number of sites covered by Google.

The Internet Archive does all this on a budget of about $4M or $5M each year. I don't know about you, but this leaves me breathless.

In order to preserve this growing corpus (libraries, Brewster notes, traditionally burn eventually) the IA seeks out partners around the world who can host copies of the data. The more different they are from the US the better. Right now a copy is held at the new library in Alexandria and negotiations are under way with a northern european country. Brewster estimates that the resources needed to maintain a mirror of IA are a PB of disk (that's petabyte), a GB of bandwidth, and $100M to set up an appropriate endowment for continued operation.

But if the "Universal Access to All Human Knowledge" goal articulated by Raj Ready of the Million Book Project is too vast, and even the "All Published Knowledge Available to the Kid in Uganda" is a bit far out, how about something easy, asks Brewster. What if we just tried to attack what we already have every right to collect? Let's go for "Public Access to the Public Domain."

In the USA the public domain is pre-1923 publications. In fact, Brewster points out, with the aid of Mike Klezman's (?) recently completed electronic version of the copyright registry, it is now easy to find out which material from 1923-1964 did not have their copyright renewed and are now also in the public domain. Let's go get this material! His proposal: give the IA a book and $10 and the IA will return to you the book unharmed plus a digital copy. Will we accept the offer? Oh, and by the way, the IA is also happy to accept video and $15/hour for the conversion of that to digital format. Oh, and did I mention that the IA will also host the digital documents on their servers "forever"?

I think we should take Brewster up on this offer. How much material do we have in the University of Minnesota collections which we could part with for a bit to let the IA digitize and store it? We should seriously consider a project to pump this material and the limited dollars required to the IA as fast as we can. This is a crazy idea at a crazy price point, let's try to sink Brewster under our enthusiastic response! The great thing is, we probably won't, he has not sunk yet.

P.S. Brewster also tossed off an idea about how to archive blogs in response to a question. His thought was that we should be able to subscribe to blog RSS feeds and simply archive everything we see announced via that mechanism. I wonder if we could auto-harvest RSS from UThink.

Posted by efc at 11:58 PM

Our Role in P2P

I am concerned that the work of the Joint Committee of the Higher Education and Entertainment Communities may do more harm than good by legitimizing some role for higher ed in killing off P2P file sharing. I don't think we have a role, I think this is a fight between the RIAA and MPAA and American society, we will just get trampled in the middle. Still, a session updating us on the P2P issue at CNI was interesting. It is clear that EDUCAUSE is finding little workable technology to help satisfy industry demands (tools like Audible Magic and ICARUS are throwing out the legitimate baby with the illegal bathwater). Brewster Kahle was in the audience and asked us to please remember that the Internet Archive depends on P2P for distribution of its legitimate content. If we need an example of real life content dependent on P2P distribution, he welcomes us to point his way.

Posted by efc at 3:55 PM

Powerful Points

I am a pretty visual person and appreciate a well laid out graphical representation of an issue. I find one of the masters of our field to be Herbert Van de Sompel. I didn't attend his session today on Federations of Institutional Repositories, but I see the handouts in the CNI packet and am struck again by what lean, direct, and illuminating illustrations he comes up with. I don't know whether he makes this stuff up himself or employs some graphic talent on the back end, but his touch has been so consistent over the years in many contexts that I suspect the former. I hear many people laud the interface of SFX, few of whom realize just how much it is the vision of Herbert, who showed "rough" versions of SFX many years before it became a commercial product with virtually the same interface it still enjoys. If you want to see what I consider PowerPoint well-used, take a look at a presentation by Herbert some day.

By the way, his work on new roles for MPEG-21 & OAI & OpenURL in federating repositories is quite interesting, thinking way outside the box. Take a look at the D-Lib article he and a few colleagues wrote for a taste.

Posted by efc at 3:46 PM

Preservation via LOCKSS?

After lunch a few of us retired to a quieter corner of the hotel to discuss whether it would be worth our time and effort to try to make LOCKSS more of a preservation tool. There was a clear consensus among this group that LOCKSS is not preservation today, and that the project (though it claims a preservation role) is really not doing much (beyond its NSF grant attempt, anyway) to make accommodations in the software for preservation issues. These would include things like issue level manifests with metadata, file format recognition and metadata (perhaps via JHOVE, which I saw was announced today), or picking up formats other than HTML (maybe an OAI harvest of metadata followed by a harvest of the related deeper-web items). Right now LOCKSS is, in essence, a "bit store," it is a backup mechanism. In some ways, building up LOCKSS installations might also remove some of the wins the system brings in terms of ease of setup and maintenance.

An interesting experiment might be to use the WayBackMachine to figure out how many of the current Humanities titles are captured in the Internet Archive.

Posted by efc at 3:33 PM

Gather, Create, Share

A group funded by the Mellon Foundation is trying to define the bounds of interaction between course management systems (CMS) and repositories. Their report should be available on the DLF web site by the end of May. In today's presentation to CNI they made three fundamental points: (1) users will be getting to repository content through a broad set of "course management" tools that extend well beyond CMS into PowerPoint, Weblogs, Citation Managers and the like; (2) repositories need to attend to a Checklist of requirements and desirables in order to interoperate with this layer of tools; and (3) the process used to build course content can be expressed as "Gather-Create-Share".

This "Gather-Create-Share" seems like a weak echo of Apple's "Rip, Mix, and Burn" campaign a few years ago. It is also the process that Lessig warns us is under threat given the intellectual property regime our country is putting into force. The session really didn't touch on the impediments that copyright puts in the way of the "Gather" step, but I was told that IP issues will be part of the Checklist when the group reports out to the DLF.

Another mention of Chandler and its higher-ed alter-ego Westwood, this is something I should pay some attention to. Chandler is an open source personal information management tool under development.

Posted by efc at 3:12 PM

Privacy Policy

One of the commitments the IT Council has made is to revise our Libraries privacy policy before the next school year begins. An overcommitted Sue Hallgren is leading this effort for the IT Council. I attended a CNI session on Security and Privacy to try to gather info for Sue and the small group working with her on this. SPEC Kits 277 and 278 came up again in this context, and 278 looks particularly helpful for developing a refreshed policy. A few tools were also mentioned that we might want to peek at, though I'm not sure any of them are actually appropriate for our context. Check out the privacy proxy they mentioned, and the public workstation privacy info and tool mentioned.

Random thought... Could we cut off much of the unwanted workstation traffic by limiting Public Browser in a new way? What would happen if public browser refused to allow more than 100 characters in any text field of any form? Would that be enough to kill use for email, but still allow research use?

Posted by efc at 9:32 AM

040415 (Thursday)

Considering Computer Support

Ralph Quarles (IU) and I found each other at the reception. Ralph has offered to help us evaluate our computer support and seek an appropriate model for future support. He noted that he is ready for some ongoing contact with his colleagues at other CIC institutions. The Library IT Directors have that kind of forum in the CIC, but staff at his level, those actually running technology support operations, really don't have many opportunities to reach out to each other. I wonder if we should plan a day or two of professional "shoot the breeze" time at Minnesota for all the folks in these positions? We could do it as part of our investigative effort. This could both help this cohort build connections to one another and serve as a font of wisdom and warning for our own planning effort.

We ask our technology staff to do the seeming impossible. Our staff is not nearly large enough to manage the kind of deployment we've got around the Libraries. How can only six staff manage 600 machines? On the other hand, could it be a failure of imagination? When I arrived at the U the first significant decision I made was to kill our attempt to use Sun Ray "appliance" computers to replace public workstations in the Libraries. We had good reasons for that decision, but the fundamental problem was staring us in the face then and remains at the core of our troubles in ITS: we cannot support a deployment of 600 Windows workstations with so few staff. Why can't we change the rules? At MIT I watched an organization deploy and maintain thousands of workstations with fewer staff than we have available to us.

I believe we need to think outside the box, it may not be Sun Ray, but we must recognize our situation (a budget even more limited than it was in 2001) and devise creative solutions to meet our needs within those bounds. I am certain this means compromises, but not necessarily the ones that run our staff ragged without the reward of a computing infrastructure they can take pride in, tell the world about, and share with our community.

My frustration with our current situation expresses itself as a frustration with ugly machinery, and I do believe that computers should be in the process of fading out of sight, but that's a red herring. My real frustration is that I've allowed our expectations to be diminished by accepting the limits we've imposed on ourselves. I wonder if we shouldn't get the CIC equivalents of Directors of ITS together to share their frustrations and triumphs. We could certainly use some inspiration and, who knows, we might even be able to do something inspiring ourselves!

Posted by efc at 11:03 PM

Storage Resource Broker

Reagan Moore from the San Diego Supercomputer Center discussed their SRB development. A very dense presentation left me with the basic impression that I need to understand this approach to data storage being developed as part of the NSF grid infrastructure. SRB "provides a uniform interface for connecting to heterogeneous data resources over a network and accessing replicated data sets." An alternative to LOCKSS? I did hear from one colleague who has already been reviewing SRB that it is a very complex bit of software to install and maintain.

Posted by efc at 10:53 PM

Digital Video to Last

Jerome McDonough, the Digital Library Development Team Leader at NYU gave a very informative presentation on video standards and preservation. The bottom line was that though 4:4:4 (truly uncompressed, an Apple codec and MJPEG2000 can provide this) video would be the right thing to capture and archive, unfortunately it is much too expensive to to store. At NYU they've stepped back to the compromise of capturing 4:2:2 video instead. He noted that anything other than 4:4:4 capture and storage was, in fact, allowing for lossy compression. This is fine until migration has to happen, at which point artifacts will creep in due to the recompression of images. Note that NYU uses TripWire to create and check up on MD5 checksums (can I use that on Thomas?) and UC Berkeley's OceanStore project is taking a stab at very large, high performing distributed storage solutions.

Posted by efc at 10:49 PM

Culture Clash

Ed Ayers gave a wonderfully funny and somewhat touching plenary address about the tensions between "Academic Culture and Computer Culture". His own work includes the well regarded Valley of the Shadow. He described the "communal autonomy" of the university as a defining characteristic. The heart of our institutions is the "mysterious exchange between student and teacher," that "intimate bubble" in which learning happens. He described this activity as a flame, both intense and vulnerable, and our universities as "massive structures to protect those flames." His suggestion was that we build lighter, smaller things that "simplify the vastness," things like instant local class nets that don't rely on the broader campus network. His mantra was "scale down." As he spoke of the intimate bubble of interaction between student and teacher, I began to wonder whether technology is not beginning to pierce that bubble with tools that allow for action in the world from the classroom and feedback from the world into the classroom.

I was really struck by Ed's flickering flame. I get awfully frustrated by the scale and scope of the University of Minnesota. I lament that its mission seems to be: "Everything to everyone!" The bureaucracy can seem endless, the commitment to excellence often lacking, the message muddled, on an on. But I stay here, and this flickering flame reminded me why. This academic enterprise is rather counter most of American culture, it is rather precious. In a culture where profit and individual heroism are prized, we operate an enterprise which spends every penny we are given in the name of creating moments of intimate, hidden victory. People discover who they are on our campus, they encounter mentors, they open their minds to one another. Sure, there is a lot of bureaucracy, not to mention a whole lot of drinking, backstabbing, and just getting by... but what if those things are the cover need to protect the flickering flame. If our culture actually realized how radical an enterprise this was, would we be allowed to get away with it? We all break the wind so that the flame of learning has a chance to move from one candle to another; maybe not every time we fire up the PowerPoint slides for another class, but maybe enough. Maybe this behemoth of an institution is what it takes to make this opportunity available to more of our neighbors.

Then again, that's pretty dreamy stuff. Quite the rationalization. I still want to stir up our University enough that we don't settle for less than excellence. The Libraries is where this starts for me. And more specifically our IT division and the work we do to put appropriate technology into the hands of our staff, students, and faculty.

Yikes! It's a good thing I'm not doing this blogging stuff more often!

Posted by efc at 10:41 PM

Identity, Portfolio, and Turning Data Models on their Heads

Steve Cawley and I attended a valuable "executive roundtable" bringing CIO's and University Librarians together to discuss "identity." As we discussed the challenges and successes of authentication and authorization in today's academic environment, I began to wonder if we are not focussing a bit too close to home. I note that while libraries felt very secure in their database and searching expertise, tools like Google snuck up outside our borders and transformed user expectations of searching and research so that now we are strangers in our own territory. What will the identity landscape look like five and ten years from now. Will users have an expectation that they can carry an identity into our organization that was credentialed beyond our borders and control?

Some hint of that future may have appeared in the form of a discussion about e-portfolios and their impact on our data models. A move toward portfolios is a move toward users asserting control of their own data (a user as holding the copy of record of their transcript, for example). This turns on its head our current data model where institutions bear the responsibility for holding and managing the continuity of that sort of data. I wonder whether the solution to the buy-in problem for institutional repositories, for example, might be an individual repository model sewn together by metadata harvesting like OAI? Who will be the "portfolio banks" of the future who (for a small fee) manage the physical systems on which your e-portfolio resides and ensure that the policies and permissions you specify for your information are actually carried out when sharing your portfolio?

Some additional notes: Who assigns identities? Who decides their scope? Some discussion of OKI's concept of "authN" (authentication) and "authZ" (authorization). The role of UIN (numbers) vs. NetID (typically names). Distinguishing between the deed of authentication and the trails and logs kept about that deed and subsequent actions (librarians are loath to keep any trail, but doing the deed might be fine). See SPEC Kits 277 and 278 from the ARL and "Mirage of Continuity" by Brian Hawkin. Credit Brad with the notion "sustainable economies in tension with the frontiers of innovation" (all you really have to do to make technology sustainable is stop changing) and Beth with "the economics of compromise" (the notion that organizations are much more willing to work with you after they have experienced a compromise and its costs than before). If setting up a portfolio banking business, what would be your "free as in beer" service lure and what would you charge for? Would password management be part of the package?

Posted by efc at 10:35 PM

At CNI

Today and tomorrow I'm at the CNI Spring Task Force meeting in Alexandria, Virginia. The Coalition for Networked Information holds these meetings twice a year and I've been lucky enough to work at two institutions that value the CNI's work and think this is a place worth being. I always find CNI meetings very meaty. Most of the sessions are in small breakout rooms, the best of these involving a brief presentation followed by vigorous discussion by a knowledgeable group. As an experiment, I've decided to try to capture my thoughts on the meeting as it proceeds via this blog, so the next few entries will be about my experience of CNI. I hope this helps me actually follow up on some of the ideas sparked here.

Posted by efc at 10:19 PM

040406 (Tuesday)

Linux Desktop

This Slashdot story about a Dan Gilmore article was the second story I've read today about Linux on the desktop appearing more viable. Earlier I'd seen a Chad Dickerson column in InfoWorld. Both were about the ease with which they had installed and used the Xandros Linux distribution. Come to think of it, a few days back I'd read this story about ZeroInstall on top of a Linux desktop/filer named ROX. ROX seems to have application bundles done right and used that to create an outstanding simple method of installing software. All of these stories are impressing me with how functional Linux is becoming on the desktop. It kind of makes me want to install Linux on an old laptop some time and see what it is like. I love my MacOS X machine, but I also like the idea of strong software on our existing Intel hardware base in the Libraries. I wonder how far we are from a viable choice to move our public and staff machines to Linux?

Posted by efc at 10:08 PM

040403 (Saturday)

Dare to Be Brave

I like my brother Christopher's vision of leadership. He calls us to dare to be brave. Easier said than done, though he has been pretty good at doing it lately. Luckily, Christopher is working on a book that may add a few more hints for us mere mortals!

Posted by efc at 7:52 AM

040402 (Friday)

Nature's Open Access Debate

Nature hosted an interesting dabate on open access a few years back. Now it is at it again with a new web focus on access to the literature. This discussion got going last month, and even the University of Minnesota's own Andrew Odlezko has chimed in with a piece. The content of this discussion should be available even to non-subscribers. Give it a try!

Posted by efc at 5:16 PM

Open Source OpenURL

Cool! We were pleased to launch our own open source project a few months back (see LibData), so I'm always happy to see new open source library projects on the block. Today I got word about OLinks from OhioLINK.

Tom Sanville wrote:

Some you know already know that OhioLINK has created its own URL resolver for journals and other materials. OLinks is an open-source OpenURL resolver intended for use by library consortia, individual libraries, and other organizations with a need to manage citation linking using the OpenURL standard. Introductory information can be found at [this site]. As noted on the site, you are advised to contact Thomas Dowling on the OhioLINK staff if you are interested in using it.

Thank you, OhioLINK!

Posted by efc at 4:10 PM

Free Culture Freed

Lawrence Lessig's new book, Free Culture, was published with a Creative Commons license with allows for derivative works. The result has been an amazing flurry of derivatives, including an audio version launched by AKMA. Does this demonstrate in any way a relationship between freeing content and creativity? Are we well served by dozens of versions of Lessig's work? Are we diminishing his own incentive to create?

Posted by efc at 1:06 PM

040401 (Thursday)

LOCKSS Wiki

It seems that wiki's are getting more mainstream every day. We've been participating in a Humanities project with the LOCKSS folks at Stanford. I just got a message from Vicky Reich letting us know that LOCKSS has set up a wiki for the project. She says:

Wikis are widely used to facilitate collaboration via the Web -"wiki-wiki" is Hawaiian for "quick". The biggest Wiki in use is the Wikipedia, http://en.wikipedia.org/wiki/Main_Page

The software is fairly easy to use and facilitates documentation and collaboration. We find them fun and extremely useful. We will be adding access control so only humanity project team members will have read and write privileges.

That last reiterates a way in which the wiki way will be compromised as they do hit the mainstream: wiki's are founded on open editorial access, but many mainstream wikis (including our staff web server) will insist on some access control.

Posted by efc at 1:38 PM

The Wiki Way

I've been working on redesigning our staff web for the past couple weeks and am very pleased with the results of using a collaborative tool called a wiki. We are using PmWiki, one of many wiki engines out there. So far PmWiki has been a great way to enforce a common template on a site, yet keep the site very open to editorial input from a broad range of players who don't have much HTML experience. Now the real acid test... Will the staff like the new site and help us maintain it? After all, one of the main problems with the old site has been how stale it gets. Will the more open editing framework of a collaborative wiki site really result in fresher content that the staff feels more ownership for? We'll see...

Posted by efc at 11:49 AM

Visualize Discussions

An entry at blog.org points to an interesting paper (PDF) on visualizing the state of a discussion online. I think it is helpful to see folks pushing past text as a way to represent the vigor and other attributes of online interaction. I wonder how we can use visualization techniques to reintroduce serendipitous browsing through library collections in an age when our libraries are often decentralized (no "main library") and full of electronic resources (nothing "on the shelf")?

Posted by efc at 11:22 AM

Alive!

It looks like the University Libraries blog service is now up and live. I didn't really follow through with any blogging in the past couple weeks, so I'd better get on the ball. I do want to congratulate the team in our Digital Libraries Development Lab for getting this service off the ground. We will be introducing the service to two University Senate committees in the next two weeks and I hope that we can help them grasp what an exciting opportunity for scholarly communication this represents. Kudos to Shane and Company for nurturing this concept!

Posted by efc at 11:17 AM