Brewster Kahle gave the talk at the closing plenary of the CNI Spring Task Force meeting. Brewster just keeps on doing, he never seems to be daunted by the scope of large tasks. The amazing thing is that it works! He set out to capture the web, and the Internet Archive (IA) does that better than any other entity. He called on us to "put the best we have to offer within the reach of our children." Within reach, to Brewster (and to our children) means "on the web." He then walked us through a back-of-the-napkin calculation of what it would take, concluding that the goal is within reach of us today and within our budgets to boot. Are we ready to answer the call?
Books. The Library of Congress = 20M volumes = 26TB = $60,000 disk space. At 2 hours/book (without destroying the books) this is doable. Output back to book form costs $1/book. This print-on-demand solution is being demonstrated today by the BookMobile the Internet Archive has put on the streets not just of the USA, but also India, Egypt, and most recently rural Uganda.
Audio. 2M "saleable objects" of audio exist, but much of it behind IP regs that make it hard to deal with. The IA approached the "taper" community of people who have taken advantage of performance oriented rock bands who followed the Grateful Dead's lead into allowing fans to tape their music and exchange it for non-commercial use. "How would you like infinite bandwidth and infinite storage for free?" the IA asked the tapers. Guess what? They love the idea. 500 rock bands have given the IA permission to archive this material and share it for free. The tapers have already produced 10-20TB of concerts available on the IA.
Moving Images. Don't just consider the 100-200,000 mainstream films (half of them from India). Consider the 2M films created in the 20th century that document daily life. Some of these may be in your very own basement. One hour of film costs about $100 to convert. One hour of video costs only $15. The IA is also now capturing 20 channels of video from around the world 24/7 for about $500,000. It is estimated there may be about 400 channels around the world.
Software. The IA has received a DMCA exception to circumvent copy protection for the purpose of ripping some of the 50,000 software packages that exist to date. They are only allowed to rip titles from no-longer-supported operating systems.
Web. The IA now captures 20TB/month of web content. The WayBackMachine holds over 30B (yes, billion) pages from 50M sites on 15M hosts. Anna Patterson's search engine based on this corpus searches 4 times the number of sites covered by Google.
The Internet Archive does all this on a budget of about $4M or $5M each year. I don't know about you, but this leaves me breathless.
In order to preserve this growing corpus (libraries, Brewster notes, traditionally burn eventually) the IA seeks out partners around the world who can host copies of the data. The more different they are from the US the better. Right now a copy is held at the new library in Alexandria and negotiations are under way with a northern european country. Brewster estimates that the resources needed to maintain a mirror of IA are a PB of disk (that's petabyte), a GB of bandwidth, and $100M to set up an appropriate endowment for continued operation.
But if the "Universal Access to All Human Knowledge" goal articulated by Raj Ready of the Million Book Project is too vast, and even the "All Published Knowledge Available to the Kid in Uganda" is a bit far out, how about something easy, asks Brewster. What if we just tried to attack what we already have every right to collect? Let's go for "Public Access to the Public Domain."
In the USA the public domain is pre-1923 publications. In fact, Brewster points out, with the aid of Mike Klezman's (?) recently completed electronic version of the copyright registry, it is now easy to find out which material from 1923-1964 did not have their copyright renewed and are now also in the public domain. Let's go get this material! His proposal: give the IA a book and $10 and the IA will return to you the book unharmed plus a digital copy. Will we accept the offer? Oh, and by the way, the IA is also happy to accept video and $15/hour for the conversion of that to digital format. Oh, and did I mention that the IA will also host the digital documents on their servers "forever"?
I think we should take Brewster up on this offer. How much material do we have in the University of Minnesota collections which we could part with for a bit to let the IA digitize and store it? We should seriously consider a project to pump this material and the limited dollars required to the IA as fast as we can. This is a crazy idea at a crazy price point, let's try to sink Brewster under our enthusiastic response! The great thing is, we probably won't, he has not sunk yet.
P.S. Brewster also tossed off an idea about how to archive blogs in response to a question. His thought was that we should be able to subscribe to blog RSS feeds and simply archive everything we see announced via that mechanism. I wonder if we could auto-harvest RSS from UThink.
I am concerned that the work of the Joint Committee of the Higher Education and Entertainment Communities may do more harm than good by legitimizing some role for higher ed in killing off P2P file sharing. I don't think we have a role, I think this is a fight between the RIAA and MPAA and American society, we will just get trampled in the middle. Still, a session updating us on the P2P issue at CNI was interesting. It is clear that EDUCAUSE is finding little workable technology to help satisfy industry demands (tools like Audible Magic and ICARUS are throwing out the legitimate baby with the illegal bathwater). Brewster Kahle was in the audience and asked us to please remember that the Internet Archive depends on P2P for distribution of its legitimate content. If we need an example of real life content dependent on P2P distribution, he welcomes us to point his way.
I am a pretty visual person and appreciate a well laid out graphical representation of an issue. I find one of the masters of our field to be Herbert Van de Sompel. I didn't attend his session today on Federations of Institutional Repositories, but I see the handouts in the CNI packet and am struck again by what lean, direct, and illuminating illustrations he comes up with. I don't know whether he makes this stuff up himself or employs some graphic talent on the back end, but his touch has been so consistent over the years in many contexts that I suspect the former. I hear many people laud the interface of SFX, few of whom realize just how much it is the vision of Herbert, who showed "rough" versions of SFX many years before it became a commercial product with virtually the same interface it still enjoys. If you want to see what I consider PowerPoint well-used, take a look at a presentation by Herbert some day.
By the way, his work on new roles for MPEG-21 & OAI & OpenURL in federating repositories is quite interesting, thinking way outside the box. Take a look at the D-Lib article he and a few colleagues wrote for a taste.
Ralph Quarles (IU) and I found each other at the reception. Ralph has offered to help us evaluate our computer support and seek an appropriate model for future support. He noted that he is ready for some ongoing contact with his colleagues at other CIC institutions. The Library IT Directors have that kind of forum in the CIC, but staff at his level, those actually running technology support operations, really don't have many opportunities to reach out to each other. I wonder if we should plan a day or two of professional "shoot the breeze" time at Minnesota for all the folks in these positions? We could do it as part of our investigative effort. This could both help this cohort build connections to one another and serve as a font of wisdom and warning for our own planning effort.
We ask our technology staff to do the seeming impossible. Our staff is not nearly large enough to manage the kind of deployment we've got around the Libraries. How can only six staff manage 600 machines? On the other hand, could it be a failure of imagination? When I arrived at the U the first significant decision I made was to kill our attempt to use Sun Ray "appliance" computers to replace public workstations in the Libraries. We had good reasons for that decision, but the fundamental problem was staring us in the face then and remains at the core of our troubles in ITS: we cannot support a deployment of 600 Windows workstations with so few staff. Why can't we change the rules? At MIT I watched an organization deploy and maintain thousands of workstations with fewer staff than we have available to us.
I believe we need to think outside the box, it may not be Sun Ray, but we must recognize our situation (a budget even more limited than it was in 2001) and devise creative solutions to meet our needs within those bounds. I am certain this means compromises, but not necessarily the ones that run our staff ragged without the reward of a computing infrastructure they can take pride in, tell the world about, and share with our community.
My frustration with our current situation expresses itself as a frustration with ugly machinery, and I do believe that computers should be in the process of fading out of sight, but that's a red herring. My real frustration is that I've allowed our expectations to be diminished by accepting the limits we've imposed on ourselves. I wonder if we shouldn't get the CIC equivalents of Directors of ITS together to share their frustrations and triumphs. We could certainly use some inspiration and, who knows, we might even be able to do something inspiring ourselves!
Ed Ayers gave a wonderfully funny and somewhat touching plenary address about the tensions between "Academic Culture and Computer Culture". His own work includes the well regarded Valley of the Shadow. He described the "communal autonomy" of the university as a defining characteristic. The heart of our institutions is the "mysterious exchange between student and teacher," that "intimate bubble" in which learning happens. He described this activity as a flame, both intense and vulnerable, and our universities as "massive structures to protect those flames." His suggestion was that we build lighter, smaller things that "simplify the vastness," things like instant local class nets that don't rely on the broader campus network. His mantra was "scale down." As he spoke of the intimate bubble of interaction between student and teacher, I began to wonder whether technology is not beginning to pierce that bubble with tools that allow for action in the world from the classroom and feedback from the world into the classroom.
I was really struck by Ed's flickering flame. I get awfully frustrated by the scale and scope of the University of Minnesota. I lament that its mission seems to be: "Everything to everyone!" The bureaucracy can seem endless, the commitment to excellence often lacking, the message muddled, on an on. But I stay here, and this flickering flame reminded me why. This academic enterprise is rather counter most of American culture, it is rather precious. In a culture where profit and individual heroism are prized, we operate an enterprise which spends every penny we are given in the name of creating moments of intimate, hidden victory. People discover who they are on our campus, they encounter mentors, they open their minds to one another. Sure, there is a lot of bureaucracy, not to mention a whole lot of drinking, backstabbing, and just getting by... but what if those things are the cover need to protect the flickering flame. If our culture actually realized how radical an enterprise this was, would we be allowed to get away with it? We all break the wind so that the flame of learning has a chance to move from one candle to another; maybe not every time we fire up the PowerPoint slides for another class, but maybe enough. Maybe this behemoth of an institution is what it takes to make this opportunity available to more of our neighbors.
Then again, that's pretty dreamy stuff. Quite the rationalization. I still want to stir up our University enough that we don't settle for less than excellence. The Libraries is where this starts for me. And more specifically our IT division and the work we do to put appropriate technology into the hands of our staff, students, and faculty.
Yikes! It's a good thing I'm not doing this blogging stuff more often!
I like my brother Christopher's vision of leadership. He calls us to dare to be brave. Easier said than done, though he has been pretty good at doing it lately. Luckily, Christopher is working on a book that may add a few more hints for us mere mortals!
Lawrence Lessig's new book, Free Culture, was published with a Creative Commons license with allows for derivative works. The result has been an amazing flurry of derivatives, including an audio version launched by AKMA. Does this demonstrate in any way a relationship between freeing content and creativity? Are we well served by dozens of versions of Lessig's work? Are we diminishing his own incentive to create?