OK, one more note about re-imagines what an online conversation can be. I am stunned at what Tim and his team have accomplished in less than one year, and the kind of creativity behind the new Talk feature is a great example of the dynamic ideas inspired by really thinking about how to serve a community.
Mary has been working on an Open Source Religious Education site idea. I don't know that conversations had been part of the idea, but if we were to implement them, I think the Talk model could be very exciting.
I feel that we too often feel we have to specify a service, understand all the functional requirements, survey the community, get it right the first time. This leads to the "if we build it will they come". The "it" becomes really big, and the "coming" becomes really important. What if we built just enough to get them to start coming? If we fail, "they" don't show up, we try something else. If "they" come, we wait for them to demand services, to tell us what should come next, to help us understand the functional requirements. We build for the community that grows. More of an "if they come then we build it model," or as a colleague put it today, a "dream of fields."
I have no idea if LT really evolved this way, I'd love to know. But it sure feels like it has. I think we need to learn to evolve library systems in similarly iterative ways. I fear we will miss the boat otherwise.
Do you have an idea for an application you wish someone would write? Here's a new idea: MyDreamApp is a contest in which the winner will have their idea turned into a real application. Kind of American Idolish, this could be fun from the idea-generation perspective or from the voting-em-out perspective. I wonder what will come of it.
Here's a high profile company offering a service that clearly violates DMCA, and about time! Ars Technica reports that Circuit City offers to duplicate DVDs for various purposes from backup to reformatting data for other devices. To make this work they have to rip the DVD's, circumventing the encryption present on the discs. How long will this last?
Apple released the MacBook this week. It is the little sib to the MacBook Pro that was introduced a few months back. Alex and I stopped in at the Rosedale Apple store and took a look. What a nice machine! It is amazing to me that all Mac laptops now sport built in iSight cameras and FrontRow remote controls. So many cool things to do with these. The MacBook is not only on sale (and my sister has already bought one!), it has also been disassembled and reviewed already.
Wow, this could work. I know that when I'm walking I like to try to reach friends or family with my cellphone. But sometimes I just hold off thinking, "it is dinnertime there" or "I don't really have anything to say." What if, instead, I were to drop in on an ongoing family discussion, hear the updates and advice from siblings and parents, leave my own 2 cents worth, then check out. I think I might do this, even daily! The RadioActive project at the MIT Medial Lab is setting out to create a tool like this. What a great idea!
Well, it seems to be happening. We are (finally) seeing a divorce between the Integrated Library System (ILS) and the Online Public Access Catalog (OPAC) we share with our patrons. The introduction of the new catalog at North Carolina State University powered by Endeca put the rest of us to shame. Our own vendor, Ex Libris has been planning a product called Primo since last year. Finally, a library vendor for whom I have a great deal of respect, TLC, has decided to broker both Endeca (they did this two years ago!) and a product called AquaBrowser.
While I've been talking about the end of the ILS as we know it, others have been acting! It is impressive to see how far they've come, and time for us to do our part to move this market along.
You might be interested in tracking a new service from Google. Try a search for "population of minnesota" on Google today and you will see something new at the top of the results: an answer! Instead of pointing off to a web site, Google is putting factual answers at the top of the results for some searches. Give a few others a try ("who is jimmy carter"). More about this can be found at...
I don't think they sleep over there!
I've been struck today by how long lasting programming technology can be and by how quickly it all changes.
Gary Fouty, a librarian in our science library, surprised me today when he revealed his talent for writing code. The beautiful thing was that he writes in Pascal, a language I left behind long ago, but one which, he reminded me, still serves awfully well and has a number of strengths. Gary has written a program to take search results in MARC form from our Aleph system and transform them into HTML for pasting into a blog, from which he delivers RSS feeds. Very graceful work. The results can be seen in the new books blog he manages.
The great thing for me is that following Gary's code led me back to Pascal, and a nice Pascal compiler for the Mac. Free Pascal (FPC) is a nifty Turbo Pascal compatible compiler for dozens of platforms including Windows, Linux, and Mac. There is even a detailed XCode Integration Kit that helps you use Apple's new coding tools with Pascal. Remember the good-ol-days of Inside Macintosh and its Pascal interface to the Mac toolbox?
Meanwhile the future is rushing at us full speed. A great article at Adaptive Path describes what they term Ajax (more commonly called "remote scripting"), the arrangement of tools and technique that make some of the coolest interfaces on the web tick (see Google Maps and Mail and a nifty map of Switzerland for examples). This model is turning the hurry up and wait paradigm of the web on its head. As the author concludes, "the challenges are for the designers of these applications: to forget what we think we know about the limitations of the Web, and begin to imagine a wider, richer range of possibilities." Another nice article on this technique can be found at Apple (it even credits Microsoft!). It looks to be a very interesting year.
One of my recurring arguments with auditors and some security staff revolves around how to secure passwords. They often push for a variety of measures, many of which I think are counterproductive and actually decrease any protection a password might offer. One of the worst offenses is the requirement to force a password change on users on some regular schedule. Last year I enjoyed a minor victory here at the U when I was able to convince the auditor, the head of network security, and the CIO that we didn't have to require 180 day auto-expiring passwords on machines with private data.
In documenting that case I pointed to a few articles including this PDF and a few ACM articles not available on the open web. Today I learned of a different article devaluing the password, that of a Microsoft security staff member arguing for long pass phrases instead: why you shouldn't be using passwords. I found this article on Slashdot which also included interesting comments and a link to an earlier story on the site.
It can be less than trivial to find you way into the Google print environment, since you need to know the id numbers of actual books.
I just found this blog post from a month ago that provides links to several titles so that you can explore a bit. Just in case that site goes away, here is one out of copyright work and one still in copyright example.
Think Secret is reporting that the wonderfully named Delicious Monster Software is working up an app called Delicious Library. As a librarian with a Mac, this is a piece of software I've always wanted! Make sure to page through the pictures at the Think Secret site. Even if this app is vapor, the pictures present dozens of interface innovations that would make our academic library system catalogs a whole lot more inviting. Why can't we do this with our catalogs? And check out the iSight barcode idea! Note that an iSight camera costs only $150, less than most dedicated barcode wands, and it can do much more than just read barcodes. I hope this product is real so I can buy an iSight to see whether this technology actually works.
No, this is not a message about Diebold! I'm meeting with some staff next week who have an interest in automating the library elections we hold periodically. Since these are pretty friendly contests, I don't think we need quite the audit trail of a government election, but we do need to maintain anonymity and make participation easy. If you have any suggestions of systems that might facilitate our internal governance elections, let me know. If you want to suggest issues that I should keep in mind when approaching this topic, let me know those as well.
One idea I've had is to verify identity with X.500 and keep voting records in a back-end database so that voters could change their minds by recalling their own ballot until the election is closed. The problem is that I would not want the database to easily identify the voter or the record of a single voter over multiple elections. But what about this... Have the software hash the userid of the voter and the title of the given election into a value by which you key their vote in the database. If they return, the same hash should be generated resulting in the same vote being modified. But if a sysadmin looks at the database, they just see a bunch of hashes, unique to that election, with no simple way to attribute a particular vote to a particular staff member. Sure, anyone with sufficient tech knowhow and time could crack this system without much trouble, but the motivation for doing so would be very slight, so this is probably not a great threat. Thoughts?
Dan Debertin has introduced me to Ruby, a prorgramming language cross between Perl & Smalltalk. It has been a fun discovery and resonated last night with an essay by Paul Graham about hackers and their tools. The portions of this essay that discuss what hackers want, why they code, how to care for them if you want them to join your cause are interesting. As some of our programmers have said, doors matter! Paul has a great Ruby quote on his site: "Some may say Ruby is a bad rip-off of Lisp or Smalltalk, and I admit that. But it is nicer to ordinary people." (attributed to Matz at LL2)
I participated in an advisory board meeting of the Documenting Internet2 project this week. As we considered appraisal strategies for collecting electronic documents and records from I2, I wondered whether appraisal would shift into a retrospective task when dealing with the electronic record of organizations. Will it be easier to collect "everything" (or whatever can be easily acquired, anyway) and then become selective later by mining that trove for the important bits. Estimates at this meeting suggested that at least 95% of "everything" is not valuable to researchers, and appraisal has been the traditional tool to ferret out the golden 5% (or even 1% in many cases). In the electronic realm, though, could it be a wiser use of human capital to collect the 100% and then mine out the 5% as needed?
One dash of cold water on this approach has been the dearth of data mining tools. However, the rise of litigation support software may be one place to hunt for useful models. The U is also home to a strong data mining research group in the DTC. Perhaps we could work with them to develop research tools for future archives?
Finally, we are beginning to see this approach emerge on the personal computer desktop. Last week Steve Jobs announced that the next generation of Mac OS X (10.4 or Tiger) will incorporate a technology Apple calls Spotlight. Spotlight will be a very fast search engine for the Mac OS. I wonder if, as search gets fast and easy enough, it replace organization? We all know how difficult it is to create a good filing system and stick to it, even on a computer. As search improves, will we just give up on organization and instead rely on searching to pull together the documents we need as we need them?
A couple emails hit my inbox today about HP and Dell "going green". While I think that's overstating matters a bit, a Mercury News story does describe two new programs: "HP said it will accept old electronics equipment, from PCs to TVs, that are dropped off at Office Depot outlets across the country from July 18 to Sept. 6, free of charge. ... And Dell went one further: It will pick up old computers and their accessories at the homes of customers. The catch: You have to buy a new Dell." Carnegie-Mellon has also had a nice Green Design page about environmentally friendly computer design and recycling.
Today and tomorrow I'm meeting with other CIC Library IT Directors in Chicago. This afternoon the question of productivity use of PDAs and laptops led to a riff on the personal use of public university equipment. Some states (such as Ohio) have very strict ethics laws prohibiting the personal use of state equipment, and all our public universities face some degree of state restriction on such use. I find this amazingly shortsighted. The smaller devices get, the more they become, at heart, personal devices. For the state to go to the expense of buying this equipment, and then tell you that you may not integrate your life using it, is completely missing the point of the technology. Devices like PDAs and laptops, if they improve productivity at all, do so by allowing users to build their lives around the devices. An employee who can readily know what personal appointments are on their calendar or can quickly respond to their daughter's email query about an evening ride to softball will be a more productive employee as surely as they will be a more balanced person. I understand the ethics concerns behind these restrictions, but can't we find some way to encourage personal use of equipment as long as it does not interfere with the work purpose of the equipment?
In a promising new twist, mod_oai will bring the Open Archives Initiative protocol to Apache. What sort of services will this make possible.
I've been a fan of Simpson Garfinkel since my days in the NeXT Users Group at MIT. Simpson was an active NeXT user and has since gone on to become a tech writer with a clear point of view, defending privacy as he engages the future. Today Simpson wrote about Google and Akamai as competitors, or at least fellow travelers on the the path to distributed terascale computing. What should Libraries be buying from Akamai and learning from Google?
Brewster Kahle gave the talk at the closing plenary of the CNI Spring Task Force meeting. Brewster just keeps on doing, he never seems to be daunted by the scope of large tasks. The amazing thing is that it works! He set out to capture the web, and the Internet Archive (IA) does that better than any other entity. He called on us to "put the best we have to offer within the reach of our children." Within reach, to Brewster (and to our children) means "on the web." He then walked us through a back-of-the-napkin calculation of what it would take, concluding that the goal is within reach of us today and within our budgets to boot. Are we ready to answer the call?
Books. The Library of Congress = 20M volumes = 26TB = $60,000 disk space. At 2 hours/book (without destroying the books) this is doable. Output back to book form costs $1/book. This print-on-demand solution is being demonstrated today by the BookMobile the Internet Archive has put on the streets not just of the USA, but also India, Egypt, and most recently rural Uganda.
Audio. 2M "saleable objects" of audio exist, but much of it behind IP regs that make it hard to deal with. The IA approached the "taper" community of people who have taken advantage of performance oriented rock bands who followed the Grateful Dead's lead into allowing fans to tape their music and exchange it for non-commercial use. "How would you like infinite bandwidth and infinite storage for free?" the IA asked the tapers. Guess what? They love the idea. 500 rock bands have given the IA permission to archive this material and share it for free. The tapers have already produced 10-20TB of concerts available on the IA.
Moving Images. Don't just consider the 100-200,000 mainstream films (half of them from India). Consider the 2M films created in the 20th century that document daily life. Some of these may be in your very own basement. One hour of film costs about $100 to convert. One hour of video costs only $15. The IA is also now capturing 20 channels of video from around the world 24/7 for about $500,000. It is estimated there may be about 400 channels around the world.
Software. The IA has received a DMCA exception to circumvent copy protection for the purpose of ripping some of the 50,000 software packages that exist to date. They are only allowed to rip titles from no-longer-supported operating systems.
Web. The IA now captures 20TB/month of web content. The WayBackMachine holds over 30B (yes, billion) pages from 50M sites on 15M hosts. Anna Patterson's search engine based on this corpus searches 4 times the number of sites covered by Google.
The Internet Archive does all this on a budget of about $4M or $5M each year. I don't know about you, but this leaves me breathless.
In order to preserve this growing corpus (libraries, Brewster notes, traditionally burn eventually) the IA seeks out partners around the world who can host copies of the data. The more different they are from the US the better. Right now a copy is held at the new library in Alexandria and negotiations are under way with a northern european country. Brewster estimates that the resources needed to maintain a mirror of IA are a PB of disk (that's petabyte), a GB of bandwidth, and $100M to set up an appropriate endowment for continued operation.
But if the "Universal Access to All Human Knowledge" goal articulated by Raj Ready of the Million Book Project is too vast, and even the "All Published Knowledge Available to the Kid in Uganda" is a bit far out, how about something easy, asks Brewster. What if we just tried to attack what we already have every right to collect? Let's go for "Public Access to the Public Domain."
In the USA the public domain is pre-1923 publications. In fact, Brewster points out, with the aid of Mike Klezman's (?) recently completed electronic version of the copyright registry, it is now easy to find out which material from 1923-1964 did not have their copyright renewed and are now also in the public domain. Let's go get this material! His proposal: give the IA a book and $10 and the IA will return to you the book unharmed plus a digital copy. Will we accept the offer? Oh, and by the way, the IA is also happy to accept video and $15/hour for the conversion of that to digital format. Oh, and did I mention that the IA will also host the digital documents on their servers "forever"?
I think we should take Brewster up on this offer. How much material do we have in the University of Minnesota collections which we could part with for a bit to let the IA digitize and store it? We should seriously consider a project to pump this material and the limited dollars required to the IA as fast as we can. This is a crazy idea at a crazy price point, let's try to sink Brewster under our enthusiastic response! The great thing is, we probably won't, he has not sunk yet.
P.S. Brewster also tossed off an idea about how to archive blogs in response to a question. His thought was that we should be able to subscribe to blog RSS feeds and simply archive everything we see announced via that mechanism. I wonder if we could auto-harvest RSS from UThink.
A group funded by the Mellon Foundation is trying to define the bounds of interaction between course management systems (CMS) and repositories. Their report should be available on the DLF web site by the end of May. In today's presentation to CNI they made three fundamental points: (1) users will be getting to repository content through a broad set of "course management" tools that extend well beyond CMS into PowerPoint, Weblogs, Citation Managers and the like; (2) repositories need to attend to a Checklist of requirements and desirables in order to interoperate with this layer of tools; and (3) the process used to build course content can be expressed as "Gather-Create-Share".
This "Gather-Create-Share" seems like a weak echo of Apple's "Rip, Mix, and Burn" campaign a few years ago. It is also the process that Lessig warns us is under threat given the intellectual property regime our country is putting into force. The session really didn't touch on the impediments that copyright puts in the way of the "Gather" step, but I was told that IP issues will be part of the Checklist when the group reports out to the DLF.
Ralph Quarles (IU) and I found each other at the reception. Ralph has offered to help us evaluate our computer support and seek an appropriate model for future support. He noted that he is ready for some ongoing contact with his colleagues at other CIC institutions. The Library IT Directors have that kind of forum in the CIC, but staff at his level, those actually running technology support operations, really don't have many opportunities to reach out to each other. I wonder if we should plan a day or two of professional "shoot the breeze" time at Minnesota for all the folks in these positions? We could do it as part of our investigative effort. This could both help this cohort build connections to one another and serve as a font of wisdom and warning for our own planning effort.
We ask our technology staff to do the seeming impossible. Our staff is not nearly large enough to manage the kind of deployment we've got around the Libraries. How can only six staff manage 600 machines? On the other hand, could it be a failure of imagination? When I arrived at the U the first significant decision I made was to kill our attempt to use Sun Ray "appliance" computers to replace public workstations in the Libraries. We had good reasons for that decision, but the fundamental problem was staring us in the face then and remains at the core of our troubles in ITS: we cannot support a deployment of 600 Windows workstations with so few staff. Why can't we change the rules? At MIT I watched an organization deploy and maintain thousands of workstations with fewer staff than we have available to us.
I believe we need to think outside the box, it may not be Sun Ray, but we must recognize our situation (a budget even more limited than it was in 2001) and devise creative solutions to meet our needs within those bounds. I am certain this means compromises, but not necessarily the ones that run our staff ragged without the reward of a computing infrastructure they can take pride in, tell the world about, and share with our community.
My frustration with our current situation expresses itself as a frustration with ugly machinery, and I do believe that computers should be in the process of fading out of sight, but that's a red herring. My real frustration is that I've allowed our expectations to be diminished by accepting the limits we've imposed on ourselves. I wonder if we shouldn't get the CIC equivalents of Directors of ITS together to share their frustrations and triumphs. We could certainly use some inspiration and, who knows, we might even be able to do something inspiring ourselves!
Reagan Moore from the San Diego Supercomputer Center discussed their SRB development. A very dense presentation left me with the basic impression that I need to understand this approach to data storage being developed as part of the NSF grid infrastructure. SRB "provides a uniform interface for connecting to heterogeneous data resources over a network and accessing replicated data sets." An alternative to LOCKSS? I did hear from one colleague who has already been reviewing SRB that it is a very complex bit of software to install and maintain.
Jerome McDonough, the Digital Library Development Team Leader at NYU gave a very informative presentation on video standards and preservation. The bottom line was that though 4:4:4 (truly uncompressed, an Apple codec and MJPEG2000 can provide this) video would be the right thing to capture and archive, unfortunately it is much too expensive to to store. At NYU they've stepped back to the compromise of capturing 4:2:2 video instead. He noted that anything other than 4:4:4 capture and storage was, in fact, allowing for lossy compression. This is fine until migration has to happen, at which point artifacts will creep in due to the recompression of images. Note that NYU uses TripWire to create and check up on MD5 checksums (can I use that on Thomas?) and UC Berkeley's OceanStore project is taking a stab at very large, high performing distributed storage solutions.
Steve Cawley and I attended a valuable "executive roundtable" bringing CIO's and University Librarians together to discuss "identity." As we discussed the challenges and successes of authentication and authorization in today's academic environment, I began to wonder if we are not focussing a bit too close to home. I note that while libraries felt very secure in their database and searching expertise, tools like Google snuck up outside our borders and transformed user expectations of searching and research so that now we are strangers in our own territory. What will the identity landscape look like five and ten years from now. Will users have an expectation that they can carry an identity into our organization that was credentialed beyond our borders and control?
Some hint of that future may have appeared in the form of a discussion about e-portfolios and their impact on our data models. A move toward portfolios is a move toward users asserting control of their own data (a user as holding the copy of record of their transcript, for example). This turns on its head our current data model where institutions bear the responsibility for holding and managing the continuity of that sort of data. I wonder whether the solution to the buy-in problem for institutional repositories, for example, might be an individual repository model sewn together by metadata harvesting like OAI? Who will be the "portfolio banks" of the future who (for a small fee) manage the physical systems on which your e-portfolio resides and ensure that the policies and permissions you specify for your information are actually carried out when sharing your portfolio?
Some additional notes: Who assigns identities? Who decides their scope? Some discussion of OKI's concept of "authN" (authentication) and "authZ" (authorization). The role of UIN (numbers) vs. NetID (typically names). Distinguishing between the deed of authentication and the trails and logs kept about that deed and subsequent actions (librarians are loath to keep any trail, but doing the deed might be fine). See SPEC Kits 277 and 278 from the ARL and "Mirage of Continuity" by Brian Hawkin. Credit Brad with the notion "sustainable economies in tension with the frontiers of innovation" (all you really have to do to make technology sustainable is stop changing) and Beth with "the economics of compromise" (the notion that organizations are much more willing to work with you after they have experienced a compromise and its costs than before). If setting up a portfolio banking business, what would be your "free as in beer" service lure and what would you charge for? Would password management be part of the package?
This Slashdot story about a Dan Gilmore article was the second story I've read today about Linux on the desktop appearing more viable. Earlier I'd seen a Chad Dickerson column in InfoWorld. Both were about the ease with which they had installed and used the Xandros Linux distribution. Come to think of it, a few days back I'd read this story about ZeroInstall on top of a Linux desktop/filer named ROX. ROX seems to have application bundles done right and used that to create an outstanding simple method of installing software. All of these stories are impressing me with how functional Linux is becoming on the desktop. It kind of makes me want to install Linux on an old laptop some time and see what it is like. I love my MacOS X machine, but I also like the idea of strong software on our existing Intel hardware base in the Libraries. I wonder how far we are from a viable choice to move our public and staff machines to Linux?
Cool! We were pleased to launch our own open source project a few months back (see LibData), so I'm always happy to see new open source library projects on the block. Today I got word about OLinks from OhioLINK.
Tom Sanville wrote:
Some you know already know that OhioLINK has created its own URL resolver for journals and other materials. OLinks is an open-source OpenURL resolver intended for use by library consortia, individual libraries, and other organizations with a need to manage citation linking using the OpenURL standard. Introductory information can be found at [this site]. As noted on the site, you are advised to contact Thomas Dowling on the OhioLINK staff if you are interested in using it.
Thank you, OhioLINK!