I've become a fan of YouTube, even following the whole LG15 controversy. Yeah, I know. Still, every once in a while I run across something worth smiling about. Here's one from Weird Al that you should all watch: a commentary on copyright. Very sweet.
OK, one more note about re-imagines what an online conversation can be. I am stunned at what Tim and his team have accomplished in less than one year, and the kind of creativity behind the new Talk feature is a great example of the dynamic ideas inspired by really thinking about how to serve a community.
Mary has been working on an Open Source Religious Education site idea. I don't know that conversations had been part of the idea, but if we were to implement them, I think the Talk model could be very exciting.
I feel that we too often feel we have to specify a service, understand all the functional requirements, survey the community, get it right the first time. This leads to the "if we build it will they come". The "it" becomes really big, and the "coming" becomes really important. What if we built just enough to get them to start coming? If we fail, "they" don't show up, we try something else. If "they" come, we wait for them to demand services, to tell us what should come next, to help us understand the functional requirements. We build for the community that grows. More of an "if they come then we build it model," or as a colleague put it today, a "dream of fields."
I have no idea if LT really evolved this way, I'd love to know. But it sure feels like it has. I think we need to learn to evolve library systems in similarly iterative ways. I fear we will miss the boat otherwise.
Dharma and Beth have published an article about Documenting Internet2 in RLG DigiNews. I spent quite a bit of effort on this project last year and found web crawling for content much more reasonable an approach than I'd expected. This year we are giving Archive-It (from the Internet Archive and RLG) a go for similar crawling. The article is an effective summary of the project.
I just saw a really neat little demo of U Rochester's Libraries Staff Web. It turns out they've implemented their whole staff web as a Xerox Docushare site. This enables not only sharing completed documents, but also sharing the editorial and creation side of documents (something you can't really see without logging in). I was particularly struck by the image sharing this system made possible.
Wired ran an article this month about the vulnerabilities of RFID tags. Ed Vielmetti picked up on this in his blog and adds a few other useful resources. RFID has seemed quite cool for a while, though a bit intimidating on the big brother front. These concerns, though, raise a real question about how reliable RFID may be. The Wired article seems a bit alarmist to me, given that it focuses its libraries comments on an institution that decided not to implement RFID anyway. Are institutions that do move ahead doing so without "locking" tags and addressing these issues?
Well, it seems to be happening. We are (finally) seeing a divorce between the Integrated Library System (ILS) and the Online Public Access Catalog (OPAC) we share with our patrons. The introduction of the new catalog at North Carolina State University powered by Endeca put the rest of us to shame. Our own vendor, Ex Libris has been planning a product called Primo since last year. Finally, a library vendor for whom I have a great deal of respect, TLC, has decided to broker both Endeca (they did this two years ago!) and a product called AquaBrowser.
While I've been talking about the end of the ILS as we know it, others have been acting! It is impressive to see how far they've come, and time for us to do our part to move this market along.
Downhill Battle has taken down its bittorrent links to Eyes on the Prize at the request of Blackside (the producer) lawyers. No surprise there. Still leaves the questions of fair use open.
I've been thinking about the Eyes on the Prize distribution some more. I'd called it "stealing" and "clearly illegal" in my prior post and comments (since edited). That was inconsiderate. Let's consider the case more carefully. The claim made by Downhill Battle is that copying Eyes on the Prize for the purpose of showing it at screenings on 2/8 is fair use. Fair use must be evaluated by four factors, lets look at the four factors with regard to this case. Remember, I am not a lawyer. I am not even an expert in copyright. I'm just doing this exercise to help with my own thinking. Your milage may vary!
In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include���
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
In this case the copies are being made for showing during Black History Month and to illuminate the tensions between copyright and the transmission of culture. As long as these copies are used only for such non-profit educational purposes, I think there is likely to be a reasonable for fair use on this factor.
Note that the use is not "transformational". While the screenings at which the documentary is presented may create a critical context that changes its role (a conversation about copyright in addition to the lessons of civil rights), this new context does not seem to me to really transform the work. As a result, I would not anticipate a slam-dunk case for fair use on the first factor.
(2) the nature of the copyrighted work;
Eyes on the Prize is a television miniseries documentary. The courts seem to treat fact-based material more generously w/r/t fair use than fictional material. This is clearly factual material. On the other hand I think visual material, like TV or film, tend to get more protection than some printed works. This may be a wash or it may lean very slightly toward fair use.
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
Well, we are being asked to copy the whole show. In fact, each episode of the show is probably to be considered a complete work. I think this factor clearly tilts against fair use.
(4) the effect of the use upon the potential market for or value of the copyrighted work.
Now this gets interesting. Since the producers and PBS are no longer selling copies of the series, is there a market at all? They claim to be working on re-securing the rights they need to distribute the work, and if they succeed there should be a decent market for the DVD or other distribution. Even so, does the mp4 distribution really take away from that market potential? I think a case could be made that this distribution and the publicity and screenings surrounding it will increase the market for this series, should it ever be distributed officially again. I know I am now interested in buying a copy, when I'd forgotten about the series before all this. In my mind this factor leans toward fair use.
Hm. Factors (1) and (4) tilt toward fair use, factor (3) tilts against fair use, factor (2) may be a wash, but slightly toward fair use in my estimation. That adds up, in my view, to fair use! Downhill Battle has a point.
Now, this is not a legal ruling in any sense, and you have to do you own analysis of the factors before making your own decision. And document your own decision in case you are ever called to defend it in a court.
An interesting day of copyright today. Kenneth Crews is with us in Minnesota and gave a great workshop for our staff today (faculty get a taste tomorrow). And when I got home I found Mary excited about a project at Downhill Battle to encourage people to copy Eyes on the Prize. This classic documentary about the civil rights movement of the 1960's is not in legal distribution because the rights granted for the clips used have expired and new rights have not been cleared yet by the production company. Civil disobedience over copyright issues. Interesting times.
So it may be a bit much to ask the Libraries to get on the criminal side of a copyright issue, but what if Libraries around the country (and ours in particular) took part in the Downhill Battle 2/8 Black History Month event to host public showings of episodes of the documentary? Of course, we would not show the illegally downloaded versions from the net, but the legal copies from our collections. The discussion fostered, though, could still be about the difficulty of preserving critical pieces of culture in an era of tough copyright enforcement.
The RLG has released new guidelines for their Cultural Materials Initiative. Note the extremely slim descriptive requirements: creator, type, title, date, id, and pointer. It appears that a record would be acceptable with just a unique id, a work type, and a pointer to a surrogate if the creator, title, and date were not known. There are other elements to the "value-added" and "bonus" segments of the "core fields," but the admission seems to be that thorough metadata is very hard to get.
Think Secret is reporting that the wonderfully named Delicious Monster Software is working up an app called Delicious Library. As a librarian with a Mac, this is a piece of software I've always wanted! Make sure to page through the pictures at the Think Secret site. Even if this app is vapor, the pictures present dozens of interface innovations that would make our academic library system catalogs a whole lot more inviting. Why can't we do this with our catalogs? And check out the iSight barcode idea! Note that an iSight camera costs only $150, less than most dedicated barcode wands, and it can do much more than just read barcodes. I hope this product is real so I can buy an iSight to see whether this technology actually works.
Slashdot is carrying a nice little interview with Jimmy Wales, the creator of the Wikipedia. If you've not yet discovered this wonderful example of "open content" you should take a peek. The wikipedia's philosophy of openness not only of access, but also of authorship is an inspiring extension of what libraries have been about. What if we didn't just share the reference collection, but helped co-author it? Amazingly, the wikipedia organizers consider $50,000 of support a big deal. Libraries regularly attract bigger grants from the feds and organization like Mellon with a whole lot less to show for it. I wonder what the wikipedia could do with a $250,000 Mellon grant?
No, this is not a message about Diebold! I'm meeting with some staff next week who have an interest in automating the library elections we hold periodically. Since these are pretty friendly contests, I don't think we need quite the audit trail of a government election, but we do need to maintain anonymity and make participation easy. If you have any suggestions of systems that might facilitate our internal governance elections, let me know. If you want to suggest issues that I should keep in mind when approaching this topic, let me know those as well.
One idea I've had is to verify identity with X.500 and keep voting records in a back-end database so that voters could change their minds by recalling their own ballot until the election is closed. The problem is that I would not want the database to easily identify the voter or the record of a single voter over multiple elections. But what about this... Have the software hash the userid of the voter and the title of the given election into a value by which you key their vote in the database. If they return, the same hash should be generated resulting in the same vote being modified. But if a sysadmin looks at the database, they just see a bunch of hashes, unique to that election, with no simple way to attribute a particular vote to a particular staff member. Sure, anyone with sufficient tech knowhow and time could crack this system without much trouble, but the motivation for doing so would be very slight, so this is probably not a great threat. Thoughts?
A colleague passed me this message which echoes my sense that some protective clauses we fight to include in our contracts are nearly worthless in the real world.
About six months ago, [our] University Libraries was faced with a decision about continuing our access to what was formerly called Elsevier's Academic Freedom Collection. We had subscribed to the package of Academic Ideal e-journal collection since 1998 through our consortium. Elsevier wasn't willing to work through consortium arrangements and wasn't willing to provide the group of journals as a package any longer. [Our] Libraries' acquisitions budget had been cut and we were facing yet another reduction. We could not afford to subscribe individually to all the previously owned journals via ScienceDirect. We renewed 34 titles in print. We also could not afford to pay the annual access fee to maintain ScienceDirect linking to the backfiles. We chose the option of receiving the Ideal/Freedom journal backfile data that we had purchased by our several years of subscriptions. Elsevier sent us 8 DLT tapes about two months after our request for the data. After another frustrating two months of locating the outdated tape drives needed to open and access the tapes, we have discovered that there is duplicated data and that the data does not appear in any kind of rational order.
It will be interesting to see how they fare. I'm afraid that the data in vendor systems will get more and more complex and that a "dump" of this data will be less and less useful to anyone outside that vendor's shop. To make these clauses meaningful will require that we develop some well known formats and then demand the offloaded data meet these specs. Of course, none of our contracts currently contain such requirements and in any case no such standards currently exist.
I have a similar concern with regard to source code escrow agreements. How much good does it do us to have source code without the suite of compilers, libraries, and tools that it takes to build a given application? Even if we could build it, would we have the skills to do so with confidence? In most cases, wouldn't we migrate to an alternate vendor's product before taking on maintenance of a defunct vendor's product?
I participated in an advisory board meeting of the Documenting Internet2 project this week. As we considered appraisal strategies for collecting electronic documents and records from I2, I wondered whether appraisal would shift into a retrospective task when dealing with the electronic record of organizations. Will it be easier to collect "everything" (or whatever can be easily acquired, anyway) and then become selective later by mining that trove for the important bits. Estimates at this meeting suggested that at least 95% of "everything" is not valuable to researchers, and appraisal has been the traditional tool to ferret out the golden 5% (or even 1% in many cases). In the electronic realm, though, could it be a wiser use of human capital to collect the 100% and then mine out the 5% as needed?
One dash of cold water on this approach has been the dearth of data mining tools. However, the rise of litigation support software may be one place to hunt for useful models. The U is also home to a strong data mining research group in the DTC. Perhaps we could work with them to develop research tools for future archives?
Finally, we are beginning to see this approach emerge on the personal computer desktop. Last week Steve Jobs announced that the next generation of Mac OS X (10.4 or Tiger) will incorporate a technology Apple calls Spotlight. Spotlight will be a very fast search engine for the Mac OS. I wonder if, as search gets fast and easy enough, it replace organization? We all know how difficult it is to create a good filing system and stick to it, even on a computer. As search improves, will we just give up on organization and instead rely on searching to pull together the documents we need as we need them?
Michael Lesk has made the US copyright renewal registry (1923-1963) searchable. This is a big help in identifying book titles which may be out of copyright even though they were published between 1923 & 1964.
Boy, is that ever a dull term! We think a lot in libraries about how we can put more information at the fingertips of our users with just a single search. MetaSearch attempts to knit together our patchwork quilt of vendors and databases into one unified set of results for users. Fat chance! All we seem to be able to do is slow down search results and present a hodgepodge of unlikely-bedfellow results. Still, metasearch is a worthy goal and we keep trying. I think Amazon is demonstrating an interesting alternative model with it's a9.com service. There you will find websearch results from Google in one column and a set of results from Amazon in a second column. This is in some ways similar to the multiple layers of results found at Teoma. Can we apply this to library systems? Could we show Google results side by side with results from our local resources? If we don't, will Google eat our lunch anyway, especially now that they are negotiating with commercial vendors to bring more of the "dark web" to light?
I had a nice day today at our local ARLD (Academic and Research Libraries Division of the Minnesota Library Assocation) Day conference at the Arboretum in Chanhassen. Most interesting to me was a presentation on the "Googlization of Library Values" by librarians from St. Cloud State, St. Catherine's, and Carleton College. I was expecting the usual library lament about how we have to resist the dominion of google which is teaching our patrons that search is simple and everything is on the net. Surprise! Every one of the presenters spent their time sharing positive lessons we need to learn from Google. Robin Ewing asked us to learn from three core values demonstrated by Google: vision, usability, and whimsy. We all know Google thinks big and keeps stuff simple, but I had not really valued their sense of whimsy. Upon some reflection I find it true: Google takes the time to make things a little fun, from their name and logo to things like allowing their interface to be translated into Klingon and Elmer Fudd. Can we make library services anything other than deadly dull?
John clued me in to SRW today. I thought I had not heard of it, but have just found that it is the name for Z39.50 Next Generation, something I had hear of. Do you think Z39.50's reputation is so poor this group had to choose a new name?
There has been a controversy swirling around the net about some pictures taken by a (now fired) Kuwait airport worker of the coffins of US soldiers being placed aboard transport home. We sometimes imagine that the net makes information free (though in this case it had something to do with the Freedom of Information Act), but I wonder how many people will be able to reach the http://www.thememoryhole.org site which was distributing similar pictures. It seems the US Government has (maybe) shut it down. A chilling sign of the times, and a reminder of just how fragile even the net can be? Or not? The NYT is running the story, with photos.
A nice mainstream article on the many uses of Creative Commons licenses is available at Business2.0. If you don't know what the Creative Commons is, you should. This article may make for a gentle intro alongside real-life stories of how it is making a difference. [Source: OAN]
In a promising new twist, mod_oai will bring the Open Archives Initiative protocol to Apache. What sort of services will this make possible.
Brewster Kahle gave the talk at the closing plenary of the CNI Spring Task Force meeting. Brewster just keeps on doing, he never seems to be daunted by the scope of large tasks. The amazing thing is that it works! He set out to capture the web, and the Internet Archive (IA) does that better than any other entity. He called on us to "put the best we have to offer within the reach of our children." Within reach, to Brewster (and to our children) means "on the web." He then walked us through a back-of-the-napkin calculation of what it would take, concluding that the goal is within reach of us today and within our budgets to boot. Are we ready to answer the call?
Books. The Library of Congress = 20M volumes = 26TB = $60,000 disk space. At 2 hours/book (without destroying the books) this is doable. Output back to book form costs $1/book. This print-on-demand solution is being demonstrated today by the BookMobile the Internet Archive has put on the streets not just of the USA, but also India, Egypt, and most recently rural Uganda.
Audio. 2M "saleable objects" of audio exist, but much of it behind IP regs that make it hard to deal with. The IA approached the "taper" community of people who have taken advantage of performance oriented rock bands who followed the Grateful Dead's lead into allowing fans to tape their music and exchange it for non-commercial use. "How would you like infinite bandwidth and infinite storage for free?" the IA asked the tapers. Guess what? They love the idea. 500 rock bands have given the IA permission to archive this material and share it for free. The tapers have already produced 10-20TB of concerts available on the IA.
Moving Images. Don't just consider the 100-200,000 mainstream films (half of them from India). Consider the 2M films created in the 20th century that document daily life. Some of these may be in your very own basement. One hour of film costs about $100 to convert. One hour of video costs only $15. The IA is also now capturing 20 channels of video from around the world 24/7 for about $500,000. It is estimated there may be about 400 channels around the world.
Software. The IA has received a DMCA exception to circumvent copy protection for the purpose of ripping some of the 50,000 software packages that exist to date. They are only allowed to rip titles from no-longer-supported operating systems.
Web. The IA now captures 20TB/month of web content. The WayBackMachine holds over 30B (yes, billion) pages from 50M sites on 15M hosts. Anna Patterson's search engine based on this corpus searches 4 times the number of sites covered by Google.
The Internet Archive does all this on a budget of about $4M or $5M each year. I don't know about you, but this leaves me breathless.
In order to preserve this growing corpus (libraries, Brewster notes, traditionally burn eventually) the IA seeks out partners around the world who can host copies of the data. The more different they are from the US the better. Right now a copy is held at the new library in Alexandria and negotiations are under way with a northern european country. Brewster estimates that the resources needed to maintain a mirror of IA are a PB of disk (that's petabyte), a GB of bandwidth, and $100M to set up an appropriate endowment for continued operation.
But if the "Universal Access to All Human Knowledge" goal articulated by Raj Ready of the Million Book Project is too vast, and even the "All Published Knowledge Available to the Kid in Uganda" is a bit far out, how about something easy, asks Brewster. What if we just tried to attack what we already have every right to collect? Let's go for "Public Access to the Public Domain."
In the USA the public domain is pre-1923 publications. In fact, Brewster points out, with the aid of Mike Klezman's (?) recently completed electronic version of the copyright registry, it is now easy to find out which material from 1923-1964 did not have their copyright renewed and are now also in the public domain. Let's go get this material! His proposal: give the IA a book and $10 and the IA will return to you the book unharmed plus a digital copy. Will we accept the offer? Oh, and by the way, the IA is also happy to accept video and $15/hour for the conversion of that to digital format. Oh, and did I mention that the IA will also host the digital documents on their servers "forever"?
I think we should take Brewster up on this offer. How much material do we have in the University of Minnesota collections which we could part with for a bit to let the IA digitize and store it? We should seriously consider a project to pump this material and the limited dollars required to the IA as fast as we can. This is a crazy idea at a crazy price point, let's try to sink Brewster under our enthusiastic response! The great thing is, we probably won't, he has not sunk yet.
P.S. Brewster also tossed off an idea about how to archive blogs in response to a question. His thought was that we should be able to subscribe to blog RSS feeds and simply archive everything we see announced via that mechanism. I wonder if we could auto-harvest RSS from UThink.
A group funded by the Mellon Foundation is trying to define the bounds of interaction between course management systems (CMS) and repositories. Their report should be available on the DLF web site by the end of May. In today's presentation to CNI they made three fundamental points: (1) users will be getting to repository content through a broad set of "course management" tools that extend well beyond CMS into PowerPoint, Weblogs, Citation Managers and the like; (2) repositories need to attend to a Checklist of requirements and desirables in order to interoperate with this layer of tools; and (3) the process used to build course content can be expressed as "Gather-Create-Share".
This "Gather-Create-Share" seems like a weak echo of Apple's "Rip, Mix, and Burn" campaign a few years ago. It is also the process that Lessig warns us is under threat given the intellectual property regime our country is putting into force. The session really didn't touch on the impediments that copyright puts in the way of the "Gather" step, but I was told that IP issues will be part of the Checklist when the group reports out to the DLF.
Random thought... Could we cut off much of the unwanted workstation traffic by limiting Public Browser in a new way? What would happen if public browser refused to allow more than 100 characters in any text field of any form? Would that be enough to kill use for email, but still allow research use?
Ralph Quarles (IU) and I found each other at the reception. Ralph has offered to help us evaluate our computer support and seek an appropriate model for future support. He noted that he is ready for some ongoing contact with his colleagues at other CIC institutions. The Library IT Directors have that kind of forum in the CIC, but staff at his level, those actually running technology support operations, really don't have many opportunities to reach out to each other. I wonder if we should plan a day or two of professional "shoot the breeze" time at Minnesota for all the folks in these positions? We could do it as part of our investigative effort. This could both help this cohort build connections to one another and serve as a font of wisdom and warning for our own planning effort.
We ask our technology staff to do the seeming impossible. Our staff is not nearly large enough to manage the kind of deployment we've got around the Libraries. How can only six staff manage 600 machines? On the other hand, could it be a failure of imagination? When I arrived at the U the first significant decision I made was to kill our attempt to use Sun Ray "appliance" computers to replace public workstations in the Libraries. We had good reasons for that decision, but the fundamental problem was staring us in the face then and remains at the core of our troubles in ITS: we cannot support a deployment of 600 Windows workstations with so few staff. Why can't we change the rules? At MIT I watched an organization deploy and maintain thousands of workstations with fewer staff than we have available to us.
I believe we need to think outside the box, it may not be Sun Ray, but we must recognize our situation (a budget even more limited than it was in 2001) and devise creative solutions to meet our needs within those bounds. I am certain this means compromises, but not necessarily the ones that run our staff ragged without the reward of a computing infrastructure they can take pride in, tell the world about, and share with our community.
My frustration with our current situation expresses itself as a frustration with ugly machinery, and I do believe that computers should be in the process of fading out of sight, but that's a red herring. My real frustration is that I've allowed our expectations to be diminished by accepting the limits we've imposed on ourselves. I wonder if we shouldn't get the CIC equivalents of Directors of ITS together to share their frustrations and triumphs. We could certainly use some inspiration and, who knows, we might even be able to do something inspiring ourselves!
Nature hosted an interesting dabate on open access a few years back. Now it is at it again with a new web focus on access to the literature. This discussion got going last month, and even the University of Minnesota's own Andrew Odlezko has chimed in with a piece. The content of this discussion should be available even to non-subscribers. Give it a try!
Cool! We were pleased to launch our own open source project a few months back (see LibData), so I'm always happy to see new open source library projects on the block. Today I got word about OLinks from OhioLINK.
Tom Sanville wrote:
Some you know already know that OhioLINK has created its own URL resolver for journals and other materials. OLinks is an open-source OpenURL resolver intended for use by library consortia, individual libraries, and other organizations with a need to manage citation linking using the OpenURL standard. Introductory information can be found at [this site]. As noted on the site, you are advised to contact Thomas Dowling on the OhioLINK staff if you are interested in using it.
Thank you, OhioLINK!
Lawrence Lessig's new book, Free Culture, was published with a Creative Commons license with allows for derivative works. The result has been an amazing flurry of derivatives, including an audio version launched by AKMA. Does this demonstrate in any way a relationship between freeing content and creativity? Are we well served by dozens of versions of Lessig's work? Are we diminishing his own incentive to create?