« November 2009 | Main | January 2010 »

December 11, 2009

Indiana Fedora Work

Projects

IN Harmony: Sheet Music from Indiana
EVIADA

Fedora Content Models

Available content models Standard disseminators apply to most objects.

Type-specific content models:

* Audio Content Model
* Collection Content Model
* Finding Aid Content Model
* Generic File Content Model
* Image Content Model
* Journal Content Model
* Multi-copy Document Content Model
* Oral History Content Model
* Paged Document Content Model
* Text Content Model
* Video Content Model

Collection Content Model (xml files)

Hydra

RepoMMan, REMAP and Hydra at Hull

Towards a Repository-enabled Scholar's Workbench
D-Lib Magazine
May/June 2009
Volume 15 Number 5/6
Richard Green Chris Awre

Technology in more detail

hydra_diagram_lynn.jpg

Hydra sets

Hydra content models and disseminators <-- Only hit in google for " "Hydra sets" fedora'
Hydra sets There are two basic models for managing "Sets", our preferred name over "collections" or "folders". * Explicit set relationships in which the set object contains an explicit listing of its set members * Implicit set relationships in which the set object has no explicit listing but rather contains some rule(s) for identifying its set members In all cases there must be a single object that represents the set itself in the repository, an object that defines and describes the set (in the abstract and/or for specific UI use) and provides a reference point (a PID) for creating object associations to the set. The various models described below concern the manner in which member objects are identified and managed. There are many relationships that could be used to define a set (explicit or implicit) in RDF. Hydra will always use 'hasMember' or its converse 'isMemberOf' as appropriate (cf 'hasPart' and 'isPartOf' for aggregate objects); this does not preclude users working with other relationships. Hydra will reserve 'isMemberOfCollection' for use in the specific case of OAI-PMH harvesting sets. Expicit sets Parent object may designate members via "hasMember {childPID}" triples in RELS-EXT, or Parent object may designate members via a METS structmap or similar mechanism Explicit sets represent a useful approach when there is a one-time determination of a closed set. Implicit sets The set object for an implicit set has no itemised 'knowledge' of its set members but contains the information needed to retrieve them. This may take the form of a query against the repository Resource Index (where the members each contain an 'isMemberOf' assertion in RELS-EXT) or a more general query or search across the repository (find all photographs where the subject is Barack Obama), or some other rules-based selection. An extra datastream will be required in the set object to contain the query or rules necessary to retrieve the set membership information.

DSPACE metadata and Google scholar

From an email Julie sent

I came across this info on some DSpace list, and thought it might help our cause with getting GS to better index AgEcon and the UDC: Providing metatags used by Google Scholar for enhanced indexing Maybe it's old news to those in the know. I was rummaging around looking for info about DSpace and Zotero.

I checked the link and there is a desire for DSPACE to provide this out of the box.

Links on the subject

Nature's Metadata for Web Pages
Below is the standard for putting metadata in html.
Expressing Dublin Core metadata using HTML/XHTML meta and link elements

Difficult to improve standing

From Publish or Perish Frequently Asked Questions

Other results issues How do I improve the accuracy with which Google Scholar lists my papers? In general, this is rather difficult, because a lot depends on the accuracy with which your papers are referenced by others. However, if you have separate web pages for each of your papers, then Google Scholar advises that you can add several meta tags to your pages to help Google's crawler to list your paper. In particular, they recommend using the following tags (replace the content="..." bits with your own information): <meta name="citation_journal_title" content="Journal Name"> <meta name="citation_authors" content="Last Name1, First Name1; Last Name2, First Name2"> <meta name="citation_title" content="Article Title"> <meta name="citation_date" content="01/01/2007"> <meta name="citation_volume" content="10"> <meta name="citation_issue" content="1"> <meta name="citation_firstpage" content="1"> <meta name="citation_lastpage" content="15"> <meta name="citation_doi" content="10.1074/jbc.M309524200"> <meta name="citation_pdf_url" content="http://www.publishername.org/10/1/1.pdf"> <meta name="citation_abstract_html_url" content="http://www.publishername.org/cgi/content/abstract/10/1/1"> <meta name="citation_fulltext_html_url" content="http://www.publishername.org/cgi/content/full/10/1/1"> <meta name="dc.Contributor" content="Last Name1, First Name1"> <meta name="dc.Contributor" content="Last Name2, First Name2"> <meta name="dc.Title" content="Article Title"> <meta name="dc.Date" content="01/01/2007"> <meta name="citation_publisher" content="Publisher Name">

December 10, 2009

Postcard Complex object from UW

PostcardObjectRelations.png See: University of Wisconsin Digital Collections Center - Sample Postcard Object in FedoraCommons.

ESciDoc TOC (Table Of Contents)

I. ESciDoc Content Models

The primary type (or the category) of the content resources depicted with the CModel. Allowed values are:
* Item
* Container
* TOC (Table Of Contents)
escidoc_conceptual_model.jpg Diagram from: eSciDoc(4).pdf A TOC is optional and not shown above.

II. TOC Description

In eSciDoc hierarchical structures are build by means of container resources. A container resource refers to its members which are again containers or items. The set of references is represented as structural map (struct-map) inside the representation of a container resource. Additionally a container may contain a table of content (TOC) which contains an ordered selection of members.

III. Example of TOC

Some attributes in the TOC xml.

div element attributes:
* ORDER: The physical pagenumber of the scan. The physical order must begin with number "1".
* ORDERLABEL: The logical pagenumber of the scan

* ID: The identification number of this scan (id of the item)
* TYPE: The type of this structural element (see List of List of structural element types
* LABEL: The elements title
* VISIBLE: Indicates if this div (and its sub-elements should be displayed when displaying this toc

ptr element attributes:
* ID: The identification of this pointer
* USE: The type of the file described with this locator
MIN = thumbnail size
DEFAULT = Web size
MAX = Full size
ITEM = item which contains these files
* xlink:href: The locator for this file
* LOCTYPE: The locator type
* MIMETYPE: The scans MIME type

III. Used by VIRR

Welcome to the "Virtueller Raum Reichsrecht" Collection of the Max Planck Institute for European History of Law The solution will provide a published digital collection and a cooperative working environment for various artefacts of the legislation in the period of the German Holy Empire. The compilation will be indexed, structured via METS, transcripted and linked to further relevant scientific literature. Max Plank Wiki says it has more than 20,000 scans.

IV. escidoc and CMAs

A) escidoc has the concept of content models see ESciDoc Logical Data Model
Content models defines in general:
* the type and structure of the content resources (item, container, members)
* a set of services that may be associated with the content resources


Seems like CMA
B) Plans to bring CMAs in
From Roadmap Infrastructure: Status: March 16, 2009
Content Model Content Model Handler
propose XML-representation
Specification needed. May be based on the new Fedora CMA (content model architecture)

December 8, 2009

Config file change to make DSAPCE properly handle unicode filenames

On strip1 the AgEcon instance was not properly downloading files that had non-ascii file names. That is it was not handling unicode characters correctly. This was corrected by fixing a config file. File to edit on strip1: tu nano tomcat/conf/server.xml Old bad line: <!-- Define an AJP 1.3 Connector on port 8009 --> <Connector port="8009" UIEncoding="UTF-8" tomcatAuthentication="false" enableLookups="false" redirectPort="8443" protocol="AJP/1.3" /> New fixed line: <!-- Define an AJP 1.3 Connector on port 8009 --> <Connector port="8009" URIEncoding="UTF-8" tomcatAuthentication="false" enableLookups="false" redirectPort="8443" protocol="AJP/1.3" />
i.e. change UIEncoding to URIEncoding

Things learned along the way:
1) Location of constant to encode strings as UTF-8 in DSPACE
./src/org/dspace/core/Constants.java:209: public static final String DEFAULT_ENCODING = "UTF-8";
2) Servlet that does downloads of pdf's
<servlet> <servlet-name>bitstream</servlet-name> <servlet-class>org.dspace.app.webui.servlet.BitstreamServlet</servlet-class> </servlet> line 165 of ./etc/dspace-web.xml
3) Code from ./src/org/dspace/app/webui/servlet/BitstreamServlet.java that does upload: protected void doDSGet(Context context, HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException, SQLException, AuthorizeException { Item item = null; Bitstream bitstream = null; System.out.println("In dspace proper"); // Get the ID from the URL String idString = request.getPathInfo(); String handle = ""; String sequenceText = ""; String filename = null; int sequenceID; System.out.println("1 idString " + idString ); // Parse 'identifier' and 'sequence' (bitstream seq. number) out // of remaining URL path, which is typically of the format: // {identifier}/{sequence}/{bitstream-name} // But since the bitstream name MAY have any number of "/"s in // it, we scan from the start to pick out the sequence: String [] pathArray = HandleManager.splitIdentifier(idString); handle = pathArray[0]; System.out.println("1.5 handle " + handle ); String extraInfo = pathArray[1]; System.out.println("2 extraInfo " + extraInfo ); if(extraInfo != null) { // Remove leading slash if any: if(extraInfo.startsWith("/")) { extraInfo = extraInfo.substring(1); } // The sequence is before the first slash, everything // else is part of the bitstream-name. int slashIndex = extraInfo.indexOf('/'); if(slashIndex != -1) { sequenceText = extraInfo.substring(0,slashIndex); filename = extraInfo.substring(slashIndex+1); } } System.out.println("3 sequenceText " + sequenceText ); try { sequenceID = Integer.parseInt(sequenceText); System.out.println("4 sequenceID " + sequenceID ); } catch (NumberFormatException nfe) { sequenceID = -1; } // Now try and retrieve the item DSpaceObject dso = HandleManager.resolveToObject(context, handle); // Make sure we have valid item and sequence number if (dso != null && dso.getType() == Constants.ITEM && sequenceID >= 0) { item = (Item) dso; if (item.isWithdrawn()) { log.info(LogManager.getHeader(context, "view_bitstream", "handle=" + handle + ",withdrawn=true")); JSPManager.showJSP(request, response, "/tombstone.jsp"); return; } boolean found = false; Bundle[] bundles = item.getBundles(); for (int i = 0; (i < bundles.length) && !found; i++) { Bitstream[] bitstreams = bundles[i].getBitstreams(); for (int k = 0; (k < bitstreams.length) && !found; k++) { if (sequenceID == bitstreams[k].getSequenceID()) { bitstream = bitstreams[k]; found = true; } } } } if (bitstream == null || filename == null || !filename.equals(bitstream.getName())) { // No bitstream found or filename was wrong -- ID invalid log.info(LogManager.getHeader(context, "invalid_id", "path=" + idString)); JSPManager.showInvalidIDError(request, response, idString, Constants.BITSTREAM); return; } // log.fatal(LogManager.getHeader(context, "view_bitstream", // "bitstream_id=" + bitstream.getID())); // Modification date // TODO: Currently the date of the item, since we don't have dates // for files response.setDateHeader("Last-Modified", item.getLastModified().getTime()); // Check for if-modified-since header long modSince = request.getDateHeader("If-Modified-Since"); if (modSince != -1 && item.getLastModified().getTime() < modSince) { // Item has not been modified since requested date, // hence bitstream has not; return 304 response.setStatus(HttpServletResponse.SC_NOT_MODIFIED); return; } // Pipe the bits InputStream is = bitstream.retrieve(); // Set the response MIME type response.setContentType(bitstream.getFormat().getMIMEType()); // Response length response.setHeader("Content-Length", String.valueOf(bitstream.getSize())); Utils.bufferedCopy(is, response.getOutputStream()); is.close(); response.getOutputStream().flush(); } 4) html generated for download before the fix: <tr><td headers="t1" class="standard">12_Felföldi_Apstract.pdf</td><td headers="t2" class="standard"></td><td headers="t3" class="standard">77Kb</td><td headers="t4" class="standard">PDF</td><td class="standard" align="center"><a target="_blank" href="/bitstream/55410/3/12_Felf%c3%b6ldi_Apstract.pdf">View/Open</a></td></tr> note 12_Felföldi_Apstract.pdf != 12_Felf%c3%b6ldi_Apstract.pdf 5) servlet that generates the above html is: ./src/org/dspace/app/webui/jsptag/ItemTag.java