November 3, 2009

DC Metadata fields used by UMN DSPACE instances:

dc.title
dc.title.alternative
dc.contributor.author
dc.contributor.editor
dc.subject
dc.subject.other
dc.date.issued
dc.identifier.citation
dc.relation.ispartofseries
dc.description.abstract
dc.description
dc.identifier.govdoc
dc.identifier.uri
dc.identifier.isbn
dc.identifier.issn
dc.identifier.ismn
dc.identifier
dc.relation
dc.format.extent
dc.language
dc.extent

October 28, 2009

Last step in getting perl drivers for Postgres

While trying to install new DBD::Pg drivers on mac os x Snow Leopard I got the error below, when running test code:
dyld: lazy symbol binding failed: Symbol not found: _PQconnectdb Referenced from: /opt/local/lib/perl5/site_perl/5.10.1/darwin-2level/auto/DBD/Pg/Pg.bundle Expected in: dynamic lookup I found that the Makefile.PL that cpan made for DBD::Pg was pointed at the wrong directories for libraries. I made the changes below to Makefile.PL (located in ~/.cpan/build/XML-DOM-1.44-tM9lK0):
#$POSTGRES_INCLUDE = $ENV{POSTGRES_INCLUDE} || $pg->inc_dir || "$ENV{POSTGRES_HOME}/include"; $POSTGRES_INCLUDE = '/opt/local/include/postgresql83'; #$POSTGRES_LIB = $ENV{POSTGRES_LIB} || $pg->lib_dir || "$ENV{POSTGRES_HOME}/lib"; $POSTGRES_LIB = '/opt/local/lib/postgresql83'; Then
sudo perl Makefile.PL
sudo make test
sudo make install

And the error was gone.

Also I had to switch
#!/opt/local/bin/perl # /usr/bin/perl

October 21, 2009

utf-8 utf-16 and UNIX

1) Convert utf-16 to utf-8.
iconv -f utf-16 -t utf-8 bell-map-IMAGESspreadsheet.xml > tt
2) Have less read utf-8
export LESSCHARSET='utf-8'
cat just works.

October 6, 2009

Some useful drupal sql

Find uid given user naa (in this case naa)

select uid from users where name='naa';

Get the number of uploads by user 'naa'

select count(upload.fid) from upload, node where node.nid=upload.nid and node.uid=(select uid from users where name='naa');

Get the number of files owned by user 'naa'

select files.filepath from files, upload where upload.fid=files.fid and upload.nid=68652;

Find path to uploaded files

select files.filepath from files, upload where upload.fid=files.fid and upload.nid=68652;

title element wrong for media ingest

The IMAGES xml files used to ingest data into the the media repository contain a flaw.
Bad version (current): <title main="Duplex House" variant="Residence project" variant="Exterior presepctive"/> Good: <title type="main" > Duplex House </title> <title type="variant" > Residence project </title> <title type="variant" > Exterior perspective </title> Effected files: bln-dcugranting2007.xml
botanical-dcugranting2007.xml
cbi-dcugranting2007.xml
ellis-dcugranting2007.xml
mno-dcugranting2007.xml
mss-alexanderBros.xml
mss-dcugranting2007.xml

Some more files (all the rest):

s 001-bell-historicalmaps
005-ymca-wwiPhotos
006-map-19thCent
008-eas-ming
011-mss-purcellMasonite

September 17, 2009

David Naughton gave me some fast perl code for hashes

#!/exlibris/sfx_ver/sfx_version_3/app/perl-5.8.6/bin/per use strict; use warnings; # Takes export file from SFX and generates host definitions to append to # the ezproxy.cfg file. # Open SFX export file for manipulation my $sfx_url_file = shift @ARGV; open (my $fh_sfx, '<', $sfx_url_file) or die "File Open Failed: $!"; # get hostnames into a hash my %hostnames; while (<$fh_sfx>) { my $line = $_; next if $line =~ /^#/; my ($sfx_target, $sfx_url) = split /\t/, $line; if ($sfx_url =~ m/\:\/\/(.*?)\//) { # Hash keys are always unique, so if a key for $1 # already exists, this line will clobber its value: $hostnames{$1} = undef; # If you want to keep track of how many times each # hostname appears, you can use this magic: # $hostnames{$1}++; # More verbose version of the code above: # if (!(exists $hostnames{$1})) { # $hostnames{$1} = 0; # } # $hostnames{$1} = $hostnames{$1} + 1; } } # print each unique hash key, with some added text for my $hostname (keys %hostnames) { print "HJ $hostname\n"; } # close file handle close $fh_sfx or die "File Close Failed: $!";

September 15, 2009

dspace batch ingest

Format of files to ingest

Here is a breakdown of the files : batch_files - top level directory (name does not matter)
       I
   Ingest1   - This a prototype for the directories that you will create.  Give these directories any name you want
            I
         contents                                            -    contains  the asset name.  The fields are separated by a single tab.  File must  be called "contents"
        dublin_core.xml                                - contains the DC metadata.  This file must be called "dublin_core.xml" 
        UDCsubmissionguidelines.pdf     - This is the asset.  You may use whatever name you want.  However the name of the
                                                                       asset must appear in the "contents" file.

tarball that gives working sample of the directory structure.

command

/dspace/dspace-ir/bin$ ./dsrun org.dspace.app.itemimport.ItemImport -a -c CollectionHandle -e Eperson -s /PATH_TO_BATCH_FILES/batch_files -m /home//PATH_TO_BATCH_FILESs/Ingest1/mapfile.txt

Resources

Dorothea Salo's EXCELLENT blog
ingest-export.ppt ARD Prasad
ScalabilityIssues - DSpace Wikis

September 14, 2009

putting captchas into dspace using jcaptcha

ingest-export.ppt

code needed for a captcha

The file form.jsp had to be modified so that if the captcha was not set or was not correct the form for the email form was not produced.
If the email form was not called then a jsp was called that produced the captcha: captcha_main.jsp The jsp captcha_main.jsp called in order the following java classes to make the captcha image:
ImageCaptchaServlet.java
CaptchaServiceSingleton.java
MyImageCaptchaEngine.java
Also the file dspace-web.xml had to be modified.

X11 not on the box where tomcat runs

If X11 is not the box with tomcat, you will get an error like "port 6000 not available". To fix this put:

-Djava.awt.headless=true

into the catalina.sh file.

Helpful links

How to use jsp-forward tag
How do I perform browser redirection from a JSP pages
Breaking a Visual CAPTCHA

August 31, 2009

Error in DSPACE search ... handles from index not having items

Symptom

Hi, I was checking abstracts for Volume 41 No. 2 of Journal of Agricultural and Applied Economics. When I searched on "Race, Gender, School . . ." I got an error message saying that the website had experienced an internal error and I tried it again w/ the same results. Since the same message requested letting you know of the problem I am responding. Sincerely,

Error Message

Trying the search above generated the error message: An internal server error occurred on http://ageconsearch.umn.edu: Date: 8/28/09 10:59 AM Session ID: 6AAEA1B2D8AADD0C7F1BE28731AE9083 -- URL Was: http://ageconsearch.umn.edu/simple-search?sort=date&query=%28%28keyword%3Arace%29%29&from_advanced=true&query2=&field1=keyword&conjunction2=AND&query1=race+&field2=keyword&query3=&conjunction1=AND&field3=ANY&SortDirection=descending -- Method: GET -- Parameters were: -- field3: "ANY" -- field2: "keyword" -- field1: "keyword" -- sort: "date" -- query3: "" -- query2: "" -- SortDirection: "descending" -- query1: "race " -- query: "((keyword:race))" -- from_advanced: "true" -- conjunction2: "AND" -- conjunction1: "AND" Exception: java.sql.SQLException: Query "((keyword:race))" returned unresolvable handle: 53087 at org.dspace.app.webui.servlet.SimpleSearchServlet.doDSGet(SimpleSearchServlet.java:271) at org.dspace.app.webui.servlet.DSpaceServlet.processRequest(DSpaceServlet.java:151) at org.dspace.app.webui.servlet.DSpaceServlet.doGet(DSpaceServlet.java:99) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199) at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:767) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:697) at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:889) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595)

Outline of Solution

1) A handle was found by search that was not in the database (probably a deleted file and IndexAll has not run yet).
2) I found the segment of code that collects into a list all the items found by search.
3) When any word that was on this deleted record was input, the search routine would throw an error and halt.
4) I rewrote the code to ignore handles that are not in the database.
5) Ran a few basic tests on Odin (our private DSAPCE setup).
6) Deployed it to strip1 (production site)

Modified Code

// FROM SimpleSearchServlet.java /* resultsItems = new Item[numItems]; for (int i = 0; i < numItems; i++) { String myhandle = (String) itemHandles.get(i); Object o = HandleManager.resolveToObject(context, myhandle); resultsItems[i] = (Item) o; if (resultsItems[i] == null) { throw new SQLException("Query \"" + query + "\" returned unresolvable handle: " + myhandle); } } */ // The code below will only add handles to the list if a well defined // item is associated with it. Item[] resultsItems_temp = new Item[numItems]; int item_count =0; for (int i = 0; i < numItems; i++){ String myhandle = (String) itemHandles.get(i); Object o = HandleManager.resolveToObject(context, myhandle); if (o != null){ resultsItems_temp[item_count] = (Item) o; item_count++; } } resultsItems = new Item[item_count]; for (int i = 0; i < item_count; i++){ resultsItems[i] = resultsItems_temp [i]; }

August 10, 2009

Ames collection ingest ... multiple metadata per file

The problem

While ingesting the metadata for the AMes collection I found a problem. Below is a list of metadata sets that map to the same image. s
identifier local Image
ama00711 ama00711.jpg
ama00712 ama00711.jpg
amp00259 amp00259.jpg
ap00259 amp00259.jpg
amp00435 amp00435.jpg
amp00436 amp00435.jpg
amp00448 amp00449.jpg
amp00449 amp00449.jpg
amp00513 amp00531.jpg
amp00531 amp00531.jpg

Jason's solution

identifier local Image ama00711 ama00711.jpg ama00712 ama00711.jpg - error in the data. Should point to ama00712.jpg (I've fixed it in IMAGES) amp00259 amp00259.jpg ap00259 amp00259.jpg - delete this version amp00435 amp00435.jpg amp00436 amp00435.jpg - error in the data. Should point to amp00436.jpg (I've fixed it in IMAGES) amp00448 amp00449.jpg - error in the data. Should point to amp00448.jpg (I've fixed it in IMAGES) amp00449 amp00449.jpg amp00513 amp00531.jpg - error in the data. Should point to amp00513.jpg (I've fixed it in IMAGES) amp00531 amp00531.jpg

What to do from here

Bill will need to pull the ames collection from IMAGES. I will need to re ingest it and delete ap00259.

August 7, 2009

Problem with a dspace user and solution

Problem


A problem that happened once before. Although Erin George has successfully uploaded files to some of the collections in the Univeresity Archives community, she is now if a cycle where she logs in, goes to a collection, and whatever she does gets cycled back to the log in screen with her x500 there but no password. She enters her password, and goes through the same thing again and again. This happened to one of our students.

Solution

This command fixed it:
update eperson set netid = 'georg038' where eperson_id = '190';

July 27, 2009

svn notes ... media_repo Externals Switch

Our svn site

https://chaucer.lib.umn.edu/svn/dldl/

Externals

Chad is using svn externals to have several projects at the DLDL link to the same set of files within the svn tree. From the command line to view the externals use the commands: svn propedit svn:externals . svn propget svn:externals FileName

Switch

The svn switch command allows a user to update to a new version of a package.
Example from John Barneson: First make drupal go offline. https://dldl00.lib.umn.edu/admin/settings/site-maintenance svn switch https://chaucer.lib.umn.edu/svn/dldl/drupal/releases/acquia-drupal-1.2.13/ .

Useful URLs

Jeremy Knope dot com OS X svn client

June 26, 2009

changing feedback email in dspace

The code for creating an email for feedback in dspace only allows 1 recipient: Email email = ConfigurationManager.getEmail("feedback"); email.addRecipient(ConfigurationManager.getProperty ("feedback.recipient")); While the email for admin allows a comma separated list of recipients: String AdminEmail = ConfigurationManager.getProperty("admin.emails"); String EmailsAddresses[] = AdminEmail.split(","); for (int i = 0; i < EmailsAddresses.length ; i++) { email.addRecipient(EmailsAddresses[i]); } I took parts of the admin code and applied it to the feedback part of dspace. Now feedback email supports a comma separated list of email recipients.

June 12, 2009

SQL to get number of new items in DSPACE after a certain date

select count(date_accessioned) from ItemsByDateAccessioned where date_accessioned > '2008-07-01'::DATE ;

June 2, 2009

Looking for non unicode characters in AgEcon metadata

Problem and general solution

Some non unicode characters have gotten into the dspace metadata. We need to find them. I will print out the meta data fields to an file of the form below. <doc> text from metadata pull </doc> Then I will run the file through xmllint.

sql needed

The line below will get all the valid item_ids.
SELECT item.item_id from item, handle where handle.resource_id=item.item_id;

The line below will pull a metadata field for a given item id.
select text_value from metadatavalue where metadata_field_id=43 AND item_id=36450;
For this query, the Series/Report will be obtained for an item with item_id=36450.

Metadata fields to check

metadata_field_id name
3 author
15 date issued
25 uri
27 abstract
40 Institution/Association
43 Series/Report
57 Keyword
63 JEL Codes
64 Title
67 email
This list came from AgEconMetadata.htm