« April 2010 | Main | June 2010 »

May 26, 2010

Using an axis client to call fedora modifyDatastreamByValue method

Below is code that modifies a fedora data stream using an axis client.

I had an error message like:
SAXException: Found character data inside an array element while deserializing
I did some looking around and found out that the line:
call.setOperationStyle(org.apache.axis.constants.Style.WRAPPED);
had to be inserted (see http://www.opensubscriber.com/message/axis-user@ws.apache.org/1855611.html and thanks Anne)
public void insertSequenceData(String NewXMl, String PID, String LogMessage) throws Exception {
  byte[] normalarr = NewXMl.getBytes("UTF-8");
  String[] altIds = new String[1];
  altIds[0] = "";
  // Use an axis client to call the Fedora webserver
  Service service = new Service();
  Call call = (Call) service.createCall();
  call.setOperationName(new QName(APIM_NS, "modifyDatastreamByValue") );
  call.setTargetEndpointAddress( new URL(URL_API_M));
  call.setUsername(fedoraUser);
  call.setPassword(fedoraPswd);
  // if the WRAPPED stlye is not used you get the evil error:
  // SAXException: Found character data inside an array element while deserializing
  // see 
  // http://www.opensubscriber.com/message/axis-user@ws.apache.org/1855611.html
  // for the solution.
  call.setOperationStyle(org.apache.axis.constants.Style.WRAPPED);
  Object[] obj_arr = new Object[] {
    PID, // The PID of the object.
    "STRUCT",        // The datastream ID.
    altIds, // Alternate identifiers for the datastream, if any.
    "METS StructMap for this object", //  The label for the datastream.
    "text/xml", //  The mime type.
    "",  //  Optional format URI of the datastream.
    normalarr,  //  The content of the datastream.
    null, //  The algorithm used to compute the checksum. One of "DEFAULT", "DISABLED", "MD5", "SHA-1", "SHA-256", "SHA-385", "SHA-512".
    null,  //  The value of the checksum represented as a hexadecimal string.
    LogMessage, //  A log message.
    false // Force the update even if it would break a data contract.
    };
  call.invoke( obj_arr); 
}

May 14, 2010

location of firefox's executable on mac OS X

/Applications/Firefox.app/Contents/MacOS/firefox-bin

find all cron files

for user in $(cut -f1 -d: /etc/passwd); do crontab -u $user -l; done
run it form root

May 5, 2010

AXIS2 client for fedora repository

Below is the source code for an AXIS2 client that talks to the fedora repository. I based this off of a very helpful code fragment that I found, but I have lost the link. So thank you friend. I really appreciate the help.
 
import org.apache.axis.client.Service;
import org.apache.axis.client.Call;
import javax.xml.namespace.QName;
import java.net.*;

public class Axis2ClientToFedora
{

 public static void main(String[] argv){
  try{
    Service service = new Service();
    Call call = (Call) service.createCall();
    call.setOperationName(new QName("http://www.fedora.info/definitions/1/0/api/", "purgeObject") );
    call.setTargetEndpointAddress( new URL("http://chaucer.lib.umn.edu:8080/fedora/services/management") );
    call.setUsername("FedoraUserName");
    call.setPassword("FedoraPassword");
    Object[] obj_arr = new Object[] {
        "basic:data",
        "purge basic:data",
        false
        };
    call.invoke( obj_arr);
  }
      catch ( Exception e ){
         e.printStackTrace();
  }
  return;
}
}

rels-ext for collections

Below is an example of the rdf that I plan to use for collection objects in fedora:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="info:fedora/collection:data"> <hasModel xmlns="info:fedora/fedora-system:def/model#" rdf:resource="info:fedora/basic:content_model"></hasModel> <hasModel xmlns="info:fedora/fedora-system:def/model#" rdf:resource="info:fedora/collection:content_model"></hasModel> <hasMember xmlns="info:fedora/fedora-system:def/relations-external#">basic:data</hasMember> </rdf:Description> </rdf:RDF> this object has all the basic methods, all the collection objects and it has a child called basic:data.

SQL: extracting the purl and title from dspace given a handle

dspace_sr=> select text_value from metadatavalue, handle where metadata_field_id=25 AND item_id=handle.resource_id and handle=49928;

text_value
---------------------------
http://purl.umn.edu/49928
dspace_sr=> select text_value from metadatavalue, handle where metadata_field_id=64 AND item_id=handle.resource_id and handle=49928;

text_value
-------------------------------------------------------------------------------------------
Food-safety Standards and Farmers Health: Evidence from Kenyan’s Export Vegetable Growers

May 4, 2010

DSPACE performance problem traced to file of indices missing

Summary sent to users

For the last 25 - 30 hours there has been major troubles on the AgEcon side. It turns out that the file containing all the data from the index all run was missing. Lacking this file produced a large number of bizarre and severe problems, I have run index all on the AgEcon side and the system now seems to be OK. I can only guess that the last index all failed and the file was not created. Several parts of they system rely on this file and failed. I am curios. How much trouble did you see on the UDC side.

Symptoms

Spikes to over 100% cpu usesage on both strip1 (tomcat) and strip3 (postgres) boxes
search fails
browse fails
epeople could not be created
input form fails after being partially filled out
bouncing tomcat and postgres produce only minutes of proper behavior

Some technical details

The file:
/dspace/assetstore/dspace-sr/search/segments
was missing. One of the error messages pointed to this problem. This file is the output of the indexer
I ran the command to reindex the metadata:

dsrun org.dspace.search.DSIndexer -c &

As soon as the command above started, the users could enter upload files and do searches. It has been about six hours since the metadata was indexed and all seems well.

eperson table

Initially I thought the problem may be in the eperson table. I do not believe that this is the case. There were 2003 epeople and I found a three that were clearly flawed:

Here is what we want an eperson to look like:

Table "public.eperson"
Column | Type | Modifiers
---------------------+-----------------------------+-----------
eperson_id | integer | not null
email | character varying(64) |
password | character varying(64) |
firstname | character varying(64) |
lastname | character varying(64) |
can_log_in | boolean |
require_certificate | boolean |
self_registered | boolean |
last_active | timestamp without time zone |
sub_frequency | integer |
phone | character varying(32) |
netid | character varying(64) |
Indexes:
"eperson_pkey" primary key, btree (eperson_id)
"eperson_email_key" unique, btree (email)
"eperson_email_idx" btree (email)
"eperson_netid_idx" btree (netid)

So the epeople were missing passwords and other critical fields. They were all deleted.
426 | newuser426 | | | | | | | | |
93 | newuser93 | | | | | | | | |
486 | aaea@umn.edu | | Registration | aaea09 | t | f | | | |

cpu performance plots

The problem happened on May 3 and into May 4. strip1-cpu.tiff strip3-cpu.tiff Raw cpu data

Some commands found along the way

get postgres processes

Postgresql equivalent of Mysql 'SHOW PROCESSLIST' SELECT * FROM PG_STAT_ACTIVITY;

form feed is an illegal character for dspace upload

The error

While trying to ingest one of the files DSPACE generated the following error message: Authors: Vado, Ligia; Goodwin, Barry K. Abstract: Favorable weather and the adoption of Genetically Modi ed (GM) corn hybrids are often argued to be factors that explain recent corn yield increases and risk reduction in the U.S. Corn Belt. The focus of this analysis is to determine whether favorable weather is the main factor explaining increased and more stable yields or if biotechnology adoption is the more relevant driving force. The hypothesis that recent biotechnology advances have increased yields and reduced risks by making corn more resistant to pests, pesticides, and/or drought is tested. Fixed e ects models of yields and crop insurance losses as functions of weather variables and genetically modi ed corn adoption rates are estimated taking into account the non-linear agronomic response of crop yields to weather. Preliminary results show that genetically modi ed corn adoption rates, especially insect- resistant corn adoption, have had a signi cant and positive e ect on average corn yields in the U.S. Corn Belt over the last years. Furthermore, genetically modi ed corn adoption has not only increased corn's tolerance to extreme heat but has also improved corn's tolerance to excessive and insucient rainfall." is not legal for a JDOM character content: 0xc is not a legal XML character. at org.jdom.Text.setText(Text.java:188) at org.jdom.Text.(Text.java:99) at org.jdom.Element.addContent(Element.java:799) at com.sun.syndication.io.impl.RSS090Generator.generateSimpleElement(RSS090Generator.java:221) at com.sun.syndication.io.impl.RSS091UserlandGenerator.populateItem(RSS091UserlandGenerator.java:175) at com.sun.syndication.io.impl.RSS092Generator.populateItem(RSS092Generator.java:85) at com.sun.syndication.io.impl.RSS093Generator.populateItem(RSS093Generator.java:44) at com.sun.syndication.io.impl.RSS094Generator.populateItem(RSS094Generator.java:43) at com.sun.syndication.io.impl.RSS20Generator.populateItem(RSS20Generator.java:67) at com.sun.syndication.io.impl.RSS090Generator.addItem(RSS090Generator.java:202) at com.sun.syndication.io.impl.RSS090Generator.addItems(RSS090Generator.java:195) at com.sun.syndication.io.impl.RSS091UserlandGenerator.addChannel(RSS091UserlandGenerator.java:81) at com.sun.syndication.io.impl.RSS091UserlandGenerator.populateFeed(RSS091UserlandGenerator.java:72) at com.sun.syndication.io.impl.RSS090Generator.generate(RSS090Generator.java:56) at com.sun.syndication.io.WireFeedOutput.outputJDom(WireFeedOutput.java:193) at com.sun.syndication.io.WireFeedOutput.output(WireFeedOutput.java:133) at org.dspace.app.webui.servlet.FeedServlet.doDSGet(FeedServlet.java:255) at org.dspace.app.webui.servlet.DSpaceServlet.processRequest(DSpaceServlet.java:151) at org.dspace.app.webui.servlet.DSpaceServlet.doGet(DSpaceServlet.java:99) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:199) at org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:282) at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:767) at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:697) at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:889) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595) {ERROR} [/].[feed] Servlet.service() for servlet feed threw exception

Critical line in error message

is not legal for a JDOM character content: 0xc is not a legal XML character.

Meaning of erorr

0xc is the hex value of the form-feed and is not allowed in xml CDATA field

Implications

This will have to filtered out of input to Fedora.

Java code to fix it

string = string.replace(/\u000c+/g, "");