« June 2008 | Main | August 2008 »

July 28, 2008

Sending dspace email to a gmail accounts

email situation

Dspace sends me a large number of email messages. My real email gets lost in a forest of dspace messages. So I have need to find every instance where silvi003@umn.edu is used and replace it with an account that I set up: dspacedump@gmail.com
So I changed:

mail.admin = silvi003@umn.edu
alert.recipient = silvi003@umn.edu

to

mail.admin = dspacedump@gmail.com
alert.recipient = dspacedump@gmail.com

in
./config/dspace.cfg

July 17, 2008

media filter UDC and cron job

Cron job

My predecessor wrote a cron job to index the contents of the pdfs in UDC. It is:
#
# Filter media
#
1 0 * * * /dspace/dspace-ir/bin/filter-media.sh > /dspace/dspace-ir/log/filter-media.log 2>&1
It was noted that this process was taking up to eight hours to run and impacting the users.
It will need to be edited and replaced.

Error record associated with the cron job

Creating search index:
Applying Media Filters
2008-02-11 08:07:19,271 
  INFO  org.dspace.core.ConfigurationManager @ DSpace logging installed using log4j.properties
Exception in thread "main" java.lang.IllegalArgumentException: Cannot resolve 4938 to a DSpace object
        at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:192)
Applying Media Filters
2008-02-11 08:07:19,839 INFO  org.dspace.core.ConfigurationManager 
       @ DSpace logging installed using log4j.properties
2008-02-11 08:07:20,160 INFO  org.dspace.content.MetadataField 
       @ Loading MetadataField elements into cache.
2008-02-11 08:07:20,199 INFO  org.dspace.content.MetadataSchema @ Loading schema cache for fast finds
SKIPPED: bitstream 16263 because 'LIFE_SCIENCEs_PREDESIGN_REPORT041504_.pdf.txt' already exists
SKIPPED: bitstream 16261 because 'equine_predesign_may04.pdf.txt' already exists
SKIPPED: bitstream 16259 because 'EducationalFacilitiesPredesignStudyFinal.pdf.txt' already exists
ERROR filtering, skipping bitstream #16251 java.lang.ArrayIndexOutOfBoundsException: 4
java.lang.ArrayIndexOutOfBoundsException: 4
        at org.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:294)
        at org.fontbox.cmap.CMapParser.parse(CMapParser.java:103)
        at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:535)
        at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387)
        at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325)
        at org.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:80)
        at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452)
        at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:215)
        at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
        at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
        at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
        at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
        at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
        at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:110)
        at org.dspace.app.mediafilter.MediaFilter.processBitstream(MediaFilter.java:155)
        at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:327)
        at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:296)
        at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:266)
        at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersCollection(MediaFilterManager.java:260)
        at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:202)
ERROR filtering, skipping bitstream #16250 java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException
SKIPPED: bitstream 16249 because 'Volume_II-Appendix2.pdf.txt' already exists
ERROR filtering, skipping bitstream #16248 java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException
SKIPPED: bitstream 15486 because 'AHC_FacilitiesMasterPlan.pdf.txt' already exists
SKIPPED: bitstream 13492 because 'Vet_med_facilities_development_plan_FINAL.pdf.txt' already exists
SKIPPED: bitstream 13490 because 'SPH_CONSOLIDATION.pdf.txt' already exists
SKIPPED: bitstream 13488 because 'AHC_strategic_facility_plan_1998.pdf.txt' already exists
SKIPPED: bitstream 13486 because 'AHC_Precinct_Plan_Report_Final_May_2006.pdf.txt' already exists
SKIPPED: bitstream 13484 because 'AHC_Mpls_District_Plan_2000.pdf.txt' already exists
Creating search index:
Creating browse index
Indexing all Items in DSpace....2008-02-11 08:17:24,358 
  INFO  org.dspace.core.ConfigurationManager @ DSpace logging installed using log4j.properties
2008-02-11 08:17:25,315 
   INFO  org.dspace.content.MetadataField @ Loading MetadataField elements into cache.
2008-02-11 08:17:25,357 
   INFO  org.dspace.content.MetadataSchema @ Loading schema cache for fast finds
 ... Done
Creating search index
2008-02-11 08:19:57,683 INFO  org.dspace.core.ConfigurationManager @ 
   DSpace logging installed using log4j.properties

Some basic information on Dspace

Possible ant tasks

ant compile
ant build_wars
ant update
ant install_code
ant init_configs
ant setup_database
ant clean_database
ant load_registries
ant fresh_install
ant clean
ant public_api
ant javadoc

Some important boxes

UDC sandbox
http://odin.lib.umn.edu:9040/dspace-ir/

We set up a virtual machine for strip one and two.
strip1vm.oit.umn.edu 160.94.138.139
strip3vm.oit.umn.edu 160.94.138.140

Miscellaneous

Version that we use: DSpace Version 1.4.1, 8-December-2006
Approximate size of the asset store: 73G
Server version: Apache/2.0.52
On strip1, the Library specific apache configs are kept in /opt/httpd/conf.d

July 9, 2008

Matt Zumwalt Talk


Matts Google page for map

Matt's Blog

XACMAL




1) Try out fedora throw away repository

2) User is a throw away class (for an example)

3) Use LDAP to connect


XACMAL

Subjects

Targets

Can put policy in the policy folder

Can put policy in special stream for digital object

MURADORA

Drama Australia turned off XACML muradora ... GUI dropped

PEP policy enforcement point - filters Shubobolith or LDAP

hard to install

PDP servelet filters (keeps policies in DBXML) webservices

email them to see if they are still suporting PEP and PDP


Need to have the information in LDAP accessible directory


Fedora out of the box does not support privileges for collections:
Fedora XACML Vocabulary

gSearch

gSearch's solr will automatically rebuild the indices, if the system burns down.
If the FOXML schema is changed gSearch will keep up.

RISEARCH

How much do you want to use rels-ext. If you don't want to have multiple steps. You can just index it under Solr.
Don't need to search to find pages in a bok.

Book maintains ordering of the pages. Not stored in a page.

Really don't need RISEARCH

Unknown.jpeg

CMA

Everything called a page has JPEG 2000, Tiff. Want to generate a thumbnail (just an example).

July 7, 2008

ingestFormat and Fedora 3.0

Version of ingestFormat that fails in Fedora 3.0

Old version version of ingest with ingestFormat equal to "foxml1.0"
This fails in Fedora 3.0. giving the error:

fedora.server.errors.ObjectValidityException: Unsupported format: foxml1.0


Code example that generates error:
 
import fedora.server.types.gen.RepositoryInfo;
import java.io.*;

public class FedoraIngest {

    private static final String protocol = "http";
    private static final String host = "localhost";
    private static final int port = 8080;
    private static final String usr = "fedoraAdmin";
    private static final String pwd = "pass";

    private static final String collection = "swhp";
//    private static final String foxmlSrc = "/Users/bill/projects/fedora/" 
//                                         + collection + "/";

    private static final String foxmlSrc = "/Users/silvi003/Desktop/bill" 
                                         + collection + "/";


    public static void main(String[] argv) throws Exception {

	String[] dir = new java.io.File(foxmlSrc).list(new FOXMLFilter());
	FedoraSOAPClient caller = new FedoraSOAPClient(protocol, host, port, usr, pwd);
	// test client connection status with the most basic call...
	for (int i = 0; i< dir.length; i++) {
	    String pid = dir[i];
	    System.out.println("FedoraIngest " + pid);
	    try {
		FileInputStream fis = null;
		String fedoraPid = null;
		File foxml = new File(foxmlSrc + pid);
		fis = new FileInputStream(foxml);
		fedoraPid = caller.ingest(fis, "foxml1.0", "ingest of " + pid);
		System.out.println("new fedora object: " + fedoraPid);
	    } 
	    catch (Exception excp) {
		System.out.println("ingest error: " + excp.getMessage());
		excp.printStackTrace();
	    }
	}
    }
}






Version of ingestFormat that works in Fedora 3.0

The ingestFormat value of "info:fedora/fedora-system:FOXML-1.1" works in Fedora 3.0.

I found this value in the config file:
$FEDORA_SRC_HOME/src/properties/server/fedora/server/resources/Server.properties
Code example that works:
 
import fedora.server.types.gen.RepositoryInfo;
import java.io.*;

public class FedoraIngestOneFile {

    private static final String protocol = "http";
    private static final String host = "localhost";
    private static final int port = 8080;
    private static final String usr = "fedoraAdmin";
    private static final String pwd = "pass";

    public static void main(String[] argv) throws Exception {

	FedoraSOAPClient caller = new FedoraSOAPClient(protocol, host, port, usr, pwd);
	String FileName = "/Users/silvi003/Desktop/umndob_msp01688";
	    try {
		File foxml = new File(FileName);
		FileInputStream fis = new FileInputStream(foxml);
		String fedoraPid = caller.ingest(fis, "info:fedora/fedora-system:FOXML-1.1", "ingest of " + FileName);
		System.out.println("new fedora object: " + fedoraPid);
	    } 
	    catch (Exception excp) {
		System.out.println("ingest error: " + excp.getMessage());
		excp.printStackTrace();
	    }
    }
}

July 1, 2008

Upload to soap

MTOM: way of sending binary in soap

SOAP Message Transmission Optimization Mechanism (MTOM)
XOP (XML-binary Optimization Packaging)

RFC 2045 section 6.8 gives description of Base64 Content-Transfer-Encoding

Understanding MTOM

Advantages of MTOM
Introduction to MTOM: A Hands-on Approach (more advantages to MTOM)
Sending Files in Chunks with MTOM Web Services and .NET 2.0

Possible PHP library for MTOM

I have a client/soap server in AXIS2 that sends MTOM back and forth. This could serve as the web service. PHP needs to talk to the AXIS2 webservice.
A possible choice is WSO2 Web Services Framework/PHP Proven Interoperability

WSO2 WSF/PHP features proven interoperability with Microsoft .NET, WSO2 WSAS (Apache Axis2/Java based Web services application server) and other J2EE implementations. The basic SOAP level interoperability as well as WS-* specification implementations have been tested and proven to interoperate.

Attachments with Web Services and Clients
You can send and receive attachments with SOAP messages both in optimized as well as non optimized formats with MTOM support. Attachments with MTOM/XOP
-- problem seems to require that you know the mime type

Downloading a Binary File from a Web Service using Axis2 and SOAP with Attachments

This example uses SWA.