« September 2007 | Main | November 2007 »

October 30, 2007

Adding a new field to the dspace database

In the file ./config/dspace.cfg one finds:

search.index.1 = author:dc.contributor.*

search.index.2 = author:dc.creator.*

search.index.3 = title:dc.title.*

search.index.4 = keyword:dc.subject.*

search.index.5 = abstract:dc.description.abstract

search.index.6 = author:dc.description.statementofresponsibility

search.index.7 = series:dc.relation.ispartofseries

search.index.8 = abstract:dc.description.tableofcontents

search.index.9 = mime:dc.format.mimetype

search.index.10 = sponsor:dc.description.sponsorship

search.index.11 = identifier:dc.identifier.*

search.index.12 = language:dc.language.iso

search.index.13 = date:dc.date.issued

I added the line that is bolded.

After this line is added you must run:

ant init_configs -- update the config system

ant install_code -- compile the indexer code

And then the script below to reindex lucence:


October 22, 2007

Report to John and Brad about dspace progress. Below is what I have the last few days with dspace.Jeff Attempt to use lucene to sort fields:1) Examined work by Rooma who attempted to solve the problem.2) She tried to use the lucene e

John & Brad,
Below is what I have the last few days with dspace.

Attempt to use lucene to sort fields:
1) Examined work by Rooma who attempted to solve the problem.
2) She tried to use the lucene engine to sort the fields -> I tested lucence sort.
3) lucene will not sort tokenized fields.
4) Requests have been sent to lucene and dspace to create sortable tokenized fields. There seems to be some internal debate as to whether this is wise/possible.
5) Used lucuene 2.2 jar to dump all attributes of fields stored in our lucene DB (we are using the 2.0 jar which does not have this feature and I will return to the original jar).
6) The "isTokenized" attribute has the value “true? for all the fields except the field named “handle?.
7) In its current state, none of the fields of interest are sortable by lucene.

Unique problem of date field:
1) “date? field is not stored in lucence.
2) Likely generated in the jsp for the 10 records that are displayed.
3) derived from direct call to sql db?

My plans:
1) I talked to Bill and he says there is a way to index a field twice, as both tokenized and non-tokenized. I will explore this idea to make our fields sortable.
2) Brad and I have discussed the "date problem". Could go directly to sql or fix lucence.

1) The lucuene 2.2 jar allows me to peer into the lucene DB and display all the properties of the stored fields.

Aliases for servlets

From the ./etc/dspace-web.xml file:

<servlet> <servlet-name>subject-search</servlet-name> <servlet-class>org.dspace.app.webui.servlet.ControlledVocabularySearchServlet</servlet-class> </servlet> <servlet> <servlet-name>simple-search</servlet-name> <servlet-class>org.dspace.app.webui.servlet.SimpleSearchServlet</servlet-class> </servlet>

Attributes of the fields in the Lucence database (in dspace) + sortable problem

I used the code below:

JavaCodeToDumpLuceneAttributes.html and found out that the fields in the lucene DB had the following attributes:

dspaceFields.html Fields cannot be tokenized if they are to be sortable. So none of the fields other then the handle are sortable.

October 17, 2007

Get logger running for dspace

1) stop tomcat

2) fix log level in dspace config file: dspace.cfg

config.template.log4j.properties = ${dspace.dir}/config/log4j.properties

config.template.log4j-handle-plugin.properties = ${dspace.dir}/config/log4j-handle-plugin.properties

config.template.oaicat.properties = ${dspace.dir}/config/oaicat.properties

3) run init_config ant task

4) make new war files

5) tomcat config file:

$CATALINA_HOME/conf/logging.properties contains

java.util.logging.ConsoleHandler.level = FINE

java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter

6) Start tomcat

October 15, 2007

Classes in dspace that touch LUCENE

All Classes that contain Lucene are in the
package org.dspace.search

./src/org/dspace/search/DSAnalyzer.java -> ./src/org/dspace/search/DSIndexer.java

./src/org/dspace/search/DSTokenizer.java -> ./src/org/dspace/search/DSQuery.java


DSIndexer is used by several classes
jgrep -l DSIndexer

DSQuery is used by several classes
jgrep -l DSQuery

Other classed found in org.dspace.search

October 12, 2007

Data files for dspace

Location in Odin to svn source files

Get data files from odin and loading them in the database

scp -r silvi003@odin.lib.umn.edu:/mnt/agecon_export/dc_mixed_nodata .

Loading the files used the command:
./dsrun edu.umn.dspace.administer.BatchImporter -R -a -e silvi003@umn.edu -s /Users/silvi003/dc_mixed_data/dc_mixed_nodata

I tried to change the code:

In DSIndexer class setting
wipe_existing = true;
Usually false. allowed the program to run much longer.

now it dies with:

Exception in thread "main" java.sql.SQLException: bad_dublin_core SchemaID=1, contributor author_contact
at org.dspace.content.Item.update(Item.java:1468)
at org.dspace.content.InstallItem.installItem(InstallItem.java:146)
at edu.umn.dspace.administer.BatchImporter.addItem(BatchImporter.java:670)
at edu.umn.dspace.administer.BatchImporter.addItems(BatchImporter.java:557)
at edu.umn.dspace.administer.BatchImporter.createCommunityStructure(BatchImporter.java:430)
at edu.umn.dspace.administer.BatchImporter.createCommunityStructure(BatchImporter.java:500)
at edu.umn.dspace.administer.BatchImporter.main(BatchImporter.java:267)


2007-10-09 10:12:51,407 WARN org.dspace.content.Item @ silvi003@umn.edu::bad_dc:Bad DC field.
SchemaID=1, element: "contributor" qualifier: "author_contact" value: "Paterson,
Anna (anna@areu.org.af)"

Brad was able to edit the files and get some of the data to load. The word "urban" produced a useful search.

October 1, 2007

Things needed to set up dspace

- download eclipse
- eclipse svs plugin subclipse
- Tomcat
- postgress.jar
- config dspace.cfg files
- also see dspace.org

This is what you want to check out:
Local svn repository
To browse repository:

Dspace home:

There is a documentation link on left side that has installation

From Brad Teale