| metadata_field_id | name |
| 3 | author |
| 15 | date issued |
| 25 | uri |
| 27 | abstract |
| 40 | Institution/Association |
| 43 | Series/Report |
| 57 | Keyword |
| 63 | JEL Codes |
| 64 | Title |
| 67 |
| handle_id | handle | resource_type_id | resource_id |
| 5 | 2204 | 2 | 2 |
| item_id | submitter_id | in_archive | withdrawn | last_modified | owning_collection |
| 2 | 1 | t | f | 2007-12-13 16:53:15.767-06 | 1 |
| items_by_title_id | item_id | title | sort_title |
| 3727 | 2 | xponentially growing solutions for inverse problems in PDE | xponentially growing solutions for inverse problems in pde |
| resource_type | resource_type_id |
| BITSTREAM | 0 |
| BUNDLE | 1 |
| ITEM | 2 |
| COLLECTION | 3 |
| COMMUNITY | 4 |
| SITE | 5 |
| GROUP | 6 |
| EPERSON | 7 |
| Log Name | Number in Log | Found Apache Match | Apache needs SQL |
| view_bitstream | 10772 | Y | Y |
| view_item | 4462 | Y | N |
| view_collection | 2084 | Y | Y |
| view_community | 582 | Y | Y |
Basically DSPACE does not properly close connections to the SQL server. When the pool is exhausted it generates error messages. This may be more of a problem now because OAI is available (climbing down the tree will hit the DB a lot) or there is another SQL-Injection attack, or UDC just may be more popular. I did not explore the probable increased load.
A more detailed explanation of the error is given below, with a possible fix. To step up the fix given on the web I need some privileges on strip3. I have asked CCO for them.
Jeff
1) Problem Indicated in the Logs
Starting at
2008-09-17 08:23:08,769
and ending at
2008-09-17 08:53:29,729 (When Bill restarted DSPACE).
There were 330 error messages of the type:
2008-09-17 08:53:29,729 WARN org.dspace.app.webui.servlet.DSpaceServlet @
anonymous:no_context:database_error:org.apache.commons.dbcp.SQLNestedException:
Cannot get a connection, pool exhausted
An error messages of this sort would generate an error screen.
2) Fix from University of Michigan
In dspace-tech the University of Michigan team addresses this problem by closing
prossess that have the phrase 'idle in transaction' when displayed by ps.
see
http://www.mail-archive.com/dspace-tech@lists.sourceforge.net/msg01057.html
3) Confirmation that our problem matches University of Michigan's
I have checked strip3 (where the postgres database lives) and
between 08:54 (When Bill restarted DSPACE) and 11:22,
there have been 45 processes created that have the form:
postgres 15047 0.0 0.0 86304 4964 ? S 10:58 0:00 postgres: dspace_ir dspace_ir 134.84.135.19 idle
So we are building up these "idle" processes on the DB side and it is likely that the system will crash again, unless we put in the Michigan fix
# # Filter media # 1 0 * * * /dspace/dspace-ir/bin/filter-media.sh > /dspace/dspace-ir/log/filter-media.log 2>&1It was noted that this process was taking up to eight hours to run and impacting the users.
Creating search index:
Applying Media Filters
2008-02-11 08:07:19,271
INFO org.dspace.core.ConfigurationManager @ DSpace logging installed using log4j.properties
Exception in thread "main" java.lang.IllegalArgumentException: Cannot resolve 4938 to a DSpace object
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:192)
Applying Media Filters
2008-02-11 08:07:19,839 INFO org.dspace.core.ConfigurationManager
@ DSpace logging installed using log4j.properties
2008-02-11 08:07:20,160 INFO org.dspace.content.MetadataField
@ Loading MetadataField elements into cache.
2008-02-11 08:07:20,199 INFO org.dspace.content.MetadataSchema @ Loading schema cache for fast finds
SKIPPED: bitstream 16263 because 'LIFE_SCIENCEs_PREDESIGN_REPORT041504_.pdf.txt' already exists
SKIPPED: bitstream 16261 because 'equine_predesign_may04.pdf.txt' already exists
SKIPPED: bitstream 16259 because 'EducationalFacilitiesPredesignStudyFinal.pdf.txt' already exists
ERROR filtering, skipping bitstream #16251 java.lang.ArrayIndexOutOfBoundsException: 4
java.lang.ArrayIndexOutOfBoundsException: 4
at org.fontbox.cmap.CMapParser.parseNextToken(CMapParser.java:294)
at org.fontbox.cmap.CMapParser.parse(CMapParser.java:103)
at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:535)
at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387)
at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325)
at org.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:80)
at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452)
at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:215)
at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:110)
at org.dspace.app.mediafilter.MediaFilter.processBitstream(MediaFilter.java:155)
at org.dspace.app.mediafilter.MediaFilterManager.filterBitstream(MediaFilterManager.java:327)
at org.dspace.app.mediafilter.MediaFilterManager.filterItem(MediaFilterManager.java:296)
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersItem(MediaFilterManager.java:266)
at org.dspace.app.mediafilter.MediaFilterManager.applyFiltersCollection(MediaFilterManager.java:260)
at org.dspace.app.mediafilter.MediaFilterManager.main(MediaFilterManager.java:202)
ERROR filtering, skipping bitstream #16250 java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException
SKIPPED: bitstream 16249 because 'Volume_II-Appendix2.pdf.txt' already exists
ERROR filtering, skipping bitstream #16248 java.lang.ArrayIndexOutOfBoundsException
java.lang.ArrayIndexOutOfBoundsException
SKIPPED: bitstream 15486 because 'AHC_FacilitiesMasterPlan.pdf.txt' already exists
SKIPPED: bitstream 13492 because 'Vet_med_facilities_development_plan_FINAL.pdf.txt' already exists
SKIPPED: bitstream 13490 because 'SPH_CONSOLIDATION.pdf.txt' already exists
SKIPPED: bitstream 13488 because 'AHC_strategic_facility_plan_1998.pdf.txt' already exists
SKIPPED: bitstream 13486 because 'AHC_Precinct_Plan_Report_Final_May_2006.pdf.txt' already exists
SKIPPED: bitstream 13484 because 'AHC_Mpls_District_Plan_2000.pdf.txt' already exists
Creating search index:
Creating browse index
Indexing all Items in DSpace....2008-02-11 08:17:24,358
INFO org.dspace.core.ConfigurationManager @ DSpace logging installed using log4j.properties
2008-02-11 08:17:25,315
INFO org.dspace.content.MetadataField @ Loading MetadataField elements into cache.
2008-02-11 08:17:25,357
INFO org.dspace.content.MetadataSchema @ Loading schema cache for fast finds
... Done
Creating search index
2008-02-11 08:19:57,683 INFO org.dspace.core.ConfigurationManager @
DSpace logging installed using log4j.properties
Hi Jeff, The Journal listing is created in the Community.java class under org/dspace/content. It is using the following SQL statement: SELECT DISTINCT(community.community_id), name, short_description, introductory_text, logo_bitstream_id, copyright_text, side_bar_text FROM community, community2item WHERE community2item.item_id IN (SELECT item_id FROM metadatavalue WHERE metadata_field_id=(SELECT metadata_field_id FROM metadatafieldregistry WHERE element='type' AND qualifier IS NULL) AND text_value IN ('Journal Article', 'Submitted Journal Article')) AND community.community_id=community2item.community_id ORDER BY (name) ASC; Basically, it is looking in the metadatafieldregistry table for element=type and an empty qualifier with a text_value of either: Journal Article or Submitted Journal Article. These were the terms defined during the initial requirements gathering of these pages. If something else is defined as a Journal type it does require a code change. It would be nice to move these text_value values into the configuration so changes don't require code modifications. Let me know if you have additional questions. Brad
Steps:
1) Copy edit-metadata.jsp to local -> confirm that new jsp is "live"
2) Confirm edu.umn.dspace.submit.step.DescribeStep is alive
3) Make "email_name" type from "name" type ... do not modify any code in "email_name" yet
4) Drop DCPersonName in "email_name" replace with string
5) Add three text fields to "email_name" in edit-metadata.jsp
6) Fix string in "email_name" code in edu.umn.dspace.submit.step.DescribeStep to handle 3rd field
Netscape users please note: By default, the window brought up by clicking "Browse..." will only display files of type HTML. If the file you are uploading isn't an HTML file, you will need to select the option to display files of other types. are available.
--%> <%-- Louise Letnes and Julia Kelly wanted these messages deleted from the top of the upload pagePlease also note that the DSpace system is able to preserve the content of certain types of files better than other types. and levels of support for each are available.
--%> <%--The files below were changed to make ag econ sort, by clicking the headers of the tables.
SR/trunk/config/dspace.cfg
SR/trunk/jsp/local/search/results.jsp
SR/trunk/src/org/dspace/app/webui/servlet/SimpleSearchServlet.java
SR/trunk/src/org/dspace/search/DSIndexer.java
SR/trunk/src/org/dspace/search/DSQuery.java
SR/trunk/src/org/dspace/search/QueryArgs.java
SR/trunk/src/edu/umn/dspace/app/webui/jsptag/ItemListTag.java
This is R 57 in the SVN Repository
In the file ./config/dspace.cfg one finds:
search.index.1 = author:dc.contributor.*
search.index.2 = author:dc.creator.*
search.index.3 = title:dc.title.*
search.index.4 = keyword:dc.subject.*
search.index.5 = abstract:dc.description.abstract
search.index.6 = author:dc.description.statementofresponsibility
search.index.7 = series:dc.relation.ispartofseries
search.index.8 = abstract:dc.description.tableofcontents
search.index.9 = mime:dc.format.mimetype
search.index.10 = sponsor:dc.description.sponsorship
search.index.11 = identifier:dc.identifier.*
search.index.12 = language:dc.language.iso
search.index.13 = date:dc.date.issued
I added the line that is bolded.
After this line is added you must run:
ant init_configs -- update the config system
ant install_code -- compile the indexer code
And then the script below to reindex lucence:
/usr/local/dspace-sr-dev/bin/index-all
John & Brad,
Below is what I have the last few days with dspace.
Jeff
Attempt to use lucene to sort fields:
1) Examined work by Rooma who attempted to solve the problem.
2) She tried to use the lucene engine to sort the fields -> I tested lucence sort.
3) lucene will not sort tokenized fields.
4) Requests have been sent to lucene and dspace to create sortable tokenized fields. There seems to be some internal debate as to whether this is wise/possible.
5) Used lucuene 2.2 jar to dump all attributes of fields stored in our lucene DB (we are using the 2.0 jar which does not have this feature and I will return to the original jar).
6) The "isTokenized" attribute has the value “true� for all the fields except the field named “handle�.
7) In its current state, none of the fields of interest are sortable by lucene.
Unique problem of date field:
1) “date� field is not stored in lucence.
2) Likely generated in the jsp for the 10 records that are displayed.
3) derived from direct call to sql db?
My plans:
1) I talked to Bill and he says there is a way to index a field twice, as both tokenized and non-tokenized. I will explore this idea to make our fields sortable.
2) Brad and I have discussed the "date problem". Could go directly to sql or fix lucence.
Gains:
1) The lucuene 2.2 jar allows me to peer into the lucene DB and display all the properties of the stored fields.
1) stop tomcat
2) fix log level in dspace config file: dspace.cfg
config.template.log4j.properties = ${dspace.dir}/config/log4j.properties
config.template.log4j-handle-plugin.properties = ${dspace.dir}/config/log4j-handle-plugin.properties
config.template.oaicat.properties = ${dspace.dir}/config/oaicat.properties
3) run init_config ant task
4) make new war files
5) tomcat config file:
$CATALINA_HOME/conf/logging.properties contains
java.util.logging.ConsoleHandler.level = FINE
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter
6) Start tomcat
All Classes that contain Lucene are in the
package org.dspace.search
Input:
./src/org/dspace/search/DSAnalyzer.java -> ./src/org/dspace/search/DSIndexer.java
Query
./src/org/dspace/search/DSTokenizer.java -> ./src/org/dspace/search/DSQuery.java
--------------------------------------------------------------------------------------------
DSIndexer is used by several classes
jgrep -l DSIndexer
./src/org/dspace/app/mediafilter/MediaFilterManager.java
./src/org/dspace/app/webui/servlet/admin/EditItemServlet.java
./src/org/dspace/content/Collection.java
./src/org/dspace/content/Community.java
./src/org/dspace/content/InstallItem.java
./src/org/dspace/content/Item.java
./src/org/dspace/search/DSIndexer.java
./src/org/dspace/search/DSQuery.java
DSQuery is used by several classes
jgrep -l DSQuery
./src/org/dspace/app/webui/servlet/ControlledVocabularySearchServlet.java
./src/org/dspace/app/webui/servlet/SimpleSearchServlet.java
./src/org/dspace/search/DSQuery.java
Other classed found in org.dspace.search
./src/org/dspace/search/Harvest.java
./src/org/dspace/search/HarvestedItemInfo.java
./src/org/dspace/search/QueryArgs.java
./src/org/dspace/search/QueryResults.java
Loading the files used the command:
./dsrun edu.umn.dspace.administer.BatchImporter -R -a -e silvi003@umn.edu -s /Users/silvi003/dc_mixed_data/dc_mixed_nodata
I tried to change the code:
In DSIndexer class setting
wipe_existing = true;
Usually false. allowed the program to run much longer.
now it dies with:
Exception in thread "main" java.sql.SQLException: bad_dublin_core SchemaID=1, contributor author_contact
at org.dspace.content.Item.update(Item.java:1468)
at org.dspace.content.InstallItem.installItem(InstallItem.java:146)
at edu.umn.dspace.administer.BatchImporter.addItem(BatchImporter.java:670)
at edu.umn.dspace.administer.BatchImporter.addItems(BatchImporter.java:557)
at edu.umn.dspace.administer.BatchImporter.createCommunityStructure(BatchImporter.java:430)
at edu.umn.dspace.administer.BatchImporter.createCommunityStructure(BatchImporter.java:500)
at edu.umn.dspace.administer.BatchImporter.main(BatchImporter.java:267)
log:
2007-10-09 10:12:51,407 WARN org.dspace.content.Item @ silvi003@umn.edu::bad_dc:Bad DC field.
SchemaID=1, element: "contributor" qualifier: "author_contact" value: "Paterson,
Anna (anna@areu.org.af)"
Brad was able to edit the files and get some of the data to load. The word "urban" produced a useful search.
- download eclipse
- eclipse svs plugin subclipse
- Tomcat
- postgress.jar
- config dspace.cfg files
- also see dspace.org