« java code in dspace to get collections an item belongs to | Main | Description of OAI properties »

Dspace Handles collections with problems

Summary

In UDC there are Dspace Handles that are collections(resource_type_id =3) but do not show up in the collections table.

Finding the problem in the media filter log

I was looking at the filter media log, dspace-ir_filter-media.log, and found many errors of the form:

Exception in thread "main" java.lang.IllegalArgumentException: Cannot resolve 4394 to a DSpace object This means that when the handle was put into the static method HandleManger.resolveObject, a null resulted.

A list (handle set 1) of these handles was obtained using the UNIX command line below:

grep 'Cannot.resolve' dspace-ir_filter-media.log | perl -p -i -e 's/^.*Cannot resolve (\d+).*$/\1/g' | sort | uniq | sort -g

Look at handles that produce error inside of Postgres

One can go to the handle table and get the resource_id for one of the handles on the list. Using this resource_id, no entry can be found in the collection table. I think these are collections that were deleted. They were removed from the collection table but not from the handle table.

sql to get collection handles

old sql command ... pulls handles that have null and valid values in the collection table

The handles that were input to filter-media came from the sql cmd below:

SELECT handle FROM handle WHERE resource_type_id=3;

The above command will grab both good collection handles and handles that have no entry in the collection table.
Handles using this command (good and bad collection handles combined: handle set 2).

new sql command only pulls handles that have valid values in the collection table

The command below will only pull handles that have valid collection_ids (i.e. exist in the collection table).

SELECT handle FROM collection, handle WHERE collection_id=resource_id AND resource_type_id=3 ORDER BY handle::text::integer;

Handles using improved command handle set 3 (only handles that exist in the collection table).

Quick sanity check

handle set 1 maps to null collections.
handle set 2 maps to all collections in the handle table both null and non-null.
handle set 3 maps to non-null collections

So we would expect:
1) There to be no overlap between handle set 1 and handle set 3.
2) The combined contents of handle set 1 and handle set 3 should be equal to the contents of handle set 2.
Both 1 and 2 are correct.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)