There is a way to extract items in a collection from dspace so that they have the form of a plane pdf and a flat xml files. This directories can be batch ingested back into dspace or another repository.
Finding a collection's ID
The file below shows how to find a collection's ID in dspace.
getCollectionID
Brad Teal's filter_media.sh script
The
filter-media.sh script will find all the handles of all the collections.
Execute the command to extract the data
[silvi003 /dspace/dspace-ir/bin]$ ./dsrun org.dspace.app.itemexport.ItemExport -t COLLECTION -i 29 -d /dspace/assetstore/udc_export/ima/ -n 0
Resulting Directory Structure
Resulting directories from ItemExport command.