« items in DSAPCE with in_archive =f and withdrawn = f | Main | Reason why we could not upload foxml file (agecon_top.xml) »

Switch in UDC Media filter.

I changed the Media Filter so that it would not use the unix nice command when it launches. This should speed up the process.

Crontab

@reboot /sbin/service httpd start @reboot sudo -u tomcat /dspace/bin/start_tomcat.sh # day of week (0 - 6) (Sunday=0) 10 1 * * 6 /dspace/dspace-ir/bin/media_launch.sh 30 22 * * 1 /dspace/dspace-sr/bin/index-all-cron 30 22 * * 2 /dspace/dspace-ir/bin/index-all-cron 30 22 * * 3 /dspace/dspace-sr/bin/index-all-cron 30 22 * * 4 /dspace/dspace-ir/bin/index-all-cron 30 22 * * 5 /dspace/dspace-sr/bin/index-all-cron

media_launch.sh

tstamp=`date "+%Y%m%d_%H:%M"` echo $tstamp nice /dspace/dspace-ir/bin/filter-media.sh > /dspace/dspace-ir/log/filter-media.sh_$tstamp.log 2>&1 cd /dspace/dspace-ir/bin/ /dspace/dspace-ir/bin/index_check_and_email.sh

filter-media.sh

Note the "-n" in filter-media means that the index will not be made after each collection is OCRed. Also in the runs using "nice" the "-n" was also used.
#!/bin/sh # This script grabs the handles of each collection # in a DSpace DB instance. Then loops through the # handles and run the full-text indexer against each # collection. # This is done to fix out of memory errors, # PDFs that are too large for full-text indexing, # and when filter-media (java app) fails now full # text indexing continues on other collections. # Setup the environment JAVA_HOME=/opt/jdk1.5.0_10 PATH=$JAVA_HOME/bin:/opt/ant/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin export PATH JAVA_HOME dbname="dspace_ir" username="read_only" hostname="strip3.oit.umn.edu" # Determine if we have Postgres client installed which psql > /dev/null if [ $? -ne 0 ] then echo echo "psql not found in your PATH, please add to your PATH and re-run script" echo exit 1 fi print_usage() { echo 1>&2 "Usage: $0 [-d dbname] [-u username]" exit 1; } while getopts d:hu: o do case "$o" in d) dbname="$OPTARG";; h) print_usage;; n) hostname="$OPTARG";; u) username="$OPTARG";; [?]) print_usage;; esac done echo_cmd="echo SELECT handle FROM handle WHERE resource_type_id=3;" psql_cmd="psql -t -U $username -h $hostname $dbname" BINDIR=`dirname $0` for handle in `$echo_cmd | $psql_cmd` do $BINDIR/filter-media -n -i $handle done $BINDIR/index-all

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)