« form feed is an illegal character for dspace upload | Main | SQL: extracting the purl and title from dspace given a handle »

DSPACE performance problem traced to file of indices missing

Summary sent to users

For the last 25 - 30 hours there has been major troubles on the AgEcon side. It turns out that the file containing all the data from the index all run was missing. Lacking this file produced a large number of bizarre and severe problems, I have run index all on the AgEcon side and the system now seems to be OK. I can only guess that the last index all failed and the file was not created. Several parts of they system rely on this file and failed. I am curios. How much trouble did you see on the UDC side.

Symptoms

Spikes to over 100% cpu usesage on both strip1 (tomcat) and strip3 (postgres) boxes
search fails
browse fails
epeople could not be created
input form fails after being partially filled out
bouncing tomcat and postgres produce only minutes of proper behavior

Some technical details

The file:
/dspace/assetstore/dspace-sr/search/segments
was missing. One of the error messages pointed to this problem. This file is the output of the indexer
I ran the command to reindex the metadata:

dsrun org.dspace.search.DSIndexer -c &

As soon as the command above started, the users could enter upload files and do searches. It has been about six hours since the metadata was indexed and all seems well.

eperson table

Initially I thought the problem may be in the eperson table. I do not believe that this is the case. There were 2003 epeople and I found a three that were clearly flawed:

Here is what we want an eperson to look like:

Table "public.eperson"
Column | Type | Modifiers
---------------------+-----------------------------+-----------
eperson_id | integer | not null
email | character varying(64) |
password | character varying(64) |
firstname | character varying(64) |
lastname | character varying(64) |
can_log_in | boolean |
require_certificate | boolean |
self_registered | boolean |
last_active | timestamp without time zone |
sub_frequency | integer |
phone | character varying(32) |
netid | character varying(64) |
Indexes:
"eperson_pkey" primary key, btree (eperson_id)
"eperson_email_key" unique, btree (email)
"eperson_email_idx" btree (email)
"eperson_netid_idx" btree (netid)

So the epeople were missing passwords and other critical fields. They were all deleted.
426 | newuser426 | | | | | | | | |
93 | newuser93 | | | | | | | | |
486 | aaea@umn.edu | | Registration | aaea09 | t | f | | | |

cpu performance plots

The problem happened on May 3 and into May 4. strip1-cpu.tiff strip3-cpu.tiff Raw cpu data

Some commands found along the way

get postgres processes

Postgresql equivalent of Mysql 'SHOW PROCESSLIST' SELECT * FROM PG_STAT_ACTIVITY;

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)