Issue
Starting at 9:35 p.m. Friday, May 20, 2011, email services were unavailable to some University email users as a result of problems experienced by the SAN storage frame located in one of OIT's main data centers. The problems with the storage frame caused three email servers in that location to cease functioning. This issue affected approximately 28,000 out of the University's 146,000 email accounts. Affected users may have experienced long delays or no response when attempting to access their email, and they were unable to send and receive email.

Email services were fully restored to most customers the next morning, Saturday, May 21, 2011, when two of the three email servers were brought back online. Email services could not be restored on the third server, and approximately 7,000 users could still not send, receive, or view email on Monday, May 23, 2011, prompting the initiation of the disaster recovery plan. Email engineers set up new accounts for affected users enabling them to send and receive email while their historical email was being restored. This was completed on Tuesday, May 24 for faculty and staff accounts, and Wednesday, May 25 for students. Historical email was restored from backups to customer accounts on Sunday, May 29, 2011, at which point customers were able to access their historical email.

Cause
On the evening of Friday, May 20, a vendor was attempting to troubleshoot an ongoing issue with a failed drive on the SAN storage frame, which was caused by an unexpected power outage a few weeks prior. During a scan, the drive went "Not Ready" (i.e. failed) which caused the three email servers to crash.

Resolution
The problem with the storage frame was resolved when the vendor replaced a bad physical disk on Saturday morning. This allowed two of the three email servers to be put back into service. The third email server's file system was damaged beyond repair. Services could only be restored through a full file system recovery to a spare server, which, because of the high volume of accounts on that server, took over 50 hours to complete. During the recovery process, new email accounts for customers were set up temporarily on other email servers and the queued mail that had accumulated was delivered. After the full file system recovery was completed, the affected accounts were all moved to this "spare" server, and historical email was restored under a hierarchy named "mail/restored_email," allowing users and/or technical support staff to merge their folders back into their original locations.

Next Steps
It is extremely expensive to support email at the scale that is necessary for the University of Minnesota, and the legacy email infrastructure is no longer designed to meet the University's business needs. The University is therefore in the process of migrating its email support to Google, which offers a more reliable platform than our current one and is designed to provide a higher level of service delivery. Our goal is to migrate all users away from and then decommission the aging, expensive email system that we have, without investing more money in it.

If your account is eligible for migration to Google, please migrate as soon as possible. All student (undergraduate and graduate), faculty and staff accounts--except accounts affiliated with the Academic Health Center (AHC)--are eligible for University of Minnesota Google accounts. Alumni and retirees that were eligible before they became an alumni or retiree, also are eligible. Learn more about Google Apps for the University of Minnesota.