Office of Information Technology

« June 2008 | Main | August 2008 »

July 10, 2008

SyncML client for iPhone will not work with UMCal Calendars

Synthesis, the company that makes the SyncML client for UMCal, has announced that due to limitations with the Apple software development kit (SDK) the SyncML client they are producing will only work to synchronize contacts. Calendar synchronization will not be available at this time. For more information see www.synthesis.ch/iphone.php?lang=e&lay=desk. iPhone users are able to access their UMCal account through their Web browser at http://umcal.umn.edu.

July 02, 2008

Postmortem: Power Outage Affecting Twin Cities Campus Systems

At approximately 10:45 a.m. on Tuesday, July 1, the Xcel Energy power transformer feeding the West Bank Office Building (WBOB) exploded and subsequently caught fire. A number of campus-wide systems and services that are housed in that building were unavailable until approximately 10 p.m. OIT proactively shut down many of these systems in order to achieve a stable power situation.

Once a consistent power supply was established, OIT staff worked throughout the day and evening to bring the systems back online.

By roughly 10 p.m., all critical OIT enterprise servers and applications were back online and verified as functioning properly.

Technical Details

The Xcel Energy power transformer immediately adjacent to and feeding the West Bank Office Building (WBOB) exploded and subsequently caught fire. Due to this, power being supplied to WBOB and the OIT Data Center (OIT DC) within WBOB was severed.

The backup power generator supplying the DC caught the power drop and functioned as required. Upon power loss from Xcel, full generator power was being supplied at all times. The power event with the transformer and the automatic switch to generator power caused the DC air chillers to automatically shut down. This is normal operating procedure. However, once on generator power, the chillers attempted to restart as expected, causing the load being requested of the generator to exceed available capacity. Because of this, the Uninterruptable Power Source (UPS) units within the DC were unable to achieve a steady-state power situation. Subsequently, the UPS units drained their available battery capacity and then went offline. These issues resulted in an unstable power situation for the data center and numerous rounds of the power to the DC itself being lost.

These events caused many of the enterprise servers to crash and a resulting loss of software applications to customers. Unfortunately, the new Enterprise Financials System was eventually affected by this power event even though the majority of its infrastructure resides in the OIT 90 Church Street data center.

In order to achieve a consistent power situation, OIT requested that all non-critical servers be shut down, allowing the chillers to come back online and put the DC into a stable power situation for the remainder of the day.

By approximately 6 p.m. Xcel Energy had a new transformer in place and functioning. At that time WBOB and the DC were switched off generator and back to line power. A notification went out to OIT and all tenants with the "All Clear" to bring up any servers.

WebMail Pro, 3.0 decommissioning reminder

The WebMail Pro and WebMail 3.0 decommission schedule has been extended to August 4, 2008. These systems were initially set to be decommissioned July 1.

GohperMail will continue as a Web-based e-mail application for central e-mail accounts and is available at www.mail.umn.edu. University e-mail accounts will remain the same and e-mail addresses will not change. Any e-mail located in WebMail Pro/3.0 also is available in GopherMail.

WebMail Pro for non-central systems mail used by departmental e-mail servers also will be decommissioned.

Send questions or comments to webmail@umn.edu.

July 01, 2008

Post mortem: Desktop Clients Unable to Send E-mail (6/30/2008)

On Monday, June 30, e-mail users were temporarily unable to send messages with desktop e-mail clients (such as Mozilla Thunderbird and Microsoft Outlook), although they were still able to receive messages and could still send messages through GopherMail. This problem occurred because OIT outgoing e-mail authentication failed. OIT was not aware of the issue until trouble tickets were opened by customer service representatives on Monday morning.

OIT has taken steps to prevent this problem from happening in the future including the addition of another monitor that will automatically check the operational status of desktop initiated outgoing e-mail service. The new monitor will check the system every several minutes to verify that messages are being sent and received as expected.

This issue has been resolved on the server side; however, if you are still experiencing trouble sending e-mails contact 1-HELP at (612) 301-4357 to get help troubleshooting the issue with your client.

Technical Details
The x.500 directory stores several variations of an individual user's password, each having been generated by a different hashing algorithm, each intended for use by an application that requires a specific hash.

Authenticated SMTP is the exclusive user of one of these variations.

Each Sunday the three copies of the directory are reloaded, one at a time, to ensure consistency. The reload on Sunday 6/29 ended at 22:52. The logs between 23:00 and 00:00 did not indicate any problems.

At 6:37 our on-call person was paged regarding problems authenticating via SMTP.
He was able to duplicate the problem and responded to operations at 6:44.

Investigation revealed that the SMTP authentications had begun to fail intermittently sometime after midnight, with the failures increasing in frequency after 01:00.

A visual inspection of the directory-resident SMTP hash revealed that the format was inconsistent with expectations. Further investigation indicated that the format change had been introduced during the Sunday reload, but that system caching had masked the problem until after midnight.

The format change was eventually tracked to a change that was made on Saturday
to the hashing algorithm that is required to LDAP v3 authentication. LDAP v3
authentication is not a current production service but will be in the near future. This change had an unintended side effect on the SMTP hash, which shared an undocumented dependency.

Once the problem had been identified, re-population of the SMTP hash was begun. The initial correction was targeted to users that had authenticated via SMTP within the last 10 days, on the presumption that these users would be more likely to have noticed the problem. The changes for this population (21,193) were generated by 9:05. At that time, the application of these changes was estimated to take approximately one hour.

Problem reports continued after the changes had been applied (10:05). Many of the reports now indicated intermittent failure: sometime working, sometimes not, or working on one client but not working on another. Initially, the continued problems were thought to be the result of system caching, and efforts were made to systematically clear all caches. These efforts proved to be unproductive. Eventually, it was determined that not all copies of the directory had had the changes correctly applied.

At 13:15, when the changes were re-applied to all copies of the directory, problem reports were dramatically reduced. At this point, exceptions could still have occurred for people that had e-mail clients that cached the failed attempt, and for people that had not authenticated within 10 days.

This population was addressed on a case-by-case basis, and was completely resolved by a full reload, which ended around 5:15PM. We have not had any indication that badly formatted SMTP hashes have been encountered since then.

Power Outage Affecting Twin Cities Campus Systems

At approximately 10:45 a.m. on Tuesday, July 1, the Xcel Energy power transformer feeding the West Bank Office Building (WBOB) exploded and subsequently caught fire. A number of campuswide systems and services that are housed in that building were unavailable until approximately 10 p.m. OIT proactively shut down many of these systems in order to achieve a stable power situation. Some of the affected systems include the following:
-- E-mail (for some users)
-- PeopleSoft Campus Solutions
-- UM Reports
-- WebVista
-- UMContent
-- NetFiles
-- Enterprise Financial System

Once a consistent power supply was established, OIT staff worked throughout the day and evening to bring the systems back online. Check the System Status page at: http://systemstatus.umn.edu for up-to-date information about OIT systems, or call 1-HELP at (612) 301-4357.