On Monday, June 30, e-mail users were temporarily unable to send messages with desktop e-mail clients (such as Mozilla Thunderbird and Microsoft Outlook), although they were still able to receive messages and could still send messages through GopherMail. This problem occurred because OIT outgoing e-mail authentication failed. OIT was not aware of the issue until trouble tickets were opened by customer service representatives on Monday morning.
OIT has taken steps to prevent this problem from happening in the future including the addition of another monitor that will automatically check the operational status of desktop initiated outgoing e-mail service. The new monitor will check the system every several minutes to verify that messages are being sent and received as expected.
This issue has been resolved on the server side; however, if you are still experiencing trouble sending e-mails contact 1-HELP at (612) 301-4357 to get help troubleshooting the issue with your client.
Technical Details
The x.500 directory stores several variations of an individual user's password, each having been generated by a different hashing algorithm, each intended for use by an application that requires a specific hash.
Authenticated SMTP is the exclusive user of one of these variations.
Each Sunday the three copies of the directory are reloaded, one at a time, to ensure consistency. The reload on Sunday 6/29 ended at 22:52. The logs between 23:00 and 00:00 did not indicate any problems.
At 6:37 our on-call person was paged regarding problems authenticating via SMTP.
He was able to duplicate the problem and responded to operations at 6:44.
Investigation revealed that the SMTP authentications had begun to fail intermittently sometime after midnight, with the failures increasing in frequency after 01:00.
A visual inspection of the directory-resident SMTP hash revealed that the format was inconsistent with expectations. Further investigation indicated that the format change had been introduced during the Sunday reload, but that system caching had masked the problem until after midnight.
The format change was eventually tracked to a change that was made on Saturday
to the hashing algorithm that is required to LDAP v3 authentication. LDAP v3
authentication is not a current production service but will be in the near future. This change had an unintended side effect on the SMTP hash, which shared an undocumented dependency.
Once the problem had been identified, re-population of the SMTP hash was begun. The initial correction was targeted to users that had authenticated via SMTP within the last 10 days, on the presumption that these users would be more likely to have noticed the problem. The changes for this population (21,193) were generated by 9:05. At that time, the application of these changes was estimated to take approximately one hour.
Problem reports continued after the changes had been applied (10:05). Many of the reports now indicated intermittent failure: sometime working, sometimes not, or working on one client but not working on another. Initially, the continued problems were thought to be the result of system caching, and efforts were made to systematically clear all caches. These efforts proved to be unproductive. Eventually, it was determined that not all copies of the directory had had the changes correctly applied.
At 13:15, when the changes were re-applied to all copies of the directory, problem reports were dramatically reduced. At this point, exceptions could still have occurred for people that had e-mail clients that cached the failed attempt, and for people that had not authenticated within 10 days.
This population was addressed on a case-by-case basis, and was completely resolved by a full reload, which ended around 5:15PM. We have not had any indication that badly formatted SMTP hashes have been encountered since then.