July 2009 Archives

OIT Virtual Hosting Strategy and Roadmap

OIT has managed a virtual server environment using VMWare since 1999. Originally to support our "crash and burn" Windows servers, we quickly realized the value that virtual servers added to our environment. VM allows us to consolidate workload from several servers, making better use of physical hardware. By 2005, OIT upgraded to VMWare's ESX platform as the new virtual environment. And again in 2008, OIT refreshed the underlying hardware infrastructure that runs our virtual environment.

After 10 years of running VMWare, we need to step back and take a second look at how we manage virtual servers. Mike Coonrod, Mike Langus, and Patton Fast have developed a future roadmap for our virtual hosting environment, including a strategy to provide broader hosting options. This week, Patton and I reviewed this OIT Virtual Hosting Strategy and Roadmap document with Doug.

It doesn't end there, though. Our next step is to develop a customer service and implementation plan to move the project forward. But I want to take this opportunity to thank Mike, Mike and Patton (and anyone else on the Windows team who contributed to the strategy.) It's taken many months of hard work to get to this point. Excellent work!

System monitoring strategy

Erik and Darby have been putting together a review of our current monitoring system (Nagios) and creating a strategy for our future monitoring and notification vision. This week, Darby and Erik presented a draft of their strategy document to the Production Services team. The doc needs a few edits yet, so a final version may be available for distribution in another week or so.

This is excellent work, and I thank Erik and Darby for their efforts on this project. In the spirit of Simplify, Standardize, Automate, it's important for us to evaluate and re-examine our tool use. Are we using the right tools? How can we improve what we do? This system monitoring strategy should give us a boost in improving system reliability.

Kudos to Erik

In the spirit of Simplify, Standardize, Automate, Erik has updated the weekly Change Report. Those of you who attended this week's Change Review Meeting saw a much improved report.

The old report was very functional, and presented a quick summary of the upcoming changes. The report was formatted in line printer format (plain text, typewriter font) so unfortunately was not able to fit in much of the change description.

The new report, thanks to Erik, now uses an easier to read font, with use of bold text to highlight headings, and change description set apart using italics text. Because the report uses a smaller Helvetica font, we can show more (all?) of the change description. This means the Change Review Meeting can make better informed decisions about pending changes.

If you'd like to see a copy of the new report, I'm sure Erik would be glad to show you!

Kudos to Al

In all seriousness, I want to extend a kudos to Al Pierce. After the 35W bridge construction was complete, Al insisted we have a guard rail installed on our side of 35W, to protect the chiller and building transformer from stray cars. The assumption was that winter drivers might spill out on the ice, crash through the chain link fence, and severely damage the chiller.

This weekend, that guard rail paid for itself.

wbob near miss8.jpg
(thanks to Matt Kauffmann for the photo)

If you track the tire marks, you'll note that the car would have run directly into our data center chiller. The car stopped at the guard rail, but had enough momentum to bend the railing a little. I'm convinced the guard rail helped us avoid an outage this weekend.

Thanks, Al!

All-OIA staff meeting

I wanted to send a reminder for the All-OIA staff meeting. Also, note that the date has changed. We will meet on Wednesday, August 5, 2:00-3:30 (instead of July 29) in the cafeteria.

During the meeting, we will discuss:

  1. The OIT external review

  2. October 24th generator cut-over, and the DR test planned during this time.

  3. Configuration management.

  4. The importance of metrics.

We will have time at the end for Q&A. Thanks, and see you then!

Today's fire alarm

I'm not sure if this is the official word, but I spoke with one of the firefighter responders after the fire alarm this afternoon. She said this was a false alarm (instead of, say, someone burning their bag of microwave popcorn.)

I'd like to remind everyone that of the last 5 fire alarms (including today) none have been a planned fire drill. Except for today, each alarm was triggered by something. You never know if the alarm is due to burning toast or because the building really is on fire. I advise that whenever you hear the fire alarm, please stop what you are doing and quickly leave the building.

The official rallying point for OIA staff is in the Red Cross parking lot. While it may be tempting to leave for lunch (especially for noon-hour fire alarms) please do not just get in your car and go - stop by the rallying point to check in. Having a central location makes it easier for us to communicate information in the event of a real fire or disaster, without having to run around and find everyone.

Fortunately, today was not a real emergency, and I thank everyone for responding immediately to the fire alarm.

Change to OIT Active Directory Home

A reminder that your "Home" (H: drive) share will change this week, Friday July 17. Here's the announcement from the OIT weekly email, last week:

On July 17th the active directory team and FAST will be coordinating the move of OIT home folders to a new file share. This is necessary due to the decommissioning of the servers that the shares currently reside. Most Windows users see these shares as their H drive. For most Windows users the move will be mostly transparent.

The home directories will be migrated beginning at 6 p.m. on Friday, July 17, and will be available at the new location beginning at 10 p.m. that same night.

Issues for Windows users to be aware of:

  1. If you are syncing folders such as your My Documents folder, changes may need to be done to continue the sync. Please send OIT Fast an e-mail so that we can coordinate the sync changes for your computer.

  2. If you regularly use previous versions function for files in your home directory there will be few available after the move. Backups will still be available for 45 day using the traditional regular back up process. Let FAST know if you need something restored.

Mac users will experience the home directory move more directly. If you currently use the "connect to server" utility on your Mac to mount your home directory you will need to use the new location following the move. The old location is:

  • //internetid.ad.umn.edu/internetid$

The new location will be:

  • //OIT-HOME.ad.umn.edu/oit-home$/internetid

If you need assistance connecting to the new share after July 17th, please let OIT FAST know so they can promptly assist you.

If you haven't used your home folder on your Mac previously, directions for connecting to it are available at the OIT FAST web site, and will be updated shortly to reflect the new location. If you've never used the home folder on a Mac or a Windows computer it may not yet exist. Please log into an OIT Windows computer once for it to be created or contact OIT Fast so we can manually create it.

Making the case

Five years ago, OIT purchased our first SAN to provide a central storage location for servers within the University. Without previous experience with SAN technology, OIT grew its SAN offering "organically". This is a double-edged sword - while we followed customer demand and met the needs of the campus departments, we did so without a strategic direction. In February 2009, the OIT Senior Management Team issued a charge for a strategic plan for future storage and backup service delivery.

Jac Campbell, Jim Hall, Patton Fast, Pete Bartz, and Mark Hove presented that storage strategy this week to the OIT senior management. This was the culmination of a large effort by many people, including campus IT directors and other OIT staff. The strategy document identifies a roadmap for future investment in storage and backup, to better serve the university community. The strategy also includes a set of recommendations that OIT will work to implement over the next 5 years.

If you'd like to see the report, ask me or Jac for a copy. We'll try to get it added to the OIT Wiki.

The storage strategy document made a compelling argument on OIT's need to continue investing in our storage and backup infrastructure. Whether you are working on one of the current charge projects or simply want to sell an idea, I recommend you use this storage strategy as a model when making your case for something.

A strategy document should be able to stand on its own, without relying on you (in person) to frame the information in the document. For best "sticking" power, frame your narrative as a kind of "story". Specifically, a good strategy document follows this general flow:

  1. Describe how we got here (the "history")
  2. Talk about where we are (the "as-is")
  3. Define any issues we may have (the "gaps")
  4. Present your idea for what to do about it (the "vision" or "fit-gap")
  5. Estimate the costs to get there

Additionally, to have greater impact:

  • For that professional look, use a cover page with the document's title and authors.
  • Add an executive summary, so it's easy for a reader to pick up your document and get a general idea of what the document is about.
  • Your strategy should not be too long, or you will lose your audience. Resist the temptation to add material just to "pad out" the document.
  • If you must include extra material, consider adding it to the end in an Appendix. For very long documents, you may wish to move the Appendices into a separate "Supplemental Materials" document.
  • Don't forget to use a standardized template as part of "one OIT":

Download DOC
Download DOT

The Purpose of IT and Who is your Customer

Tim Gagner pointed me to this blog post: The Purpose of IT and Who is your Customer. It's a short, interesting read. In the article, the author addresses the questions "Who are we and why are we here?" and "Who is your customer?"

It's worth a read, but I can sum up his post with two bullet points:

  • IT exists to support the customer
  • In a technical job, you customer may very well be another technical professional

This is definitely how we operate in OIT. We play a support role to the rest of the university. Due to the support nature of what we do, it's true that most of our customers are other IT staff at the university, often just within OIT. Ultimately, we are here to serve students, faculty and staff. But we do that by serving our direct customers - other IT professionals. Who are your direct customers?

New temporary generator

By now, everyone should be aware of the new temporary generator that has been installed on the front green space. Mark Powell was kind enough to share this photo of the new generator being lifted into place, over the trees, onto the lawn:

0629091259.jpg

This temporary generator will be with us until late October or early November, and will provide backup power to the data center and parts of the WBOB building in the event of a total power outage.

A new, permanent generator will go online on October 24. You should see installation activities on the 35-W side of the building in the weeks before that.

Expect our conversion to the new generator to be an all-day event. During the cut-over we will need to power down everything in the WBOB data center.

During the hours when the WBOB data center will be off-line, OIA plans to execute a Disaster Recovery test. Most of you will be involved at some level with this DR test, so please plan your time accordingly. We'll bring up a key system at the alternate Church Street facility, verify the data, then bring down the DR system. The production system will come back up normally when WBOB power is restored.

End of an era

A year ago today, July 1 2008, the new PeopleSoft Enterprise Financials System ("EFS") went live. As of that moment, the old CUFS system running on the Mainframe went into "legacy" mode, running reports. All new financial data went into EFS.

Today, July 1 2009, the Mainframe has officially been decommissioned. This was the last step in a year-long system retirement plan. It's an end of an era.
tapes.jpg
mainframe.jpg
I'd like to take a moment to recognize those of you who assisted with the Mainframe, either during its production run, or in the recent shutdown. The University has relied on the Mainframe as our primary enterprise computing system for more than 30 years. The Mainframe is a solid platform, designed to handle very high volume input and output (I/O) and emphasize throughput computing. The Mainframe was the first to offer virtual machines, each with its own protected address spaces. Our IBM zSeries Mainframe ran up to 4 "logical partitions", although I believe by the time EFS went live we only had 3 LPARs.

Yet, times change. In the last 10-15 years, the IT industry has increasingly replaced Mainframe platforms with "open systems" (UNIX, Windows, ..) and personal computer networks. But we shouldn't forget the roots of this industry workhorse.

Current projects

I wanted to recognize some of the improvement projects across OIT, and within OIA. Everyone has put so much time into these, but we are already seeing benefits from these efforts. The following are the charge projects that are currently underway:

Monitoring strategy
Define a strategy and approach to support automated monitoring of all our systems. Our current system uses Nagios.

Job scheduling
Complete an Autosys job scheduling strategy and approach document that documents our future state and defines our short term/tactical changes needed to support current scheduled jobs.

Statistics strategy
Define our statistics and data-gathering services, and better define our partnerships with our customers and units. Our current data-gathering system is Cacti, but we believe we can do much more using SAS.

Database hosting
Develop a roadmap for our database hosting services, and better define our partnerships with our customers and units. This project specifically addresses Oracle, but will serve as a model for future database hosting efforts.

Change control
Develop an agreed upon method for scheduling and transparent application of changes to impact University business minimally; ensure that an agreed upon method and process is in place for change control.

Storage architecture
Provide a clear roadmap for our storage related services and better define our partnerships with our customers and units such as MSI. Review our in-place technical infrastructure and define our long-term strategy with respect to our storage and backup architecture and storage related services. This project has essentially completed, waiting for the wrap-up meeting on July 10.

VMWare strategy
Create a roadmap to support the growing number of virtual servers, by building on our current VMWare platform. This project also sets a goal for OIT to convert more systems (UNIX and Windows) to VMWare. This is almost complete, and the final draft of the report should be ready next week.

Service Center
Investigate and document the current state of our Service Center implementation. Additionally, we are requesting that an application roadmap be created that sets direction for our continued support of the Service Center applications and any and all services that utilize the tool. The Service Center cross-OIT group will review the final draft of the report at next week's meeting, ready for the project wrap-up meeting on July 9.