May 10, 2008

"Live from Minnesota", Drupal Search Sprint 2008

Just winding down now from a long day of thinking about and working on Drupal search, I thought I spend five minutes jotting down a few impressions.

The People

Robert Douglass, Blake Lucchesi, Djun Kim, David Lesieur, Earnest Berry, and Doug Green and Yours Truly

I'm almost starting to take for granted the fact that Drupal will be sending the University of Minnesota Libraries a handful of authentically brilliant and unusually good-natured developers to visit, this being our second high-profile Drupal project. But yeah, we're "Live from Minnesota" as Doug Green points out (I like the meme, no doubt).

The Work

Like a lot of efforts underway in Drupal, search is moving towards a more extensible model by way of a core API. This shift has me excited for a number of reasons, but mostly due to my interest in the Drupal/Solr combo within the context of academic research environments. By decoupling the search and indexing processes from a specific database implementation, we open the door to a whole new ecosystem of search utilities.

If all goes well with the new search API, I believe it won't be long before we start seeing c|net style faceted drill-down e-commerce sites in Drupal. We already have a well laid-out Apache Solr Module, but a search API will provide more granular control, better integration with core search and ease the long-term burden of maintenance on Robert by moving portions of his code into core.

Personally, this is a chance for me to really think deeply about how to best harness the wonderful work of Doug Green and others within the context of core search features, especially as regards Solr integration. On an even more practical level, this is my own little unit testing bootcamp as I help to write tests for the work being done here this weekend.

Off to bed and on to day two.

Drupal, Solr, GeoNetwork and GeoServer at DLF

Presented at the Digital Library Federation Spring Forum 2008

March 04, 2008

Drupalcon 2008: Dev, Stage, Build!

This morning I attended a session titled Best practices in development environments, staging, build management, and production environments. In terms of practical importance, this may have been the most valuable session for me. Here are a few highlights of the panel discussion (by topic).

Development, Staging and Production Environments - Source Code Control

Most large-scale shops use quite an array of systems utilities to manage (a) their Drupal/PHP code, (b) databases and (c) uploaded/user-contributed files. All shops represented on the panel made use of Subversion for source control.

One panel member (Neil Giarratana of Lucidus) actually lives down the street from a lead engineer at Subversion and has had dinner with him a couple of times to chat about potential "best practices" for a Drupal shop. The result of these conversations was a pretty straight-forward Subversion folder setup: httpdocs, docs (project documentation) and dbdumps (MySQL db dumps) - all three of these folders being replicated on a per-project basis. What is interesting about this practice (to me) is that they avoid using the conventional the conventional branch, trunk, tree setup. The Subversion developer made the case that these conventions are used primarily when you need to collaborate with a large community of people and that they don't make a heck of a lot of sense for internal projects. Neil's gang push out all updates via checkout to there prod, staging and dev environments.

Another Drupal shop, Advomatic (a very, very compitent bunch) check out Drupal core from CVS and then check that into subversion. They then have separate "sites' folders for each project. In Drupal, virtually 100% of unique coding happens in the sites folder, so this makes a lot of sense to me.

Automated Deployments

The OSU Open Source Labs (where Drupal.org is hosted), have adopted the Cfengine configuration engine to automate their deployment process. Cfengine basically provides systems administrators with an Object-Oriented language that allows the to set constraints on pretty much any kind of server configuration. These can be pushed out to multiple machines and can also do things like allow for single command deployment of complex server environments. It sounds really great, but apparently Cfengine is a major pain to get up and running and has a fairly high learning curve. Narayan Newton (OSUOSL) said "We would not be able to administer our systems with our staffing numbers without Cfengine."

Code vs Content Migrations

Content Migration: basically, it's ugly (for now). There have been various efforts to clean this process up in Drupal, but it's still not quite there yet. The key is to backup and have a very clearly articulated rollback policy/practice. If Dries has his way, Drupal 7 will be RDF compliant and this issue will more or less go away. Not only will you be able to easily migrate content, external content will (theoretically) be easy to feature in real-time in any Drupal installation.

Most shops either use rsync and/or straight-up tarbals to deploy code from staging to production. Some also deploy directly from svn checkouts.

QA Best Practices

Neil Giarratana started with an interesting point: automated load testing is of limited interest to his shop. They are pretty familiar with Drupal performance issues, for the most part. Lucidus has a full time QA person that does a great deal of manual functional testing (clicking through the UI), and that is their chief method of finding issues. Neil said the most important thing that they have learned is to have clear and standardized communication processes set up for their customers: indicate (a) when upgrades will happen, (b) when they are actually happening, (c) your fallback plan (in detail), (d) when you are done with the upgrade...for example. He mentioned a Cisco paper on this topic (no reference yet...I need to Call the Science and Engineering Library and have them help me dig it up!).

Other shops do take advantage of automated tools such as Watir + firewatir (Ruby-based) and Firewater for functional testing. The good news, IMHO, is that unit tests AND functional tests will become standard practice in Drupal core; Dries proposed this change in his keynote yesterday. This announcement brought cheers from some core developers, particularly as Dries indicated that unit testing would allow for a much more compressed code freeze period, which would leave more time for active development, new features, etc.

Eliminating (uh, Reducing) Unscheduled Down-Time

One of the more interesting comments concerned *testing* backups. Advomatic apparently had, at one point, made backups of a site that had an exceedingly large sessions table (implementation issue). It was in the millions of records and caused the recovery process to go quite slowly when they did need to recover from a system failure. If they had tested their backups occasionally, they would have caught this problem and corrected it ahead of time.

At least a couple of the shops on the panel make use of systems monitoring applications like Nagios and Cacti. These sorts of tools can provide 'forensic evidence' for systems failures, for example.

The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been reviewed or approved by the University of Minnesota.