myU OneStop


Office of Planning & Analysis

OIR home page.

Tracking File Downloads w/Google Analytics

| No Comments

The November 19th, 2010 Web Standards Meeting featured a presentation on utilizing Analytics, specifically Google Analytics, to enhance your user experience and website usage. This presentation was provided by Nuria Sheehan, CEHD, and Liz Turchin, CCE. Nuria had specifically mentioned something about capturing click events to track PDF file downloads. This is a segment of usage data that we had previously missed, as I have not used Apache log analysis, so I decided to investigate further...

Some Google searching revealed a snippet of JavaScript which would in effect, simulate a standard click via an onclick action. The snippet follows:

  onClick="javascript: pageTracker._trackPageview('/downloads/map');

There is an obvious limitation to implementing this solution: you have to manually add an onClick attribute to your links. More details to this regard on Google Help

I wanted a solution that would automatically register and track these clicks, without me having to maintain all of the links to these PDFs, Excel documents, etc. Luckily for me, I utilize the Prototype JavaScript framework, which provides many shortcuts to make something like this quite easy.

The end product needed to search the DOM to identify links to PDF and Excel documents and register a click event listener, which would then use the built-in callback function from Google, shown above. Code follows:

This code should be in an application wide JavaScript file, or in the page's HTML header, after the Prototype file is included.

// Finds all of the PDF/XLS document links on a page, and enables
// tracking for Google analytics. Known issue, will fail to match
// PDF links utilizing hash to jump to page (eg file.pdf#page=35).
// Also, not compatible with IE6 because of CSS 3.
// More info on the page tracking API:
// http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=55529
function trackFileClicks(){
  // Modify file_extensions to include any additional types you 
  // would like to track (eg .xlsx)
  var file_exentions = ['.pdf', '.xls', '.doc'];
  file_exentions.each(function(file_extension){
    // CSS 3 selector, a[href$='.pdf'], $= implies file ends with '.pdf'
    var css_selector = "a[href$='" + file_extension + "']";
    // select all of the links w/the current file extension
    $$(css_selector).each(function(link){
      // Add a click listener to the link, which will send an 
      // Asynch request to google
      link.observe('click', function(e){
        var pageTracker = _gat._getTracker('UA-XXXXXX'); //Change to your ID
        pageTracker._trackPageview(cleanHref(link.href, file_extension));
      });
    });
  });
}

// Takes the full URL and reduces to only path, and file name sans extension.
function cleanHref(href, extension){
  // Found RE on StackOverflow:
  // http://stackoverflow.com/questions/27745/getting-parts-of-a-url-regex)
  var uri_regex = /^((http[s]?|ftp):\/)?\/?([^:\/\s]+)(:([^\/]*))?((\/\w+)*\/)
  ([\w\-\.]+[^#?\s]+)(\?([^#]*))?(#(.*))?$/;
  var clean_href = href.toString();
  //Regex returns array with indexes as follows:
  var SCHEMA = 2, DOMAIN = 3, PORT = 5, PATH = 6,
  FILE = 8, QUERYSTRING = 9, HASH = 12;
  var match_container = clean_href.match(uri_regex)

  return match_container[PATH] + match_container[FILE].replace(extension, '');
}

This code must be on the Header of your page and wrapped with script tags

document.observe("dom:loaded", function() {
  trackFileClicks();
}

If you are using another library like jQuery, it should be quite easy to port this code over. It is important to note that since this utilizes CSS3, a browser which does not support this, eg IE6, will fail. The code could be extended, but you would lose some speed, as you would be required to manually check all of the links via Regex.

UPDATE: var pageTracker = gat.getTracker('UA-XXXXXX'); //Change to your ID This line must be included, intial blog post was missing this, and therefore if the code was used, clicks would not be tracked.

This has been implemented on our site, and has been proven successful! The fix above was required to make it work.

Leave a comment