« how to get rid of '\n' in an sql call | Main | Modules being used in Chaucer version of Drupal »

Instructions to allow harvesting of log files by URCHIN

Instructions to allow harvesting of log files by URCHIN
OIT referenced URCHIN page General plan: A log file called daily.log will be produced each day that is in a location where it be harvested through http by the OIT URCHIN service.
Steps:
I) Add lines to the MEDIA Archive httpd.conf and ssl.conf files that will create log files in the correct format. For these lines to run a UNIX utility, cronolog, may have to be installed.
II) Write a cron job that creates the daily.log file and transferres that file to a directory in the MediaStats domain.
III) Add an .htaccess file to the directory mentioned ion step II. This .htaccess file will allow the OIT Urchin appliance access to the daily.log file.
I)
1. Lines to add to Media Archive’s http.conf file:
########### Lines for Media Archive httpd.conf file ####################
CustomLog "|/usr/sbin/cronolog /etc/httpd/logs/www/%Y/%m/%d/access.log" "%h %l %u %t \"%r\" %>s %b"
ErrorLog "|/usr/sbin/cronolog /etc/httpd/logs/www/%Y/%m/%d/errors.log"
########### End Lines for Media Archive httpd.conf file ######### #######
2. ssl.conf
########### Lines for Media Archive ssl.confs file ####################
CustomLog "|/usr/sbin/cronolog /etc/httpd/logs/www/%Y/%m/%d/ssl_access.log" "%h %l %u %t \"%r\" %>s %b"
ErrorLog "|/usr/sbin/cronolog /etc/httpd/logs/www/%Y/%m/%d/ssl_errors.log"
########### End of lines for Media Archive ssl.confs file ####################
Comments:
1) If /usr/sbin/cronolog does not exist on your box, then the app must be installed see cronolog.org . Most likely this will involve someone at OIT.
2) The directory “/etc/httpd/logs/” is the root for the log files. If you change this it will effect the cron job in the next section.
3) It is not required that the error logs be created since these logs will not be transferred to URCHIN. They are merely created for a comparison to the *access* files.
4) URCHIN wants Common Log Format, for more info on this see: Common Log Format and Customlog


II) Write a cron job to create the daily.log file and move it to a location (MediaStats domain that you created) where OIT can access it.
########### Cron to create daily.log  ################
#!/bin/bash
# Path to directory that URCHIN will harvest the files from.
URL_PATH=" /var/www/html/chaucer/urchin_daily "
# Location where /usr/sbin/cronolog will create the daily log files
LOG_PATH="/etc/httpd/logs/www/"
YESTERDAY=$(date --date="yesterday" +"%Y/%m/%d")

echo "Removing existing logs from $URL_PATH"
[ -f $URL_PATH/daily.log ] && rm -f $URL_PATH/daily*.log*

if [ -d $LOG_PATH/$YESTERDAY ]

then
	echo "Looking for access logs in $LOG_PATH/$YESTERDAY/"
	for file in $LOG_PATH/$YESTERDAY/*access.log*
	do
		if [ -s $file ]
		then
			echo "cat $file to the Urchin log "
			cat $file >> $URL_PATH/daily.log
		fi
	done
fi
# Change perms so URCHIN app can get daily.log
[ -f $URL_PATH/daily.log ] && chmod 664 $URL_PATH/daily.log

########### end of Cron to create daily.log  ################
Comments:

1) If you made no changes in step I, then you will not have to change LOG_PATH. However if you made any changes to the path in step I, you must change LOG_PATH.
2) URL_PATH must be changed. It points to a directory that can be reached by the MediaStats Domain that you created. It is possible that this is the only variable that you will need to change.
3) This cron job should be run once a day, sometime in the early morning.
4) The user that runs the cron must have read privileges in LOG_PATH and write in URL_PATH.
III. Put an .htacess file in the directory defined by: URL_PATH, so the OIT URCHIN app can harvest daily.log files.

########### .htaccess file in the directory URL_PATH ################
RewriteEngine On
RewriteCond %{SERVER_PORT} 80
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}


order deny,allow
deny from all
allow from 128.101.65.14
allow from 128.101.29.84
allow from urchin.umn.edu
allow from mousetrap.software.umn.edu 

######### end of  .htaccess file in the directory URL_PATH ###############
Comments:
1) This will allow just OIT in
2) I will need to know what the URL is the maps to the directory: URL_PATH.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)