« Regex to pull GET POST commands from apache logs | Main | some ips that wormly uses »

Translating apache log format to dspace

Introduction

There are basically two types of log files that must be handled: views and downloads.

Downloads

Comparison of apache and dspace logs

From apache logs:
69.109.228.170 - - [21/Sep/2008:23:59:27 -0500] "GET /bitstream/31045/1/26020387.pdf HTTP/1.1" 200 1225483 "http://scholar.google.com/scholar?hl=en&lr=&q=accept+Genetically+modified+organism&btnG=Search" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1"

When the guts of this request is sent to agecon by entering the line below into a browser:

http://ageconsearch.umn.edu/bitstream/31045/1/26020387.pdf

Catalina records the following log entry:

2008-09-30 16:12:08,153 INFO org.dspace.app.webui.servlet.BitstreamServlet @ anonymous:session_id=F82A25EDFCF0C73AE8C19291D3C3985A:ip_addr=128.101.29.84:view_bitstream:bitstream_id=3976


Required conversions

Notice that in the two log entries above, apache records a handle of 31045, while the dspace log gives a bitstream_id of 3976. To convert from apache to dspace, we must map the handle to the bitstream_id. The sql command below will do this:

select bundle2bitstream.bitstream_id from item2bundle, handle, bundle2bitstream where (handle.handle=31045 and handle.resource_id = item2bundle.item_id and bundle2bitstream.bundle_id=item2bundle.bundle_id); bitstream_id
bundle_id
-----------
3976
Note the bundle_id and the bitstream_id are the same.

Views

Comparison of apache and dspace logs

For views we have an apache log of the form:

203.20.101.203 - - [21/Sep/2008:23:32:17 -0500] "GET /handle/22682 HTTP/1.1" 503 410 "http://scholar.google.com.au/scholar?q=%22some+implications+of+the+growth+of+the+mineral+sector%22&hl=en&um=1&ie=UTF-8&oi=scholart" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; InfoPath.1)"

When the request below is put into a browser:
http://ageconsearch.umn.edu/handle/22682

We get the catalina log:

2008-09-30 16:26:30,456 INFO org.dspace.app.webui.servlet.DSpaceServlet @ anonymous:session_id=F82A25EDFCF0C73AE8C19291D3C3985A:ip_addr=128.101.29.84:view_item:handle=22682

Required conversions

To generate the dspace log format, we need to determine what was being viewed. In the case above it was an view_item.
From the handle we can find the resource_type_id, which allows us to generate terms like, view_item.
select * from handle where handle=22682; handle_id | handle | resource_type_id | resource_id -----------+--------+------------------+------------- 13089 | 22682 | 2 | 12346
The table below provides a conversion between the resource_type_id and they actual type.
resource_type resource_type_id
BITSTREAM 0
BUNDLE 1
ITEM 2
COLLECTION 3
COMMUNITY 4
SITE 5
GROUP 6
EPERSON 7

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)