Translating apache log format to dspace
Introduction
There are basically two types of log files that must be handled: views and downloads.Downloads
Comparison of apache and dspace logs
From apache logs:69.109.228.170 - - [21/Sep/2008:23:59:27 -0500] "GET /bitstream/31045/1/26020387.pdf HTTP/1.1" 200 1225483 "http://scholar.google.com/scholar?hl=en&lr=&q=accept+Genetically+modified+organism&btnG=Search" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-TW; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1"
When the guts of this request is sent to agecon by entering the line below into a browser:
http://ageconsearch.umn.edu/bitstream/31045/1/26020387.pdf
Catalina records the following log entry:
2008-09-30 16:12:08,153 INFO org.dspace.app.webui.servlet.BitstreamServlet @ anonymous:session_id=F82A25EDFCF0C73AE8C19291D3C3985A:ip_addr=128.101.29.84:view_bitstream:bitstream_id=3976
Required conversions
Notice that in the two log entries above, apache records a handle of 31045, while the dspace log gives a bitstream_id of 3976. To convert from apache to dspace, we must map the handle to the bitstream_id. The sql command below will do this:select bundle2bitstream.bitstream_id from item2bundle, handle, bundle2bitstream where (handle.handle=31045 and handle.resource_id = item2bundle.item_id and bundle2bitstream.bundle_id=item2bundle.bundle_id); bitstream_id
bundle_id
-----------
3976
Note the bundle_id and the bitstream_id are the same.
Views
Comparison of apache and dspace logs
For views we have an apache log of the form:203.20.101.203 - - [21/Sep/2008:23:32:17 -0500] "GET /handle/22682 HTTP/1.1" 503 410 "http://scholar.google.com.au/scholar?q=%22some+implications+of+the+growth+of+the+mineral+sector%22&hl=en&um=1&ie=UTF-8&oi=scholart" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; InfoPath.1)"
When the request below is put into a browser:
http://ageconsearch.umn.edu/handle/22682
We get the catalina log:
2008-09-30 16:26:30,456 INFO org.dspace.app.webui.servlet.DSpaceServlet @ anonymous:session_id=F82A25EDFCF0C73AE8C19291D3C3985A:ip_addr=128.101.29.84:view_item:handle=22682
Required conversions
To generate the dspace log format, we need to determine what was being viewed. In the case above it was an view_item.From the handle we can find the resource_type_id, which allows us to generate terms like, view_item.
The table below provides a conversion between the resource_type_id and they actual type.
| resource_type | resource_type_id |
| BITSTREAM | 0 |
| BUNDLE | 1 |
| ITEM | 2 |
| COLLECTION | 3 |
| COMMUNITY | 4 |
| SITE | 5 |
| GROUP | 6 |
| EPERSON | 7 |