[Archivesspace_Users_Group] how to find if a certain page was accessed in the PUI

Custer, Mark mark.custer at yale.edu
Wed Mar 25 10:26:28 EDT 2020


The server logs will give you one perspective, but another perspective that you can possibly get from a page-tagging approach like Google Analytics would be all of the other stuff that you an configure (e.g. the ability to add tags with Google Tag Manager), which I only bring up since it sounds like you also have Google Analytics installed on your site.

When looking at site traffic of an archival discovery system, I would want to report on the overall use of an entire finding aid, and with the ASpace PUI that would include aggregating all of the visits to the "resource", "archival_object", and "top_container" URLs for the same finding aid.  e.g.


All three of those URLs represent different views of http://test.archivesspace.org/repositories/2/resources/34, but those three URLs above don't tell you that by themselves, which makes setting things up especially important.  That's one place where Google Tag Manger could come in handy, I think, since you could add a custom dimensions for collection ID, finding aid author, etc.  I say "I think" since I haven't used Google Analytics as a site manager in a long time, but I would love to have that type of data.

Blake, regarding Matamo, I've often thought that it might be really beneficial if an organization provided "Analytics as a Service" alongside site hosting (e.g. using the Matomo On-premise option with ASpace) ??.


From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Blake Carver <blake.carver at lyrasis.org>
Sent: Wednesday, March 25, 2020 9:10 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] how to find if a certain page was accessed in the PUI

You (or your friendly neighborhood sysadmin) can change up the Apache logs to add/remove quite a few things.  There's all sorts of good stuff you can get in there:
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Steele, Henry <Henry.Steele at tufts.edu>
Sent: Wednesday, March 25, 2020 9:06 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] how to find if a certain page was accessed in the PUI

Thank you, Blake.  This is all really helpful.  I will see if I can use some of these strategies to look through our logs.

It may not be there.  Our pui log level is “fatal” and I’m not sure the Apache (/var/log/httpd) contain this kind of information.  But it’s certainly good to know how to look

Henry Steele

Systems Librarian

Tufts University Library Technology Services


From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> On Behalf Of Blake Carver
Sent: Wednesday, March 25, 2020 9:01 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] how to find if a certain page was accessed in the PUI

I know quite a few people use Google Analytics, which is not something I find useful at all, but it's used quite often. Check matamo for an open source analytics product. There are many others. I know matamo gives you the ability to customize things, and I bet it could be quite useful, though I've not touched it in many years.

I think your best bet is to get to know your Apache logs. You should be able to get something useful out of there, but you'll need to learn what your logging there, and maybe change it  up.  Read up on Apache's "LogFormat" , it's pretty flexible and you can customize that on your server. You can also customize where log files end up for which domain name, so that might help as well.  If you're running the PUI and STAFF sides on different URLs, or prefixes, that will help set them apart for logging. This is all one of those "It Depends" kinds of things.  Using grep/awk/sed etc... will let you pull out different things from the logs. Try tailing the log as you look at different things on the site and see how those get logged, then work up some simple greps to pull out just what you need every day.  This is a simple one I use to see the busiest sites on a server:

cat /var/log/apache2/other_vhosts_access.log.1 |  awk {'print $1'} | sort |uniq -c |sort -nr | head -20

(If you're looking at that and thinking "You don't need cat in there, dummy" I know I know, old habits die hard)

You could do the same kind of grep work on the archivesspace.out log file and get something out of it. You might need to experiment with loglevel on that to see what you can get. DEBUG is probably way too much.

Here's some real nginx logs... these are based on real logs with some details changed to protect the innocent.

Here's one you might see quite often, if someone is logged into the staff side you'll see this POST to check their session: example.edu - [25/Mar/2020:12:32:22 +0000] "POST /update_monitor/poll HTTP/1.1" 200 4751 "https://example.edu/resources/134/edit<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fexample.edu%2Fresources%2F134%2Fedit&data=02%7C01%7Cmark.custer%40yale.edu%7C78c45215e6704340707908d7d0bdf293%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637207386549161930&sdata=2Eji8WQW742DE1QRdrzmLCkQ%2BtNXzymZu9vEPC3wHa4%3D&reserved=0>" "lock_version=12&uri=%2Frepositories%2F5%2Fresources%2F134" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:71.0) Gecko/20100101 Firefox/71.0" "-"

Here's another one, someone is looked at a resource on the staff side: example.edu - [25/Mar/2020:12:31:03 +0000] "GET /resources/2774?inline=true&undefined_id=%2Frepositories%2F3%2Fresources%2F2774 HTTP/1.1" 200 9839 "https://example.edu/resources/2774<https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fexample.edu%2Fresources%2F2774&data=02%7C01%7Cmark.custer%40yale.edu%7C78c45215e6704340707908d7d0bdf293%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637207386549171927&sdata=Ugpn7089njik19GBuKPStFoahF5Q1kmRsJKVJFai8sU%3D&reserved=0>" "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36" "-"

And here's a bot crawling the public side. example.edu - [25/Mar/2020:12:29:58 +0000] "GET /repositories/2/archival_objects/97930 HTTP/1.1" 200 21473 "-" "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot<https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.opensiteexplorer.org%2Fdotbot&data=02%7C01%7Cmark.custer%40yale.edu%7C78c45215e6704340707908d7d0bdf293%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637207386549171927&sdata=z84CL3eKCnhPwGMsWYhRN65e2L1UTTDGZEcs8ehFLqQ%3D&reserved=0>, help at moz.com<mailto:help at moz.com>)" "-"

Depending on how you configure your Apache/nginx/whatever logs, those log lines will look different and you can log a bunch of different things.

On the ArchivesSpace side (archivesspace/logs/archivesspace.out) the logs can look different depending on your log level. Here's one set to debug showing the indexer doing some work:

INFO: [collection1] webapp= path=/update params={} {add=[/repositories/2/archival_objects/33921#pui, /repositories/2/archival_objects/33922#pui, /repositories/2/archival_objects/33923#pui, /repositories/2/archival_objects/33924#pui, /repositories/2/archival_objects/33925#pui, /repositories/2/archival_objects/33926#pui, /repositories/2/archival_objects/33927#pui, /repositories/2/archival_objects/33928#pui, /repositories/2/archival_objects/33929#pui, /repositories/2/archival_objects/33930#pui, ... (25 adds)]} 0 6

Here's one line from me viewing a resource on the staff side, as you can see it'll be a bit more challenging to get useful stuff out of this log, but it's in there:

[2020-03-25T08:45:29-04:00] INFO: [collection1] webapp= path=/select params={facet.field=assessment_record_types&facet.field=assessment_surveyors&facet.field=assessment_review_required&facet.field=assessment_reviewers&facet.field=assessment_completed&facet.field=assessment_inactive&facet.field=assessment_survey_year&facet.field=assessment_sensitive_material&csv.escape=\&start=0&q.op=AND&fq=repository:"/repositories/3"+OR+repository:global&fq=types:("assessment")&fq=(-types:("pui_only")+AND+(assessment_record_uris:("\/repositories\/3\/resources\/406")))&fq=-exclude_by_default:true&sort=&rows=30&bq=primary_type:resource^100&q=*:*&facet.limit=20&defType=edismax&qf=four_part_id^3+title^2+finding_aid_filing_title^2+fullrecord&pf=four_part_id^4&csv.header=true&csv.encapsulator="&facet.mincount=0&wt=json&facet=true} hits=0 status=0 QTime=61


From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Steele, Henry <Henry.Steele at tufts.edu<mailto:Henry.Steele at tufts.edu>>
Sent: Wednesday, March 25, 2020 7:54 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] how to find if a certain page was accessed in the PUI

Good morning,

We recently made our PUI open the public and we are trying to find out about usage, particularly of a certain page within our repository.  I’m trying to figure out if there’s any way to see this in the logs.

I’ve looked in the application log archivesspace.out, but I’m not sure what I’m seeing here.  I see records being accessed, with a response of 200, but I don’t know if this is the staff interface, the PUI, or if it’s some indexing activity.  Is there a way in the application log to see if a certain page has been accessed in the PUI?  We have our log level set to “fatal” for the PUI, and the “pui_log” is default.  I know should mean the log only reports on problematic events, but since I see a lot of activity in the log, I’m wondering if this setting doesn’t actually have effect.

Alternately, does anyone know if there might be other server logs that would be of use?   I’m looking in the Apache logs at /var/log/httpd but I’m not sure which of this logs would contain such information if any.

Any information you had would be of great help.  Thanks

If this isn’t

Henry Steele

Systems Librarian

Tufts University Library Technology Services


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20200325/9f5b4855/attachment.html>

More information about the Archivesspace_Users_Group mailing list