[Archivesspace_Users_Group] OAI harvesting issue

Kevin W. Schlottmann kws2126 at columbia.edu
Wed Apr 22 15:06:39 EDT 2020


Dear AS List,

We rely on the OAI feed to pipe updated records to various places, on a
nightly basis.  We recently came across some odd behavior that we are
hoping list members might have some suggestions.

We have a few resource records that have been recently updated, show the
correct updated time in the staff GUI, and have the correct updated time
when the downloaded directly using the OAI getRecord command[1].

However, in our bulk OAI download of all records, using pyoaiharvester[2],
the record's datestamp is somehow stuck on an earlier date.

Even stranger, if we add the 'from' parameter to [2] manually with the
correct date value, we *get* the records, with the correct datestamp.

We are digging into this with help from Lyrasis, but we don't have an
answer yet.  My guess is an issue with the harvester, but it's not
immediately obvious what it would be.  Other avenues we're looking at
issues with the resumption token, or with the indexer (the latter often
being the cause of AS issues, anecdotally). Questions for the list:

1) Is there anything known in the OAI implementation that might cause this
off datestamp behavior?

2) Since this may be an issue with the harvester, does anyone have a
preferred OAI harvester that handles marcxml?

Best,

Kevin

[1] getRecord command; getting it as a single record has the right
datestamp:
https://
{oaiendpoint}?verb=GetRecord&identifier=oai:columbia//repositories/2/resources/6381&metadataPrefix=oai_marc

[2] Using the pyoaiharvester library (
https://github.com/vphill/pyoaiharvester).
python /.../as_reports/pyoaiharvester/pyoaiharvest.py -l  {oaiendpoint} -m
oai_marc -s collection -o /.../archivesspace/oai/20200419.asRaw.xml

-- 
Kevin Schlottmann
Head of Archives Processing
Rare Book & Manuscript Library
Butler Library, Room 801
Columbia University
535 W. 114th St., New York, NY  10027
(212) 854-8483
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20200422/0fab03f2/attachment.html>


More information about the Archivesspace_Users_Group mailing list