[Archivesspace_Users_Group] OAI Harvesting issues with 3.2.0

Andrew Morrison andrew.morrison at bodleian.ox.ac.uk
Wed Mar 30 05:32:55 EDT 2022


If you post the error messages in your log files from around the time 
when you get an "Internal Server Error" it would help diagnose the 
problem. But here are some observations that might be relevant.

Exporting EAD, whether from the staff interface or via the OAI-PMH 
service, uses both MySQL and Solr. The former to retrieve the resource 
and the IDs of its archival objects. The latter to retrieve the archival 
objects, although it checks whether the version in Solr is the same as 
in MySQL, and fetches from the database if not. So your problems could 
be with either, or both. Also, if Solr and MySQL are out-of-sync on your 
3.2 system, but in-sync on the 2.8.1 one, that could explain some of the 
difference in response time. You could try a soft re-index and see if 
that has any effect:

https://archivesspace.github.io/tech-docs/administration/indexes.html

Wherever it gets the records from, they're retrieved in batches of 20 at 
a time. Those are then converted from JSON to EAD. That is the 
CPU-intensive part, and single-threaded, so typically takes up most of 
the overall runtime. But if there's something about your infrastructure 
which makes retrieval slow, you could reduce total waiting time by 
increasing the batch size, which is possible by putting the following in 
/backend/plugin_init.rb/ in a local plugin and restarting ArchivesSpace:

module ASpaceExport
   module LazyChildEnumerations
     PREFETCH_SIZE = 50
   end
end

Your mileage may vary. Bigger batches will increase memory usage. And it 
might make a big difference for some collections, but none at all in 
others, because the exporter only ever requests siblings. Therefore a 
collection with a deeply-nested structure can require hundreds more 
batches than one which is shallow, despite having the same total number 
of archival objects in both, regardless of how high you set the prefetch 
size.

The OAI-PMH service has an additional issue that it cannot stream its 
output. See here:

https://archivesspace.atlassian.net/browse/ANW-1270

Andrew.


On 29/03/2022 17:37, Andy Boze wrote:
> Just to elaborate a bit on what Tom wrote, we are harvesting EAD 
> records. I've done a bit of comparison, making OAI requests for the 
> same records on 2.8.1 and 3.2. A record on 2.8.1 that took about 10 
> seconds , took about 3 minutes on 3.2. A record that took about 3 
> minutes to respond on 2.8.1 timed out on 3.2 after 20 minutes with an 
> "Internal Server Error" message.
>
> Andy
>
> On 3/29/2022 11:57 AM, Tom Hanstra wrote:
>> We have set up a test server running ArchivesSpace 3.2.0. As 
>> required, that means a separate Solr instance which I've installed on 
>> the same server.
>>
>> Most things have gone OK, but we are seeing some timeout issues with 
>> OAI harvesting tests. The harvest will address a few of the records 
>> but regularly receives "Internal Server Error" messages. What seems 
>> to be happening is that we are hitting certain records which time 
>> out. We've tried skipping over such records to see if it was just a 
>> bad record, but that will simply cause a failure a bit further down 
>> the line. Our time out is set for 20 minutes, which should be plenty 
>> of time. So these timeouts don't make much sense.
>>
>> These records are harvesting without similar issues on our 2.8.1 
>> instance, so I would not expect this to be a record issue directly. 
>> Could it be something about how we have set up Solr? I see no errors 
>> in any of our ArchivesSpace or Solr logs, so I'm not sure how to 
>> debug this. Any suggestions?
>>
>> Thanks,
>> Tom
>>
>> -- 
>> *Tom Hanstra*
>> /Sr. Systems Administrator/
>> hanstra at nd.edu <mailto:hanstra at nd.edu>
>>
>>
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20220330/32139f3e/attachment.html>


More information about the Archivesspace_Users_Group mailing list