[Archivesspace_Users_Group] Spreadsheet imports locking up database
Brian Hoffman
brian.hoffman at lyrasis.org
Wed Jun 9 16:41:36 EDT 2021
Hi Peter, James, and Nancy,
I posted a test release (for testing only) that may help with the issue Peter identified. If anyone wants to test and report back that would be great.
https://github.com/quoideneuf/archivesspace/releases/tag/issue-2275
Brian
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of James Bullen <james at hudmol.com>
Date: Sunday, June 6, 2021 at 5:26 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Spreadsheet imports locking up database
Hi Peter,
I’m not familiar with the spreadsheet importer, but it sounds like you’ve done the analysis to be in a position to fix it. :)
Here’s a commit for the “run tonight” feature implemented in a plugin:
https://gitlab.gaiaresources.com.au/qsa/as_runcorn/-/commit/8a2bd97
It was refined in a few subsequent commits (4369dd0, 800ac1c, 5f1207e).
Cheers,
James
On Jun 6, 2021, at 6:02 PM, Peter Heiner <ph448 at cam.ac.uk<mailto:ph448 at cam.ac.uk>> wrote:
James Bullen wrote on 2021-06-06 11:35:03:
Bulk update operations usually run in database transactions. This is great because it guarantees that the database will be in a consistent state when the job completes - if it succeeds then all of the updates are applied, if it fails then none of the updates are applied.
The price for this guarantee is that any affected records are locked until the job completes.
Thanks for your response. I generally agree with your points but have mine to make.
One is that the consistency unit here need not be the entire job, it is sufficient to isolate a single line, and I would normally expect the database to cope fine with such a workload.
Secondly, if I read my database output correctly, in this case the problem is not a wholly isolated bulk update but unnecessary table scans occurring for each line that make the database hold a lock for several seconds instead of milliseconds. I've yet to understand the purpose of the subnote_metadata table and why the delete is taking place, but we did notice that the query tried IDs that were not in the database.
In any case, it should be possible to unroll the subquery or provide optimiser hints for the bottleneck to go away.
We recently implemented the ability to submit import jobs to be run after hours for a client to work around this.
We were thinking of a similar workaround, do you have anything you could share with us?
Thanks,
p
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20210609/e9c97974/attachment.html>
More information about the Archivesspace_Users_Group
mailing list