[Archivesspace_Users_Group] External Solr - Memory Allocation?
Andrew Morrison
andrew.morrison at bodleian.ox.ac.uk
Mon Jan 30 04:46:06 EST 2023
Truncating the "deleted_records" table will prevent the OAI-PMH service
from being able to send out deletion notifications. Maybe it is worth
trying on a testing system, but probably not a good idea on a production
system.
Also note that is this...
> Deleted 186992 documents
... is not logging the deletion of deleted records. It is logging the
deletion of Solr documents for unpublished records. But few, if any,
exist to be deleted. ArchivesSpace sent 186992 IDs to Solr to delete,
just in case any of them were unpublished immediately before this index
run. Solr returned a 200 OK response, even if none were found, so
ArchivesSpace reports them as all deleted.
The other logged deletions, before that, are likewise not "real" 99.9%
of the time. These batches...
> Deleted 100 documents
...are when ArchivesSpace tells Solr to delete the tree nodes for all
archival objects without children, just in case any of them had children
before this index run. Only the PUIIndexer does this, which is part of
why it is slower (it is also allocated fewer threads in default config.)
Again, 99.99% of the time there's nothing for Solr to delete, but it has
to search its indexes for them anyway.
And these...
> Deleted 25 documents
...are the deletion of URIs in the deleted_records table. But most of
those were deleted long ago by previous index runs. Again, it is just in
case any new ones were recently deleted (and even those were probably
deleted by the RealtimeIndexer.)
This "belt and braces" approach prevents a few stray records remaining
in the PUI when they've been deleted or unpublished, but it seems to be
the cause of the longest wait times for commits when re-indexing large
repositories. Maybe something has changed in newer versions of Solr to
make this process slower, possibly specifically for deletions?
Andrew.
On 27/01/2023 14:01, Blake Carver wrote:
> > I'm running default config values for the AS log levels so they are
> all set to 'debug'.
>
> I'm seeing "INFO" and not "DEBUG" there.
>
> > Deleted 186992 documents
>
> How much is in the deleted_records table? Try truncating that.
> ArchivesSpace is going and deleting anything in that table.
>
> > So I'm falling back to this just being super slow for some reason.
>
> Could be some complex records, could be there's way too much in the
> deleted table.
>
> > but I'm not sure why the PUI indexer would become so much slower (21
> hours)
>
> Yep, sounds about right. The PUI is slow.
>
> > We do have some collections that are quite large (10s of thousands
> of AOs), so maybe that's part of the issue.
>
> No doubt that's slowing it down too.
>
>
> ------------------------------------------------------------------------
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of
> Joshua D. Shaw <Joshua.D.Shaw at dartmouth.edu>
> *Sent:* Thursday, January 26, 2023 6:02 PM
> *To:* Archivesspace Users Group
> <archivesspace_users_group at lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] External Solr - Memory
> Allocation?
> Thanks, Blake!
>
> I'm running default config values for the AS log levels so they are
> all set to 'debug'. I took a closer look, and the timeout message
> happens exactly after the timeout amount I set (as you'd expect).
> Interestingly, Solr is in the middle of deleting documents when it
> goes silent
>
> I, [2023-01-26T09:18:40.357101 #78764] INFO -- : Thread-3384: Deleted
> 100 documents: #<Net::HTTPOK:0x72b3d9e>
>
> .... 40 minutes pass with all the other AS log chatter ...
>
> E, [2023-01-26T09:58:40.400971 #78764] ERROR -- : Thread-3384:
> SolrIndexerError when deleting records: Timeout error with POST {....}
> I, [2023-01-26T09:58:40.410522 #78764] INFO -- : Thread-3384: Deleted
> 100 documents: #<Net::HTTPOK:0x4ab44e31>
>
> This continuing delete phase goes on for a bit until it stops logging
> batch deletes.
>
> I, [2023-01-26T09:59:11.734200 #78764] INFO -- : Thread-3384: Deleted
> 9 documents: #<Net::HTTPOK:0x1be6c3e9>
>
> .... 40 minutes pass with all the other AS log chatter ... And then
> the commit error pops up
>
> E, [2023-01-26T10:39:11.746166 #78764] ERROR -- : Thread-3384:
> SolrIndexerError when committing:
> Timeout error with POST {"commit":{"softCommit":false}}.
>
> Then after some more time
>
> I, [2023-01-26T11:06:35.678926 #78764] INFO -- : Thread-3384: Deleted
> 186992 documents: #<Net::HTTPOK:0x7e298af9>
>
> .... This all seems to indicate to me that the commit phase is taking
> an inordinate amount of time (almost 2 hours - maybe that's what I
> need to set the timeout to?). After that, the indexer starts the 2nd repo
>
> I, [2023-01-26T11:06:35.765797 #78764] INFO -- : Thread-3384: PUI
> Indexer [2023-01-26 11:06:35 -0500] Indexed 2 additional PUI records
> in repository Sherman
>
> .... The indexer waits for a looong time with no timeout and no
> messaging - even though this is a tiny repo - and then starts the 3rd repo
>
> I, [2023-01-26T11:31:32.795602 #78764] INFO -- : Thread-3384: PUI
> Indexer [2023-01-26 11:31:32 -0500] Indexed 188 additional PUI records
> in repository Rauner-XO
>
> And then the indexer starts the 4th repo soon after and seems to go on
> to complete normally
>
> I, [2023-01-26T11:31:33.369369 #78764] INFO -- : Thread-3384: PUI
> Indexer [2023-01-26 11:31:33 -0500] ~~~ Indexed 25 of 74785
> archival_object records in repository thedartmouth
>
> The Solr logs indicate that Solr is working this entire time doing
> adds and deletes. For example in one of the quiet phases:
>
> 2023-01-26 10:23:35.928 INFO (qtp2101153819-523) [ x:archivesspace]
> o.a.s.u.p.LogUpdateProcessorFactory [archivesspace] webapp=/solr
> path=/update params={}{add=...
> 2023-01-26 10:23:38.195 INFO (qtp2101153819-468) [ x:archivesspace]
> o.a.s.u.p.LogUpdateProcessorFactory [archivesspace] webapp=/solr
> path=/update params={}{deleteByQuery=...
>
> So I'm falling back to this just being *super* slow for some reason.
> I do have some custom indexer addons, but I'm not sure why the PUI
> indexer would become *so* much slower (21 hours) when the Staff
> indexer completes in a normal amount of time (a little under 6 hours).
> For previous versions this hasn't been quite that different (6hrs vs
> about 13hrs). We do have some collections that are quite large (10s of
> thousands of AOs), so maybe that's part of the issue.
>
> I haven't checked to see if the PUI indexer is gathering that much
> more data (and traversing the tree more times - maybe?) than it was in
> 3.1.1, but that's on my 'to check' list.
>
> Joshua
> ------------------------------------------------------------------------
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of
> Blake Carver <blake.carver at lyrasis.org>
> *Sent:* Thursday, January 26, 2023 4:12 PM
> *To:* Archivesspace Users Group
> <archivesspace_users_group at lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] External Solr - Memory
> Allocation?
> That's... interesting.
>
> That RAM allocation seems fine. That Solr timeout is way higher than I
> would think is needed.
>
> Maybe set the loglevel to debug and see if it spits out something more
> useful? Maybe you'll be able to see what it's up to during that
> looooong time. I like your theory on that.
>
>
> ------------------------------------------------------------------------
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of
> Joshua D. Shaw <Joshua.D.Shaw at dartmouth.edu>
> *Sent:* Thursday, January 26, 2023 3:38 PM
> *To:* Archivesspace Users Group
> <archivesspace_users_group at lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] External Solr - Memory
> Allocation?
> Following up on this. And looking for some advice!
>
> Even with the Solr timeout set to 40 minutes, I'm seeing some random
> Solr Timeout errors, though these do *NOT* cause the indexer to
> restart. In the latest test run I see one Solr Timeout for delete and
> one for commit - both following the PUI indexer run for AOs for the
> first and largest repo (~630k AOs).
>
> The indexer throws the delete timeout error, waits for a loooong time
> with seemingly no activity, throws the commit timeout error, waits
> again, and then picks back up as if nothing had gone wrong and
> continues with the initial index run. All of the index data looks
> correct (ie correct number of objects in both the staff and PUI).
>
> My theory is that the Solr update phase really is taking a super
> loooong time, but that the data has all been sent to Solr so the
> timeouts are really just ArchivesSpace waiting for Solr in between
> indexing one object type and the next and no index data is lost.
>
> There are no corresponding log entries in the Solr logs that I can find.
>
> I'm running solr 8.11.6 with 4GB and AS 3.3.1 with 4GB. Both bare
> metal on my laptop, so no container issues that might be at play. Solr
> memory use peaks at around 3.5GB.
>
> I've kept the stock thread and records per thread settings and just
> upped the timeout (to 2400). I guess the next step is to set the
> timeout even higher - maybe an hour (3600)? I don't see a reason to
> run a lower thread or record count, but can certainly try that as
> well, though I'm not looking forward to the time it will take (the
> current run takes 21 hours as it is - up from about 15 for 3.1.1)
>
> Any advice appreciated!
>
> Thanks!
> Joshua
>
> ------------------------------------------------------------------------
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of
> Joshua D. Shaw <Joshua.D.Shaw at dartmouth.edu>
> *Sent:* Tuesday, January 24, 2023 6:56 AM
> *To:* Archivesspace Users Group
> <archivesspace_users_group at lyralists.lyrasis.org>
> *Subject:* [Archivesspace_Users_Group] External Solr - Memory Allocation?
> Hey all
>
> We're about to jump to v3.3.1 and I'm wondering if anyone has any
> suggestions for memory allocation for Solr?
>
> Currently we're running 6GB for the entire suite in v3.1.1 and are
> looking to keep the same overall memory footprint. Wondering if
> something like a 75/25 split (ie 4GB for AS and 2GB for Solr) would be
> a reasonable allocation? Or are people finding that Solr is more
> demanding?
>
> Thanks!
> Joshua
>
> ___________________
> Joshua Shaw (he, him)
> Library Web & Application Developer
> Digital Library Technologies Group
> Dartmouth College
> 603.646.0405
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20230130/776447e4/attachment.html>
More information about the Archivesspace_Users_Group
mailing list