[Archivesspace_Users_Group] Implementing a standalone Solr instance

Andrew Morrison andrew.morrison at bodleian.ox.ac.uk
Mon Dec 6 05:04:16 EST 2021


Your issues with the indexer may be, in part, because Cambridge have 
over 30 repositories. The indexer runs twice (once staff, once PUI) per 
repository over the course of the indexing_frequency_seconds periods set 
in config.rb. The Bodleian has a similar number of records, but only one 
repository, so indexing is less of an issue.

Switching to external Solr here had an imperceptible effect on search 
times (and none above a certain number of simultaneous requests as other 
factors become the more significant bottleneck) and a marginal increase 
on the time for a full re-index.

Probably the best thing for the OP to do is to load-test both options 
and see what works for them, with their particular infrastructure, data, 
and configurations.

Andrew.


On 05/12/2021 06:58, Peter Heiner wrote:
> Hi Kyle,
>
> We at the Cambridge University Library have been running an external Solr
> instance since the early days of our deployment.
> We use Puppet for configuration management of our local VM stack and the
> module we use is https://forge.puppetlabs.com/modules/landcareresearch/solr.
>
> We currently store over 13K resources comprised of over 900K archival objects
> across 30 repositories, the number of attached notes is well above 2.2M.
> It's important to note that our initial archival object count was over 700K, I
> believe we were fairly large scale users even initially.
>
> Our production Solr currently has just under 3 million documents in the index.
> Availability and performance of Solr is, in our experience, the single most
> common cause of ArchivesSpace downtime, ahead of Java memory leakage, planned
> maintenance due to upgrades, and database or network failures.
>
> For our scale and use, especially while we were in the migration phase
> occasionally bringing in hundreds of thousands of archival objects, we have
> found that indexing threads could also interfere with interactive use of the
> site and after some unsuccessful attempts at fixing this by altering the
> indexer thread count we have split the indexer off to another VM. Our Ops team
> have recently identified and fixed issues with our VM infrastructure, so we're
> waiting for more monitoring data to confirm whether we still need this, but I
> imagine having Solr, AS, and the indexer on the same VM might hurt
> performance.
>
> Hope that helps,
> p
>
> Kyle Breneman wrote on 2021-12-03 16:50:29:
>> Here at the University of Baltimore, we are working to stand up our own external Solr installation in preparation for ArchivesSpace's move away from bundled Solr.  Campus IT is asking me whether or not we should install Solr on the same server we're using to host ArchviesSpace.  I see that ArchivesSpace officially has no opinion<https://archivesspace.org/archives/7137> on this matter (See under "Will you have strict requirements for how to deploy Solr?").
>>
>> Does anyone on this list have a recommendation about whether an external instance of Solr should be installed on the same server as ArchivesSpace, or on its own separate server?  I don't see that it makes much difference, but I am also not experienced in managing servers, or in administering Solr.
>>
>> Kyle Breneman
>> Integrated Digital Services Librarian
>> The University of Baltimore
>> kbreneman at ubalt.edu<mailto:kbreneman at ubalt.edu>
>> I believe in freedom of thought and
>> freedom of speech. Do you?
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


More information about the Archivesspace_Users_Group mailing list