[Archivesspace_Users_Group] Solr indexing performance

Mark Cyzyk mcyzyk at jhu.edu
Tue Jul 12 09:23:39 EDT 2022


FYI, after experimentation, here is what finally worked!:

## By setting the next two options, you can control how many CPU cores are used,
## and the amount of memory that will be consumed by the indexing process (more
## cores and/or more records per thread means more memory used).
AppConfig[:indexer_records_per_thread] = 100
AppConfig[:indexer_thread_count] = 2
AppConfig[:indexer_solr_timeout_seconds] = 999999
#
## PUI Indexer Settings
AppConfig[:pui_indexer_enabled] = true
AppConfig[:pui_indexing_frequency_seconds] = 30
AppConfig[:pui_indexer_records_per_thread] = 100
AppConfig[:pui_indexer_thread_count] = 2
#

This worked on my local VM with 4 GB RAM and on our Development server with 8 GB RAM.

RAM was crucial here.  On my 4 GB VM, I tried with 200 records per thread, 150 records per thread, 125 records per thread, and each time it started to swap out, then ran out of swap, then crashed.  At 100 records per thread, it was swapping, but finished the job in decent time (a couple hours).

Just so there is a record of this!

Mark

--

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Mark Cyzyk, M.A., M.L.S.
Library Applications Group
The Sheridan Libraries
The Johns Hopkins University
mcyzyk at jhu.edu

Verba volant, scripta manent.

________________________________
From: Mark Cyzyk
Sent: Monday, July 4, 2022 11:56 AM
To: archivesspace_users_group at lyralists.lyrasis.org <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Solr indexing performance

Dear ASpace User Group,

I've got ASpace 3.2.0 running against external Solr in our Development VM, but Solr is taking a loooong time building the initial index.

Like, it's been running now for a full week!

I have tweaked the config.rb settings and restarted the ASpace service, but nothing seems to speed it up.

VM:
2 cpus
8 GB RAM
looking at used resources, seems like there is a lot still FREE

My settings in config.rb:

## By setting the next two options, you can control how many CPU cores are used,
## and the amount of memory that will be consumed by the indexing process (more
## cores and/or more records per thread means more memory used).
AppConfig[:indexer_records_per_thread] = 250              <-- I bumped this up from 25
AppConfig[:indexer_thread_count] = 2
AppConfig[:indexer_solr_timeout_seconds] = 999999
#
## PUI Indexer Settings
AppConfig[:pui_indexer_enabled] = true
AppConfig[:pui_indexing_frequency_seconds] = 15         <-- I decreased to this, down from 30
AppConfig[:pui_indexer_records_per_thread] = 250        <-- I bumped this up from 25
AppConfig[:pui_indexer_thread_count] = 2

Does anyone know how to speed up Solr indexing?  I can't seem to find the bottleneck here.

Advice appreciated,

Mark

--

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Mark Cyzyk, M.A., M.L.S.
Library Applications Group
The Sheridan Libraries
The Johns Hopkins University
mcyzyk at jhu.edu

Verba volant, scripta manent.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20220712/0aeb5fb4/attachment.html>


More information about the Archivesspace_Users_Group mailing list