[Archivesspace_Users_Group] Indexing and search issues

Andrew Morrison andrew.morrison at bodleian.ox.ac.uk
Thu Mar 11 07:31:06 EST 2021


You can allocate more memory to ArchivesSpace by setting the 
ASPACE_JAVA_XMX environment variable it runs under. Setting that to 
"-Xmx4g" should be sufficient.

Those FATAL lines in the log snippet are caused by a bot probing for 
known vulnerabilities in common web platforms and applications, hoping 
to find a web site running an out-of-date copy (of Drupal in this case) 
which it can exploit. It has nothing to do with ArchivesSpace, which has 
no PHP code. It is merely logging that it doesn't know what to do with 
that request.

How did you know your one successful re-indexing completed? There are 
two indexers, Staff and PUI, with the latter usually taking much longer 
to finish. So if the PUI indexer fails after the staff indexer finishes, 
you will see more records in the staff interface than the public 
interface, even if they're all set to be public. Also both indexers log 
messages that could be interpreted as meaning they've finished, but they 
then run additional indexing to build trees, to enable navigation within 
collections to work. A finally they instruct Solr to commit changes, 
which can be slow depending on the performance of your storage. You 
could try doubling AppConfig[:indexer_solr_timeout_seconds] to allow 
more time for each operation.

Or it could've re-indexed one repository but failed on the next. And it 
is possible for entire repositories to be set as non-public, which could 
be another explanation for fewer records.

Are you running an external Solr? If so, is the AppConfig[:solr_url] in 
config.rb pointing to the correct server?

There are many possible reasons for search slowness, including not 
enough memory. Are there any differences in the speed of doing the same 
search in the staff and public interfaces? Or between two ways of 
getting the same results in the PUI. For example, does the link in the 
header to list all collections (/repositories/resources) return results 
faster than searching everything then filtering to just collections 
(/search?q[]=*&op[]=&field[]=keyword&filter_fields[]=primary_type&filter_values[]=resource). 
There's a fix coming in 3.0.0 for the latter.

Andrew.


On 11/03/2021 02:07, Tom Hanstra wrote:
> I'm very new to ArchivesSpace and so my issues may be early 
> configuration problems. But I'm hoping some out there can assist. We 
> are moving from hosted to local, so I have a large database full of 
> data that I'm working with.
>
> Indexing
> Right now, I'm running into two primary problems:
>
> - Twice now, I've hit issues where the indexing fails due to the Java 
> heap space being exhausted. Do others run into this? What do others 
> use for Java settings?
> - I've broken out my PUI indexing log into a separate log and see 
> FATAL errors in the log:
> ------
> I, [2021-03-10T15:32:03.747156 #2919]  INFO -- : 
> [1b34df32-d3b7-49c3-b205-01a59daf03e5] Started GET "/system_api.php"
>  for 206.189.134.38 at 2021-03-10 15:32:03 -0500
> F, [2021-03-10T15:32:03.881297 #2919] FATAL -- : 
> [1b34df32-d3b7-49c3-b205-01a59daf03e5]
> F, [2021-03-10T15:32:03.881658 #2919] FATAL -- : 
> [1b34df32-d3b7-49c3-b205-01a59daf03e5] ActionController::RoutingErro
> r (No route matches [GET] "/system_api.php"):
> F, [2021-03-10T15:32:03.881866 #2919] FATAL -- : 
> [1b34df32-d3b7-49c3-b205-01a59daf03e5]
> F, [2021-03-10T15:32:03.882085 #2919] FATAL -- : 
> [1b34df32-d3b7-49c3-b205-01a59daf03e5] actionpack (5.2.4.4) lib/acti
> on_dispatch/middleware/debug_exceptions.rb:65:in `call'
> [1b34df32-d3b7-49c3-b205-01a59daf03e5] actionpack (5.2.4.4) 
> lib/action_dispatch/middleware/show_exceptions.rb:33:in `
> call'
> ------
> Is this something to be concerned about? Why is it showing up in the 
> PUI log?
>
> Search issues
> - Supposedly, I did get one round of indexing completed without a heap 
> error. But the resulting searches yielded numbers which were incorrect 
> compared to our hosted version. This is why I've been trying 
> reindexing. Is it usual to have indexing *look* like it is complete 
> but really be incomplete?
> - When I do a search, the response is really slow. I've got nginx set 
> up as a proxy in front of ArchivesSpace and it is showing that the 
> slowness is in ArchivesSpace itself somewhere. I don't see anything in 
> the logs to show what is taking so long. Where should I be checking 
> for issues?
>
> Thanks,
> Tom
>
> -- 
> *Tom Hanstra*
> /Sr. Systems Administrator/
> hanstra at nd.edu <mailto:hanstra at nd.edu>
>
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20210311/99cf4c4d/attachment.html>


More information about the Archivesspace_Users_Group mailing list