[Archivesspace_Users_Group] Best Way to Reindex with PUI Live?

Joshua D. Shaw Joshua.D.Shaw at dartmouth.edu
Tue Apr 28 10:47:46 EDT 2020


Thanks, Andrew! So far we haven't had any issues with indexer commit timeouts, but I'll keep that in mind.

There's an extra wrinkle that I forgot to mention. Its a 2.5.0 -> 2.7.1 upgrade and the data the indexer gathers has changed (both from core and from some local indexer changes) so I think I'm going to follow something similar to a suggestion Mark Custer made.

Basically, I'm thinking of spinning up a clone of the 2.5.0 app and db and let that index, then point our PUI to that, then upgrade the app and run the db and indexer updates on the now hidden from the public production version, and then point the PUI back to the old production system once the indexer has finished up. And, finally, remove the clone.

Something like that seems the most foolproof in concept. And we'll be doing it during our intercession to minimize any risk.

Joshua

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Andrew Morrison <andrew.morrison at bodleian.ox.ac.uk>
Sent: Tuesday, April 28, 2020 9:20 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Best Way to Reindex with PUI Live?


Not knowing which version you are using I cannot be absolutely sure, but for versions released in recent years deleting the indexer_state and indexer_pui_state subfolders inside the data directory will not cause downtime or missing records for PUI users (nor staff.)


If you are re-indexing because you've made changes to config.rb it will require an application restart to put the change into effect. Delete those state folders immediately after running the restart command, and the indexer will begin refreshing records in batches once it is back up and running. If the changes you've made affect how certain records are indexed (e.g. inherited_fields for archival_objects) then there will be some inconsistency until every record has been overwritten in Solr's memory by the ArchivesSpace indexer. But it is unlikely any end user will notice.


If you do decide to block user access during the re-index, you should note it is possible for the indexer to go into a loop when doing a full re-index, and never finish. But only if you've got lots of complex records in a single repository. That is because the last step in re-indexing each repository is to send an instruction to Solr to commit all changes in memory to disk. Depending on the speed of whatever storage layer your system uses that can take longer than 5 minutes, in which case the indexer will start again from scratch. We've set AppConfig[:indexer_solr_timeout_seconds] to 1800 to give it half an hour, to avoid this.


Andrew.



On 27/04/2020 21:00, Joshua D. Shaw wrote:
Hey Blake-

I usually empty the indexer states directories and the data/solr_index/index directory when I do a fresh index run, but this is the first time I've had to do a re-index while the PUI is live. Staff I can give a heads up and they typically don't work weekends anyway. But students & faculty are a different ballgame!

Do you inform users of the PUI that its down? Or do your stats indicate that the use on weekends is low enough not to warrant that step? I'm loathe to completely take down an online resource - especially now when Dartmouth is in the middle of its spring quarter.

I guess I'll try a couple of different approaches on our dev site and see which turns out to be best. If none of those work, postponing the update till early June is probably the best option for us (when classes and finals end).

Thanks!
Joshua

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org><mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Blake Carver <blake.carver at lyrasis.org><mailto:blake.carver at lyrasis.org>
Sent: Monday, April 27, 2020 2:47 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org><mailto:archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Best Way to Reindex with PUI Live?

Theoretically another way to do it is to update system_mtime on everything as well.

https://gist.github.com/Blake-/538c8d7cc7ade39efc372a3e3e190873<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2FBlake-%2F538c8d7cc7ade39efc372a3e3e190873&data=02%7C01%7Cjoshua.d.shaw%40dartmouth.edu%7C3c5b615bb3ed4ab561ae08d7eb76ee55%7C995b093648d640e5a31ebf689ec9446f%7C0%7C0%7C637236768364882877&sdata=PMVjhoP6JNV%2BBSqc%2FK3mOm2Ml2PuBoNGKo1nnqcs5u0%3D&reserved=0>

Someplace in the official solr docs they say the best way to do it is to wipe everything. I've found it best to empty /data/.

We'll usually do the full reindexes on a Friday night, most sites will have finished up by Monday.
________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org><mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Joshua D. Shaw <Joshua.D.Shaw at dartmouth.edu><mailto:Joshua.D.Shaw at dartmouth.edu>
Sent: Monday, April 27, 2020 1:01 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org><mailto:archivesspace_users_group at lyralists.lyrasis.org>
Subject: [Archivesspace_Users_Group] Best Way to Reindex with PUI Live?

Hi all-

Just wondering what people have been doing when they need to do a total reindex and they have a live PUI? Our reindex takes about 4-6 hours typically and I'm looking to avoid 4-6 hours of PUI downtime if at all possible.

I'm planning to just wipe the indexer_state files and leave the index itself in place while the re-index occurs, but I'm wondering if there are better/alternate methods? Theoretically the PUI should still be functional while the reindex takes place if only the indexer_state files are wiped.

Thanks!
Joshua

___________________
Joshua Shaw (he, him)
Technology Coordinator
Rauner Special Collections Library & Digital Library Technologies Group
Dartmouth College
603.646.0405



_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group<https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flyralists.lyrasis.org%2Fmailman%2Flistinfo%2Farchivesspace_users_group&data=02%7C01%7Cjoshua.d.shaw%40dartmouth.edu%7C3c5b615bb3ed4ab561ae08d7eb76ee55%7C995b093648d640e5a31ebf689ec9446f%7C0%7C0%7C637236768364882877&sdata=rEdTqReJX3IyRQvVKTaOTC8IB44GIgVWZ%2FLPTBM0BK8%3D&reserved=0>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20200428/148322c7/attachment.html>


More information about the Archivesspace_Users_Group mailing list