[Archivesspace_Users_Group] Method to Pause Indexer during Job Run?

Andrew Morrison andrew.morrison at bodleian.ox.ac.uk
Fri Feb 21 11:37:23 EST 2020


Thank you for the information and thoughts.

Andrew.


On Thu, 2020-02-20 at 22:17 +0000, Joshua D. Shaw wrote:
To close this out, I decided to go with a migration because of the time involved to update. A migration takes about 10-15 minutes, but the job looked like it was gonna take about 10-20 *hours* to complete.

I also got an error about the position constraint failing in the db during a job run, which may be down to a locking issue? I didn't chase it down enough because I was testing the migration and realized the time benefit.

On the ANW-902 issue, if there were a way to get the state of the current indexer run - not just whether the indexer is on or off, then I think it would be doable to

1) wait until the current indexer run completes
2) send a pause indexer update
3) run the job
4) send a resume indexer update

Its that first step that needs some research/thought.

Joshua

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Joshua D. Shaw <Joshua.D.Shaw at dartmouth.edu>
Sent: Wednesday, February 19, 2020 9:56 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Method to Pause Indexer during Job Run?

One additional little bit of info that may complicate things. It seems like the pause will not shut down an indexer run midway. It will only take effect after the current run is complete. At least that seems to be the case. For the import jobs, that may make things a bit dicey if a large index run is in progress when the import job kicks off, since you could still get that sync happening. I guess you could wait for the indexer to complete (not sure how to get that status) and then kick off the import job? Or, perhaps the import jobs should add a final step that iterates through all of the created objects and sets the mtime to the import job completion time?

Joshua

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Joshua D. Shaw <Joshua.D.Shaw at dartmouth.edu>
Sent: Wednesday, February 19, 2020 9:40 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Method to Pause Indexer during Job Run?

I think I've got it going by adding a little loop that checks the indexer state via the response body of the indexer (ie does it contain the string "paused") and then sending a "put" with the duration parameter set to either pause (you can specify an exact duration or just let it default to 900 seconds) or resume (duration = 0). Things to note:

1) For some reason, the indexer actually listens at AppConfig[:indexer_url]/aspace-indexer/ not just AppConfig[:indexer_url]
2) There's no nice ASHTTP wrapper for put, so you have to construct the Net::HTTP for the put yourself

I don't see why similar logic couldn't be incorporated into any import job so that the import has a chance to finish up before the indexer runs again, preventing the sync issues in ANW-902.

I've got about 280k objects to check and update, so I'll see if I run into any indexer issues once the job is completed. The only thing I've seen that may be related to that is a snapshot failure when doing a large index run (full or otherwise), but I don't think I've ever seen it completely fail due to a commit timeout. That almost sounds more like disk access or network (if you're running a separate SOLR instance).

Joshua

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Andrew Morrison <andrew.morrison at bodleian.ox.ac.uk>
Sent: Wednesday, February 19, 2020 8:32 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Method to Pause Indexer during Job Run?

I'd be interested in hearing if you get this to work, because it could be useful in fixing this issue:

https://archivesspace.atlassian.net/browse/ANW-902<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Farchivesspace.atlassian.net%2Fbrowse%2FANW-902&data=02%7C01%7Cjoshua.d.shaw%40dartmouth.edu%7C535f4bf5fff543baa52c08d7b54bf116%7C995b093648d640e5a31ebf689ec9446f%7C0%7C0%7C637177210089186817&sdata=BopJbME%2BQUEU8eafvj64lcfk5yQhapHtclw3zVpoG%2BA%3D&reserved=0>

Also, if you're making a truly mammoth update, which will be followed by a re-index of nearly everything, you might want to consider increasing the AppConfig[:indexer_solr_timeout_seconds] config setting. It may be our infrastructure, but I've found that Solr commit's phase can take so long that ArchivesSpace times out before it finishes, causing it to start the whole re-index again from scratch. We've set it to 1800 to avoid this, but YMMV.

Andrew.

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Joshua D. Shaw <Joshua.D.Shaw at dartmouth.edu>
Sent: 19 February 2020 13:05
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Method to Pause Indexer during Job Run?

Thanks, James. I glanced at that, but somehow didn't realize those were endpoints I could hit. I'll give it a go!

Joshua

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of James Bullen <james at hudmol.com>
Sent: Tuesday, February 18, 2020 7:16 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Method to Pause Indexer during Job Run?


Hi Joshua,

I haven’t used it, but I see these endpoints in indexer/app/main.rb

  get "/" do
    if IndexerCommon.paused?
      "Indexers paused until #{IndexerCommon.class_variable_get(:@@paused_until)}"
    else
      "Running every #{AppConfig[:solr_indexing_frequency_seconds].to_i} seconds. "
    end
  end

  # this pauses the indexer so that bulk update and migrations can happen
  # without bogging down the server
  put "/" do
    duration = params[:duration].nil? ? 900 : params[:duration].to_i
    IndexerCommon.pause duration
    "#{IndexerCommon.class_variable_get(:@@paused_until)}"
  end


Seems to do what you want.


Cheers,
James


On Feb 19, 2020, at 6:29 AM, Joshua D. Shaw <Joshua.D.Shaw at dartmouth.edu<mailto:Joshua.D.Shaw at dartmouth.edu>> wrote:

Hey all-

I writing a job that may take a *long* time (hours) to complete which will be updating a *lot* of AO records. I'm wondering if there's a way to pause the Indexer during a job so that I can let the Indexer do its thing*after* the job completes. I know I can toggle the AppConfig value for the indexer and do a stop/start for the app, but ideally I'd like to do the pause/resume of the Indexer while the job runs.

I could also set this up as a migration, but the updates include a bunch of tables (I'm adding an instance to AOs which meet certain criteria) and I'd prefer to use the API to do things to be safe.

Any thoughts on pausing the Indexer during a job, or do I bite the bullet and do this as a migration?

Thanks!
Joshua

___________________
Joshua Shaw (he, him)
Technology Coordinator
Rauner Special Collections Library & Digital Library Technologies Group
Dartmouth College
603.646.0405
!DSPAM:5e4c3b1e193891489818497! _______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group<https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Flyralists.lyrasis.org%2Fmailman%2Flistinfo%2Farchivesspace_users_group&data=02%7C01%7Cjoshua.d.shaw%40dartmouth.edu%7C535f4bf5fff543baa52c08d7b54bf116%7C995b093648d640e5a31ebf689ec9446f%7C0%7C0%7C637177210089196811&sdata=z3dtAH1%2BggKSGOc4lps5swWp5qSH1xi0V83x4USSh%2F8%3D&reserved=0>


!DSPAM:5e4c3b1e193891489818497!


_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20200221/e40be1dd/attachment.html>


More information about the Archivesspace_Users_Group mailing list