[Archivesspace_Users_Group] strategies for piecemeal re-index

Mark Cooper mark.cooper at lyrasis.org
Wed Jul 22 21:01:25 EDT 2015


Hi Maureen,

We haven't *had* to do it in production so far but we have tested deleting files in indexer_state to trigger reindexing just for the targeted record type and it worked for our test cases, enough to feel it could work if there was an urgent need for it prior to a full rebuild or backup restore. 

You could restore from a "however long ago is necessary Solr backup" if you have it (such as provided by ArchivesSpace's backup script, which includes the indexer state) and let the indexer bring it up to date. That could save you from having to do a full rebuild. If you don't have those, or aren't sure when a good state was guaranteed, I think Brian's suggested approach may be the best one if you've got capacity for it.

I can't think of an obvious way to make Solr replication work to your advantage in the way I think Claire is describing it. ArchivesSpace can only point at a single Solr instance (which can be external), but you can't point at the broken index on one server, while rebuilding another. It's not like other applications I've used where the indexing happens (or can be triggered / configured) independently of the main app.

Last thing I can think of (and probably not too helpful for your immediate need) is to try and aggressively speed up indexing, perhaps by overnight disabling everything apart from the backend, indexer and solr and tune the configuration settings to maximize indexing speed (with threads and records per thread being the most likely candidates for experimentation -- but it's hard to predict how much benefit it will offer and is likely to be very "spec" dependent, so your mileage may vary).

Mark

Mark Cooper
Technical Lead, Hosting and Support
LYRASIS
email: mark.cooper at lyrasis.org
skype: mark_c_cooper​

________________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Brian Hoffman <brianjhoffman at gmail.com>
Sent: Wednesday, July 22, 2015 1:41 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] strategies for piecemeal re-index

Hi Maureen,

Perhaps you could try using a second instance of ArchivesSpace to build the new index, then copying that index over your production ArchivesSpace index. It’s not elegant but I can’t think of anything better.

Brian


On Jul 22, 2015, at 4:26 PM, Callahan, Maureen <maureen.callahan at yale.edu> wrote:

> Claire, this sounds like a very sensible solution. Thank you! I’m eager to hear any thoughts that Chris or Brian may have. I (or someone else from Yale) may also be in touch directly to learn more about your process.
>
> Many thanks,
> Maureen
>
>
>> On Jul 22, 2015, at 4:10 PM, KNOWLES Claire <Claire.Knowles at ed.ac.uk> wrote:
>>
>> Hi Maureen,
>>
>> With another service we run we replicate the SOLR. If we want to do a full
>> reindex we then point the webapp to the replicated SOLR and turn off
>> replication. I¹m not sure if this solution will work with ArchivesSpace,
>> I¹m sure Chris can advise.
>>
>> Claire
>>
>> --
>> Claire Knowles
>> Library Digital Development Manager
>> Library and University Collections, Information Services
>> University of Edinburgh
>> Tel: 0131 6503023
>> Email: claire.knowles at ed.ac.uk
>>
>>
>>
>>
>>
>> On 22/07/2015 15:37,
>> "archivesspace_users_group-bounces at lyralists.lyrasis.org on behalf of
>> Callahan, Maureen"
>> <archivesspace_users_group-bounces at lyralists.lyrasis.org on behalf of
>> maureen.callahan at yale.edu> wrote:
>>
>>> Hey everyone,
>>>
>>> At some point, our index got totally jacked. Unfortunately, our database
>>> is way too big to be able to do a full re-index overnight and we¹re
>>> reluctant to leave it going over the weekend in case something goes wrong.
>>>
>>> Also, our Aeon requesting service relies on a webservice built on top of
>>> ArchivesSpace, so we can¹t have the index unavailable for too long, even
>>> if it¹s not during normal working hours.
>>>
>>> Has anyone played with re-indexing a bit at a time by deleting records
>>> from indexer_state? Is this a reliable way to fix index problems? Does
>>> anyone have thoughts on other strategies?
>>>
>>> Thanks,
>>> Maureen
>>> _______________________________________________
>>> Archivesspace_Users_Group mailing list
>>> Archivesspace_Users_Group at lyralists.lyrasis.org
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwIFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=JgH2YCQ8D3P9-Lm_x4bv3d2CZBYlbx6hxnLFHtfovi8&m=tn9RUtSjPMsjMstlHFo9h7W0l1cfREohyVoURWtmSxM&s=SEcF2mqDEkndNEGCVVYmk--uNFqLRX112AYMrnZD2L0&e=
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwIFAw&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=JgH2YCQ8D3P9-Lm_x4bv3d2CZBYlbx6hxnLFHtfovi8&m=tn9RUtSjPMsjMstlHFo9h7W0l1cfREohyVoURWtmSxM&s=SEcF2mqDEkndNEGCVVYmk--uNFqLRX112AYMrnZD2L0&e=
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


More information about the Archivesspace_Users_Group mailing list