<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 11, 2021 at 1:25 PM Andrew Morrison <<a href="mailto:andrew.morrison@bodleian.ox.ac.uk">andrew.morrison@bodleian.ox.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<i><i>> I also notice that indexing overall slows down as it gets
farther into our records.</i></i>
<p>I haven't observed a slow-down in indexing. At least not
noticeably enough to cause me to want to measure it. As I
mentioned, there are some latter stages of the indexing process
(building trees, committing changes) that can run for a long time
without logging anything. But for the main indexing of archival
objects, the last 1000 doesn't seem to take longer than the first
1000.<br></p></div></blockquote><div><i>I really only have one set of data to work from, but I've been tracking times for PUI indexing especially. During the first hour, it was able to complete<span style="color:rgb(0,0,0);font-family:Arial;font-size:13px;text-align:right;white-space:pre-wrap">193300 records. I'm assuming it was able to do that because it was covering ground it had passed before the Java Heap error caused the previous attempt to stop. But looking later in the processing, hour 3 it indexed </span><span style="color:rgb(0,0,0);font-family:Arial;font-size:13px;text-align:right;white-space:pre-wrap">41500 records, hour 12 it was down to </span><span style="color:rgb(0,0,0);font-family:Arial;font-size:13px;text-align:right;white-space:pre-wrap">22250 records, and now in the last hour (hour 23) it has only completed 950 records. That is why I was asking about the slow down.</span></i></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
</p>
<p><i>> Is an external Solr a good idea for a site like ours?</i>
</p>
<p>Hard to say. There was a discussion on the performance benefits
of running external Solr on here last month:</p>
<p><a href="http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/2021-February/thread.html#8168" target="_blank">http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/2021-February/thread.html#8168</a></p>
<p>And documentation here:</p>
<p><a href="https://archivesspace.github.io/tech-docs/provisioning/solr.html" target="_blank">https://archivesspace.github.io/tech-docs/provisioning/solr.html</a><br>
</p>
Whether it affects indexing speed for large numbers of records I
cannot say. We made the decision to use it before all our data was
migrated into ArchivesSpace.<br>
<p>
</p>
<p></p></div></blockquote><div><i>Looks as if separating Solr out is a good idea, based upon feedback. I'll work on that (assuming indexing finally finishes at some point) </i></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>Andrew.</p>
<p><br>
</p>
<div>On 11/03/2021 15:32, Tom Hanstra wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">Thanks, Andrew. Some responses intertwined below,
italicized:</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Mar 11, 2021 at 7:31
AM Andrew Morrison <<a href="mailto:andrew.morrison@bodleian.ox.ac.uk" target="_blank">andrew.morrison@bodleian.ox.ac.uk</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>You can allocate more memory to ArchivesSpace by
setting the ASPACE_JAVA_XMX environment variable it runs
under. Setting that to "-Xmx4g" should be sufficient.</p>
</div>
</blockquote>
<div><i>I did bump that and the ASPACE_JAVA_XSS up a bit for
this round, which looks like it will finally complete.
Just a few more PUI records need to be added. </i></div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Those FATAL lines in the log snippet are caused by a
bot probing for known vulnerabilities in common web
platforms and applications, hoping to find a web site
running an out-of-date copy (of Drupal in this case)
which it can exploit. It has nothing to do with
ArchivesSpace, which has no PHP code. It is merely
logging that it doesn't know what to do with that
request.<br>
</p>
</div>
</blockquote>
<div><i>Thanks. I was hoping this was just extraneous. </i></div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>How did you know your one successful re-indexing
completed? There are two indexers, Staff and PUI, with
the latter usually taking much longer to finish. So if
the PUI indexer fails after the staff indexer finishes,
you will see more records in the staff interface than
the public interface, even if they're all set to be
public. Also both indexers log messages that could be
interpreted as meaning they've finished, but they then
run additional indexing to build trees, to enable
navigation within collections to work. A finally they
instruct Solr to commit changes, which can be slow
depending on the performance of your storage. You could
try doubling AppConfig[:indexer_solr_timeout_seconds] to
allow more time for each operation.</p>
</div>
</blockquote>
<div><i>At least one set of logs, when I earlier gave the
server more resources, showed that the indexing had
completed. But, because the second repository was showing
nothing, I decided to indexing again. <br>
<br>
This time around, we do have the second repository found
so that, too, indicates that things have gone better. I
guess I have to wait for things to complete but there are
still some questions outstanding. For instance, one search
I did for "football" (something dear to the Notre Dame
experience) within the repository which is supposed to be
pretty much indexed, showed over 32K results on our hosted
site but only 17K locally. That seems wildly off with only
a few PUI records to be completed (log shows 736500 of
763368). Could the incomplete index really be that far
off?</i></div>
<div><i><br>
</i></div>
<div><i>I also notice that indexing overall slows down as it
gets farther into our records. Is that probably because
there is just more to be done with the records that might
not have gotten done in earlier attempts while the first
records buzz by rapidly because of earlier indexing
attempts? Or could it be that resources are taken up
early in the processing and no longer available for
processing the later records? Is resource tuning just a
trial/error prospect? I don't see a lot of information in
the documentation.</i><br>
<div>
<p>Or it could've re-indexed one repository but failed on
the next. And it is possible for entire repositories to
be set as non-public, which could be another explanation
for fewer records.</p>
<p>Are you running an external Solr? If so, is the
AppConfig[:solr_url] in config.rb pointing to the
correct server?<br>
</p>
</div>
</div>
<div><i>I'm running a local Solr as part of the application.
Is an external Solr a good idea for a site like ours? I
will also do some tweaking with the Solr settings to see
if that might help...after I get through at least one
complete index. </i></div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>There are many possible reasons for search slowness,
including not enough memory. Are there any differences
in the speed of doing the same search in the staff and
public interfaces? Or between two ways of getting the
same results in the PUI. For example, does the link in
the header to list all collections
(/repositories/resources) return results faster than
searching everything then filtering to just collections
(/search?q[]=*&op[]=&field[]=keyword&filter_fields[]=primary_type&filter_values[]=resource).
There's a fix coming in 3.0.0 for the latter.</p>
</div>
</blockquote>
<div><i>I had not tried comparing staff to public. I will do
that (though I first have to get some access to the staff
side!). And I'll really not try to do much comparison
until we get indexing complete, in case the indexing
itself is slowing things down. </i></div>
<div><i><br>
</i></div>
<div><i>More questions to come, I'm sure. But thanks for your
input and ideas of places to look further. Much
appreciated.</i></div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Andrew.</p>
<p><br>
</p>
<div>On 11/03/2021 02:07, Tom Hanstra wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">I'm very new to ArchivesSpace and so my
issues may be early configuration problems. But I'm
hoping some out there can assist. We are moving from
hosted to local, so I have a large database full of
data that I'm working with.
<div><br>
</div>
<div>Indexing<br>
<div>Right now, I'm running into two primary
problems:</div>
<div><br>
</div>
<div>- Twice now, I've hit issues where the indexing
fails due to the Java heap space being exhausted.
Do others run into this? What do others use for
Java settings?</div>
<div>- I've broken out my PUI indexing log into a
separate log and see FATAL errors in the log:<br>
------</div>
<div>I, [2021-03-10T15:32:03.747156 #2919] INFO --
: [1b34df32-d3b7-49c3-b205-01a59daf03e5] Started
GET "/system_api.php"<br>
for 206.189.134.38 at 2021-03-10 15:32:03 -0500<br>
F, [2021-03-10T15:32:03.881297 #2919] FATAL -- :
[1b34df32-d3b7-49c3-b205-01a59daf03e5]<br>
F, [2021-03-10T15:32:03.881658 #2919] FATAL -- :
[1b34df32-d3b7-49c3-b205-01a59daf03e5] <a>ActionController::RoutingErro</a><br>
r (No route matches [GET] "/system_api.php"):<br>
F, [2021-03-10T15:32:03.881866 #2919] FATAL -- :
[1b34df32-d3b7-49c3-b205-01a59daf03e5]<br>
F, [2021-03-10T15:32:03.882085 #2919] FATAL -- :
[1b34df32-d3b7-49c3-b205-01a59daf03e5] actionpack
(5.2.4.4) lib/acti<br>
on_dispatch/middleware/debug_<a>exceptions.rb:65:in</a>
`call'<br>
[1b34df32-d3b7-49c3-b205-01a59daf03e5] actionpack
(5.2.4.4) lib/action_dispatch/middleware/show_<a>exceptions.rb:33:in</a> `<br>
call'</div>
<div>------</div>
<div>Is this something to be concerned about? Why is
it showing up in the PUI log?</div>
<div><br>
</div>
<div>Search issues</div>
<div>- Supposedly, I did get one round of indexing
completed without a heap error. But the resulting
searches yielded numbers which were incorrect
compared to our hosted version. This is why I've
been trying reindexing. Is it usual to have
indexing *look* like it is complete but really be
incomplete?</div>
<div>- When I do a search, the response is really
slow. I've got nginx set up as a proxy in front of
ArchivesSpace and it is showing that the slowness
is in ArchivesSpace itself somewhere. I don't see
anything in the logs to show what is taking so
long. Where should I be checking for issues?</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Tom</div>
<div>
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div dir="ltr">
<div><b style="font-family:arial,helvetica,sans-serif;font-size:12.7273px;color:rgb(136,136,136)">Tom
Hanstra</b><br>
</div>
<div style="color:rgb(136,136,136);font-size:12.8px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div style="font-size:12.7273px">
<div>
<div><i style="font-size:12.7273px;font-family:arial,helvetica,sans-serif">Sr.
Systems Administrator</i></div>
<div><a href="mailto:hanstra@nd.edu" style="color:rgb(17,85,204);font-size:12.7273px;font-family:arial,helvetica,sans-serif" target="_blank">hanstra@nd.edu</a><br>
</div>
</div>
<div><span style="font-family:arial,helvetica,sans-serif"><br>
</span></div>
</div>
<div style="font-size:12.7273px"><img src="https://docs.google.com/uc?export=download&id=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0&revid=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ"><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Archivesspace_Users_Group mailing list
<a href="mailto:Archivesspace_Users_Group@lyralists.lyrasis.org" target="_blank">Archivesspace_Users_Group@lyralists.lyrasis.org</a>
<a href="http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group" target="_blank">http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>
Archivesspace_Users_Group mailing list<br>
<a href="mailto:Archivesspace_Users_Group@lyralists.lyrasis.org" target="_blank">Archivesspace_Users_Group@lyralists.lyrasis.org</a><br>
<a href="http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group" rel="noreferrer" target="_blank">http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group</a><br>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div dir="ltr">
<div><b style="font-family:arial,helvetica,sans-serif;font-size:12.7273px;color:rgb(136,136,136)">Tom
Hanstra</b><br>
</div>
<div style="color:rgb(136,136,136);font-size:12.8px">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div style="font-size:12.7273px">
<div>
<div><i style="font-size:12.7273px;font-family:arial,helvetica,sans-serif">Sr.
Systems Administrator</i></div>
<div><a href="mailto:hanstra@nd.edu" style="color:rgb(17,85,204);font-size:12.7273px;font-family:arial,helvetica,sans-serif" target="_blank">hanstra@nd.edu</a><br>
</div>
</div>
<div><span style="font-family:arial,helvetica,sans-serif"><br>
</span></div>
</div>
<div style="font-size:12.7273px"><img src="https://docs.google.com/uc?export=download&id=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0&revid=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ"><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
Archivesspace_Users_Group mailing list
<a href="mailto:Archivesspace_Users_Group@lyralists.lyrasis.org" target="_blank">Archivesspace_Users_Group@lyralists.lyrasis.org</a>
<a href="http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group" target="_blank">http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group</a>
</pre>
</blockquote>
</div>
_______________________________________________<br>
Archivesspace_Users_Group mailing list<br>
<a href="mailto:Archivesspace_Users_Group@lyralists.lyrasis.org" target="_blank">Archivesspace_Users_Group@lyralists.lyrasis.org</a><br>
<a href="http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group" rel="noreferrer" target="_blank">http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div><b style="font-family:arial,helvetica,sans-serif;font-size:12.7273px;color:rgb(136,136,136)">Tom Hanstra</b><br></div><div style="color:rgb(136,136,136);font-size:12.8px"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div style="font-size:12.7273px"><div><div><i style="font-size:12.7273px;font-family:arial,helvetica,sans-serif">Sr. Systems Administrator</i></div><div><a href="mailto:hanstra@nd.edu" style="color:rgb(17,85,204);font-size:12.7273px;font-family:arial,helvetica,sans-serif" target="_blank">hanstra@nd.edu</a><br></div></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div></div><div style="font-size:12.7273px"><img src="https://docs.google.com/uc?export=download&id=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0&revid=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ"><br></div></div></div></div></div></div></div></div></div></div></div></div>