<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 11, 2021 at 1:25 PM Andrew Morrison <<a href="mailto:andrew.morrison@bodleian.ox.ac.uk">andrew.morrison@bodleian.ox.ac.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  
  <div>
    <i><i>> I also notice that indexing overall slows down as it gets
        farther into our records.</i></i>
    <p>I haven't observed a slow-down in indexing. At least not
      noticeably enough to cause me to want to measure it. As I
      mentioned, there are some latter stages of the indexing process
      (building trees, committing changes) that can run for a long time
      without logging anything. But for the main indexing of archival
      objects, the last 1000 doesn't seem to take longer than the first
      1000.<br></p></div></blockquote><div><i>I really only have one set of data to work from, but I've been tracking times for PUI indexing especially. During the first hour, it was able to complete<span style="color:rgb(0,0,0);font-family:Arial;font-size:13px;text-align:right;white-space:pre-wrap">193300 records. I'm assuming it was able to do that because it was covering ground it had passed before the Java Heap error caused the previous attempt to stop.  But looking later in the processing, hour 3 it indexed </span><span style="color:rgb(0,0,0);font-family:Arial;font-size:13px;text-align:right;white-space:pre-wrap">41500 records, hour 12 it was down to </span><span style="color:rgb(0,0,0);font-family:Arial;font-size:13px;text-align:right;white-space:pre-wrap">22250 records, and now in the last hour (hour 23) it has only completed 950 records.  That is why I was asking about the slow down.</span></i></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>
    </p>
    <p><i>> Is an external Solr a good idea for a site like ours?</i>
    </p>
    <p>Hard to say. There was a discussion on the performance benefits
      of running external Solr on here last month:</p>
    <p><a href="http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/2021-February/thread.html#8168" target="_blank">http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/2021-February/thread.html#8168</a></p>
    <p>And documentation here:</p>
    <p><a href="https://archivesspace.github.io/tech-docs/provisioning/solr.html" target="_blank">https://archivesspace.github.io/tech-docs/provisioning/solr.html</a><br>
    </p>
    Whether it affects indexing speed for large numbers of records I
    cannot say. We made the decision to use it before all our data was
    migrated into ArchivesSpace.<br>
    <p>
    </p>
    <p></p></div></blockquote><div><i>Looks as if separating Solr out is a good idea, based upon feedback. I'll work on that (assuming indexing finally finishes at some point) </i></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>Andrew.</p>
    <p><br>
    </p>
    <div>On 11/03/2021 15:32, Tom Hanstra wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr">Thanks, Andrew. Some responses intertwined below,
          italicized:</div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Thu, Mar 11, 2021 at 7:31
            AM Andrew Morrison <<a href="mailto:andrew.morrison@bodleian.ox.ac.uk" target="_blank">andrew.morrison@bodleian.ox.ac.uk</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p>You can allocate more memory to ArchivesSpace by
                setting the ASPACE_JAVA_XMX environment variable it runs
                under. Setting that to "-Xmx4g" should be sufficient.</p>
            </div>
          </blockquote>
          <div><i>I did bump that and the ASPACE_JAVA_XSS up a bit for
              this round, which looks like it will finally complete.
              Just a few more PUI records need to be added. </i></div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p>Those FATAL lines in the log snippet are caused by a
                bot probing for known vulnerabilities in common web
                platforms and applications, hoping to find a web site
                running an out-of-date copy (of Drupal in this case)
                which it can exploit. It has nothing to do with
                ArchivesSpace, which has no PHP code. It is merely
                logging that it doesn't know what to do with that
                request.<br>
              </p>
            </div>
          </blockquote>
          <div><i>Thanks. I was hoping this was just extraneous. </i></div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p> </p>
              <p>How did you know your one successful re-indexing
                completed? There are two indexers, Staff and PUI, with
                the latter usually taking much longer to finish. So if
                the PUI indexer fails after the staff indexer finishes,
                you will see more records in the staff interface than
                the public interface, even if they're all set to be
                public. Also both indexers log messages that could be
                interpreted as meaning they've finished, but they then
                run additional indexing to build trees, to enable
                navigation within collections to work. A finally they
                instruct Solr to commit changes, which can be slow
                depending on the performance of your storage. You could
                try doubling AppConfig[:indexer_solr_timeout_seconds] to
                allow more time for each operation.</p>
            </div>
          </blockquote>
          <div><i>At least one set of logs, when I earlier gave the
              server more resources, showed that the indexing had
              completed. But, because the second repository was showing
              nothing, I decided to indexing again. <br>
              <br>
              This time around, we do have the second repository found
              so that, too, indicates that things have gone better. I
              guess I have to wait for things to complete but there are
              still some questions outstanding. For instance, one search
              I did for "football" (something dear to the Notre Dame
              experience) within the repository which is supposed to be
              pretty much indexed, showed over 32K results on our hosted
              site but only 17K locally. That seems wildly off with only
              a few PUI records to be completed (log shows 736500 of
              763368). Could the incomplete index really be that far
              off?</i></div>
          <div><i><br>
            </i></div>
          <div><i>I also notice that indexing overall slows down as it
              gets farther into our records. Is that probably because
              there is just more to be done with the records that might
              not have gotten done in earlier attempts while the first
              records buzz by rapidly because of earlier indexing
              attempts?  Or could it be that resources are taken up
              early in the processing and no longer available for
              processing the later records? Is resource tuning just a
              trial/error prospect?  I don't see a lot of information in
              the documentation.</i><br>
            <div>
              <p>Or it could've re-indexed one repository but failed on
                the next. And it is possible for entire repositories to
                be set as non-public, which could be another explanation
                for fewer records.</p>
              <p>Are you running an external Solr? If so, is the
                AppConfig[:solr_url] in config.rb pointing to the
                correct server?<br>
              </p>
            </div>
          </div>
          <div><i>I'm running a local Solr as part of the application.
              Is an external Solr a good idea for a site like ours? I
              will also do some tweaking with the Solr settings to see
              if that might help...after I get through at least one
              complete index. </i></div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p> </p>
              <p>There are many possible reasons for search slowness,
                including not enough memory. Are there any differences
                in the speed of doing the same search in the staff and
                public interfaces? Or between two ways of getting the
                same results in the PUI. For example, does the link in
                the header to list all collections
                (/repositories/resources) return results faster than
                searching everything then filtering to just collections
(/search?q[]=*&op[]=&field[]=keyword&filter_fields[]=primary_type&filter_values[]=resource).
                There's a fix coming in 3.0.0 for the latter.</p>
            </div>
          </blockquote>
          <div><i>I had not tried comparing staff to public. I will do
              that (though I first have to get some access to the staff
              side!). And I'll really not try to do much comparison
              until we get indexing complete, in case the indexing
              itself is slowing things down. </i></div>
          <div><i><br>
            </i></div>
          <div><i>More questions to come, I'm sure. But thanks for your
              input and ideas of places to look further. Much
              appreciated.</i></div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
            <div>
              <p>Andrew.</p>
              <p><br>
              </p>
              <div>On 11/03/2021 02:07, Tom Hanstra wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">I'm very new to ArchivesSpace and so my
                  issues may be early configuration problems. But I'm
                  hoping some out there can assist. We are moving from
                  hosted to local, so I have a large database full of
                  data that I'm working with.
                  <div><br>
                  </div>
                  <div>Indexing<br>
                    <div>Right now, I'm running into two primary
                      problems:</div>
                    <div><br>
                    </div>
                    <div>- Twice now, I've hit issues where the indexing
                      fails due to the Java heap space being exhausted.
                      Do others run into this? What do others use for
                      Java settings?</div>
                    <div>- I've broken out my PUI indexing log into a
                      separate log and see FATAL errors in the log:<br>
                      ------</div>
                    <div>I, [2021-03-10T15:32:03.747156 #2919]  INFO --
                      : [1b34df32-d3b7-49c3-b205-01a59daf03e5] Started
                      GET "/system_api.php"<br>
                       for 206.189.134.38 at 2021-03-10 15:32:03 -0500<br>
                      F, [2021-03-10T15:32:03.881297 #2919] FATAL -- :
                      [1b34df32-d3b7-49c3-b205-01a59daf03e5]<br>
                      F, [2021-03-10T15:32:03.881658 #2919] FATAL -- :
                      [1b34df32-d3b7-49c3-b205-01a59daf03e5] <a>ActionController::RoutingErro</a><br>
                      r (No route matches [GET] "/system_api.php"):<br>
                      F, [2021-03-10T15:32:03.881866 #2919] FATAL -- :
                      [1b34df32-d3b7-49c3-b205-01a59daf03e5]<br>
                      F, [2021-03-10T15:32:03.882085 #2919] FATAL -- :
                      [1b34df32-d3b7-49c3-b205-01a59daf03e5] actionpack
                      (5.2.4.4) lib/acti<br>
                      on_dispatch/middleware/debug_<a>exceptions.rb:65:in</a>
                      `call'<br>
                      [1b34df32-d3b7-49c3-b205-01a59daf03e5] actionpack
                      (5.2.4.4) lib/action_dispatch/middleware/show_<a>exceptions.rb:33:in</a> `<br>
                      call'</div>
                    <div>------</div>
                    <div>Is this something to be concerned about? Why is
                      it showing up in the PUI log?</div>
                    <div><br>
                    </div>
                    <div>Search issues</div>
                    <div>- Supposedly, I did get one round of indexing
                      completed without a heap error. But the resulting
                      searches yielded numbers which were incorrect
                      compared to our hosted version. This is why I've
                      been trying reindexing. Is it usual to have
                      indexing *look* like it is complete but really be
                      incomplete?</div>
                    <div>- When I do a search, the response is really
                      slow. I've got nginx set up as a proxy in front of
                      ArchivesSpace and it is showing that the slowness
                      is in ArchivesSpace itself somewhere. I don't see
                      anything in the logs to show what is taking so
                      long. Where should I be checking for issues?</div>
                    <div><br>
                    </div>
                    <div>Thanks,</div>
                    <div>Tom</div>
                    <div>
                      <div><br>
                      </div>
                      -- <br>
                      <div dir="ltr">
                        <div dir="ltr">
                          <div>
                            <div dir="ltr">
                              <div dir="ltr">
                                <div><b style="font-family:arial,helvetica,sans-serif;font-size:12.7273px;color:rgb(136,136,136)">Tom
                                    Hanstra</b><br>
                                </div>
                                <div style="color:rgb(136,136,136);font-size:12.8px">
                                  <div dir="ltr">
                                    <div dir="ltr">
                                      <div dir="ltr">
                                        <div dir="ltr">
                                          <div style="font-size:12.7273px">
                                            <div>
                                              <div><i style="font-size:12.7273px;font-family:arial,helvetica,sans-serif">Sr.
                                                  Systems Administrator</i></div>
                                              <div><a href="mailto:hanstra@nd.edu" style="color:rgb(17,85,204);font-size:12.7273px;font-family:arial,helvetica,sans-serif" target="_blank">hanstra@nd.edu</a><br>
                                              </div>
                                            </div>
                                            <div><span style="font-family:arial,helvetica,sans-serif"><br>
                                              </span></div>
                                          </div>
                                          <div style="font-size:12.7273px"><img src="https://docs.google.com/uc?export=download&id=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0&revid=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ"><br>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
                <br>
                <fieldset></fieldset>
                <pre>_______________________________________________
Archivesspace_Users_Group mailing list
<a href="mailto:Archivesspace_Users_Group@lyralists.lyrasis.org" target="_blank">Archivesspace_Users_Group@lyralists.lyrasis.org</a>
<a href="http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group" target="_blank">http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group</a>
</pre>
              </blockquote>
            </div>
            _______________________________________________<br>
            Archivesspace_Users_Group mailing list<br>
            <a href="mailto:Archivesspace_Users_Group@lyralists.lyrasis.org" target="_blank">Archivesspace_Users_Group@lyralists.lyrasis.org</a><br>
            <a href="http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group" rel="noreferrer" target="_blank">http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group</a><br>
          </blockquote>
        </div>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        <div dir="ltr">
          <div dir="ltr">
            <div>
              <div dir="ltr">
                <div dir="ltr">
                  <div><b style="font-family:arial,helvetica,sans-serif;font-size:12.7273px;color:rgb(136,136,136)">Tom
                      Hanstra</b><br>
                  </div>
                  <div style="color:rgb(136,136,136);font-size:12.8px">
                    <div dir="ltr">
                      <div dir="ltr">
                        <div dir="ltr">
                          <div dir="ltr">
                            <div style="font-size:12.7273px">
                              <div>
                                <div><i style="font-size:12.7273px;font-family:arial,helvetica,sans-serif">Sr.
                                    Systems Administrator</i></div>
                                <div><a href="mailto:hanstra@nd.edu" style="color:rgb(17,85,204);font-size:12.7273px;font-family:arial,helvetica,sans-serif" target="_blank">hanstra@nd.edu</a><br>
                                </div>
                              </div>
                              <div><span style="font-family:arial,helvetica,sans-serif"><br>
                                </span></div>
                            </div>
                            <div style="font-size:12.7273px"><img src="https://docs.google.com/uc?export=download&id=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0&revid=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ"><br>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <br>
      <fieldset></fieldset>
      <pre>_______________________________________________
Archivesspace_Users_Group mailing list
<a href="mailto:Archivesspace_Users_Group@lyralists.lyrasis.org" target="_blank">Archivesspace_Users_Group@lyralists.lyrasis.org</a>
<a href="http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group" target="_blank">http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group</a>
</pre>
    </blockquote>
  </div>

_______________________________________________<br>
Archivesspace_Users_Group mailing list<br>
<a href="mailto:Archivesspace_Users_Group@lyralists.lyrasis.org" target="_blank">Archivesspace_Users_Group@lyralists.lyrasis.org</a><br>
<a href="http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group" rel="noreferrer" target="_blank">http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div><b style="font-family:arial,helvetica,sans-serif;font-size:12.7273px;color:rgb(136,136,136)">Tom Hanstra</b><br></div><div style="color:rgb(136,136,136);font-size:12.8px"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div style="font-size:12.7273px"><div><div><i style="font-size:12.7273px;font-family:arial,helvetica,sans-serif">Sr. Systems Administrator</i></div><div><a href="mailto:hanstra@nd.edu" style="color:rgb(17,85,204);font-size:12.7273px;font-family:arial,helvetica,sans-serif" target="_blank">hanstra@nd.edu</a><br></div></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div></div><div style="font-size:12.7273px"><img src="https://docs.google.com/uc?export=download&id=1GFX1KaaMTtQ2Kg2u8bMXt1YwBp96bvf0&revid=0B7APN9POn6xAQ244WWFYMFU3aVJwZ0lxbmVHK3FxNXlCd0RRPQ"><br></div></div></div></div></div></div></div></div></div></div></div></div>