<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>You can see what the default Solr config in ArchivesSpace does
      with these queries in this screenshot of Solr's analysis tool on a
      development system:</p>
    <p><a moz-do-not-send="true" href="https://user-images.githubusercontent.com/33721187/79843129-b9549680-83b1-11ea-8d3a-670f4e84a6de.png">https://user-images.githubusercontent.com/33721187/79843129-b9549680-83b1-11ea-8d3a-670f4e84a6de.png</a></p>
    <p>On the left is how it indexes Governors and on the right is how
      it handles a query for Governor's. The first step, marked "ST" in
      light grey, is the Standard Tokenizer. As you can see, it does
      nothing in this case, and passes both unchanged to the next step
      ("SF", the stop word filter, which also does nothing.)<br>
    </p>
    <p>Changing to a different tokenizer could change how apostrophes
      are handled. Or adding a stemmer might do the same and also ensure
      the same results are returned for singular and plural forms of
      most words. But these sort of customizations are
      language-specific. What works for English probably wouldn't work,
      and might have negative effects, for finding materials in Spanish,
      French or German. This is one of the advantages of using an <a moz-do-not-send="true" href="https://archivesspace.github.io/archivesspace/user/running-archivesspace-with-external-solr/">external
        Solr server</a> set up - that you can tailor it for your
      collections and your users. It also means you can run a more
      up-to-date version of Solr, with <a moz-do-not-send="true" href="https://lucene.apache.org/solr/guide/7_7/filter-descriptions.html">more
        and better options</a> (we use Word Delimiter Graph Filter and
      KStem.)<br>
    </p>
    <p>Andrew.<br>
    </p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 20/04/2020 23:44, Trevor Thornton
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:CA+SdyS2RepCPx0ocMk_Bds0JjGFEwVf5QWHHnDpVGacOptVqpQ@mail.gmail.com">
      
      <div dir="ltr">From what I can tell, the Solr <a href="https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer" moz-do-not-send="true">Standard Tokenizer</a> (which I think
        is the one used for most text fields) doesn't exclude the
        apostrophe or use it as a delimiter to split the word (as it
        does with other punctuation marks), so a query for "Governor’s"
        won't match "Governors" and vice versa. I don't know of a
        convenient workaround (without modifying the Solr schema).</div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Mon, Apr 20, 2020 at 4:38
          PM Hoffner, Bailey E. <<a href="mailto:baileys@ou.edu" moz-do-not-send="true">baileys@ou.edu</a>> wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px
          0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
          <div lang="EN-US">
            <div class="gmail-m_510320714496366909WordSection1">
              <p class="MsoNormal">Hello All,</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">One of our catalogers noticed an
                issue with search functionality and normalization (see
                below). Has anyone dealt with this issue before, or know
                of a workaround?</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">Thanks!</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">-Bailey</p>
              <p class="MsoNormal"> </p>
              <div>
                <p class="MsoNormal"><span style="font-family:"Calibri
                    Light",sans-serif">Bailey Hoffner, MLIS</span></p>
                <p class="MsoNormal"><span style="font-family:"Calibri
                    Light",sans-serif">Metadata and Collections
                    Management Archivist</span></p>
                <p class="MsoNormal"><span style="font-family:"Calibri
                    Light",sans-serif">University of Oklahoma
                    Libraries</span></p>
                <p class="MsoNormal"><span style="font-family:"Calibri
                    Light",sans-serif">405-325-1566</span></p>
              </div>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal"> </p>
              <div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt
                solid rgb(181,196,223);padding:3pt 0in 0in">
                <p class="MsoNormal"><b><span style="font-size:12pt;color:black">From: </span></b><span style="font-size:12pt;color:black">"Steele, Thomas
                    D." <<a href="mailto:Thomas.D.Steele-1@ou.edu" target="_blank" moz-do-not-send="true">Thomas.D.Steele-1@ou.edu</a>><br>
                    <b>Date: </b>Monday, April 20, 2020 at 3:26 PM<br>
                    <b>To: </b>"Hoffner, Bailey E." <<a href="mailto:baileys@ou.edu" target="_blank" moz-do-not-send="true">baileys@ou.edu</a>><br>
                    <b>Subject: </b>normalization in ArchiveSpace</span></p>
              </div>
              <div>
                <p class="MsoNormal"> </p>
              </div>
              <p class="MsoNormal">Searching for a term such as
                “Governors’” yields no hits if you spell it as
                “Governor’s”.  both terms should normalize to
                “Governors”, but it’s possible the latter is normalizing
                to “Governor s”</p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal">Tom Steele</p>
              <p class="MsoNormal">Science and Technology Cataloger</p>
              <p class="MsoNormal">University of Oklahoma Libraries</p>
              <p class="MsoNormal">Norman, OK   73019</p>
              <p class="MsoNormal">(405) 325-4082</p>
              <p class="MsoNormal"><a href="mailto:Thomas.D.Steele-1@ou.edu" target="_blank" moz-do-not-send="true"><span style="color:blue">Thomas.D.Steele-1@ou.edu</span></a></p>
              <p class="MsoNormal"> </p>
              <p class="MsoNormal"><i><span style="font-size:8pt">"Books
                    constitute capital. A library book lasts as long as
                    a house, for hundreds of years. It is not, then, an
                    article of mere consumption but fairly of capital,
                    and often in the case of professional men, setting
                    out in life, it is their only capital</span></i>.<i><span style="font-size:8pt">" -- Thomas Jefferson</span></i></p>
              <p class="MsoNormal"> </p>
            </div>
          </div>
          _______________________________________________<br>
          Archivesspace_Users_Group mailing list<br>
          <a href="mailto:Archivesspace_Users_Group@lyralists.lyrasis.org" target="_blank" moz-do-not-send="true">Archivesspace_Users_Group@lyralists.lyrasis.org</a><br>
          <a href="http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group" rel="noreferrer" target="_blank" moz-do-not-send="true">http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group</a><br>
        </blockquote>
      </div>
      <br clear="all">
      <div><br>
      </div>
      -- <br>
      <div dir="ltr" class="gmail_signature">
        <div dir="ltr">
          <div>
            <div dir="ltr">
              <div dir="ltr">
                <div dir="ltr"><font style="background-color:rgb(255,255,255)" size="2" color="#666666">Trevor Thornton</font>
                  <div><font style="background-color:rgb(255,255,255)" size="2" color="#666666">Applications Developer,
                      Digital Library Initiatives</font></div>
                  <div><font style="background-color:rgb(255,255,255)" size="2" color="#666666">North Carolina State
                      University Libraries</font></div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
Archivesspace_Users_Group mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Archivesspace_Users_Group@lyralists.lyrasis.org">Archivesspace_Users_Group@lyralists.lyrasis.org</a>
<a class="moz-txt-link-freetext" href="http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group">http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group</a>
</pre>
    </blockquote>
  </body>
</html>