<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle19
{mso-style-type:personal;
font-family:"Calibri",sans-serif;}
span.EmailStyle20
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle21
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
font-weight:normal;
font-style:normal;
text-decoration:none none;}
span.EmailStyle22
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle23
{mso-style-type:personal;
font-family:"Calibri",sans-serif;}
span.EmailStyle24
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle25
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle26
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle27
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:916670602;
mso-list-type:hybrid;
mso-list-template-ids:-258592906 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D">Hi Mark and Aditi,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">I think the problem is in 4 and 5. After I sent you the email this morning, I tried to enter some sample data after step 3, and the KStemFilterFactory seems to recognize either collection or collections (plural).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Aditi, I can’t reproduce the 231 records found that you reported. I am getting 2,888 in development and production ASpace.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">-Khuong<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><a name="_MailEndCompose"><span style="color:#1F497D"><o:p> </o:p></span></a></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Khuong Vu <br>
<b>Sent:</b> Monday, October 15, 2018 9:37 AM<br>
<b>To:</b> Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org><br>
<b>Subject:</b> RE: [Archivesspace_Users_Group] Auto stemming search results on AS 2.5<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="color:#1F497D">Hi Mark, <o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">I am trying to help Aditi with auto stemming search, and following your suggestion. I would like to ask for additional help.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">After incorporating a non-aggressive stemmer, namely KStemFilterFactory, I am now getting 0 found for “resource”, and 2,888 for “resources”. Before incorporating the stemmer, I was getting 0 and 231 respectively.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">I was doing the following steps in trying to incorporate KStemFilterFactory, and I am wondering if I have missed anything:<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo2"><![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Download source code for ASpace 2.5<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo2">
<![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">a.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Add KStemFilterFactory in solr/schema.xml as another analyzer (as shown in your link)<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo2"><![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Use the build script provided in the source code to:<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo2">
<![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">a.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Build a deployment package<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo2">
<![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">b.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Deploy the deployment package<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo2">
<![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">c.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Launch the application successfully<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo2"><![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">3.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Run the application against MySQL (instead of the test database)<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo2"><![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">4.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Use production data<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo2">
<![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">a.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Dump mysql data from production and load it into the development<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo2">
<![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">b.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Copy archivesspace/data from production into development<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo2"><![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">5.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Rebuild solr index<o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo2">
<![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">a.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><a href="http://localhost:8090/#/~cores/collection1/update?stream.body=<delete><query>*:*</query></delete">http://localhost:8090/#/~cores/collection1/update?stream.body=<delete><query>*:*</query></delete</a><span style="color:#1F497D">><o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo2">
<![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">b.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><a href="http://localhost:8090/#/~cores/collection1/update?stream.body=<commit/">http://localhost:8090/#/~cores/collection1/update?stream.body=<commit/</a><span style="color:#1F497D">><o:p></o:p></span></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo2">
<![if !supportLists]><span style="color:#1F497D"><span style="mso-list:Ignore">c.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="color:#1F497D">Restart the application and test the search functionalities<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Thanks for your help.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">-Khuong<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> <a href="mailto:archivesspace_users_group-bounces@lyralists.lyrasis.org">
archivesspace_users_group-bounces@lyralists.lyrasis.org</a> [<a href="mailto:archivesspace_users_group-bounces@lyralists.lyrasis.org">mailto:archivesspace_users_group-bounces@lyralists.lyrasis.org</a>]
<b>On Behalf Of </b>Custer, Mark<br>
<b>Sent:</b> Monday, October 08, 2018 12:08 PM<br>
<b>To:</b> Archivesspace Users Group <<a href="mailto:archivesspace_users_group@lyralists.lyrasis.org">archivesspace_users_group@lyralists.lyrasis.org</a>><br>
<b>Subject:</b> Re: [Archivesspace_Users_Group] Auto stemming search results on AS 2.5<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hi, Aditi.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">So, I don’t know how best to approach this, but I can say how we’ve approached this at Yale for the time being. We’ve updated our solr/schema.xml file to use the Krovetz stemmer, which is a non-aggressive stemmer that works well for the
English language since it’s primarily dictionary based. Here’s an example of how we changed that:
<a href="https://github.com/fordmadox/archivesspace/blob/2.4.1.yale.hm.mm/solr/schema.xml#L338-L357">
https://github.com/fordmadox/archivesspace/blob/2.4.1.yale.hm.mm/solr/schema.xml#L338-L357</a> (the stemmer is used both for the index, at line 347, and at query time, at line 355).
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Please note that this tactic affects both the staff and public interface. Well, I should say that it primarily affects the staff interface. One wrinkle that we’ve discovered is that in the typeahead feature in the staff interface, the
query is handled a bit differently there and it’s not stemmed (I’ve been meaning to look into seeing what would need to change here but I still haven’t done that yet). So, if you did a typeahead for something like “scrapbooks” in our staff interface when
trying to link to a heading, when you type in that last “s” then that doesn’t work. The problem is that the typeahead query is not being stemmed, whereas it’s doing a search on an index that has been stemmed. Our workaround is just to type “scrapbook” to
find and link to “scrapbooks”. Not ideal, but it beats expecting that users or staff would search for “invoice” OR “invoices” to get a set of results that they’d actually expect.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’m definitely a proponent that ArchivesSpace (or any discovery service) should include some sort of stemming out of the box, but right now that’s not the case.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Anyhow, I’m curious to hear what sort of approach you take, so please us know.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Mark<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> <a href="mailto:archivesspace_users_group-bounces@lyralists.lyrasis.org">
archivesspace_users_group-bounces@lyralists.lyrasis.org</a> [<a href="mailto:archivesspace_users_group-bounces@lyralists.lyrasis.org">mailto:archivesspace_users_group-bounces@lyralists.lyrasis.org</a>]
<b>On Behalf Of </b>Aditi Worcester<br>
<b>Sent:</b> Monday, 08 October, 2018 12:11 PM<br>
<b>To:</b> Archivesspace Users Group <<a href="mailto:archivesspace_users_group@lyralists.lyrasis.org">archivesspace_users_group@lyralists.lyrasis.org</a>><br>
<b>Subject:</b> [Archivesspace_Users_Group] Auto stemming search results on AS 2.5<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hello,<o:p></o:p></p>
<p class="MsoNormal">We’re trying to auto stem our search results on AS 2.5 and don’t quite know how!<o:p></o:p></p>
<p class="MsoNormal">For instance, a search for “invoice” on the PUI returns “no results”. A search for “invoices” returns 231 results.
<o:p></o:p></p>
<p class="MsoNormal">Any suggestions on how best to approach this? <o:p></o:p></p>
<p class="MsoNormal">Thank you in advance for your time and help.<o:p></o:p></p>
<p class="MsoNormal">Aditi<o:p></o:p></p>
<p class="MsoNormal">-----------<o:p></o:p></p>
<p class="MsoNormal">Aditi Worcester<o:p></o:p></p>
<p class="MsoNormal">Processing Archivist, Kellogg Library<o:p></o:p></p>
<p class="MsoNormal">California State University San Marcos<o:p></o:p></p>
<p class="MsoNormal">760-750-8359 | <a href="mailto:aworcester@csusm.edu">aworcester@csusm.edu</a>
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>