[Archivesspace_Users_Group] Auto stemming search results on AS 2.5

Khuong Vu kvu at csusm.edu
Mon Oct 15 12:36:32 EDT 2018


Hi Mark,

I am trying to help Aditi with auto stemming search, and following your suggestion.  I would like to ask for additional help.

After incorporating a non-aggressive stemmer, namely KStemFilterFactory, I am now getting 0 found for “resource”, and 2,888 for “resources”.  Before incorporating the stemmer, I was getting 0 and 231 respectively.

I was doing the following steps in trying to incorporate KStemFilterFactory, and I am wondering if I have missed anything:

1.       Download source code for ASpace 2.5

a.       Add  KStemFilterFactory in solr/schema.xml as another analyzer (as shown in your link)

2.       Use the build script provided in the source code to:

a.       Build a deployment package

b.      Deploy the deployment package

c.       Launch the application successfully

3.       Run the application against MySQL (instead of the test database)

4.       Use production data

a.       Dump mysql data from production and load it into the development

b.      Copy archivesspace/data from production into development

5.       Rebuild solr index

a.       http://localhost:8090/#/~cores/collection1/update?stream.body=<delete><query>*:*</query></delete>

b.      http://localhost:8090/#/~cores/collection1/update?stream.body=<commit/>

c.       Restart the application and test the search functionalities

Thanks for your help.
-Khuong



From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Monday, October 08, 2018 12:08 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Auto stemming search results on AS 2.5

Hi, Aditi.

So, I don’t know how best to approach this, but I can say how we’ve approached this at Yale for the time being.  We’ve updated our solr/schema.xml file to use the Krovetz stemmer, which is a non-aggressive stemmer that works well for the English language since it’s primarily dictionary based.  Here’s an example of how we changed that:  https://github.com/fordmadox/archivesspace/blob/2.4.1.yale.hm.mm/solr/schema.xml#L338-L357 (the stemmer is used both for the index, at line 347, and at query time, at line 355).

Please note that this tactic affects both the staff and public interface.  Well, I should say that it primarily affects the staff interface.  One wrinkle that we’ve discovered is that in the typeahead feature in the staff interface, the query is handled a bit differently there and it’s not stemmed (I’ve been meaning to look into seeing what would need to change here but I still haven’t done that yet).  So, if you did a typeahead for something like “scrapbooks” in our staff interface when trying to link to a heading, when you type in that last “s” then that doesn’t work.  The problem is that the typeahead query is not being stemmed, whereas it’s doing a search on an index that has been stemmed.  Our workaround is just to type “scrapbook” to find and link to “scrapbooks”.  Not ideal, but it beats expecting that users or staff would search for “invoice” OR “invoices” to get a set of results that they’d actually expect.

I’m definitely a proponent that ArchivesSpace (or any discovery service) should include some sort of stemming out of the box, but right now that’s not the case.

Anyhow, I’m curious to hear what sort of approach you take, so please us know.

Mark



From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Aditi Worcester
Sent: Monday, 08 October, 2018 12:11 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] Auto stemming search results on AS 2.5

Hello,
We’re trying to auto stem our search results on AS 2.5 and don’t quite know how!
For instance, a search for “invoice” on the PUI returns “no results”. A search for “invoices” returns 231 results.
Any suggestions on how best to approach this?
Thank you in advance for your time and help.
Aditi
-----------
Aditi Worcester
Processing Archivist, Kellogg Library
California State University San Marcos
760-750-8359 | aworcester at csusm.edu<mailto:aworcester at csusm.edu>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20181015/2a50702d/attachment.html>


More information about the Archivesspace_Users_Group mailing list