[Archivesspace_Users_Group] FW: normalization in ArchivesSpace

Andrew Morrison andrew.morrison at bodleian.ox.ac.uk
Tue Apr 21 05:03:04 EDT 2020


You can see what the default Solr config in ArchivesSpace does with 
these queries in this screenshot of Solr's analysis tool on a 
development system:

https://user-images.githubusercontent.com/33721187/79843129-b9549680-83b1-11ea-8d3a-670f4e84a6de.png

On the left is how it indexes Governors and on the right is how it 
handles a query for Governor's. The first step, marked "ST" in light 
grey, is the Standard Tokenizer. As you can see, it does nothing in this 
case, and passes both unchanged to the next step ("SF", the stop word 
filter, which also does nothing.)

Changing to a different tokenizer could change how apostrophes are 
handled. Or adding a stemmer might do the same and also ensure the same 
results are returned for singular and plural forms of most words. But 
these sort of customizations are language-specific. What works for 
English probably wouldn't work, and might have negative effects, for 
finding materials in Spanish, French or German. This is one of the 
advantages of using an external Solr server 
<https://archivesspace.github.io/archivesspace/user/running-archivesspace-with-external-solr/> 
set up - that you can tailor it for your collections and your users. It 
also means you can run a more up-to-date version of Solr, with more and 
better options 
<https://lucene.apache.org/solr/guide/7_7/filter-descriptions.html> (we 
use Word Delimiter Graph Filter and KStem.)

Andrew.


On 20/04/2020 23:44, Trevor Thornton wrote:
> From what I can tell, the Solr Standard Tokenizer 
> <https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-StandardTokenizer> 
> (which I think is the one used for most text fields) doesn't exclude 
> the apostrophe or use it as a delimiter to split the word (as it does 
> with other punctuation marks), so a query for "Governor’s" won't match 
> "Governors" and vice versa. I don't know of a convenient workaround 
> (without modifying the Solr schema).
>
> On Mon, Apr 20, 2020 at 4:38 PM Hoffner, Bailey E. <baileys at ou.edu 
> <mailto:baileys at ou.edu>> wrote:
>
>     Hello All,
>
>     One of our catalogers noticed an issue with search functionality
>     and normalization (see below). Has anyone dealt with this issue
>     before, or know of a workaround?
>
>     Thanks!
>
>     -Bailey
>
>     Bailey Hoffner, MLIS
>
>     Metadata and Collections Management Archivist
>
>     University of Oklahoma Libraries
>
>     405-325-1566
>
>     *From: *"Steele, Thomas D." <Thomas.D.Steele-1 at ou.edu
>     <mailto:Thomas.D.Steele-1 at ou.edu>>
>     *Date: *Monday, April 20, 2020 at 3:26 PM
>     *To: *"Hoffner, Bailey E." <baileys at ou.edu <mailto:baileys at ou.edu>>
>     *Subject: *normalization in ArchiveSpace
>
>     Searching for a term such as “Governors’” yields no hits if you
>     spell it as “Governor’s”.  both terms should normalize to
>     “Governors”, but it’s possible the latter is normalizing to
>     “Governor s”
>
>     Tom Steele
>
>     Science and Technology Cataloger
>
>     University of Oklahoma Libraries
>
>     Norman, OK   73019
>
>     (405) 325-4082
>
>     Thomas.D.Steele-1 at ou.edu <mailto:Thomas.D.Steele-1 at ou.edu>
>
>     /"Books constitute capital. A library book lasts as long as a
>     house, for hundreds of years. It is not, then, an article of mere
>     consumption but fairly of capital, and often in the case of
>     professional men, setting out in life, it is their only
>     capital/./" -- Thomas Jefferson/
>
>     _______________________________________________
>     Archivesspace_Users_Group mailing list
>     Archivesspace_Users_Group at lyralists.lyrasis.org
>     <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>     http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>
>
> -- 
> Trevor Thornton
> Applications Developer, Digital Library Initiatives
> North Carolina State University Libraries
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20200421/d1d54ffd/attachment.html>


More information about the Archivesspace_Users_Group mailing list