<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.EmailStyle22
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal">Hi Joshua,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks for posting about this. Just to clarify, are you referring to search behavior in the staff interface or the public interface (or both)?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If it’s the public interface, you’ve hit the nail on the head – the full record field is a big issue there. We’ve been working on specific improvements to search behavior that we believe will address situations like this. We’d love to have
you test what we’ve been doing and can point you to it, if you’d like.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">(If it’s the staff interface, full record exhibits the same behavior, but it seems to be an issue for fewer people.)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We’d also love your help with this indexer work if you’re like to talk with Laney and some others who been looking into this. Sounds like your perspective and investigations could be really helpful to everyone, especially while we’re all
knee deep in it!<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Christine<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-family:"Arial",sans-serif;color:#1F497D">Christine Di Bella<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Arial",sans-serif;color:#1F497D">ArchivesSpace Program Manager<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Arial",sans-serif;color:#1F497D"><a href="mailto:christine.dibella@lyrasis.org"><span style="color:#0563C1">christine.dibella@lyrasis.org</span></a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Arial",sans-serif;color:#1F497D">800.999.8558 x2905<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Arial",sans-serif;color:#1F497D">678-235-2905<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Arial",sans-serif;color:#1F497D">cdibella13 (Skype)</span><span style="color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> archivesspace_users_group-bounces@lyralists.lyrasis.org <archivesspace_users_group-bounces@lyralists.lyrasis.org>
<b>On Behalf Of </b>Joshua D. Shaw<br>
<b>Sent:</b> Wednesday, June 27, 2018 10:31 AM<br>
<b>To:</b> Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org><br>
<b>Subject:</b> Re: [Archivesspace_Users_Group] Indexing repository details in all records skews results set<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div id="divtagdefaultwrapper">
<p><span style="font-size:12.0pt;color:black">I did a little more digging and to answer my own question, the "fullrecord" field holds everything (well almost) in the SOLR doc. I think that the steps to build this field, specifically the "extract_string_values"
method in IndexerCommon is probably a bit greedy and probably should skip the repository in addition to the update times, etc. I'm testing that locally.<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black">My own issue was also complicated by some custom indexer stuff I'm doing that was initially adding the resource as a fully resolved attribute to the AO docs (I'm doing it differently now).....which doubled the fullrecord
issue and added its own headaches for searching relevancy.<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black">Joshua<o:p></o:p></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<div>
<div class="MsoNormal" align="center" style="text-align:center"><span style="font-size:12.0pt;color:black">
<hr size="3" width="98%" align="center">
</span></div>
<div id="divRplyFwdMsg">
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black">
<a href="mailto:archivesspace_users_group-bounces@lyralists.lyrasis.org">archivesspace_users_group-bounces@lyralists.lyrasis.org</a> <<a href="mailto:archivesspace_users_group-bounces@lyralists.lyrasis.org">archivesspace_users_group-bounces@lyralists.lyrasis.org</a>>
on behalf of Joshua D. Shaw <<a href="mailto:Joshua.D.Shaw@dartmouth.edu">Joshua.D.Shaw@dartmouth.edu</a>><br>
<b>Sent:</b> Tuesday, June 26, 2018 5:25 PM<br>
<b>To:</b> Archivesspace Users Group<br>
<b>Subject:</b> [Archivesspace_Users_Group] Indexing repository details in all records skews results set</span><span style="font-size:12.0pt;color:black">
<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black"> <o:p></o:p></span></p>
</div>
</div>
<div>
<div id="x_divtagdefaultwrapper">
<p><span style="font-size:12.0pt;color:black">Hi All-<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black">I think this has been the behavior of AS from the beginning, but during some recent testing, I finally realized that AS is indexing the repository details with every record in the repository. Since part of our address
is "6065 Webster Hall" and we have a *lot* of Daniel Webster related material (he's a Dartmouth alum), searching for "webster" is a bad thing since every record in the repo is listed. In a vanilla install, you can see the repository details in the json package
in the results (result['json']), so that sort of made sense....<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black">I've done some cooking of the indexer to remove the resolved repository details (result['json']['repository']['_resolved'] (and fiddle some other things), but even though the json representation of the search results
contains no instance of the search string, I *still* get results based on the repository details.<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black">Example:<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black">Repository Name is "rauner" and the long name is "Rauner Special Collections Library"<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black">Search: "rauner"<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black">Example results in json for a top container and an archival object below. Note that these *do not* contain the string "rauner"<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black">I must be missing something in how the indexer is actually storing and searching data. I'd love to know if someone has a method to remove the repository details (and anything else global) from the results to prevent
this sort of thing and to cut down on erroneous results.<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black">Thanks!<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black">Joshua<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black">TC:<o:p></o:p></span></p>
<pre><span style="color:black">{<o:p></o:p></span></pre>
<pre><span style="color:black"> "id": "/repositories/2/top_containers/53",<o:p></o:p></span></pre>
<pre><span style="color:black"> "uri": "/repositories/2/top_containers/53",<o:p></o:p></span></pre>
<pre><span style="color:black"> "title": "MS-1371b, Box 53",<o:p></o:p></span></pre>
<pre><span style="color:black"> "primary_type": "top_container",<o:p></o:p></span></pre>
<pre><span style="color:black"> "types": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "top_container"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "json": "{\"lock_version\":38,\"indicator\":\"53\",\"created_by\":\"admin\",\"last_modified_by\":\"admin\",\"create_time\":\"2018-06-26T20:28:33Z\",\"system_mtime\":\"2018-06-26T21:11:11Z\",\"user_mtime\":\"2018-06-26T20:28:33Z\",\"type\":\"box\",\"jsonmodel_type\":\"top_container\",\"active_restrictions\":[],\"container_locations\":[],\"series\":[],\"collection\":[{\"ref\":\"/repositories/2/resources/1\",\"identifier\":\"MS-1371b\",\"display_string\":\"Mario Puzo papers\"}],\"uri\":\"/repositories/2/top_containers/53\",\"repository\":{\"ref\":\"/repositories/2\",\"_resolved\":\"\"},\"restricted\":false,\"is_linked_to_published_record\":false,\"display_string\":\"Box 53\",\"long_display_string\":\"MS-1371b, Box 53\"}",<o:p></o:p></span></pre>
<pre><span style="color:black"> "suppressed": false,<o:p></o:p></span></pre>
<pre><span style="color:black"> "publish": false,<o:p></o:p></span></pre>
<pre><span style="color:black"> "system_generated": false,<o:p></o:p></span></pre>
<pre><span style="color:black"> "repository": "/repositories/2",<o:p></o:p></span></pre>
<pre><span style="color:black"> "type_enum_s": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "box"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "created_by": "admin",<o:p></o:p></span></pre>
<pre><span style="color:black"> "last_modified_by": "admin",<o:p></o:p></span></pre>
<pre><span style="color:black"> "user_mtime": "2018-06-26T20:28:33Z",<o:p></o:p></span></pre>
<pre><span style="color:black"> "system_mtime": "2018-06-26T21:11:11Z",<o:p></o:p></span></pre>
<pre><span style="color:black"> "create_time": "2018-06-26T20:28:33Z",<o:p></o:p></span></pre>
<pre><span style="color:black"> "display_string": "Box 53",<o:p></o:p></span></pre>
<pre><span style="color:black"> "collection_uri_u_sstr": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "/repositories/2/resources/1"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "collection_display_string_u_sstr": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "Mario Puzo papers"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "collection_identifier_stored_u_sstr": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "MS-1371b"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "collection_identifier_u_stext": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "MS-1371b",<o:p></o:p></span></pre>
<pre><span style="color:black"> "MS 1371b",<o:p></o:p></span></pre>
<pre><span style="color:black"> "MS1371b",<o:p></o:p></span></pre>
<pre><span style="color:black"> "MS- 1371 b"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "exported_u_sbool": [<o:p></o:p></span></pre>
<pre><span style="color:black"> false<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "empty_u_sbool": [<o:p></o:p></span></pre>
<pre><span style="color:black"> false<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "indicator_u_stext": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "53"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "jsonmodel_type": "top_container"<o:p></o:p></span></pre>
<pre><span style="color:black"> }<o:p></o:p></span></pre>
<p><span style="font-size:12.0pt;color:black">AO:<o:p></o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<pre><span style="color:black">{<o:p></o:p></span></pre>
<pre><span style="color:black"> "id": "/repositories/2/archival_objects/3",<o:p></o:p></span></pre>
<pre><span style="color:black"> "uri": "/repositories/2/archival_objects/3",<o:p></o:p></span></pre>
<pre><span style="color:black"> "title": "<emph render=\"italic\">The Fortunate Pilgrim</emph>",<o:p></o:p></span></pre>
<pre><span style="color:black"> "primary_type": "archival_object",<o:p></o:p></span></pre>
<pre><span style="color:black"> "types": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "archival_object"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "json": "{\"lock_version\":0,\"position\":2,\"publish\":true,\"ref_id\":\"a97bf46cbc2cd85e9789c76098a3ee1b\",\"title\":\"<emph render=\\\"italic\\\">The Fortunate Pilgrim</emph>\",\"display_string\":\"<emph render=\\\"italic\\\">The Fortunate Pilgrim</emph>\",\"restrictions_apply\":false,\"created_by\":\"admin\",\"last_modified_by\":\"admin\",\"create_time\":\"2018-06-26T20:28:33Z\",\"system_mtime\":\"2018-06-26T21:11:11Z\",\"user_mtime\":\"2018-06-26T20:28:33Z\",\"suppressed\":false,\"level\":\"series\",\"jsonmodel_type\":\"archival_object\",\"external_ids\":[],\"subjects\":[],\"linked_events\":[],\"extents\":[],\"dates\":[],\"external_documents\":[],\"rights_statements\":[],\"linked_agents\":[],\"onbase_documents\":[],\"ancestors\":[{\"ref\":\"/repositories/2/resources/1\",\"level\":\"collection\"}],\"instances\":[],\"notes\":[],\"uri\":\"/repositories/2/archival_objects/3\",\"repository\":{\"ref\":\"/repositories/2\",\"_resolved\":\"\"},\"resource\":{\"ref\":\"/repositories/2/resources/1\"},\"has_unpublished_ancestor\":false,\"resource_identifier_u_sstr\":\"MS-1371b\",\"resource_type_u_sstr\":null,\"resource_title\":\"Mario Puzo papers\"}",<o:p></o:p></span></pre>
<pre><span style="color:black"> "suppressed": false,<o:p></o:p></span></pre>
<pre><span style="color:black"> "publish": false,<o:p></o:p></span></pre>
<pre><span style="color:black"> "system_generated": false,<o:p></o:p></span></pre>
<pre><span style="color:black"> "repository": "/repositories/2",<o:p></o:p></span></pre>
<pre><span style="color:black"> "level_enum_s": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "series",<o:p></o:p></span></pre>
<pre><span style="color:black"> "collection"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "resource": "/repositories/2/resources/1",<o:p></o:p></span></pre>
<pre><span style="color:black"> "ref_id": "a97bf46cbc2cd85e9789c76098a3ee1b",<o:p></o:p></span></pre>
<pre><span style="color:black"> "created_by": "admin",<o:p></o:p></span></pre>
<pre><span style="color:black"> "last_modified_by": "admin",<o:p></o:p></span></pre>
<pre><span style="color:black"> "user_mtime": "2018-06-26T20:28:33Z",<o:p></o:p></span></pre>
<pre><span style="color:black"> "system_mtime": "2018-06-26T21:11:11Z",<o:p></o:p></span></pre>
<pre><span style="color:black"> "create_time": "2018-06-26T20:28:33Z",<o:p></o:p></span></pre>
<pre><span style="color:black"> "notes": "",<o:p></o:p></span></pre>
<pre><span style="color:black"> "level": "series",<o:p></o:p></span></pre>
<pre><span style="color:black"> "ancestors": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "/repositories/2/resources/1"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "total_restrictions_u_sstr": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "false"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "resource_identifier_u_sstr": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "MS-1371b"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "resource_title_u_sstr": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "Mario Puzo papers"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "resource_identifier_w_title_u_sstr": [<o:p></o:p></span></pre>
<pre><span style="color:black"> "MS-1371b: Mario Puzo papers"<o:p></o:p></span></pre>
<pre><span style="color:black"> ],<o:p></o:p></span></pre>
<pre><span style="color:black"> "jsonmodel_type": "archival_object"<o:p></o:p></span></pre>
<pre><span style="color:black"> }<o:p></o:p></span></pre>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
<p><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</body>
</html>