<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p style="margin-top:0;margin-bottom:0">Hi Christine-</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">I found its in both the staff and public interfaces, though the public side does seem to be more problematic - since it seems to be even greedier about how much its adding to the index fields. I think we got "lucky" because
 of the whoel "webster" thing.</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">I'd be happy to chat with Laney and anyone else about what I've been poking at! </p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">Best,</p>
<p style="margin-top:0;margin-bottom:0">Joshua</p>
<p style="margin-top:0;margin-bottom:0"><br>
</p>
<p style="margin-top:0;margin-bottom:0">PS. For the record - and in case anyone else is doing something similar - one of the issues I ran into initially had to do with adding data from ancestors. I was initially adding the resource as a fully resolved attribute
 to an AO using the "add_attribute_to_resolve" method and then fiddling with that data to get what I needed for faceting and display. But I had forgotten about the fullrecord field and all of the resource data was being pushed into the AO fullrecord as well.
 I've switched to fetching each resource, extracting the data from that, and pushing just what I need into the AO record. It adds a bunch of overhead to the indexer process, so I'd be happy to hear if anyone has a better idea!<br>
</p>
<br>
<br>
<div style="color: rgb(0, 0, 0);">
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> archivesspace_users_group-bounces@lyralists.lyrasis.org <archivesspace_users_group-bounces@lyralists.lyrasis.org> on behalf of Christine Di
 Bella <christine.dibella@lyrasis.org><br>
<b>Sent:</b> Wednesday, June 27, 2018 10:36 AM<br>
<b>To:</b> Archivesspace Users Group<br>
<b>Subject:</b> Re: [Archivesspace_Users_Group] Indexing repository details in all records skews results set</font>
<div> </div>
</div>
<meta content="text/html; charset=Windows-1252">
<meta name="x_Generator" content="Microsoft Word 15 (filtered medium)">
<div link="#0563C1" vlink="#954F72" lang="EN-US">
<div class="x_WordSection1">
<p class="x_MsoNormal">Hi Joshua,</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">Thanks for posting about this. Just to clarify, are you referring to search behavior in the staff interface or the public interface (or both)?</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">If it’s the public interface, you’ve hit the nail on the head – the full record field is a big issue there. We’ve been working on specific improvements to search behavior that we believe will address situations like this. We’d love to
 have you test what we’ve been doing and can point you to it, if you’d like.</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">(If it’s the staff interface, full record exhibits the same behavior, but it seems to be an issue for fewer people.)</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">We’d also love your help with this indexer work if you’re like to talk with Laney and some others who been looking into this. Sounds like your perspective and investigations could be really helpful to everyone, especially while we’re
 all knee deep in it!</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">Christine</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal"><span style="font-family:"Arial",sans-serif; color:#1F497D">Christine Di Bella</span></p>
<p class="x_MsoNormal"><span style="font-family:"Arial",sans-serif; color:#1F497D">ArchivesSpace Program Manager</span></p>
<p class="x_MsoNormal"><span style="font-family:"Arial",sans-serif; color:#1F497D"><a href="mailto:christine.dibella@lyrasis.org" id="LPlnk324448" class="OWAAutoLink" previewremoved="true"><span style="color:#0563C1">christine.dibella@lyrasis.org</span></a></span></p>
<p class="x_MsoNormal"><span style="font-family:"Arial",sans-serif; color:#1F497D">800.999.8558 x2905</span></p>
<p class="x_MsoNormal"><span style="font-family:"Arial",sans-serif; color:#1F497D">678-235-2905</span></p>
<p class="x_MsoNormal"><span style="font-family:"Arial",sans-serif; color:#1F497D">cdibella13 (Skype)</span><span style="color:#1F497D"></span></p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal"> </p>
<div>
<div style="border:none; border-top:solid #E1E1E1 1.0pt; padding:3.0pt 0in 0in 0in">
<p class="x_MsoNormal"><b>From:</b> archivesspace_users_group-bounces@lyralists.lyrasis.org <archivesspace_users_group-bounces@lyralists.lyrasis.org>
<b>On Behalf Of </b>Joshua D. Shaw<br>
<b>Sent:</b> Wednesday, June 27, 2018 10:31 AM<br>
<b>To:</b> Archivesspace Users Group <archivesspace_users_group@lyralists.lyrasis.org><br>
<b>Subject:</b> Re: [Archivesspace_Users_Group] Indexing repository details in all records skews results set</p>
</div>
</div>
<p class="x_MsoNormal"> </p>
<div id="x_divtagdefaultwrapper">
<p><span style="font-size:12.0pt; color:black">I did a little more digging and to answer my own question, the "fullrecord" field holds everything (well almost) in the SOLR doc. I think that the steps to build this field, specifically the "extract_string_values"
 method in IndexerCommon is probably a bit greedy and probably should skip the repository in addition to the update times, etc. I'm testing that locally.</span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black">My own issue was also complicated by some custom indexer stuff I'm doing that was initially adding the resource as a fully resolved attribute to the AO docs (I'm doing it differently now).....which doubled the
 fullrecord issue and added its own headaches for searching relevancy.</span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black">Joshua</span></p>
<p class="x_MsoNormal" style="margin-bottom:12.0pt"><span style="font-size:12.0pt; color:black"> </span></p>
<div>
<div class="x_MsoNormal" style="text-align:center" align="center"><span style="font-size:12.0pt; color:black">
<hr width="98%" size="3" align="center">
</span></div>
<div id="x_divRplyFwdMsg">
<p class="x_MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black">
<a href="mailto:archivesspace_users_group-bounces@lyralists.lyrasis.org" id="LPlnk18206" class="OWAAutoLink" previewremoved="true">
archivesspace_users_group-bounces@lyralists.lyrasis.org</a> <<a href="mailto:archivesspace_users_group-bounces@lyralists.lyrasis.org" id="LPlnk308594" class="OWAAutoLink" previewremoved="true">archivesspace_users_group-bounces@lyralists.lyrasis.org</a>> on
 behalf of Joshua D. Shaw <<a href="mailto:Joshua.D.Shaw@dartmouth.edu" id="LPlnk754311" class="OWAAutoLink" previewremoved="true">Joshua.D.Shaw@dartmouth.edu</a>><br>
<b>Sent:</b> Tuesday, June 26, 2018 5:25 PM<br>
<b>To:</b> Archivesspace Users Group<br>
<b>Subject:</b> [Archivesspace_Users_Group] Indexing repository details in all records skews results set</span><span style="font-size:12.0pt; color:black">
</span></p>
<div>
<p class="x_MsoNormal"><span style="font-size:12.0pt; color:black"> </span></p>
</div>
</div>
<div>
<div id="x_x_divtagdefaultwrapper">
<p><span style="font-size:12.0pt; color:black">Hi All-</span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black">I think this has been the behavior of AS from the beginning, but during some recent testing, I finally realized that AS is indexing the repository details with every record in the repository. Since part of our
 address is "6065 Webster Hall" and we have a *lot* of Daniel Webster related material (he's a Dartmouth alum), searching for "webster" is a bad thing since every record in the repo is listed. In a vanilla install, you can see the repository details in the
 json package in the results (result['json']), so that sort of made sense....</span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black">I've done some cooking of the indexer to remove the resolved repository details (result['json']['repository']['_resolved'] (and fiddle some other things), but even though the json representation of the search results
 contains no instance of the search string, I *still* get results based on the repository details.</span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black">Example:</span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black">Repository Name is "rauner" and the long name is "Rauner Special Collections Library"</span></p>
<p><span style="font-size:12.0pt; color:black">Search: "rauner"</span></p>
<p><span style="font-size:12.0pt; color:black">Example results in json for a top container and an archival object below. Note that these *do not* contain the string "rauner"</span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black">I must be missing something in how the indexer is actually storing and searching data. I'd love to know if someone has a method to remove the repository details (and anything else global) from the results to prevent
 this sort of thing and to cut down on erroneous results.</span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black">Thanks!</span></p>
<p><span style="font-size:12.0pt; color:black">Joshua</span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black">TC:</span></p>
<pre><span style="color:black">{</span></pre>
<pre><span style="color:black">        "id": "/repositories/2/top_containers/53",</span></pre>
<pre><span style="color:black">        "uri": "/repositories/2/top_containers/53",</span></pre>
<pre><span style="color:black">        "title": "MS-1371b, Box 53",</span></pre>
<pre><span style="color:black">        "primary_type": "top_container",</span></pre>
<pre><span style="color:black">        "types": [</span></pre>
<pre><span style="color:black">          "top_container"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "json": "{\"lock_version\":38,\"indicator\":\"53\",\"created_by\":\"admin\",\"last_modified_by\":\"admin\",\"create_time\":\"2018-06-26T20:28:33Z\",\"system_mtime\":\"2018-06-26T21:11:11Z\",\"user_mtime\":\"2018-06-26T20:28:33Z\",\"type\":\"box\",\"jsonmodel_type\":\"top_container\",\"active_restrictions\":[],\"container_locations\":[],\"series\":[],\"collection\":[{\"ref\":\"/repositories/2/resources/1\",\"identifier\":\"MS-1371b\",\"display_string\":\"Mario Puzo papers\"}],\"uri\":\"/repositories/2/top_containers/53\",\"repository\":{\"ref\":\"/repositories/2\",\"_resolved\":\"\"},\"restricted\":false,\"is_linked_to_published_record\":false,\"display_string\":\"Box 53\",\"long_display_string\":\"MS-1371b, Box 53\"}",</span></pre>
<pre><span style="color:black">        "suppressed": false,</span></pre>
<pre><span style="color:black">        "publish": false,</span></pre>
<pre><span style="color:black">        "system_generated": false,</span></pre>
<pre><span style="color:black">        "repository": "/repositories/2",</span></pre>
<pre><span style="color:black">        "type_enum_s": [</span></pre>
<pre><span style="color:black">          "box"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "created_by": "admin",</span></pre>
<pre><span style="color:black">        "last_modified_by": "admin",</span></pre>
<pre><span style="color:black">        "user_mtime": "2018-06-26T20:28:33Z",</span></pre>
<pre><span style="color:black">        "system_mtime": "2018-06-26T21:11:11Z",</span></pre>
<pre><span style="color:black">        "create_time": "2018-06-26T20:28:33Z",</span></pre>
<pre><span style="color:black">        "display_string": "Box 53",</span></pre>
<pre><span style="color:black">        "collection_uri_u_sstr": [</span></pre>
<pre><span style="color:black">          "/repositories/2/resources/1"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "collection_display_string_u_sstr": [</span></pre>
<pre><span style="color:black">          "Mario Puzo papers"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "collection_identifier_stored_u_sstr": [</span></pre>
<pre><span style="color:black">          "MS-1371b"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "collection_identifier_u_stext": [</span></pre>
<pre><span style="color:black">          "MS-1371b",</span></pre>
<pre><span style="color:black">          "MS 1371b",</span></pre>
<pre><span style="color:black">          "MS1371b",</span></pre>
<pre><span style="color:black">          "MS- 1371 b"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "exported_u_sbool": [</span></pre>
<pre><span style="color:black">          false</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "empty_u_sbool": [</span></pre>
<pre><span style="color:black">          false</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "indicator_u_stext": [</span></pre>
<pre><span style="color:black">          "53"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "jsonmodel_type": "top_container"</span></pre>
<pre><span style="color:black">      }</span></pre>
<p><span style="font-size:12.0pt; color:black">AO:</span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<pre><span style="color:black">{</span></pre>
<pre><span style="color:black">        "id": "/repositories/2/archival_objects/3",</span></pre>
<pre><span style="color:black">        "uri": "/repositories/2/archival_objects/3",</span></pre>
<pre><span style="color:black">        "title": "<emph render=\"italic\">The Fortunate Pilgrim</emph>",</span></pre>
<pre><span style="color:black">        "primary_type": "archival_object",</span></pre>
<pre><span style="color:black">        "types": [</span></pre>
<pre><span style="color:black">          "archival_object"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "json": "{\"lock_version\":0,\"position\":2,\"publish\":true,\"ref_id\":\"a97bf46cbc2cd85e9789c76098a3ee1b\",\"title\":\"<emph render=\\\"italic\\\">The Fortunate Pilgrim</emph>\",\"display_string\":\"<emph render=\\\"italic\\\">The Fortunate Pilgrim</emph>\",\"restrictions_apply\":false,\"created_by\":\"admin\",\"last_modified_by\":\"admin\",\"create_time\":\"2018-06-26T20:28:33Z\",\"system_mtime\":\"2018-06-26T21:11:11Z\",\"user_mtime\":\"2018-06-26T20:28:33Z\",\"suppressed\":false,\"level\":\"series\",\"jsonmodel_type\":\"archival_object\",\"external_ids\":[],\"subjects\":[],\"linked_events\":[],\"extents\":[],\"dates\":[],\"external_documents\":[],\"rights_statements\":[],\"linked_agents\":[],\"onbase_documents\":[],\"ancestors\":[{\"ref\":\"/repositories/2/resources/1\",\"level\":\"collection\"}],\"instances\":[],\"notes\":[],\"uri\":\"/repositories/2/archival_objects/3\",\"repository\":{\"ref\":\"/repositories/2\",\"_resolved\":\"\"},\"resource\":{\"ref\":\"/repositories/2/resources/1\"},\"has_unpublished_ancestor\":false,\"resource_identifier_u_sstr\":\"MS-1371b\",\"resource_type_u_sstr\":null,\"resource_title\":\"Mario Puzo papers\"}",</span></pre>
<pre><span style="color:black">        "suppressed": false,</span></pre>
<pre><span style="color:black">        "publish": false,</span></pre>
<pre><span style="color:black">        "system_generated": false,</span></pre>
<pre><span style="color:black">        "repository": "/repositories/2",</span></pre>
<pre><span style="color:black">        "level_enum_s": [</span></pre>
<pre><span style="color:black">          "series",</span></pre>
<pre><span style="color:black">          "collection"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "resource": "/repositories/2/resources/1",</span></pre>
<pre><span style="color:black">        "ref_id": "a97bf46cbc2cd85e9789c76098a3ee1b",</span></pre>
<pre><span style="color:black">        "created_by": "admin",</span></pre>
<pre><span style="color:black">        "last_modified_by": "admin",</span></pre>
<pre><span style="color:black">        "user_mtime": "2018-06-26T20:28:33Z",</span></pre>
<pre><span style="color:black">        "system_mtime": "2018-06-26T21:11:11Z",</span></pre>
<pre><span style="color:black">        "create_time": "2018-06-26T20:28:33Z",</span></pre>
<pre><span style="color:black">        "notes": "",</span></pre>
<pre><span style="color:black">        "level": "series",</span></pre>
<pre><span style="color:black">        "ancestors": [</span></pre>
<pre><span style="color:black">          "/repositories/2/resources/1"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "total_restrictions_u_sstr": [</span></pre>
<pre><span style="color:black">          "false"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "resource_identifier_u_sstr": [</span></pre>
<pre><span style="color:black">          "MS-1371b"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "resource_title_u_sstr": [</span></pre>
<pre><span style="color:black">          "Mario Puzo papers"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "resource_identifier_w_title_u_sstr": [</span></pre>
<pre><span style="color:black">          "MS-1371b: Mario Puzo papers"</span></pre>
<pre><span style="color:black">        ],</span></pre>
<pre><span style="color:black">        "jsonmodel_type": "archival_object"</span></pre>
<pre><span style="color:black">      }</span></pre>
<p><span style="font-size:12.0pt; color:black"> </span></p>
<p><span style="font-size:12.0pt; color:black"> </span></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>