[Archivesspace_Users_Group] Indexing repository details in all records skews results set

Christine Di Bella christine.dibella at lyrasis.org
Wed Jun 27 10:36:33 EDT 2018


Hi Joshua,

Thanks for posting about this. Just to clarify, are you referring to search behavior in the staff interface or the public interface (or both)?

If it's the public interface, you've hit the nail on the head - the full record field is a big issue there. We've been working on specific improvements to search behavior that we believe will address situations like this. We'd love to have you test what we've been doing and can point you to it, if you'd like.

(If it's the staff interface, full record exhibits the same behavior, but it seems to be an issue for fewer people.)

We'd also love your help with this indexer work if you're like to talk with Laney and some others who been looking into this. Sounds like your perspective and investigations could be really helpful to everyone, especially while we're all knee deep in it!

Christine

Christine Di Bella
ArchivesSpace Program Manager
christine.dibella at lyrasis.org<mailto:christine.dibella at lyrasis.org>
800.999.8558 x2905
678-235-2905
cdibella13 (Skype)


From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> On Behalf Of Joshua D. Shaw
Sent: Wednesday, June 27, 2018 10:31 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Indexing repository details in all records skews results set


I did a little more digging and to answer my own question, the "fullrecord" field holds everything (well almost) in the SOLR doc. I think that the steps to build this field, specifically the "extract_string_values" method in IndexerCommon is probably a bit greedy and probably should skip the repository in addition to the update times, etc. I'm testing that locally.



My own issue was also complicated by some custom indexer stuff I'm doing that was initially adding the resource as a fully resolved attribute to the AO docs (I'm doing it differently now).....which doubled the fullrecord issue and added its own headaches for searching relevancy.



Joshua

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Joshua D. Shaw <Joshua.D.Shaw at dartmouth.edu<mailto:Joshua.D.Shaw at dartmouth.edu>>
Sent: Tuesday, June 26, 2018 5:25 PM
To: Archivesspace Users Group
Subject: [Archivesspace_Users_Group] Indexing repository details in all records skews results set


Hi All-



I think this has been the behavior of AS from the beginning, but during some recent testing, I finally realized that AS is indexing the repository details with every record in the repository. Since part of our address is "6065 Webster Hall" and we have a *lot* of Daniel Webster related material (he's a Dartmouth alum), searching for "webster" is a bad thing since every record in the repo is listed. In a vanilla install, you can see the repository details in the json package in the results (result['json']), so that sort of made sense....



I've done some cooking of the indexer to remove the resolved repository details (result['json']['repository']['_resolved'] (and fiddle some other things), but even though the json representation of the search results contains no instance of the search string, I *still* get results based on the repository details.



Example:



Repository Name is "rauner" and the long name is "Rauner Special Collections Library"

Search: "rauner"

Example results in json for a top container and an archival object below. Note that these *do not* contain the string "rauner"



I must be missing something in how the indexer is actually storing and searching data. I'd love to know if someone has a method to remove the repository details (and anything else global) from the results to prevent this sort of thing and to cut down on erroneous results.



Thanks!

Joshua





TC:

{

        "id": "/repositories/2/top_containers/53",

        "uri": "/repositories/2/top_containers/53",

        "title": "MS-1371b, Box 53",

        "primary_type": "top_container",

        "types": [

          "top_container"

        ],

        "json": "{\"lock_version\":38,\"indicator\":\"53\",\"created_by\":\"admin\",\"last_modified_by\":\"admin\",\"create_time\":\"2018-06-26T20:28:33Z\",\"system_mtime\":\"2018-06-26T21:11:11Z\",\"user_mtime\":\"2018-06-26T20:28:33Z\",\"type\":\"box\",\"jsonmodel_type\":\"top_container\",\"active_restrictions\":[],\"container_locations\":[],\"series\":[],\"collection\":[{\"ref\":\"/repositories/2/resources/1\",\"identifier\":\"MS-1371b\",\"display_string\":\"Mario Puzo papers\"}],\"uri\":\"/repositories/2/top_containers/53\",\"repository\":{\"ref\":\"/repositories/2\",\"_resolved\":\"\"},\"restricted\":false,\"is_linked_to_published_record\":false,\"display_string\":\"Box 53\",\"long_display_string\":\"MS-1371b, Box 53\"}",

        "suppressed": false,

        "publish": false,

        "system_generated": false,

        "repository": "/repositories/2",

        "type_enum_s": [

          "box"

        ],

        "created_by": "admin",

        "last_modified_by": "admin",

        "user_mtime": "2018-06-26T20:28:33Z",

        "system_mtime": "2018-06-26T21:11:11Z",

        "create_time": "2018-06-26T20:28:33Z",

        "display_string": "Box 53",

        "collection_uri_u_sstr": [

          "/repositories/2/resources/1"

        ],

        "collection_display_string_u_sstr": [

          "Mario Puzo papers"

        ],

        "collection_identifier_stored_u_sstr": [

          "MS-1371b"

        ],

        "collection_identifier_u_stext": [

          "MS-1371b",

          "MS 1371b",

          "MS1371b",

          "MS- 1371 b"

        ],

        "exported_u_sbool": [

          false

        ],

        "empty_u_sbool": [

          false

        ],

        "indicator_u_stext": [

          "53"

        ],

        "jsonmodel_type": "top_container"

      }

AO:



{

        "id": "/repositories/2/archival_objects/3",

        "uri": "/repositories/2/archival_objects/3",

        "title": "<emph render=\"italic\">The Fortunate Pilgrim</emph>",

        "primary_type": "archival_object",

        "types": [

          "archival_object"

        ],

        "json": "{\"lock_version\":0,\"position\":2,\"publish\":true,\"ref_id\":\"a97bf46cbc2cd85e9789c76098a3ee1b\",\"title\":\"<emph render=\\\"italic\\\">The Fortunate Pilgrim</emph>\",\"display_string\":\"<emph render=\\\"italic\\\">The Fortunate Pilgrim</emph>\",\"restrictions_apply\":false,\"created_by\":\"admin\",\"last_modified_by\":\"admin\",\"create_time\":\"2018-06-26T20:28:33Z\",\"system_mtime\":\"2018-06-26T21:11:11Z\",\"user_mtime\":\"2018-06-26T20:28:33Z\",\"suppressed\":false,\"level\":\"series\",\"jsonmodel_type\":\"archival_object\",\"external_ids\":[],\"subjects\":[],\"linked_events\":[],\"extents\":[],\"dates\":[],\"external_documents\":[],\"rights_statements\":[],\"linked_agents\":[],\"onbase_documents\":[],\"ancestors\":[{\"ref\":\"/repositories/2/resources/1\",\"level\":\"collection\"}],\"instances\":[],\"notes\":[],\"uri\":\"/repositories/2/archival_objects/3\",\"repository\":{\"ref\":\"/repositories/2\",\"_resolved\":\"\"},\"resource\":{\"ref\":\"/repositories/2/resources/1\"},\"has_unpublished_ancestor\":false,\"resource_identifier_u_sstr\":\"MS-1371b\",\"resource_type_u_sstr\":null,\"resource_title\":\"Mario Puzo papers\"}",

        "suppressed": false,

        "publish": false,

        "system_generated": false,

        "repository": "/repositories/2",

        "level_enum_s": [

          "series",

          "collection"

        ],

        "resource": "/repositories/2/resources/1",

        "ref_id": "a97bf46cbc2cd85e9789c76098a3ee1b",

        "created_by": "admin",

        "last_modified_by": "admin",

        "user_mtime": "2018-06-26T20:28:33Z",

        "system_mtime": "2018-06-26T21:11:11Z",

        "create_time": "2018-06-26T20:28:33Z",

        "notes": "",

        "level": "series",

        "ancestors": [

          "/repositories/2/resources/1"

        ],

        "total_restrictions_u_sstr": [

          "false"

        ],

        "resource_identifier_u_sstr": [

          "MS-1371b"

        ],

        "resource_title_u_sstr": [

          "Mario Puzo papers"

        ],

        "resource_identifier_w_title_u_sstr": [

          "MS-1371b: Mario Puzo papers"

        ],

        "jsonmodel_type": "archival_object"

      }




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20180627/1f7e0bb3/attachment.html>


More information about the Archivesspace_Users_Group mailing list