[Archivesspace_Users_Group] PUI search results

Custer, Mark mark.custer at yale.edu
Mon Aug 7 13:12:07 EDT 2017


Ah, thanks; I didn't notice that detail earlier this morning.  It must be that the collection name is part of the top container record, then, and that's indexed as part of the component record (and that would be the reason why component 6 shows up, since it has both collection names attached to its top container record).  It does seem strange that those components would be included in the results only because they're linked to a top container record.  I'll update the ticket in JIRA.


From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Maderik, Rachel A <maderikra at vmi.edu>
Sent: Monday, August 7, 2017 12:32:27 PM
To: 'Archivesspace Users Group'
Subject: Re: [Archivesspace_Users_Group] PUI search results

Thanks for the information Mark! With regards to sandbox example,  the Samuel D. Rockenbach Papers actually has four components (http://public.archivesspace.org/repositories/2/resources/1319<https://urldefense.proofpoint.com/v2/url?u=http-3A__public.archivesspace.org_repositories_2_resources_1319&d=DwMFAg&c=cjytLXgP8ixuoHflwc-poQ&r=7Ez68qVcrmRD6nn1FqwoHBDEOxeRUCPm3xGvnFT0zjU&m=52aGDsU8mTtVLzjcEvWnSc6y9dYM2JCHY7uwQpP83UI&s=_ajO8dP1zhswRaNSgGOGiRP1fu0edugoLZTyan3-r9M&e=>), but “Component Four” is not being retrieved with that search (only Components 1-3 are showing up). Component Four is the only one *without* a top container, so we assumed that was the reason that components 1-3 are showing up (they do have top containers). But yes, we would appreciate the ability to exclude all these children unless they actually contain the user’s search terms (basically we’d like it to work the way it used to).

From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Monday, August 07, 2017 11:44 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] PUI search results


>From your example that includes 5 search results, I can say that the first 4 results with the out-of-the-box PUI are intentional, since those first 4 results occur within the Rockenback papers.  That last search result for component 6, though, is not intentional!  I’m afraid that we didn’t have any instances of a shared container in our test corpus, since this behavior hadn’t been reported until now (and thanks so much for providing a clear example!!!).  I’ve just logged this issue in JIRA: https://archivesspace.atlassian.net/browse/AR-1863<https://urldefense.proofpoint.com/v2/url?u=https-3A__archivesspace.atlassian.net_browse_AR-2D1863&d=DwMFAg&c=cjytLXgP8ixuoHflwc-poQ&r=7Ez68qVcrmRD6nn1FqwoHBDEOxeRUCPm3xGvnFT0zjU&m=52aGDsU8mTtVLzjcEvWnSc6y9dYM2JCHY7uwQpP83UI&s=K9jPS7uq8M93CIa28qo3KtL93X1Pi51FPZyKmxR44Ig&e=>

My hope was to make the Solr index as lightweight as possible for the PUI, but oftentimes it’s difficult to figure out what should be excluded.  In your case, it sounds like you’d also prefer that the collection name to be excluded in search results 2 – 4.  I’m not sure what the Solr records for those components look like right now, so others with more technical expertise will have to weigh in on how to change that.


p.s. another thing that we wanted to get into this version of the PUI, but we had to postpone to a future version, is the ability to show your keywords in context.  Once that feature is in development, I think that the focus on the information retrieval aspect of the new PUI will really help iron out exactly what the solr index should look like, since that will enable a lot more eyes to review the search results and provide feedback (e.g. if the highlighted keyword doesn’t make sense at all, as in this case, it will be easy to show where it’s coming from, and ask for that not to be included in the PUI index).  It’s my hope that feature is  the next feature implemented in the PUI for this very reason.

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Maderik, Rachel A
Sent: Monday, 07 August, 2017 10:15 AM
To: 'archivesspace_users_group at lyralists.lyrasis.org' <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] PUI search results

We’re running v2.1 in our sandbox, and have major concerns with the PUI indexer bringing back lots of irrelevant results on our public site. It seems that it’s trying to bring back “related” records that don’t actually contain the user’s search terms. For example, this search done in 2.0 brings back 2 results (http://archivesspace.vmi.edu/search?utf8=%E2%9C%93&q=rockenbach+AND+papers<https://urldefense.proofpoint.com/v2/url?u=http-3A__archivesspace.vmi.edu_search-3Futf8-3D-25E2-259C-2593-26q-3Drockenbach-2BAND-2Bpapers&d=DwMFAg&c=cjytLXgP8ixuoHflwc-poQ&r=7Ez68qVcrmRD6nn1FqwoHBDEOxeRUCPm3xGvnFT0zjU&m=qCWKmfyBiXbX8PzgHDPphzvTO00lImReBAvE4DWBEFQ&s=le6KHiSF8h1S3pZ3iHwL-iHD8qnKb62bNZ7cg80l7t4&e=>), while the exact same search in 2.1 retrieves 139 results (searchresults1.jpg), most of which are components that don’t actually contain those search terms (plus, the relevant resource record is on the second to last page of the results (searchresults2.jpg)).

I created a live example in the public sandbox with the search “Rockenbach AND papers”, which retrieves a relevant collection plus 4 components, none of which contains those search terms: http://public.archivesspace.org/search?utf8=%E2%9C%93&op%5B%5D=&q%5B%5D=rockenbach+AND+papers&limit=&field%5B%5D=&from_year%5B%5D=&to_year%5B%5D=&commit=Search<https://urldefense.proofpoint.com/v2/url?u=http-3A__public.archivesspace.org_search-3Futf8-3D-25E2-259C-2593-26op-255B-255D-3D-26q-255B-255D-3Drockenbach-2BAND-2Bpapers-26limit-3D-26field-255B-255D-3D-26from-5Fyear-255B-255D-3D-26to-5Fyear-255B-255D-3D-26commit-3DSearch&d=DwMFAg&c=cjytLXgP8ixuoHflwc-poQ&r=7Ez68qVcrmRD6nn1FqwoHBDEOxeRUCPm3xGvnFT0zjU&m=qCWKmfyBiXbX8PzgHDPphzvTO00lImReBAvE4DWBEFQ&s=TuZHG2Efun_LaQ11G1TTFKGmlvC6gLKnL52R7N6x3G0&e=>. One of the results, “Component Six”, is not even part of the “Samuel D. Rockenbach Papers” collection. We’re guessing it’s in the results because it shares a top container with the other results (Box 1).

Our question: is this intentional behavior? If so, is there a way to turn it off? (I.e. revert to the old indexing rules, where the search retrieves results based on keywords only and nothing else). We have many items that share a physical container but are unrelated to each other, so this is unhelpful and confusing for our users.

Rachel Maderik
Systems and Technology Librarian
501D Preston Library
Virginia Military Institute
Lexington, VA 24450

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20170807/dee119d4/attachment.html>

More information about the Archivesspace_Users_Group mailing list