[Archivesspace_Users_Group] PUI search functionality documentation

Christine Di Bella christine.dibella at lyrasis.org
Mon Oct 1 09:45:36 EDT 2018


Dear Johanna,

Thanks for your question about this, and apologies for the delay in responding. I've been participating in work travel and conference activity the last two weeks and am just now catching up with listserv posts.

Improving the public interface search was something that we investigated a great deal over the spring and early summer based on feedback from a number of institutions using the PUI. Unfortunately, we determined that making the changes required will necessitate a substantial change to the indexing for the application. We're working to identify and obtain resources in order to do so while maintaining forward progress in other areas of the application.

How the search on the public side currently works is documented only in technical terms. I've distilled what we know down for this purpose, but the explanation is still rather technical. If there are additional questions on the specifics, I'm happy to try to answer them, but this is definitely something that I lean on Laney and others on the developer side for better understanding. (And any mistakes in interpretation in what's below are mine.) Here is some information about how the PUI search currently indexes and weights information in order to display results:


  *   ArchivesSpace has multiple indexers (one each essentially for staff side information, public side information, and a real-time indexer that updates the index as changes are made) but all three put their information into one shared index. There is a field called fullrecord which takes nearly all the fields in ArchivesSpace and makes them a single field for the purposes of keyword search. PUI indexes fullrecord plus more for the collection organization display. The code that creates the staff interface records is the same as what is used by the PUI indexer with some additions for the separate PUI records.

Because there is only one index currently there is only one fullrecord field rather than one for staff and one for public as you might expect. Everything pulling from one index that includes a field for almost everything in ArchivesSpace is one of the reasons why information that is not displayed in the public interface affects public interface results.



  *   Anything that appears in the fields included in fullrecord is included in the index and available to the public and staff sides, though what displays is determined by other settings in the views. (This is why unpublished records rightly don't appear in the PUI though they can affect search results.) On the public side, the most heavily weighted fields are identifier, title, and finding aid title, but the results in record types that are resources and accessions are lifted highest, then agents and subjects.

For more specifics, the values after the ^ show the magnitude of the weighting.
Currently, these are hard-coded in the solrconfig.xml file and the solr model in the backend:
>From solrconfig.xml:

  1.  pf = "four_part_id^50" (pf is for Phrase Fields which boosts the score of documents in cases where all of the terms in the q parameter appear in close proximity)
  2.  qf = "title^25 four_part_id^50 fullrecord" (qf is for Query Fields which specifies the fields in the index on which to perform the query)
  3.  bq = "primary_type:resource^100 primary_type:accession^100 primary_type:subject^50 primary_type:agent_person^50 primary_type:agent_corporate_entity^30 primary_type:agent_family^30" (bq is for Boost Query which specifies a factor by which a term or phrase should be "boosted" in importance when considering a match)
Passed into the solr query from solr model in the backend:

  1.  pf = "four_part_id^4"
  2.  qf = "four_part_id^3 title^2 finding_aid_filing_title^2 fullrecord"



  *   There were some changes made in some v2.3.x and v2.4.x releases of ArchivesSpace that made some parameters, such as whether the default operator is OR or AND, configurable, but they only work on the staff side because of how the PUI works. Changing the operator does not work on the public side because the code for the public side overwrites some areas when the final solr query gets built before it is sent to solr for retrieval. Also, there are some subqueries that are created in the PUI search that have AND and OR hardcoded so the final query contains a combination of ORs and ANDs. That is not configurable at all. Yale (and possibly Harvard as well, though Johanna would have a better sense of this) has done some work to modify search for its own purposes but I believe their changes have been scaled back significantly as they saw what we saw in investigating this - as currently set up, making a change in one area negatively impacts search in another area, including the staff interface.

We believe the only possibility for making substantial, lasting change to the PUI search is to refactor how search happens. This is a major undertaking, and it's very important to us that doing so not negatively impact how people use the PUI or the staff interface now or stop all progress on development in general for a significant period of time. Taking the time to identify ways to do this, determining the best path forward, and finding resources to pursue it is the reason we have not progressed with PUI search the way we were hoping earlier in the year.

We are incredibly fortunate that ArchivesSpace has such an active and engaged user community and that the application has become so fundamental to people's work. We take very seriously the degree to which making significant changes to it would impact people's work and want to pursue any such development in as thoughtful and responsible a way as we can. As plans progress we will involve the community in the discussions as they relate to PUI search specifically.

I hope knowing more about how the search currently works helps and please do reach out if you would like to discuss more before we reach that point.

Christine

Christine Di Bella
ArchivesSpace Program Manager
christine.dibella at lyrasis.org<mailto:christine.dibella at lyrasis.org>
800.999.8558 x2905
678-235-2905
cdibella13 (Skype)

[ASpaceOrgHomeMedium]



From: Carll, Johanna <jcarll at radcliffe.harvard.edu>
Sent: Monday, September 17, 2018 9:33 AM
To: Christine Di Bella <christine.dibella at lyrasis.org>; Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: PUI search functionality documentation

Dear Christine,

Now that we have had a few months of experience with the ArchivesSpace PUI here at Harvard, we are reviewing user feedback to help us prioritize post-launch development needs. One area of concern is the PUI search functionality, as we've received multiple reports of unsatisfactory and unexpected search results.

Can you direct us to - or share -  documentation on the PUI search functionality, including relevance ranking, weighting, and indexed fields? This will help us evaluate what may be done locally to improve results, as well as participate in the discussion and planning for changes to the core code that would improve search results.

Thanks
Johanna

Johanna Carll
Archivist and Metadata Specialist
Schlesinger Library
Radcliffe Institute for Advanced Study Harvard University
10 Garden Street
Cambridge, MA 02138
617-495-8524
jcarll at radcliffe.harvard.edu<mailto:jcarll at radcliffe.harvard.edu>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20181001/198bbea7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 6608 bytes
Desc: image002.jpg
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20181001/198bbea7/attachment.jpg>


More information about the Archivesspace_Users_Group mailing list