[Archivesspace_Users_Group] Help using AS Restful API to query

Kevin W. Schlottmann kws2126 at columbia.edu
Fri Nov 8 12:12:13 EST 2019


Hi Steve,

I'm a little late to this thread, but I wanted to note that depending on
your exact use case, the built-in OAI-PMH endpoint might be useful.  You
can specify the exact time range of modified times desired; choose MARC,
EAD, or DC (all in XML though); set whether the entire record is downloaded
or just the header; and choose whether to include deleted records.  This is
what we use to publish collection-level records to our catalog overnight,
so I'd be happy to talk more about it if helpful.

https://github.com/archivesspace/tech-docs/blob/master/architecture/oai-pmh.md

Kevin



On Fri, Nov 8, 2019 at 12:01 PM Steve Mattison <smattiso at nd.edu> wrote:

> This is very helpful.
> Thank you so much for passing this along.
>
>
> On Fri, Nov 8, 2019 at 11:53 AM Custer, Mark <mark.custer at yale.edu> wrote:
>
>> And Steve, now that I’m looking at your question, I **think** that
>> everything already mentioned should point you in the right direction for
>> doing that search with the API.
>>
>>
>>
>> It sounds like your use case might be aimed at performing very granular
>> updates, but in case it’s helpful, here’s the approach that Hudson Molonglo
>> (thanks James, et al.!) provided for us so that we can figure out which
>> finding aids to export, adding a new endpoint named “/resource-update-feed”
>> via a plugin. See
>> https://github.com/hudmol/archivesspace_export_service/tree/master/backend
>> .  In this case, we wanted to know not just when a resource or archival
>> record had been edited in a finding aid, but when any of the records that
>> link to it were (e.g. the archival object wasn’t edited, but an associated
>> digital object was).  We use that endpoint as part of a larger service, but
>> I’ve also found it useful in other contexts.
>>
>>
>>
>> Additionally, it was important to figure out which records had been
>> unpublished or suppressed since a particular time, which is why I like this
>> approach, since the idea is that you send a request to that new endpoint,
>> and it will return to you a list of IDs that need to be added (either
>> because they’re brand new or because they’ve been edited), as well as a
>> list of IDs that should be removed.  Again, this use case was just for
>> Resource records, but I imagine that a similar approach would be useful for
>> other record types.
>>
>>
>>
>> Mark
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:
>> archivesspace_users_group-bounces at lyralists.lyrasis.org] *On Behalf Of *Seth
>> Shaw
>> *Sent:* Friday, 08 November, 2019 11:26 AM
>> *To:* Archivesspace Users Group <
>> archivesspace_users_group at lyralists.lyrasis.org>
>> *Subject:* Re: [Archivesspace_Users_Group] Help using AS Restful API to
>> query
>>
>>
>>
>> Thanks, James and Mark. I appreciate the additional pointers. (And sorry
>> for high-jacking your thread, Steve!)
>>
>>
>>
>> On Fri, Nov 8, 2019 at 8:20 AM Custer, Mark <mark.custer at yale.edu> wrote:
>>
>> Seth,
>>
>>
>>
>> Here’s another example:
>>
>>
>>
>> search?type[]=archival_object&page=1&aq=
>>
>> {
>>     "query": {
>>         "op": "AND",
>>         "subqueries": [
>>             {
>>                 "field": "keyword",
>>                 "value": "39002102378974",
>>                 "jsonmodel_type": "field_query",
>>                 "negated": false,
>>                 "literal": false
>>             },
>>             {
>>                 "field": "types",
>>                 "value": "pui",
>>                 "jsonmodel_type": "field_query",
>>                 "negated": true
>>             }
>>         ],
>>         "jsonmodel_type": "boolean_query"
>>     },
>>     "jsonmodel_type": "advanced_query"
>> }
>>
>>
>>
>> In that case, for example, we might have one archival object where that
>> barcode has been applied.  In the Solr index, there will be two documents,
>> if and only if that record has been published.  But, we can exclude the PUI
>> document in the result set with that second part of the subquery.
>>
>>
>>
>> Mark
>>
>>
>>
>>
>>
>> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:
>> archivesspace_users_group-bounces at lyralists.lyrasis.org] *On Behalf Of *Seth
>> Shaw
>> *Sent:* Friday, 08 November, 2019 11:13 AM
>> *To:* Archivesspace Users Group <
>> archivesspace_users_group at lyralists.lyrasis.org>
>> *Subject:* Re: [Archivesspace_Users_Group] Help using AS Restful API to
>> query
>>
>>
>>
>> That might be it. We don't use the PUI for patron access; but we do still
>> have it left on as the staff occasionally look at it. Certainly something
>> to investigate.
>>
>>
>>
>> Regardless, either the API should use *one* index consistently, have a
>> documented filter (I don't see one there) and/or give some other obvious
>> indication as to where a result came from.
>>
>>
>>
>> On Fri, Nov 8, 2019 at 8:02 AM James Bullen <james at hudmol.com> wrote:
>>
>>
>>
>> Hi Seth,
>>
>>
>>
>> I’m not seeing that. Could it be you’re seeing pui docs as well - the
>> instance I’m testing on has the pui turned off.
>>
>>
>>
>>
>>
>> Cheers,
>>
>> James
>>
>>
>>
>>
>>
>> On Nov 8, 2019, at 10:51 AM, Seth Shaw <seth.shaw at unlv.edu> wrote:
>>
>>
>>
>> James, I was hoping using the filter as you described would remove the
>> duplicate results issue I was having by using the advanced query compound
>> search; but I'm seeing the same thing as before.
>>
>>
>>
>> Running the search via the
>> API: 'archivestest:8089/search?type[]=archival_object&page=1&filter={"query":{"comparator":"greater_than","field":"system_mtime","value":"2019-10-02","jsonmodel_type":"date_field_query"}}'
>> returns, in part,
>> `{"page_size":10,"first_page":1,"last_page":874,"this_page":1,"offset_first":1,"offset_last":10,"total_hits":8732,`
>> ...
>>
>>
>>
>> Whereas the SQL query `SELECT count(*) FROM archival_object WHERE
>> system_mtime > '2019-10-02';` is returning "4369" (half the results of the
>> REST query).
>>
>>
>>
>> Have you run into this issue before?
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Nov 8, 2019 at 7:34 AM James Bullen <james at hudmol.com> wrote:
>>
>>
>>
>> Something like this works for me:
>>
>>
>>
>> /search type[]=resource type[]=archival_object page=1
>> filter={"query":{"comparator":"greater_than","field":"system_mtime","value":"2019-10-02","jsonmodel_type":"date_field_query"}}
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Nov 8, 2019, at 10:24 AM, Seth Shaw <seth.shaw at unlv.edu> wrote:
>>
>>
>>
>> To do this you need to use the advanced query parameter which,
>> unfortunately, is not well documented. There are a few email threads that
>> describe using the advanced search though:
>>
>>
>>
>>
>> http://lyralists.lyrasis.org/mailman/htdig/archivesspace_users_group/2015-June/001734.html
>> <https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Flyralists.lyrasis.org%2Fmailman%2Fhtdig%2Farchivesspace_users_group%2F2015-June%2F001734.html&data=02%7C01%7Cmark.custer%40yale.edu%7C14a98c82e2d841b696f508d764686ccb%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C1%7C637088271973732409&sdata=4HOPKRSdqZZdb%2FZ15TJsDOQ%2B%2FxQrFf8Msy9nE77wGQc%3D&reserved=0>
>>
>>
>>
>> I've formulated queries like you describe before (I'll have to dig
>> through my notes to see if I can find it again); but the result set was
>> consistently giving me back duplicate results for some unknown reason; so I
>> stopped using it.
>>
>>
>>
>> Ideally, we would be able to add a 'modified_since' parameter to each
>> entity's endpoint to get the most recent set. The capability exists in the
>> code, but hasn't been exposed to the REST end-point. I've submitted a
>> ticket that will hopefully lead to this being resolved:
>> https://archivesspace.atlassian.net/browse/ANW-962?filter=-2.
>>
>>
>>
>>
>>
>> On Fri, Nov 8, 2019 at 7:07 AM Steve Mattison <smattiso at nd.edu> wrote:
>>
>> Community,
>>
>>
>>
>> I'm new to using ArchivesSpace, and new to using the AS API.  I need to
>> use the API to search within a given repository to find records that have
>> been modified after a particular time (e.g. modified within the last 48
>> hours).  (We then want to export metadata related to those resources or
>> archival_objects for a project we're working on.)
>>
>>
>>
>> I have found the documentation for search-this-repository
>> <https://nam05.safelinks.protection.outlook.com/?url=http%3A%2F%2Farchivesspace.github.io%2Farchivesspace%2Fapi%2F%3Fshell%23search-this-repository&data=02%7C01%7Cmark.custer%40yale.edu%7C14a98c82e2d841b696f508d764686ccb%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C1%7C637088271973752395&sdata=7WZLNWnShXavBBsLzHPBDZzLutLClo%2BwNeevuUvd9eI%3D&reserved=0>,
>> but don't know how to formulate a query to find all resources and/or
>> archival_objects where the system_mtime is after a particular time.  I
>> would be fine with performing two searches, one for resources and one for
>> archival_objects, if that is required.
>>
>>
>>
>> Any help with the syntax for the query would be much appreciated.
>>
>>
>>
>> Thanks for your help.
>>
>>
>>
>> --
>>
>> *Steve Mattison*
>>
>> *Lead Software Engineer, Digital Library Technologies*
>>
>> *Hesburgh Libraries*
>>
>>
>>
>> *University of Notre Dame*
>>
>> 271 Hesburgh Library
>>
>> Notre Dame, IN 46556-5629
>>
>> *o:* 574-631-8559
>>
>> *e: *steve.mattison at nd.edu
>>
>>
>>
>> [https: //
>> docs.google.com/uc?export=download&id=1eLUXKKJMijoIFzQtpJutmM1uLHON9ZDt&revid=0B-gl36FdlSuTV2ZPM1NBek5HamNiOWN1UllhT3d2VWtCU3hjPQ]
>>
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>>
>> !DSPAM:5dc588a9250937848916637!
>>
>>
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>> !DSPAM:5dc58f16257571993821028!
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>>
>> !DSPAM:5dc58f16257571993821028!
>>
>>
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>
>
> --
> *Steve Mattison*
> *Lead Software Engineer, Digital Library Technologies*
> *Hesburgh Libraries*
>
> *University of Notre Dame*
> 271 Hesburgh Library
> Notre Dame, IN 46556-5629
> *o:* 574-631-8559
> *e: *steve.mattison at nd.edu
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>


-- 
Kevin Schlottmann
Head of Archives Processing
Rare Book & Manuscript Library
Butler Library, Room 801
Columbia University
535 W. 114th St., New York, NY  10027
(212) 854-8483
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20191108/c4b12270/attachment.html>


More information about the Archivesspace_Users_Group mailing list