[Archivesspace_Users_Group] Retrieving tree info via API (what are "waypoints"?)

Tue Jul 23 12:51:26 EDT 2019

I believe for the next level of archival_objects, you have to get /repositories/$REPO/archival_objects/$ID/children , but check the API docs.

Note that there is also a GET /repositories/$REPO/resources/$ID/ordered_records method that gives you the whole hierarchy, but minimal info about each resource:  { ref: display_string:, depth:, level: } 

I don’t think I knew about that one the first time I was wrestling with this sort of task. 
If you’re doing backend API and not worried about real time display update, it might make more sense to walk the output ordered_records 
If you want more complete info on resource children. 

— Steve. 

> On Jul 23, 2019, at 12:11 PM, Trevor Thornton <trthorn2 at ncsu.edu> wrote:
> 
> Just found that file in the repo before I saw your message and I think I understand now - thanks!
> 
> So, if you're looking at a node below the root (an ArchivalObject) that has >200 children, you would hit the ".../tree/waypoint" endpoint however many times and include "parent_node" in the GET params with the ArchivalObject URI, right?
> 
> On Tue, Jul 23, 2019 at 11:57 AM Majewski, Steven Dennis (sdm7g) <sdm7g at virginia.edu <mailto:sdm7g at virginia.edu>> wrote:
> 
>> So the next question is how do you make the subsequent calls to retrieve the next 200, etc.?
> 
> 
> 
> You call  /repositories/$repo/resources/$id/tree/waypoint?offset=$N  23 times. 
> ( You already got the first batch in .precomputed_waypoints in the call to /ress/root  ) 
> 
> 
> I found the documentation note in the source I was looking for: 
> https://github.com/archivesspace/archivesspace/blob/master/backend/app/model/large_tree.rb <https://github.com/archivesspace/archivesspace/blob/master/backend/app/model/large_tree.rb>
> 
> 
> # What's the big idea?
> #
> # ArchivesSpace has some big trees in it, and sometimes they look a lot like big
> # sticks.  Back in the dark ages, we used JSTree for our trees, which in general
> # is perfectly cromulent.  We recognized the risk of having some very large
> # collections, so dutifully configured JSTree to lazily load subtrees as the
> # user expanded them (avoiding having to load the full tree into memory right
> # away).
> #
> # However, time makes fools of us all.  The JSTree approach works fine if your
> # tree is fairly well balanced, but that's not what things look like in the real
> # world.  Some trees have a single root node and tens of thousands of records
> # directly underneath it.  Lazy loading at the subtree level doesn't save you
> # here: as soon as you expand that (single) node, you're toast.
> #
> # This "large tree" business is a way around all of this.  It's effectively a
> # hybrid of trees and pagination, except we call the pages "waypoints" for
> # reasons known only to me.  So here's the big idea:
> #
> #  * You want to show a tree.  You ask the API to give you the root node.
> #
> #  * The root node tells you whether or not it has children, how many children,
> #    and how many waypoints that works out to.
> #
> #  * Each waypoint is a fixed-size page of nodes.  If the waypoint size is set
> #    to 200, a node with 1,000 children would have 5 waypoints underneath it.
> #
> #  * So, to display the records underneath the root node, you fetch the root
> #    node, then fetch the first waypoint to get the first N nodes.  If you need
> #    to show more nodes (i.e. if the user has scrolled down), you fetch the
> #    second waypoint, and so on.
> #
> #  * The records underneath the root might have their own children, and they'll
> #    have their own waypoints that you can fetch in the same way.  It's nodes,
> #    waypoints and turtles the whole way down.
> #
> # All of this interacts with the largetree.js code in the staff and public
> # interfaces.  You open a resource record, and largetree.js fetches the root
> # node and inserts placeholders for each waypoint underneath it.  As the user
> # scrolls towards a placeholder, the code starts building tracks ahead of the
> # train, fetching that waypoint and rendering the records it contains.  When a
> # user expands a node to view its children, that process repeats again (the node
> # is fetched, waypoint placeholders inserted, etc.).
> #
> # The public interface runs the same code as the staff interface, but with a
> # small twist: it fetches its nodes and waypoints from Solr, rather than from
> # the live API.  We hit the API endpoints at indexing time and store them as
> # Solr documents, effectively precomputing all of the bits of data we need when
> # displaying trees.
> 
> 
> 
> 
> 
>> On Jul 23, 2019, at 11:08 AM, Trevor Thornton <trthorn2 at ncsu.edu <mailto:trthorn2 at ncsu.edu>> wrote:
>> 
>> Thanks, Steve. That makes sense, and I tested with a resource with >1000 top level children and I see that only 200 of them are included, which corresponds to the value for "waypoint_size" in the response:
>> 
>> {  
>>    "child_count":4780,
>>    "waypoints":24,
>>    "waypoint_size":200
>> ...
>> 
>> So the next question is how do you make the subsequent calls to retrieve the next 200, etc.?
>> 
>> On Tue, Jul 23, 2019 at 10:52 AM Majewski, Steven Dennis (sdm7g) <sdm7g at virginia.edu <mailto:sdm7g at virginia.edu>> wrote:
>> I believe the rationale of the waypoints was that initially, it was expected that resource children/ archival objects would fall into a more balanced tree structure, but it turned out that there were many flat hierarchies with hundreds of top level children, and getting all of the children at once was not working very efficiently. So with they waypoint calls, you may only be getting some of the children, but the display can start populating the tree display while making additional calls for the rest. 
>> 
>> I may have some postman examples and internal notes around somewhere: I’ll see what I can dig out. 
>> 
>> — Steve. 
>> 
>> 
>>> On Jul 23, 2019, at 9:05 AM, Trevor Thornton <trthorn2 at ncsu.edu <mailto:trthorn2 at ncsu.edu>> wrote:
>>> 
>>> Hi everybody-
>>> 
>>> I'm building a service using these API endpoints (or I think I am):
>>> [:GET] /repositories/:repo_id/resources/:id/tree/root <http://archivesspace.github.io/archivesspace/api/#fetch-tree-information-for-the-top-level-resource-record>
>>> [:GET] /repositories/:repo_id/resources/:id/tree/node <http://archivesspace.github.io/archivesspace/api/#fetch-tree-information-for-an-archival-object-record-within-a-tree>
>>> 
>>> These incorporate the concept of "waypoints", which I admit that I'm not familiar with in this context, and it isn't explained very well in the documentation. This is what I have to work with (these are elements included in the API response):
>>> child_count – the number of immediate children
>>> waypoints – the number of “waypoints” those children are grouped into
>>> waypoint_size – the number of children in each waypoint
>>> precomputed_waypoints – a collection of arrays (keyed on child URI) in the same format as returned by the ’/waypoint’ endpoint. Since a fetch for a given node is almost always followed by a fetch of the first waypoint, using the information in this structure can save a backend call.
>>> Can anyone explain what exactly waypoints are and how they are different from children? In the examples I've seen, the "precomputed_waypoints" element in the response looks like a convoluted way (an array value of the lone element in an object, which is itself the value of the lone element in another object) to provide the children nodes of the given node (or root). What's the difference?
>>> 
>>> Thanks,
>>> Trevor
>>> 
>>> -- 
>>> Trevor Thornton
>>> Applications Developer, Digital Library Initiatives
>>> North Carolina State University Libraries
>>> _______________________________________________
>>> Archivesspace_Users_Group mailing list
>>> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
>> 
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
>> 
>> 
>> -- 
>> Trevor Thornton
>> Applications Developer, Digital Library Initiatives
>> North Carolina State University Libraries
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
> 
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
> 
> 
> -- 
> Trevor Thornton
> Applications Developer, Digital Library Initiatives
> North Carolina State University Libraries
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20190723/1647ce08/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3598 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20190723/1647ce08/attachment.bin>