[Archivesspace_Users_Group] PUI question: external indexing of container list tree link text

Majewski, Steven Dennis (sdm7g) sdm7g at virginia.edu
Fri Jan 4 11:03:37 EST 2019


I would suggest crawling the OAI endpoint for indexing, but linking to the PUI record.
oai_ead metadata just EAD in an OAI wrapper, and has the complete resource tree. 
The problem with that is that not everyone may have configured OAI or made it public. 

But yes: that’s a problem with progressive web apps: all of the data you want indexed isn’t in the page. 
I wonder if there is a way thru google webmaster console or sitemaps to configure this sort of action, i.e.
Use this other URL to index this resource. 

— Steve Majewski



> On Jan 4, 2019, at 9:59 AM, Rees, John (NIH/NLM) [E] <reesj at mail.nlm.nih.gov> wrote:
> 
> Hi all,
>  
> I administer a finding aids aggregation service that in part scrapes HTML-source code as a data input and I am looking for some advice/start a conversation.
>  
> Several of our contributing repositories with this data type moved to ArchivesSpace in 2018 and we are not able to crawl ASpace’s collection_organization#tree source which seems to be the only organized view of container list data. As many of you probably know these are coded as URIs in the HTML-source and are only rendered as visible text at runtime via javascript and css in the browser.
>  
> Our crawler cannot translate these HTML-source URIs into text that it can index. The only workaround we’ve been able to find is indexing the PDF view, but not everyone implements this feature. Additionally, our crawler sometimes times out on large PDFs as it can take ASpace a while to generate them at runtime.
>  
> I’m also wondering if PUI implementers have noticed any issues with other search engines having difficulty indexing their PUI content at a full-document level?
>  
> I searched the Jira backlog and PUI Enhancements wikispace and did not find anything specifically addressing this use case.
>  
> Thanks,
> John
>  
>  
> John P. Rees
> Archivist and Digital Resources Manager
> History of Medicine Division
> National Library of Medicine
> 301-827-4510
>  
>  
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20190104/2aeb7ad3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3598 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20190104/2aeb7ad3/attachment.bin>


More information about the Archivesspace_Users_Group mailing list