[Archivesspace_Users_Group] PUI question: external indexing of container list tree link text
Rees, John (NIH/NLM) [E]
reesj at mail.nlm.nih.gov
Fri Jan 4 09:59:23 EST 2019
I administer a finding aids aggregation service that in part scrapes HTML-source code as a data input and I am looking for some advice/start a conversation.
Our crawler cannot translate these HTML-source URIs into text that it can index. The only workaround we've been able to find is indexing the PDF view, but not everyone implements this feature. Additionally, our crawler sometimes times out on large PDFs as it can take ASpace a while to generate them at runtime.
I'm also wondering if PUI implementers have noticed any issues with other search engines having difficulty indexing their PUI content at a full-document level?
I searched the Jira backlog and PUI Enhancements wikispace and did not find anything specifically addressing this use case.
John P. Rees
Archivist and Digital Resources Manager
History of Medicine Division
National Library of Medicine
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Archivesspace_Users_Group