[Archivesspace_Users_Group] ampersand issue with PDF button in 2.1.2 public interface

Custer, Mark mark.custer at yale.edu
Fri Sep 22 10:11:22 EDT 2017


That would be fantastic!!!!!


For your PDF error, I think that might be caused by a slightly different issue.  The new PUI “Print to PDF” process converts that ArchivesSpace JSON record to HTML and then converts that HTML into a PDF file.  So, it doesn’t use the same JSON --> EAD --> PDF process as the staff interface.  I’m assuming that a small tweak to this file https://github.com/archivesspace/archivesspace/blob/master/public/app/lib/xml_cleaner.rb might allow it to still create the PDF successfully (assuming that ArchivesSpace would want the application to handle both “b&w” and “b&w”, which might not be the case).

We should log this issue in JIRA at some point, regardless, just so that it’s captured there.  I don’t have time to do that right now, but I did update one of the files in the sandbox to illustrate the problem.  Here it is: http://public.archivesspace.org/repositories/2/resources/1008/

  *   Before I added the lone collection-level note to this record, the PDF printed fine.
  *   Once I added a note of “b&w”, it failed.
  *   When I change the note to “b&w”, the PDF file works....
  *   It also still displays fine in the PUI, which might mean that the problem that I noted in my previous message only occurs when the note is in one of those “see more” / “see less” sections.


From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Trevor Thornton
Sent: Friday, 22 September, 2017 9:45 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] ampersand issue with PDF button in 2.1.2 public interface

The logic for converting ampersands in the EAD exporter is to only convert them if they are immediately followed by a space, otherwise they are assumed to be an entity. This is part of the process of sanitizing mixed content, which is actually applied to most fields. However, the ampersand conversion is included in the routine that handles line breaks (converting 2 line breaks into paragraphs as appropriate), and this is only applied to fields for which the corresponding EAD tag allows <p> as a child, which excludes untititle, abstract, etc.

There's no good reason that I can think of why the ampersand conversion should be restricted in this way, so it can probably be moved to apply more broadly. Unfortunately, since the new EAD3 exporter is based on the existing EAD exporter, this problem persists in the EAD3 exporter, because I didn't really notice it until now. I'll try to fix it in both places and do a pull request.

On Fri, Sep 22, 2017 at 8:48 AM, Mayo, Dave <dave_mayo at harvard.edu<mailto:dave_mayo at harvard.edu>> wrote:
Hi Benn,

This is a recurring issue I hit over both Harvard and Smith’s collections – it’s a consequence of ASpace not really having a distinction between mixed content and plaintext content.

Unfortunately, there isn’t really a good solution.  The best solution as far as I’ve been able to figure is to use HTML/XML entity for ampersand (&) wherever it appears in a context that’s treated by the interface/etc as markup; title fields _definitely_ fall under that category.  There’s unfortunately no reliable guide to what fields are “mixed content” and what fields are “plaintext content” because, well, the underlying system doesn’t track that distinction – it’s up to how the fields are eventually displayed/used to build exports/etc.

As to _how_ to fix it – well, it depends somewhat on whether you can be ABSOLUTELY SURE you don’t have any HTML/XML entities in your title fields.  If you are ABSOLUTELY SURE of this, you should be able to make the change via API or on the SQL level, but if you DO have entities, it gets a lot harder, to the point where manual review is probably appropriate.
- Dave Mayo
ASpace Core Committer’s Group Member
From: <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Benn Joseph <benn.joseph at northwestern.edu<mailto:benn.joseph at northwestern.edu>>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Date: Thursday, September 21, 2017 at 4:21 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] ampersand issue with PDF button in 2.1.2 public interface

Hi all,

We've encountered an issue with the v2.1.2 Print-to-PDF button in the public interface--apparently for any resource record with an ampersand that is followed immediately by another character that is not a space (e.g. "b&w" or "AT&T"), the ampersand is misinterpreted and causes the Print-to-PDF button to fail with an error. For me, that error is just "something went wrong", but the log shows this (when it gets tripped up on "b&w"):

RuntimeError (Failed to clean XML: The reference to entity "w" must end with the ';' delimiter.):

So we're guessing ArchivesSpace is thinking "&w" should be "&w;", and so forth for any other string of text with an ampersand. I checked this by going into a record that wouldn’t print and changing the lone suspect ampersand (“AT&T” to “AT and T”) and the PDF generated just fine.

This doesn't impact being able to just view resource records in the public interface, it's just the PDF function that isn't working. It's a problem, though, because we want to be able to use that PDF functionality but we also have a lot of ampersands in our resource records! Has anyone else experienced this issue or possibly come up with a fix?



Benn Joseph
Head of Archival Processing
Northwestern University Libraries
Northwestern University
benn.joseph at northwestern.edu<mailto:benn.joseph at northwestern.edu%0d>

Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>

Trevor Thornton
Applications Developer, Digital Library Initiatives
North Carolina State University Libraries
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20170922/ab0975af/attachment.html>

More information about the Archivesspace_Users_Group mailing list