[Archivesspace_Users_Group] ampersand issue with PDF button in 2.1.2 public interface

Trevor Thornton trthorn2 at ncsu.edu
Fri Sep 22 09:44:42 EDT 2017


The logic for converting ampersands in the EAD exporter is to only convert
them if they are immediately followed by a space, otherwise they are
assumed to be an entity. This is part of the process of sanitizing mixed
content, which is actually applied to most fields. However, the ampersand
conversion is included in the routine that handles line breaks (converting
2 line breaks into paragraphs as appropriate), and this is only applied to
fields for which the corresponding EAD tag allows <p> as a child, which
excludes untititle, abstract, etc.

There's no good reason that I can think of why the ampersand conversion
should be restricted in this way, so it can probably be moved to apply more
broadly. Unfortunately, since the new EAD3 exporter is based on the
existing EAD exporter, this problem persists in the EAD3 exporter, because
I didn't really notice it until now. I'll try to fix it in both places and
do a pull request.

On Fri, Sep 22, 2017 at 8:48 AM, Mayo, Dave <dave_mayo at harvard.edu> wrote:

> Hi Benn,
>
> This is a recurring issue I hit over both Harvard and Smith’s collections
> – it’s a consequence of ASpace not really having a distinction between
> mixed content and plaintext content.
>
> Unfortunately, there isn’t really a good solution.  The best solution as
> far as I’ve been able to figure is to use HTML/XML entity for ampersand
> (&) wherever it appears in a context that’s treated by the
> interface/etc as markup; title fields _*definitely*_ fall under that
> category.  There’s unfortunately no reliable guide to what fields are
> “mixed content” and what fields are “plaintext content” because, well, the
> underlying system doesn’t track that distinction – it’s up to how the
> fields are eventually displayed/used to build exports/etc.
>
>
> As to _*how*_ to fix it – well, it depends somewhat on whether you can be
> ABSOLUTELY SURE you don’t have any HTML/XML entities in your title fields.
> If you are ABSOLUTELY SURE of this, you should be able to make the change
> via API or on the SQL level, but if you DO have entities, it gets a lot
> harder, to the point where manual review is probably appropriate.
>
> - Dave Mayo
> ASpace Core Committer’s Group Member
>
> *From: *<archivesspace_users_group-bounces at lyralists.lyrasis.org> on
> behalf of Benn Joseph <benn.joseph at northwestern.edu>
> *Reply-To: *Archivesspace Users Group <archivesspace_users_group@
> lyralists.lyrasis.org>
> *Date: *Thursday, September 21, 2017 at 4:21 PM
> *To: *Archivesspace Users Group <archivesspace_users_group@
> lyralists.lyrasis.org>
> *Subject: *[Archivesspace_Users_Group] ampersand issue with PDF button in
> 2.1.2 public interface
>
>
>
> Hi all,
>
> We've encountered an issue with the v2.1.2 Print-to-PDF button in the
> public interface--apparently for any resource record with an ampersand that
> is followed immediately by another character that is not a space (e.g.
> "b&w" or "AT&T"), the ampersand is misinterpreted and causes the
> Print-to-PDF button to fail with an error. For me, that error is just
> "something went wrong", but the log shows this (when it gets tripped up on
> "b&w"):
>
>
>
> RuntimeError (Failed to clean XML: The reference to entity "w" must end
> with the ';' delimiter.):
>
>
>
> So we're guessing ArchivesSpace is thinking "&w" should be "&w;", and so
> forth for any other string of text with an ampersand. I checked this by
> going into a record that wouldn’t print and changing the lone suspect
> ampersand (“AT&T” to “AT and T”) and the PDF generated just fine.
>
>
>
> This doesn't impact being able to just view resource records in the public
> interface, it's just the PDF function that isn't working. It's a problem,
> though, because we want to be able to use that PDF functionality but we
> also have a lot of ampersands in our resource records! Has anyone else
> experienced this issue or possibly come up with a fix?
>
>
>
> Thanks,
>
> --Benn
>
>
>
> *Benn Joseph*
>
> Head of Archival Processing
>
> Northwestern University Libraries
>
> Northwestern University
>
> www.library.northwestern.edu
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.library.northwestern.edu&d=DwMFAg&c=WO-RGvefibhHBZq3fL85hQ&r=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE&m=m73cREghXWiIzy9ulXvIZW1Mx-NoJoH_rB1LSdzHQ6Q&s=Xj5cFVS13R-ioWYCsYqxItOviZziBf6vpg_FBhiC1c4&e=>
>
> benn.joseph at northwestern.edu <benn.joseph at northwestern.edu%0d>
>
> 847.467.6581 <(847)%20467-6581>
>
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>


-- 
Trevor Thornton
Applications Developer, Digital Library Initiatives
North Carolina State University Libraries
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20170922/e1f58352/attachment.html>


More information about the Archivesspace_Users_Group mailing list