[Archivesspace_Users_Group] ampersand issue

Brian Hoffman brianjhoffman at gmail.com
Wed Dec 9 11:04:24 EST 2015


Hi Julia,

I don’t think this has anything to do with your XML files or your actual data, but rather with the Java library ASpace uses to try to turn field content that contains mixed XML into displayable HTML (JSoup).

The problem most likely has to do with the java platform on the server you are running ASpace on, and  / or an older version of JSoup in your classpath. So, it might require a little help from your system administrator to debug. However, the good news is that this appears to just be a display issue and your data looks right (i.e., the values in the form field).

I posted a little code that might help diagnose the issue.

https://gist.github.com/quoideneuf/6d8ca958d423a4020e93 <https://gist.github.com/quoideneuf/6d8ca958d423a4020e93>

Brian





> On Dec 9, 2015, at 9:06 AM, Novakovic, Julia <jNovakovic at museumofplay.org> wrote:
> 
> Hi Brian,
>  
> The quotation marks worked fine, but ø still reads as  ø ! Screenshot attached. 
>  
> [Our Director of IT will confirm for me if our XML files are encoded as UTF-8.]
>  
> Thanks!
>  
> --Julia
>  
>  
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Brian Hoffman
> Sent: Tuesday, December 08, 2015 5:15 PM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] ampersand issue
>  
> Hi Chris,
>  
> You are right - I tested two records with these titles:
>  
>  "AmpTest & <emph>a</emph>”
>  "AmpTest & <emph>a</emph>"
>  
> they *both* export identically as:
>  
> <unittitle>AmpTest & <emph>a</emph></unittitle>
>  
> I also tried a similar experiment with the < entity and got different results:
>  
> "AmpTest < A”
> exports as
> <unittitle>AmpTest < A</unittitle>
> which is invalid XML. 
>  
> "AmpTest < A”
> exports as
> <unittitle>AmpTest &lt; A </unittitle>
> which is probably going to seem wrong to any user who would try to do it this way.
>  
> I’m wondering whether most users intend to key in actual XML mixed content or just text with inline markup that corresponds to EAD. 
>  
> Julia, what happens if you cut and paste this text “øøøøøø” into a new resource record title and save it?
>  
> Brian
>  
>  
>  
>  
>  
>  
>  
>  
> On Dec 8, 2015, at 4:14 PM, Chris Fitzpatrick <Chris.Fitzpatrick at lyrasis.org <mailto:Chris.Fitzpatrick at lyrasis.org>> wrote:
>  
> 
> Hi Brian,
>  
> Hm,trying to think what the issues are with having "AmpTest & <emph>a</emph>" stored in the DB? 
> The exporter converts the & into & , but you're thinking this would be an import problem?
>  
> The other option, I thinki, would be to add things to the MixedContent parser, which turns all the wonderful EAD "mixed content" into actual HTML. 
>  
> Julia: 
>  
> Do you know if your XML files are saved as UFT-8? I wonder if you have an encoding issue that might be causing this. 
>  
> best, Chris. 
>  
> Chris Fitzpatrick | Developer, ArchivesSpace
> Skype: chrisfitzpat  | Phone: 918.236.6048
> http://archivesspace.org/ <http://archivesspace.org/>
>  
> 
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Galligan, Patrick <PGalligan at rockarch.org <mailto:PGalligan at rockarch.org>>
> Sent: Tuesday, December 8, 2015 10:07 PM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] ampersand issue
>  
> I also didn’t have any issues with quotation marks. Maybe they were smart quotes or something?
>  
> Patrick Galligan
> Rockefeller Archive Center
> Assistant Digital Archivist
> 914-366-6386
>  
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>] On Behalf Of Brian Hoffman
> Sent: Tuesday, December 08, 2015 4:04 PM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] ampersand issue
>  
> Julia,
>  
> I’m not having any trouble saving a title with ‘ø’ (see screenshot). Are you on a windows machine? 
>  
>  
> Brian
>  
> <image001.png>
>  
> On Dec 8, 2015, at 3:50 PM, Novakovic, Julia <jNovakovic at museumofplay.org <mailto:jNovakovic at museumofplay.org>> wrote:
>  
> Similarly, we have found that special characters like ø or quotation marks in the title field do not appear correctly, while they appear fine in other notes fields throughout imported collections. I have had to go through manually and change the characters to something that closely resembles the actual characters we want. [For example, Brøderbund to Broderbund … Gerald A. (“Jerry”) Lawson papers to Gerald A. (‘Jerry’) Lawson papers.] I would also appreciate clarification like Brian has outlined below. 
>  
> Thanks!
> --Julia
>  
>  
> Julia Novakovic
> Archivist
> Associate Editor, American Journal of Play
> The Strong
> One Manhattan Square
> Rochester, NY 14607 U.S.A.
> Tel 585-410-6307
> Fax 585-423-1886
> jnovakovic at museumofplay.org <mailto:jnovakovic at museumofplay.org>
> www.museumofplay.org <http://www.museumofplay.org/>
>  
> The Strong is home to:
> International Center for the History of Electronic Games | National Toy Hall of Fame | World Video Game Hall of Fame
> Brian Sutton-Smith Library and Archives of Play | Woodbury School | American Journal of Play
>  
>  
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>] On Behalf Of Brian Hoffman
> Sent: Tuesday, December 08, 2015 3:23 PM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] ampersand issue
>  
> Currently unittitle gets imported as-is to the title field of resources and components. So when you import the example you get two records whose title field is "AmpTest & A”
>  
> We could change this so that you get "AmpTest & A” instead, but what happens when a user imports a unittitle like: "AmpTest & <emph>A<emph>”? We can’t unescape the & entity without leaving the XML zone, so what happens to the <emph> tag and content?
>  
> To me this is part of a larger still unresolved question of what exactly the data type of ASpace text field is supposed to be. Is it XML? If so, I think we should be assuming that archivists are managing XML data here, and it should be ‘&’. If it isn’t, and yet we still want to support inline styling, we need to come up with a set of rules for what kind of inline pseudo-markup is allowed and for how it maps to EAD on export.
>  
> Brian
>  
>  
>  
>  
> On Dec 8, 2015, at 1:50 PM, Chris Fitzpatrick <Chris.Fitzpatrick at lyrasis.org <mailto:Chris.Fitzpatrick at lyrasis.org>> wrote:
>  
> 
> Hi, 
> I think I understand..
> So, the title is in the imported XML is 
> <unittitle>AmpTest & A</unittitle>
> but you want the EAD converter to switch this to be "AmpTest & A"?
> If that's the case, it seems like a pretty easy thing to add to the converter...
>  
> b,chris
>  
>  
> Chris Fitzpatrick | Developer, ArchivesSpace
> Skype: chrisfitzpat  | Phone: 918.236.6048
> http://archivesspace.org/ <http://archivesspace.org/>
>  
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org><archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Galligan, Patrick <PGalligan at rockarch.org <mailto:PGalligan at rockarch.org>>
> Sent: Tuesday, December 8, 2015 6:58 PM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] ampersand issue
>  
> Christine,
>  
> I’d like to circle back to this issue.
>  
> I was doing some testing with the merged pull request, and while it no longer just deletes the escaped character, it actually adds “&” to the display of the title.
>  
> I’ve also noticed that while it corrects the unittitle on the highest level, it doesn’t seem to work with series levels later.
>  
> Attached is a screenshot and the EAD that I imported into AS. Has anyone else run into this issue? Has anyone found a viable solution so far?
>  
> Patrick Galligan
> Rockefeller Archive Center
> Assistant Digital Archivist
> 914-366-6386
>  
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>[mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>] On Behalf Of Christine Di Bella
> Sent: Thursday, December 03, 2015 10:10 AM
> To: Archivesspace Users Group
> Subject: [Archivesspace_Users_Group] FW: ampersand issue
>  
> Forwarded for Matt Francis.
>  
> (I believe there has been some work on this issue very recently by the University of Michigan folks. See this thread on Github - https://github.com/archivesspace/archivesspace/issues/332 <https://github.com/archivesspace/archivesspace/issues/332> - and the associated merged pull request. – Christine)
>  <https://github.com/archivesspace/archivesspace/issues/332>	
> EAD Import - Problem with Escaped Characters · Issue #332 · archivesspace/archivesspace
> It looks like there is an issue with certain escaped characters (& and <) getting dropped in at least and tags. What we suspect is happening is that escaped characters are b...
> Read more... <https://github.com/archivesspace/archivesspace/issues/332>
>  
> From: MATTHEW R FRANCIS [mailto:mrf22 at psu.edu <mailto:mrf22 at psu.edu>] 
> Sent: Wednesday, December 2, 2015 3:20 PM
> To: archivesspace users group-bounces <archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>>
> Subject: ampersand issue
>  
> All,
>  
> We are currently in ASpace v1.4.1 and recently observed an issue for when we try to import EAD files that contain ampersands in the XML, and are now curious if others have experienced this and/or if anyone knows a fix for the issue.
>  
> Currently when import files with an ampersand coded as "&" as seen in:
>  
> <image001.png>
>  
>  
> The "&" does not appear to be rendered in ASpace in any form, as seen in:
>  
>  
> <image002.png>
>  
> In looking through JIRA it does not appear that this specific issue/behavior has been reported, but before reporting it as a bug we were hoping to determine if this was a universal issue, or perhaps just local.
>  
> Thanks for the help and feedback.
>  
> -Matt
>  
> Matt Francis
> Archivist for Collection Management
> Special Collections Library
> Penn State University
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
>  
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
>  
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
>  
> <oslash_120915.PNG>_______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20151209/a8f64e5f/attachment.html>


More information about the Archivesspace_Users_Group mailing list