[Archivesspace_Users_Group] RC candidates and v1.5.0 not creating <p> tags in EAD export

McPhee, Laurel lmcphee at ucsd.edu
Wed Jul 27 10:06:56 EDT 2016


Thanks, Mark and Chris!

Yes, removing all the ns2 references and swapping in xlink is on our checklist of tasks for moving into production (the data in our ASpace test instance is pretty much as-is from a recent AT test migration), which was supposed to happen this month. I like your idea of stripping/adding the necessary snippet on export though, because as you say, in the back of my mind I've been thinking about EAD3, and inwardly groaning at the thought of updating all those notes AGAIN whenever that becomes a reality.

Many thanks, see you in Atlanta!
Laurel

From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Wednesday, July 27, 2016 6:43 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] RC candidates and v1.5.0 not creating <p> tags in EAD export

All,

Great news!  Chris Fitzpatrick submitted a patch for the paragraph wrapping issue at the end of last week.  Mark Cooper has since tested and merged the fix into the core code, so you can test it out starting today at http://test.archivesspace.org

I imagine the fix should come out in the next release, but if you need it immediately you scan see the changes in the exporters/serializers/ead.rb file here (as well as the new tests added): https://github.com/archivesspace/archivesspace/commit/f14f2e67f72281bffa4422c2fb5a355630573211

Mark

p.s.
Laurel, I think you're still going to need to change those namespace prefixes after the export, if you need valid EAD.  I know that when we migrated from the AT to ArchivesSpace at Yale, for instance, we changed all of the "ns2:" pieces of text in the AT's database to "xlink:", since that's the prefix that ASpace uses.  Ideally, though, I don't know if these prefixes should be stored in the database, but perhaps instead they could be stripped and/or added when needed during the import and export processes (especially since EAD3 has removed the XLink namespace).  It's not a great model to follow (we're not SQL experts :)), but this is how we handled that before the migration: https://github.com/YaleArchivesSpace/migrationSQL/blob/master/AllDatabasesPreMigration.sql#L1-L25  (also, I'm pretty certain that we did *not* have any xlink attributes in our dates, but since it's possible to have linking elements in your dates for some reason in EAD, we even had update statements for that remote possibility!)



From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Monday, 25 July, 2016 12:45 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] RC candidates and v1.5.0 not creating <p> tags in EAD export

Yeah, that seems to be the gist of it:  if the ampersands look like "&" in the web forms, rather than "&", they should continue to export okay in the note fields in the newest release; if they look like "&" in a unit title, then those should still export fine; but if they look like "&" in a notes field, that will cause problems in the export.

The bug has to do with how the exporter decides whether or not it should wrap the note in a paragraph element, probably to guard against instances when the "p" elements are stored as text in the database (i.e. "<p>first paragraph</p>", instead of "first paragraph".)

I'm just now testing the problem with the attributes, and it seems that the paragraph elements will not export correctly in version 1.5 if a namespace prefix (like ns2, used by the AT upon export, or xlink, used by ASpace) is present anywhere in the note.  To be valid EAD 2002, though, one of your extref examples would need to be exported by the system like like this:

         <p>Selected materials from the collection have been digitized and can be viewed by
            clicking the link below. <extref
              xlink:href="http://library.ucsd.edu/dc/collection/bb5496475c"
              xlink:title="Items available online">Items available online</extref>
          </p>

But you can't get that exported right now, since if you have "xlink:" (or "ns:") on those attributes, then you won't get a paragraph wrapper element (and I think that behavior is new to version 1.5, but I'm not sure).  I'm pretty sure it was required to have the "xlink:" prefix in previous versions of ASpace, though.

*Both of these issues could probably be fixed with a minor patch to the exporter code in the short term, but I'd still prefer a holistic look into how XML is handled upon import and export of the system in the long term.*

It's probably a bad idea, for just one example, to include those namespace prefixes as text in the database (unless they were stored as XML, so that they could be manipulated as XML rather than just text), since when ASpace gets an option to export EAD3, those prefixes can't be included in the exports, and so, additional logic would have to be added to strip that information.

In any event, I'll add a few notes to the JIRA ticket that you created once I've looked into this a little more.





From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of McPhee, Laurel
Sent: Monday, 25 July, 2016 12:11 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] RC candidates and v1.5.0 not creating <p> tags in EAD export

Hi Mark,

Thanks for looking into it. So, if I may paraphrase you, if an organization upgrades to 1.5, and they have 500 resource records that they migrated a year ago from AT (into, say, 1.4.2), all of their records with ampersands that previously exported just fine will now have trouble exporting as valid EAD. But if they create a NEW resource record in 1.5 from scratch, or import (not migrate) a new EAD in, loaded with ampersands, it will export correctly with <p> tags. Is this correct?

On Friday, I created a bug report for this issue: https://archivesspace.atlassian.net/browse/AS-98<https://urldefense.proofpoint.com/v2/url?u=https-3A__archivesspace.atlassian.net_browse_AS-2D98&d=CwMFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=uz1FZfx7XilbqoApFXJbFygfmLZXmNfitL2QpazUF98&s=viVBfdqLLv2-6g_mT5fKq41nIgFsxYqr1075kigl8fI&e=>

The language might need to be refined to address the full scope/details of the problem. Thanks for your input!

Laurel

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Monday, July 25, 2016 8:57 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] RC candidates and v1.5.0 not creating <p> tags in EAD export


Laurel, all:



Following up on this issue, I tried to replicate it in the newest version of ArchivesSpace, but I haven't had any luck (well, I uncovered a few other minor bugs, but nothing as problematic as the ones you reported).  Anyhow, here's the file that I uploaded for testing, which is in the GoneRepo repository: http://test.archivesspace.org/resources/151<https://urldefense.proofpoint.com/v2/url?u=http-3A__test.archivesspace.org_resources_151&d=CwMFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=uz1FZfx7XilbqoApFXJbFygfmLZXmNfitL2QpazUF98&s=dCYMK0Y40R4njCVgy3xA12GGHUF7GJY_wziHodgP8g8&e=>.



So, it appears that the problem is being caused by how previous importers (or migrators) handled the ampersands versus what the exporter expects now.  For example, this is text from one of our scope and content notes that was imported from AT, I think:  "individual cards measuring 300 by 360mm & captioned." (and that ampersand is not encoded as "&", which is how it is imported in versions 1.5, and possibly earlier).  That ampersand no longer exports okay in version 1.5, but it exports fine in version 1.4.2  Ideally, the importers and exporters would have one way to handle XML data but it seems like there have been different ways to do that in the past.  I've no clue how best to handle that with the ASpace architecture, aside to say that it should be consistent in all aspects of the application.



Mark





From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Thursday, 21 July, 2016 5:12 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] RC candidates and v1.5.0 not creating <p> tags in EAD export



Hi, Laurel:



What timing!  I remember Brad mentioning this issue before, but I had completely forgotten about it until seeing your message about the exact same time that I was testing a new automatic EAD export service for ArchivesSpace.  Right before I read your message, I was wondering why so many of our files failed to produce valid EAD files.  The main problem, it turns out, is the bug that's not putting paragraph elements around text that has ampersands.   But this only happens in the 1.5 versions.  I can confirm that in version 1.4.2 this problem doesn't exist (that's our production version), but that it does exist in 1.5 (that's our test version).  I haven't tested anything with the altformavail/extref issues, though.



I haven't looked into the exporter code too much yet, but the history of updates to the EAD exporter file should provide clues (to the developers, at least):  https://github.com/archivesspace/archivesspace/commits/master/backend/app/exporters/serializers/ead.rb<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_archivesspace_archivesspace_commits_master_backend_app_exporters_serializers_ead.rb&d=CwMFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=dURyN9dLHMTStr2wd50oCQjXJwVfE97zD-TWh4fTqus&s=bBnV5XYygTQhUg2JF6pLIPSbdbdXi-bu1T_qRF2ycdo&e=>



Mark







From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of McPhee, Laurel
Sent: Thursday, 21 July, 2016 3:45 PM
To: archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>
Subject: [Archivesspace_Users_Group] RC candidates and v1.5.0 not creating <p> tags in EAD export



Hello,



UC San Diego discovered an issue in R2 that we informally reported to B. Westbrook in early June. The bug is this: note fields containing either an & entity ref and/or an <extref> tag do not get properly wrapped in a <p> tag upon EAD export. For example, all of our <prefercite> notes contain the term "Special Collections & Archives", and because of the presence of the ampersand in the note, do not get wrapped. Similarly, if a bioghist note has an & somewhere in it, none of the paragraphs get wrapped in a <p> tag, leaving us with giant block of text. And last, in altformavail, where we record the existence of digital versions of our collections with a link, the existence of an <extref> causes the whole note to not get wrapped in a <p> tag when we export the EAD.



This issue breaks our local scripts and also prevents validation on the Online Archive of California. At the time, I didn't create a JIRA ticket, because we were told the problem was being worked on...but now I'm going through the steps to create the ticket. Is there any feedback/observations on this in the member community before I do so? It's happening in v1.5.0, which we tested this morning. Thanks!



Laurel McPhee

Supervisory Archivist, Special Collections & Archives Program

UC San Diego Library | * 858-534-5619 | * lmcphee at ucsd.edu<mailto:lmcphee at ucsd.edu>




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20160727/49869033/attachment.html>


More information about the Archivesspace_Users_Group mailing list