[Archivesspace_Users_Group] Enumerations Findings
Lisa Calahan
lcalahan at umn.edu
Wed Feb 15 16:25:30 EST 2017
I've attached the .csv example. I didn't test it in 1.5.3, but the bug
occurs in 1.5.2 (I know it did not occur in 1.5.1). I reported the bug on
January 17.
On Wed, Feb 15, 2017 at 3:04 PM, Majewski, Steven Dennis (sdm7g) <
sdm7g at eservices.virginia.edu> wrote:
> Yes, and the previous cases I’ve seen ( which have since been fixed ) have
> been where the document was originally parsed with correct character
> encoding, but that encoding wasn’t being preserved on some other
> ( xml or json ) internal transform. So that might be something to look for
> if it’s still happening in a new use case.
>
>
>
> On Feb 15, 2017, at 3:54 PM, Reese, Terry P. <reese.2179 at osu.edu> wrote:
>
> I’d be interested in the same thing (a sample file). I’m familiar with
> the tools being used, and if the data is UTF8, then you shouldn’t see this
> problem unless the import is munging the data or encoding – which would be
> a much different problem.
>
> --tr
>
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [
> mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org>] *On Behalf Of *Majewski,
> Steven Dennis (sdm7g)
> *Sent:* Wednesday, February 15, 2017 3:50 PM
> *To:* Archivesspace Users Group <archivesspace_users_group@
> lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings
>
>
> Do you have a sample import file that fails this way ?
> Do you know if it still fail on current release ?
> ( and is bug reported on Jira ? )
>
> — Steve.
>
>
>
> On Feb 15, 2017, at 3:25 PM, Lisa Calahan <lcalahan at umn.edu> wrote:
>
> I've also received the same UTF8 error when importing legacy accession
> records that have *valid*diacritical marks in the title and/or agent name.
>
>
> Lisa
>
> On Wed, Feb 15, 2017 at 2:17 PM, Reese, Terry P. <reese.2179 at osu.edu>
> wrote:
>
> I guess my question would be – is your legacy data UTF8? For whatever
> reason, I’ve found that historically, Archives have often used other
> charactersets when encoding their EAD files (though to be fair, I see this
> in MARC records as well; confusion between MARC8, ISO8859-1, and codepage
> 1252). The simply solution (and this would maintain your characters) would
> be to convert the character set to UTF8. Otherwise, even if you held on to
> these values – they wouldn’t display in any form that you could read; and
> in fact, that is what the error message is trying to tell you. That as a
> UTF8 value, your data is going to be gibberish, regardless of if you keep
> it or not.
>
> --tr
>
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:
> archivesspace_users_group-bounces at lyralists.lyrasis.org] *On Behalf Of *Stasiulatis,
> Suzanne
> *Sent:* Wednesday, February 15, 2017 3:12 PM
>
> *To:* Archivesspace Users Group <archivesspace_users_group@
> lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings
>
>
> I totally agree that we shouldn’t have special characters if at all
> possible, but a large amount of our legacy data uses them. Especially in
> titles, staff want to use those characters as they are reflected on
> original materials.
>
> Suzanne
>
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:
> archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org>] *On Behalf Of *Reese,
> Terry P.
> *Sent:* Wednesday, February 15, 2017 2:58 PM
> *To:* Archivesspace Users Group
> *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings
>
> Why would you want to retain invalid special characters? My guess is that
> one of the reasons for this error is that invalid characters would cause
> problems with indexing for search, as well as impact display and export. I
> would think you’d want to use the error as a flag to identify data that
> needs to be corrected. Or am I missing something?
>
> --tr
>
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:
> archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org>] *On Behalf Of *Stasiulatis,
> Suzanne
> *Sent:* Wednesday, February 15, 2017 2:52 PM
> *To:* Archivesspace Users Group <archivesspace_users_group@
> lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings
>
> This also came up for me recently. If invalid special characters are
> present in the content titles, I get this error. I’m not sure quite how to
> adjust to accept those special characters.
>
> <image002.png>
>
> *Suzanne Stasiulatis *| Archivist II
> Pennsylvania Historical and Museum Commission | Pennsylvania State
> Archives
> 350 North Street | Harrisburg, PA 17120-0090
> Phone: 717-787-5953 <(717)%20787-5953>
> http://www.phmc.pa.gov
> sustasiula at pa.gov
>
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:
> archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org>] *On Behalf Of *Majewski,
> Steven Dennis (sdm7g)
> *Sent:* Wednesday, February 15, 2017 2:36 PM
> *To:* Archivesspace Users Group
> *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings
>
>
>
> We have run into the case that some EAD attribute values are required to
> be NMTOKENs, thus no embedded spaces or other disallowed characters. We
> replaced enumerations with embedded spaces with underscores.
>
> This has only come to my attention in the last week or so, so I haven’t
> made a thorough investigation of which attributes or which enumerations
> this applies to — just fixed them as I’ve encountered that error.
>
> So it may be intentional that it is using the non translated value.
> ( And I wouldn’t be surprised, if for simplicity, it may be over applying
> that rule in places where it’s not actually required. )
>
>
> — Steve.
>
>
>
> On Feb 15, 2017, at 2:09 PM, Carlos Lemus <carlos.lemus at unlv.edu> wrote:
>
> Hello,
>
> At UNLV Special Collections, we've been working on cleaning up our
> enumeration values because in many cases there were duplicates caused by
> imports (i.e value: linear_feet vs value: Linear feet vs Linear Feet). We
> wanted to stick as close as possible to ArchivesSpace standards and decided
> to make our enumeration values all lowercase seperated by an underscore and
> then merge any records with incorrect enumerations into that correct value
> (i.e value: linear Feet into linear_feet). We also have some custom
> enumerations such as: value: oversized_box, translation: Oversized Box;
> digital_file; Digital File
>
> After we had that set up correctly, we had some findings and was wondering
> if anyone has experienced the same things or had a standard we could use.
>
> 1. When generating PDFs and EADs the enumeration values that were custom
> (such as the oversized_box) would come out as machine readable
> oversized_box instead of using our local en.yml value (located in the local
> plugin).
> This was something I found in the EAD serializer (https://github.com/
> archivesspace/archivesspace/blob/master/backend/app/
> exporters/serializers/ead.rb#L490) and was able to create a temporary
> solution of generating it , but required altering the enumeration instead
> of referencing our file. I thought i'd point it out because anyone creating
> custom enumerations even with a translation in an en.yml file would not
> see their change reflected in the EAD export. (I've attached an image
> reflecting this) Anyone experience this?
>
> 2. Another example of this case was in the container "type" attribute.
> Before something like Oversized Box would be export to EAD as is because
> that was it's value in the enumeration. After we changed the value
> correctly to oversized_box, it would export to the EAD container "type" as
> is and translate to the PDF as well. With some XSLT manipulation I was able
> to get it to show up as oversized box (shown in attachments). I've looked
> through https://www.loc.gov/ead/tglib/elements/container.html and cannot
> find an example of a two+ attribute value.
>
> Should attributes be machine readable (i.e oversized_box), human readable
> (Oversized Box), or does it even matter? Of course, exporting it as
> Oversized Box would be easiest to translate a user friendly version to the
> user.
>
> Excuse me for the lengthy post, I'm trying to be thorough with my
> explenation, but please let me know if you've come accross something
> similar or have a finite solution.
>
> Carlos Lemus
> Application Programmer, Special Collections Technical Services
> University Libraries, University of Nevada, Las Vegas
>
> *How often have I said to you that when you have eliminated the
> impossible, whatever remains, however improbable, must be the truth? -
> Sherlock Holmes*
> <enumeration_ead.PNG><containers_enum.PNG>__________
> _____________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>
>
>
> --
>
> Head of Archival Processing
>
> University of Minnesota Libraries
> Archives and Special Collections
> Elmer L. Andersen Library, Suite 315
> 222-21st Ave. S.
> Minneapolis MN 55455
>
> Phone: 612.626.2531 <(612)%20626-2531>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>
--
Head of Archival Processing
University of Minnesota Libraries
Archives and Special Collections
Elmer L. Andersen Library, Suite 315
222-21st Ave. S.
Minneapolis MN 55455
Phone: 612.626.2531
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20170215/7727fa37/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAAAccessions_Feb.csv
Type: text/csv
Size: 63800 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20170215/7727fa37/attachment.csv>
More information about the Archivesspace_Users_Group
mailing list