[Archivesspace_Users_Group] Enumerations Findings

Lisa Calahan lcalahan at umn.edu
Wed Feb 15 15:25:15 EST 2017


I've also received the same UTF8 error when importing legacy accession
records that have *valid* diacritical marks in the title and/or agent name.

Lisa

On Wed, Feb 15, 2017 at 2:17 PM, Reese, Terry P. <reese.2179 at osu.edu> wrote:

> I guess my question would be – is your legacy data UTF8?  For whatever
> reason, I’ve found that historically, Archives have often used other
> charactersets when encoding their EAD files (though to be fair, I see this
> in MARC records as well; confusion between MARC8, ISO8859-1, and codepage
> 1252).  The simply solution (and this would maintain your characters) would
> be to convert the character set to UTF8.  Otherwise, even if you held on to
> these values – they wouldn’t display in any form that you could read; and
> in fact, that is what the error message is trying to tell you.  That as a
> UTF8 value, your data is going to be gibberish, regardless of if you keep
> it or not.
>
>
>
> --tr
>
>
>
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:
> archivesspace_users_group-bounces at lyralists.lyrasis.org] *On Behalf Of *Stasiulatis,
> Suzanne
> *Sent:* Wednesday, February 15, 2017 3:12 PM
>
> *To:* Archivesspace Users Group <archivesspace_users_group@
> lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings
>
>
>
> I totally agree that we shouldn’t have special characters if at all
> possible, but a large amount of our legacy data uses them. Especially in
> titles, staff want to use those characters as they are reflected on
> original materials.
>
>
>
> Suzanne
>
>
>
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [
> mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org>] *On Behalf Of *Reese,
> Terry P.
> *Sent:* Wednesday, February 15, 2017 2:58 PM
> *To:* Archivesspace Users Group
> *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings
>
>
>
> Why would you want to retain invalid special characters?  My guess is that
> one of the reasons for this error is that invalid characters would cause
> problems with indexing for search, as well as impact display and export.  I
> would think you’d want to use the error as a flag to identify data that
> needs to be corrected.  Or am I missing something?
>
>
>
> --tr
>
>
>
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [
> mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org>] *On Behalf Of *Stasiulatis,
> Suzanne
> *Sent:* Wednesday, February 15, 2017 2:52 PM
> *To:* Archivesspace Users Group <archivesspace_users_group@
> lyralists.lyrasis.org>
> *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings
>
>
>
> This also came up for me recently. If invalid special characters are
> present in the content titles, I get this error. I’m not sure quite how to
> adjust to accept those special characters.
>
>
>
>
>
> *Suzanne Stasiulatis *| Archivist II
> Pennsylvania Historical and Museum Commission | Pennsylvania State
> Archives
> 350 North Street | Harrisburg, PA 17120-0090
>
> Phone: 717-787-5953 <(717)%20787-5953>
>
> http://www.phmc.pa.gov
>
> sustasiula at pa.gov
>
>
>
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org [
> mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org
> <archivesspace_users_group-bounces at lyralists.lyrasis.org>] *On Behalf Of *Majewski,
> Steven Dennis (sdm7g)
> *Sent:* Wednesday, February 15, 2017 2:36 PM
> *To:* Archivesspace Users Group
> *Subject:* Re: [Archivesspace_Users_Group] Enumerations Findings
>
>
>
>
>
>
>
> We have run into the case that some EAD attribute values are required to
> be NMTOKENs, thus no embedded spaces or other disallowed characters. We
> replaced enumerations with embedded spaces with underscores.
>
>
>
> This has only come to my attention in the last week or so, so I haven’t
> made a thorough investigation of which attributes or which enumerations
> this applies to — just fixed them as I’ve encountered that error.
>
>
>
> So it may be intentional that it is using the non translated value.
>
> ( And I wouldn’t be surprised, if for simplicity, it may be over applying
> that rule in places where it’s not actually required. )
>
>
>
>
>
> — Steve.
>
>
>
>
>
> On Feb 15, 2017, at 2:09 PM, Carlos Lemus <carlos.lemus at unlv.edu> wrote:
>
>
>
> Hello,
>
>
>
> At UNLV Special Collections, we've been working on cleaning up our
> enumeration values because in many cases there were duplicates caused by
> imports (i.e value: linear_feet vs value: Linear feet vs Linear Feet). We
> wanted to stick as close as possible to ArchivesSpace standards and decided
> to make our enumeration values all lowercase seperated by an underscore and
> then merge any records with incorrect enumerations into that correct value
> (i.e value: linear Feet into linear_feet). We also have some custom
> enumerations such as: value: oversized_box, translation: Oversized Box;
> digital_file; Digital File
>
>
>
> After we had that set up correctly, we had some findings and was wondering
> if anyone has experienced the same things or had a standard we could use.
>
>
>
> 1. When generating PDFs and EADs the enumeration values that were custom
> (such as the oversized_box) would come out as machine readable
> oversized_box instead of using our local en.yml value (located in the local
> plugin).
>
>      This was something I found in the EAD serializer (https://github.com/
> archivesspace/archivesspace/blob/master/backend/app/
> exporters/serializers/ead.rb#L490) and was able to create a temporary
> solution of generating it , but required altering the enumeration instead
> of referencing our file. I thought i'd point it out because anyone creating
> custom enumerations even with a translation in an en.yml  file would not
> see their change reflected in the EAD export. (I've attached an image
> reflecting this) Anyone experience this?
>
>
>
> 2. Another example of this case was in the container "type" attribute.
> Before something like Oversized Box would be export to EAD as is because
> that was it's value in the enumeration. After we changed the value
> correctly to oversized_box, it would export to the EAD container "type" as
> is and translate to the PDF as well. With some XSLT manipulation I was able
> to get it to show up as oversized box (shown in attachments). I've looked
> through https://www.loc.gov/ead/tglib/elements/container.html and cannot
> find an example of a two+ attribute value.
>
>
>
> Should attributes be machine readable (i.e oversized_box), human readable
> (Oversized Box), or does it even matter? Of course, exporting it as
> Oversized Box would be easiest to translate a user friendly version to the
> user.
>
>
>
> Excuse me for the lengthy post, I'm trying to be thorough with my
> explenation, but please let me know if you've come accross something
> similar or have a finite solution.
>
>
> Carlos Lemus
>
> Application Programmer, Special Collections Technical Services
>
> University Libraries, University of Nevada, Las Vegas
>
>
>
> *How often have I said to you that when you have eliminated the
> impossible, whatever remains, however improbable, must be the truth? -
> Sherlock Holmes*
>
> <enumeration_ead.PNG><containers_enum.PNG>__________
> _____________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>


-- 

Head of Archival Processing

University of Minnesota Libraries
Archives and Special Collections
Elmer L. Andersen Library, Suite 315
222-21st Ave. S.
Minneapolis MN 55455

Phone: 612.626.2531
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20170215/39bd9cd9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 81628 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20170215/39bd9cd9/attachment-0001.png>


More information about the Archivesspace_Users_Group mailing list