[Archivesspace_Users_Group] JIRA issues cleanup ? [was: Enumerations Findings]

Thu Feb 16 08:43:00 EST 2017

Hi everyone,

Just to provide a little context for the JIRA issues cleanup specifically, Laney and I are in the process of working with others on improving some processes and workflows related to JIRA. But in the way JIRA has been used for the last couple years, both of the issues Steve points to in his message are resolved, as are any other issues with an “Accepted” status. “Accepted” in our past/present usage means code has been delivered and tested that fixes the issue, or that something else has happened that indicates this issue has reached the end of the line. (If what has happened is more complex than just that it got fixed, there is usually a comment trail on the issue that gives an indication.) We have a number of resources for understanding our current use of JIRA on the wiki at https://archivesspace.atlassian.net/wiki/display/ADC/JIRA+Resources. As the process evolves over the coming months with the new crew in place, we’ll update these, but they do reflect how things have worked to now.

We’ve heard feedback across the board that people find our use of JIRA and its relationship to Github confusing, and that it’s difficult for people to determine if an issue has been reported and what’s going on with it once it has been. Improving this experience is very high on our priority list and we very much welcome suggestions. As Steve says, there’s been a changing of the guard and that provides an opportunity for us to reexamine various processes and policies and work with all of you to improve them.

(On the specific issue of what’s going on with being able to import diacritics/special characters, in general, known issues have been fixed over the years. Determining whether there are still issues often does require examination of the specific file being imported and analyzing whether it’s actually something about the encoding in the file or particular data that is presenting a problem. If the encoding and data look OK, then we need to dig deeper, which many on this list are really expert at doing, as we’ve seen!)

Christine

Christine Di Bella
ArchivesSpace Program Manager
christine.dibella at lyrasis.org<mailto:christine.dibella at lyrasis.org>
800.999.8558 x2905
678-235-2905
cdibella13 (Skype)

[ASpaceOrgHomeMedium]

From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Majewski, Steven Dennis (sdm7g)
Sent: Wednesday, February 15, 2017 5:25 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: [Archivesspace_Users_Group] JIRA issues cleanup ? [was: Enumerations Findings]

BTW: I know there’s been a recent changing of the guard, so cleanup of JIRA issues may have been neglected, but on this subject:  I believe ar-1421 and ar-647 can both be marked as closed ( and probably several others. Or is this status being tracked somewhere else?  )

https://archivesspace.atlassian.net/browse/AR-1421?jql=text%20~%20%22utf-8%22

https://archivesspace.atlassian.net/browse/AR-647?jql=text%20~%20%22utf-8%22

— Steve.

On Feb 15, 2017, at 4:04 PM, Majewski, Steven Dennis (sdm7g) <sdm7g at eservices.virginia.edu<mailto:sdm7g at eservices.virginia.edu>> wrote:

Yes, and the previous cases I’ve seen ( which have since been fixed ) have been where the document was originally parsed with correct character encoding, but that encoding wasn’t being preserved on some other
( xml or json ) internal transform. So that might be something to look for if it’s still happening in a new use case.

On Feb 15, 2017, at 3:54 PM, Reese, Terry P. <reese.2179 at osu.edu<mailto:reese.2179 at osu.edu>> wrote:

I’d be interested in the same thing (a sample file).  I’m familiar with the tools being used, and if the data is UTF8, then you shouldn’t see this problem unless the import is munging the data or encoding – which would be a much different problem.

--tr

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Majewski, Steven Dennis (sdm7g)
Sent: Wednesday, February 15, 2017 3:50 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Enumerations Findings

Do you have a sample import file that fails this way ?
Do you know if it still fail on current release ?
( and is bug reported on Jira ? )

— Steve.

On Feb 15, 2017, at 3:25 PM, Lisa Calahan <lcalahan at umn.edu<mailto:lcalahan at umn.edu>> wrote:

I've also received the same UTF8 error when importing legacy accession records that have validdiacritical marks in the title and/or agent name.

Lisa

On Wed, Feb 15, 2017 at 2:17 PM, Reese, Terry P. <reese.2179 at osu.edu<mailto:reese.2179 at osu.edu>> wrote:
I guess my question would be – is your legacy data UTF8?  For whatever reason, I’ve found that historically, Archives have often used other charactersets when encoding their EAD files (though to be fair, I see this in MARC records as well; confusion between MARC8, ISO8859-1, and codepage 1252).  The simply solution (and this would maintain your characters) would be to convert the character set to UTF8.  Otherwise, even if you held on to these values – they wouldn’t display in any form that you could read; and in fact, that is what the error message is trying to tell you.  That as a UTF8 value, your data is going to be gibberish, regardless of if you keep it or not.

--tr

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>] On Behalf Of Stasiulatis, Suzanne
Sent: Wednesday, February 15, 2017 3:12 PM

To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Enumerations Findings

I totally agree that we shouldn’t have special characters if at all possible, but a large amount of our legacy data uses them. Especially in titles, staff want to use those characters as they are reflected on original materials.

Suzanne

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Reese, Terry P.
Sent: Wednesday, February 15, 2017 2:58 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] Enumerations Findings

Why would you want to retain invalid special characters?  My guess is that one of the reasons for this error is that invalid characters would cause problems with indexing for search, as well as impact display and export.  I would think you’d want to use the error as a flag to identify data that needs to be corrected.  Or am I missing something?

--tr

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Stasiulatis, Suzanne
Sent: Wednesday, February 15, 2017 2:52 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Enumerations Findings

This also came up for me recently. If invalid special characters are present in the content titles, I get this error. I’m not sure quite how to adjust to accept those special characters.

<image002.png>

Suzanne Stasiulatis | Archivist II
Pennsylvania Historical and Museum Commission | Pennsylvania State Archives
350 North Street | Harrisburg, PA 17120-0090
Phone: 717-787-5953<tel:(717)%20787-5953>
http://www.phmc.pa.gov<http://www.phmc.pa.gov/>
sustasiula at pa.gov<mailto:sustasiula at pa.gov>

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Majewski, Steven Dennis (sdm7g)
Sent: Wednesday, February 15, 2017 2:36 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] Enumerations Findings

We have run into the case that some EAD attribute values are required to be NMTOKENs, thus no embedded spaces or other disallowed characters. We replaced enumerations with embedded spaces with underscores.

This has only come to my attention in the last week or so, so I haven’t made a thorough investigation of which attributes or which enumerations this applies to — just fixed them as I’ve encountered that error.

So it may be intentional that it is using the non translated value.
( And I wouldn’t be surprised, if for simplicity, it may be over applying that rule in places where it’s not actually required. )

— Steve.

On Feb 15, 2017, at 2:09 PM, Carlos Lemus <carlos.lemus at unlv.edu<mailto:carlos.lemus at unlv.edu>> wrote:

Hello,

At UNLV Special Collections, we've been working on cleaning up our enumeration values because in many cases there were duplicates caused by imports (i.e value: linear_feet vs value: Linear feet vs Linear Feet). We wanted to stick as close as possible to ArchivesSpace standards and decided to make our enumeration values all lowercase seperated by an underscore and then merge any records with incorrect enumerations into that correct value (i.e value: linear Feet into linear_feet). We also have some custom enumerations such as: value: oversized_box, translation: Oversized Box; digital_file; Digital File

After we had that set up correctly, we had some findings and was wondering if anyone has experienced the same things or had a standard we could use.

1. When generating PDFs and EADs the enumeration values that were custom (such as the oversized_box) would come out as machine readable oversized_box instead of using our local en.yml value (located in the local plugin).
     This was something I found in the EAD serializer (https://github.com/archivesspace/archivesspace/blob/master/backend/app/exporters/serializers/ead.rb#L490) and was able to create a temporary solution of generating it , but required altering the enumeration instead of referencing our file. I thought i'd point it out because anyone creating custom enumerations even with a translation in an en.yml  file would not see their change reflected in the EAD export. (I've attached an image reflecting this) Anyone experience this?

2. Another example of this case was in the container "type" attribute. Before something like Oversized Box would be export to EAD as is because that was it's value in the enumeration. After we changed the value correctly to oversized_box, it would export to the EAD container "type" as is and translate to the PDF as well. With some XSLT manipulation I was able to get it to show up as oversized box (shown in attachments). I've looked through https://www.loc.gov/ead/tglib/elements/container.html and cannot find an example of a two+ attribute value.

Should attributes be machine readable (i.e oversized_box), human readable (Oversized Box), or does it even matter? Of course, exporting it as Oversized Box would be easiest to translate a user friendly version to the user.

Excuse me for the lengthy post, I'm trying to be thorough with my explenation, but please let me know if you've come accross something similar or have a finite solution.

Carlos Lemus
Application Programmer, Special Collections Technical Services
University Libraries, University of Nevada, Las Vegas

How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth? - Sherlock Holmes
<enumeration_ead.PNG><containers_enum.PNG>_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

--

Head of Archival Processing

University of Minnesota Libraries
Archives and Special Collections
Elmer L. Andersen Library, Suite 315
222-21st Ave. S.
Minneapolis MN 55455

Phone: 612.626.2531
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20170216/d1432250/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 4144 bytes
Desc: image003.jpg
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20170216/d1432250/attachment.jpg>