[Archivesspace_Users_Group] EAD encoding errors again

Majewski, Steven Dennis (sdm7g) sdm7g at eservices.virginia.edu
Fri Feb 5 16:28:16 EST 2016

Some time ago there was a thread about some encoding errors on EAD Import that seemed hard to replicate.

I just ran into an example, and it appears that the encoding error does not depend at all on the encoding of the imported EAD file.

The job failed with this error:

Error: #<Encoding::InvalidByteSequenceError: ""\xE2"" on US-ASCII> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

This was with a file encoded as UTF-8.
I converted the file to US-ASCII encoding with all non-ascii characters numerically encoded.
The job failed with the same error message.
Only after replacing all of the numerically encoded characters with other ascii characters was I able to successfully import it.

I have been able to import lots of EAD files with non-ascii unicode characters without any problems before.
The persistence of this error under different encodings seems to indicate that it’s failing on some internal transformation after the file has been parsed, and the fact that I haven’t seen this problem with other files would seem to indicate that the problem is specific to the processing of a specific element. So it’s perhaps possible that the previous, hard to replicate errors were also context specific.

Unfortunately, I’m seeing this problem on my largest EAD file, and it’s too large to upload to either sandbox.archivesspace.org<http://sandbox.archivesspace.org> or JIRA. I will try to isolate the cause of this error and produce a smaller test file.

[ I added a JIRA ticket AR-1421 for this, as well as AR-1420 for the import timeout error in my earlier messages. ]

The good news is that after replacing those non-ascii characters, the file did successfully import in a timely manner.
This was the 14MB file that I previously reported as taking more than 24 hours to import. I didn’t time the operation, but it was certainly under 15-20 minutes.

— Steve Majewski

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20160205/0912fdb5/attachment.html>

More information about the Archivesspace_Users_Group mailing list