<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<div class=""><br class="">
</div>
Some time ago there was a thread about some encoding errors on EAD Import that seemed hard to replicate.
<div class=""><br class="">
</div>
<div class="">I just ran into an example, and it appears that the encoding error does not depend at all on the encoding of the imported EAD file. </div>
<div class=""><br class="">
</div>
<div class="">The job failed with this error:</div>
<div class=""><br class="">
</div>
<div class=""><span style="color: rgb(51, 238, 51); font-family: monospace; font-size: 13px; white-space: pre; background-color: rgb(51, 51, 51);" class="">Error: #<Encoding::InvalidByteSequenceError: ""\xE2"" on US-ASCII> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!</span></div>
<div class=""><span style="color: rgb(51, 238, 51); font-family: monospace; font-size: 13px; white-space: pre; background-color: rgb(51, 51, 51);" class=""><br class="">
</span></div>
<div class=""><span style="color: rgb(51, 238, 51); font-family: monospace; font-size: 13px; white-space: pre; background-color: rgb(51, 51, 51);" class=""><br class="">
</span></div>
<div class="">This was with a file encoded as UTF-8.</div>
<div class="">I converted the file to US-ASCII encoding with all non-ascii characters numerically encoded. </div>
<div class="">The job failed with the same error message. </div>
<div class="">Only after replacing all of the numerically encoded characters with other ascii characters was I able to successfully import it. </div>
<div class=""><br class="">
</div>
<div class="">I have been able to import lots of EAD files with non-ascii unicode characters without any problems before. </div>
<div class="">The persistence of this error under different encodings seems to indicate that it’s failing on some internal transformation after the file has been parsed, and the fact that I haven’t seen this problem with other files would seem to indicate that
the problem is specific to the processing of a specific element. So it’s perhaps possible that the previous, hard to replicate errors were also context specific. </div>
<div class=""><br class="">
</div>
<div class="">Unfortunately, I’m seeing this problem on my largest EAD file, and it’s too large to upload to either
<a href="http://sandbox.archivesspace.org" class="">sandbox.archivesspace.org</a> or JIRA. I will try to isolate the cause of this error and produce a smaller test file. </div>
<div class=""><br class="">
</div>
<div class="">[ I added a JIRA ticket AR-1421 for this, as well as AR-1420 for the import timeout error in my earlier messages. ] </div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">The good news is that after replacing those non-ascii characters, the file did successfully import in a timely manner. </div>
<div class="">This was the 14MB file that I previously reported as taking more than 24 hours to import. I didn’t time the operation, but it was certainly under 15-20 minutes. </div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">— Steve Majewski</div>
<div class=""><br class="">
</div>
</body>
</html>