[Archivesspace_Users_Group] EAD Import Issue...
Steven Majewski
sdm7g at virginia.edu
Fri Jul 31 14:04:17 EDT 2015
When I download that original email enclosure and run xmllint on it, it doesn’t show any encoding errors.
I was also able to import it into ArchivesSpace without any errors.
I wonder if the translation to and from Base64 encoding of the email enclosure somehow
transforms the character encoding and fixes the problem ?
Re: testing with Schematron:
In my experience ( with doing validation in Java and hitting those kind of encoding errors )
encoding errors come from early in the processing pipeline before Schematron or XSLT processing.
I think you would need to scan the file for invalid encoding before passing it to the XML parser.
( In fact, I’m not even sure if you can express an invalid encoding in Schematron if it’s XML in
a particular encoding. )
— Steve.
> On Jul 31, 2015, at 1:12 PM, Custer, Mark <mark.custer at yale.edu> wrote:
>
> Interesting. I just tried to change the encoding value, but that doesn’t work. If you do a find and replace in the file to replace the single quotes, though, the record will import fine. I’ve attached a copy of the record that I was able to import.
>
> For the record, using that type of single quote doesn’t invalidate the EAD file. It’s still perfectly valid, but I don’t know if it’s fully UTF-8 compliant.
>
> Is there any way to come up with a list of invalid characters? If so, then that could be added to a Schematron file to test and make sure those values aren’t present before attempting to do the batch upload.
>
> Mark
>
>
>
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Steven Majewski
> Sent: Friday, July 31, 2015 12:31 PM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] EAD Import Issue...
>
>
> You might also try changing the encoding of the EAD file in the XML header.
> If it’s not declared, by default it’s UTF-8.
> Change the first line to:
>
> <?xml version="1.0" encoding="windows-1252"?>
>
>
> ( I don’t know for a fact if this will work for ArchivesSpace, but it works with most parsers and validators. )
>
>
> Alternatively, if you have ‘iconv’ you can run a conversion thru that program to change the encoding:
>
> iconv -f WINDOWS-1252 -t UTF-8
>
>
>
>
> — Steve Majewski
>
>
>
> On Jul 31, 2015, at 12:17 PM, Tomecek, Christy <christy.tomecek at yale.edu <mailto:christy.tomecek at yale.edu>> wrote:
>
> Hello,
>
> I think the issue is that there are Word “Smart Quotes” in your text fields (not the markup itself). The EAD won’t validate if they are present.
>
> Example (Smart quote highlighted):
>
> <abstract label="Abstract">Dating from 1918 to 2000, the History and Background Information series consists of written histories, newspaper clippings, and anniversary publications documenting St. Vincent’s steady growth in the Lincoln Park neighborhood.
>
> There is a way to turn off Smart Quotes in Word so this way you don’t have to go line by line fixing them if you are doing a copy-paste from a Word Document into ASpace.
>
> · Open Word. Go to File (or if you are in Windows 8/8.1, go to the Windows logo button).
> · Scroll to the bottom of the sidebar where things like "New," "Save," etc. are and click on "Options" at the bottom.
> · Go to "Proofing," located on the sidebar.
> · Go to "AutoCorrect Options" in the main panel.
> · Go to the "AutoFormat As You Type" tab and uncheck the "'Straight quotes' with 'smart quotes'" options under "Replace When You Type."
>
> Best,
> Christy
>
> --
>
> Christy Tomecek
> Archives Assistant
> Yale University Library
> Manuscripts and Archives
> 203-432-7382
> christy.tomecek at yale.edu <mailto:christy.tomecek at yale.edu>
>
>
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>] On Behalf Of Rossetti, Dominic
> Sent: Friday, July 31, 2015 11:58 AM
> To: 'archivesspace_users_group at lyralists.lyrasis.org <mailto:archivesspace_users_group at lyralists.lyrasis.org>'
> Subject: [Archivesspace_Users_Group] EAD Import Issue...
>
> Hey all,
>
> When trying to import EAD I get the following error message in the log file:
>
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> IMPORT ERROR
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
>
> Error: #<Encoding::UndefinedConversionError: ""\x9D"" from Windows-1252 to UTF-8>
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
> I’ve attached a file as an example. The EAD is valid and correct. Not sure what is causing the issue.
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwMFaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=ONZ_tzFly3LOGxk_7dqccGCBrqT5JyiPkFAKzNIo-fk&s=MlHFxUG4tOKSstrhCGRzJgiOBpbTCE-S0CkPZW3m9m0&e=>
>
> <dpu_ead_cm0001_stvincentchurchchi-edited.xml>_______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150731/0c424a27/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4943 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150731/0c424a27/attachment.bin>
More information about the Archivesspace_Users_Group
mailing list