[Archivesspace_Users_Group] EAD Import Issue...

Steven Majewski sdm7g at virginia.edu
Fri Jul 31 14:04:17 EDT 2015


When I download that original email enclosure and run xmllint on it, it doesn’t show any encoding errors.
I was also able to import it into ArchivesSpace without any errors.

I wonder if the translation to and from Base64 encoding of the email enclosure somehow 
transforms the character encoding and fixes the problem ? 


Re: testing with Schematron: 

In my experience ( with doing validation in Java and hitting those kind of encoding errors )
encoding errors come from early in the processing pipeline before Schematron or XSLT processing. 
I think you would need to scan the file for invalid encoding before passing it to the XML parser. 
( In fact, I’m not even sure if you can express an invalid encoding in Schematron if it’s XML in 
  a particular encoding. ) 


— Steve. 



> On Jul 31, 2015, at 1:12 PM, Custer, Mark <mark.custer at yale.edu> wrote:
> 
> Interesting.  I just tried to change the encoding value, but that doesn’t work.  If you do a find and replace in the file to replace the single quotes, though, the record will import fine.  I’ve attached a copy of the record that I was able to import.
>  
> For the record, using that type of single quote doesn’t invalidate the EAD file.  It’s still perfectly valid, but I don’t know if it’s fully UTF-8 compliant.
>  
> Is there any way to come up with a list of invalid characters?  If so, then that could be added to a Schematron file to test and make sure those values aren’t present before attempting to do the batch upload.
>  
> Mark
>  
>  
>  
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Steven Majewski
> Sent: Friday, July 31, 2015 12:31 PM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] EAD Import Issue...
>  
>  
> You might also try changing the encoding of the EAD file in the XML header.
> If it’s not declared, by default it’s UTF-8. 
> Change the first line to:
>  
>             <?xml version="1.0" encoding="windows-1252"?>
>  
>  
> ( I don’t know for a fact if this will work for ArchivesSpace, but it works with most parsers and validators. )
>  
>  
> Alternatively, if you have ‘iconv’ you can run a conversion thru that program to change the encoding:
>  
> iconv -f WINDOWS-1252 -t UTF-8 
>  
>  
>  
>  
> — Steve Majewski
>  
>  
>  
> On Jul 31, 2015, at 12:17 PM, Tomecek, Christy <christy.tomecek at yale.edu <mailto:christy.tomecek at yale.edu>> wrote:
>  
> Hello,
>  
> I think the issue is that there are Word “Smart Quotes” in your text fields (not the markup itself). The EAD won’t validate if they are present.
>  
> Example (Smart quote highlighted):
>  
> <abstract label="Abstract">Dating from 1918 to 2000, the History and Background Information series consists of written histories, newspaper clippings, and anniversary publications documenting St. Vincent’s steady growth in the Lincoln Park neighborhood.
>  
> There is a way to turn off Smart Quotes in Word so this way you don’t have to go line by line fixing them if you are doing a copy-paste from a Word Document into ASpace.
>  
> ·         Open Word. Go to File (or if you are in Windows 8/8.1, go to the Windows logo button).
> ·         Scroll to the bottom of the sidebar where things like "New," "Save," etc. are and click on "Options" at the bottom. 
> ·         Go to "Proofing," located on the sidebar.
> ·         Go to "AutoCorrect Options" in the main panel.
> ·         Go to the "AutoFormat As You Type" tab and uncheck the "'Straight quotes' with 'smart quotes'" options under "Replace When You Type."
>  
> Best,
> Christy
>  
> --
>  
> Christy Tomecek
> Archives Assistant
> Yale University Library
> Manuscripts and Archives
> 203-432-7382
> christy.tomecek at yale.edu <mailto:christy.tomecek at yale.edu>
>  
>  
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org <mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>] On Behalf Of Rossetti, Dominic
> Sent: Friday, July 31, 2015 11:58 AM
> To: 'archivesspace_users_group at lyralists.lyrasis.org <mailto:archivesspace_users_group at lyralists.lyrasis.org>'
> Subject: [Archivesspace_Users_Group] EAD Import Issue...
>  
> Hey all,
>  
> When trying to import EAD I get the following error message in the log file:
>  
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> IMPORT ERROR
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>  
>  
> Error: #<Encoding::UndefinedConversionError: ""\x9D"" from Windows-1252 to UTF-8>
> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>  
> I’ve attached a file as an example. The EAD is valid and correct. Not sure what is causing the issue.
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwMFaQ&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=ONZ_tzFly3LOGxk_7dqccGCBrqT5JyiPkFAKzNIo-fk&s=MlHFxUG4tOKSstrhCGRzJgiOBpbTCE-S0CkPZW3m9m0&e=>
>  
> <dpu_ead_cm0001_stvincentchurchchi-edited.xml>_______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group <http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150731/0c424a27/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4943 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150731/0c424a27/attachment.bin>


More information about the Archivesspace_Users_Group mailing list