[Archivesspace_Users_Group] EAD Import - cryptic error messages

Steven Majewski sdm7g at virginia.edu
Fri Mar 7 17:04:06 EST 2014


I can now batch import a large majority of our 4074  EAD files into ArchivesSpace. 
( I’ve only samples and tested a portion of other Virginia Heritage institutions 6000+ files. ) 

All but 21 files parse and produce json files with my batch import parser. 
Not all of those json files import with POST /repositories/$ID/batch_imports: several hundred fail,
usually with Java memory errors returned. In the one instance I’ve tried, I was able to import 
the file successfully using the frontend web import job upload form. 

The remaining error messages are:


   1  #<:ValidationException: {:errors=>{"instances/0/container/type_1"=>["Property is required but was missing"]}}>
   1  #<:ValidationException: {:errors=>{"record"=>["Can't unambiguously match {:reference_text=>\"(In non correspondence -legal)\"} against schema types: [\"JSONModel(:note_index_item) object\"]. Resolve this by adding a 'jsonmodel_type' property to {:reference_text=>\"(In non correspondence -legal)\"}"]}}>
   1  Invalid schema given: string
   2  #<:ValidationException: {:errors=>{"notes/7/subnotes/0/content"=>["Must be 65000 characters or fewer"]}}>
  16  #<:ValidationException: {:errors=>{"extents"=>["At least 1 item(s) is required"]}}>


I will continue to investigate these errors, but in the mean time, it seemed we could import a sufficient number to go on to 
evaluating the rest of the system. 

Here is the stylesheet I’m using to fix some of these import problems. In some cases, the “fix” is just papering over a problem
to get the guide imported. For example, we’re inserting “1 arbitrary_unit”  for missing <extent> elements; or arbitrarily truncating
the <eadid> at 255 characters.  The desired plan would be to get all of the collections imported into ArchivesSpace and review
and correct them there (rather than editing EAD xml files).  But we need a good way of tagging the elements that need review
and correction.  I don’t know if the xml comments I’ve inserted will prove to be a useful solution to that. 

Here, I believe Archivist’s Toolkit solution was to import files even when it didn’t meet it’s requirements, but then you could not
publish or expand the hierarchy of the guide until you had gone thru and corrected the issues, which were hilighted in the edit
form.  This would be the ideal, but I would at least like to figure out how to tag elements as needing review.  


( BTW: What is the difference between not-published & suppressed ?  ) 





Looking directly at the json files (using that patch below) definitely speeded up finding the source of the problems. 
This is the main problem with using an intermediate representation like json-model: the error messages all reference
the intermediate language, and not the source language.  What is the poor EAD author to make of an error message
like this: 

#<:ValidationException: {:errors=>{"record"=>["Can't unambiguously match {:reference_text=>\"(In non correspondence -legal)\"} against schema types: [\"JSONModel(:note_index_item) object\"]. Resolve this by adding a 'jsonmodel_type' property to {:reference_text=>\"(In non correspondence -legal)\"}"]}}>

??

If fixing the error messages proves to be too difficult a task, maybe it would be simpler to produce a schema that matches 
what AS EAD import accepts, and people can do their own pre-validation against that restricted schema. 
( Or is the goal the other way around ? to eventually get AS imported to handle any valid EAD file ?  ) 



— Steve Majewski / UVA Alderman Library


On Mar 3, 2014, at 5:25 PM, Chris Fitzpatrick <Chris.Fitzpatrick at lyrasis.org> wrote:

> Hi Steven,
> 
> Wow, thanks for this. I'm am going over this and it really helps for the improved error messaging we are trying to setup. 
> 
> I definitely think it should be doable to strip out any empty XML tags and not have them create JSON nodes.
> 
> Also looking at the diff you sent..it seems to cause some problems with the test suite, but I need to figure out what's going on there and way this is stripping out some of these error messages. Will update soon. 
> 
> But yes, until then a good work around would be to strip out empty EAD tags prior to import....
> 
> best,chris. 
> 
> 
> Chris Fitzpatrick | chris.fitzpatrick at lyrasis.org
> Developer, ArchivesSpace
> http://archivesspace.org/
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Steven Majewski <sdm7g at virginia.edu>
> Sent: Saturday, March 01, 2014 5:14 PM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] EAD Import - cryptic error messages
>  
> 
> I found a way to get info on the EAD -> JSON_schema mappings, and I’ve managed to fix those notes/0/content errors as well as several others.
> 
> For debugging purposes, I make this temporary change to jsonmodel_wrap.rb  to  ignore all validation errors,
> and I run my command line EAD import parser.  ( I haven’t tried running this code on the backend server — no idea what that might break. )
> 
> --- a/backend/app/converters/lib/jsonmodel_wrap.rb
> +++ b/backend/app/converters/lib/jsonmodel_wrap.rb
> @@ -13,10 +13,10 @@ module ASpaceImport
>          # TODO - speed things up by avoiding this another way
>          rescue JSONModel::ValidationException => e
>  
> -          e.errors.reject! {|path, mssg|
> -                            e.attribute_types &&
> -                            e.attribute_types.has_key?(path) &&
> -                            e.attribute_types[path] == 'ArchivesSpaceDynamicEnum'}
> +          e.errors.reject! {|path, mssg| true }
> +#                            e.attribute_types &&
> +#                            e.attribute_types.has_key?(path) &&
> +#                            e.attribute_types[path] == 'ArchivesSpaceDynamicEnum'}
> 
> 
> This generates json files for almost all of the EAD files. ( except for about 30, which I assume are the ones with errors other
> than #<:ValidationException…> ).  The ones that would not have normally validated correctly will still generate validation
> errors if POSTED to  /repositories/$ID/batch_imports.  However, I can pipe them thru json_pp and search for the schema 
> property in the error message.  So far, this has yielded enough context information to identify the source of the problem 
> in the EAD file. 
> 
> Most of these problems seem to trace back to empty elements in the EAD file. 
> 
> In a few cases, there is a missing required element ( unitid, for example ), but in most cases, removing the empty element
> fixes the problem.  Is this something that could be fixed in the parser ? : if the element is empty, don’t create a JSON property
> for it ?   
> ( For now, I’m adding templates for all of the glitches I’ve found to a AS fixup stylesheet run as a pre-process to AS import. ) 
> 
> 
> — Steve Majewski
> 
> 
> On Feb 28, 2014, at 8:43 AM, Brad Westbrook <brad.westbrook at lyrasis.org> wrote:
> 
>> Hi, Steve,
>>  
>> I won’t be able to address your mapping request until next week.
>>  
>> We are working on a public release now which will address the LDAP security hole reported a couple of weeks ago and include a number of enhancements made since the 1.0.4 release on Jan. 20.  We are aiming to announce the release later today, but it might not be until Monday, depending on resolution of one item.
>>  
>> Brad
>>  
>> Bradley D. Westbrook
>> Program Manager
>> brad.westbrook at lyrasis.org
>> 800.999.8558 x2910
>> 678.235.2910
>> bradley_d_westbrook (Skype) 
>> <image001.png>
>>  
>>  
>>  
>>  
>>  
>> From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org]On Behalf Of Steven Majewski
>> Sent: Friday, February 28, 2014 8:33 AM
>> To: Archivesspace Users Group
>> Subject: Re: [Archivesspace_Users_Group] EAD Import - cryptic error messages
>>  
>>  
>> Brad:
>>  
>>  Can  you express this requirement in terms of EAD elements  instead of  JSONModel schema types ? 
>>  It’s that mapping that is giving me trouble:  trying to turn the schema references in those error messages
>>  into elements in the imported EAD that need to be addressed. 
>>  
>>  Any ETA for that next release ? 
>> I’ve managed to fixup some of the import problems with a stylesheet: I’m up to 2749 files out of 4074 parsing successfully ( up from 0 and 300+ on my 
>> initial efforts ). That notes/0/content message is my greatest outstanding issue: 
>>  
>> 1210  #<:ValidationException: {:errors=>{"notes/0/content"=>["At least 1 item(s) is required"]}}>
>>   31  Unexpected Object Type in Queue: Expected archival_object got container
>>   30  #<:ValidationException: {:errors=>{"dates"=>["one or more required (or enter a Title)"], "title"=>["must not be an empty string (or enter a Date)"]}}>
>>   11  #<:ValidationException: {:errors=>{"instances/0/container/indicator_1"=>["Property is required but was missing"]}}>
>>   11  #<:ValidationException: {:errors=>{"id_0"=>["Property is required but was missing"]}}>
>>    8  #<:ValidationException: {:errors=>{"extents"=>["At least 1 item(s) is required"], "notes/0/content"=>["At least 1 item(s) is required"]}}>
>>    6  #<:ValidationException: {:errors=>{"extents"=>["At least 1 item(s) is required"]}}>
>>    5  #<:ValidationException: {:errors=>{"notes/0/content"=>["At least 1 item(s) is required"], "id_0"=>["Property is required but was missing"]}}>
>>    5  #<:ValidationException: {:errors=>{"ead_id"=>["Must be 255 characters or fewer"]}}>
>>    2  #<:ValidationException: {:errors=>{"instances/0/container/type_1"=>["Property is required but was missing"]}}>
>>    1  Invalid schema given: string
>>    1  #<:ValidationException: {:errors=>{"record"=>["Can't unambiguously match {:reference_text=>\"(In non correspondence -legal)\"} against schema types: [\"JSONModel(:note_index_item) object\"]. Resolve this by adding a 'jsonmodel_type' property to {:reference_text=>\"(In non correspondence -legal)\"}"]}}>
>>    1  #<:ValidationException: {:errors=>{"notes/7/subnotes/0/content"=>["Must be 65000 characters or fewer"]}}>
>>    1  #<:ValidationException: {:errors=>{"notes/0/content"=>["At least 1 item(s) is required"], "notes/8/subnotes/0/content"=>["Must be 65000 characters or fewer"]}}>
>>    1  #<:ValidationException: {:errors=>{"instances/0/container/type_1"=>["Property is required but was missing"], "instances/0/container/indicator_1"=>["Property is required but was missing"]}}>
>>    1  #<:ValidationException: {:errors=>{"extents"=>["At least 1 item(s) is required"], "ead_id"=>["Must be 255 characters or fewer"]}}>
>>  
>>  
>>  
>> — Steve M.
>>  
> 
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20140307/d2aa6a9f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: as-munger.xsl
Type: application/octet-stream
Size: 6508 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20140307/d2aa6a9f/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20140307/d2aa6a9f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4943 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20140307/d2aa6a9f/attachment.bin>


More information about the Archivesspace_Users_Group mailing list