[Archivesspace_Users_Group] EAD Import - cryptic error messages

Custer, Mark mark.custer at yale.edu
Mon Mar 10 10:55:54 EDT 2014


Just to second this:  I would love to have a schema available that defines what type of EAD files are supported by ArchivesSpace!


From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Steven Majewski
Sent: Friday, March 07, 2014 5:04 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD Import - cryptic error messages


I can now batch import a large majority of our 4074  EAD files into ArchivesSpace.
( I've only samples and tested a portion of other Virginia Heritage institutions 6000+ files. )

All but 21 files parse and produce json files with my batch import parser.
Not all of those json files import with POST /repositories/$ID/batch_imports: several hundred fail,
usually with Java memory errors returned. In the one instance I've tried, I was able to import
the file successfully using the frontend web import job upload form.

The remaining error messages are:


   1  #<:ValidationException: {:errors=>{"instances/0/container/type_1"=>["Property is required but was missing"]}}>
   1  #<:ValidationException: {:errors=>{"record"=>["Can't unambiguously match {:reference_text=>\"(In non correspondence -legal)\"} against schema types: [\"JSONModel(:note_index_item) object\"]. Resolve this by adding a 'jsonmodel_type' property to {:reference_text=>\"(In non correspondence -legal)\"}"]}}>
   1  Invalid schema given: string
   2  #<:ValidationException: {:errors=>{"notes/7/subnotes/0/content"=>["Must be 65000 characters or fewer"]}}>
  16  #<:ValidationException: {:errors=>{"extents"=>["At least 1 item(s) is required"]}}>

I will continue to investigate these errors, but in the mean time, it seemed we could import a sufficient number to go on to
evaluating the rest of the system.

Here is the stylesheet I'm using to fix some of these import problems. In some cases, the "fix" is just papering over a problem
to get the guide imported. For example, we're inserting "1 arbitrary_unit"  for missing <extent> elements; or arbitrarily truncating
the <eadid> at 255 characters.  The desired plan would be to get all of the collections imported into ArchivesSpace and review
and correct them there (rather than editing EAD xml files).  But we need a good way of tagging the elements that need review
and correction.  I don't know if the xml comments I've inserted will prove to be a useful solution to that.

Here, I believe Archivist's Toolkit solution was to import files even when it didn't meet it's requirements, but then you could not
publish or expand the hierarchy of the guide until you had gone thru and corrected the issues, which were hilighted in the edit
form.  This would be the ideal, but I would at least like to figure out how to tag elements as needing review.


( BTW: What is the difference between not-published & suppressed ?  )




Looking directly at the json files (using that patch below) definitely speeded up finding the source of the problems.
This is the main problem with using an intermediate representation like json-model: the error messages all reference
the intermediate language, and not the source language.  What is the poor EAD author to make of an error message
like this:

#<:ValidationException: {:errors=>{"record"=>["Can't unambiguously match {:reference_text=>\"(In non correspondence -legal)\"} against schema types: [\"JSONModel(:note_index_item) object\"]. Resolve this by adding a 'jsonmodel_type' property to {:reference_text=>\"(In non correspondence -legal)\"}"]}}>

??

If fixing the error messages proves to be too difficult a task, maybe it would be simpler to produce a schema that matches
what AS EAD import accepts, and people can do their own pre-validation against that restricted schema.
( Or is the goal the other way around ? to eventually get AS imported to handle any valid EAD file ?  )



- Steve Majewski / UVA Alderman Library


On Mar 3, 2014, at 5:25 PM, Chris Fitzpatrick <Chris.Fitzpatrick at lyrasis.org<mailto:Chris.Fitzpatrick at lyrasis.org>> wrote:


Hi Steven,

Wow, thanks for this. I'm am going over this and it really helps for the improved error messaging we are trying to setup.

I definitely think it should be doable to strip out any empty XML tags and not have them create JSON nodes.

Also looking at the diff you sent..it seems to cause some problems with the test suite, but I need to figure out what's going on there and way this is stripping out some of these error messages. Will update soon.

But yes, until then a good work around would be to strip out empty EAD tags prior to import....

best,chris.


Chris Fitzpatrick | chris.fitzpatrick at lyrasis.org<mailto:chris.fitzpatrick at lyrasis.org>
Developer, ArchivesSpace
http://archivesspace.org/
________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Steven Majewski <sdm7g at virginia.edu<mailto:sdm7g at virginia.edu>>
Sent: Saturday, March 01, 2014 5:14 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD Import - cryptic error messages


I found a way to get info on the EAD -> JSON_schema mappings, and I've managed to fix those notes/0/content errors as well as several others.

For debugging purposes, I make this temporary change to jsonmodel_wrap.rb  to  ignore all validation errors,
and I run my command line EAD import parser.  ( I haven't tried running this code on the backend server - no idea what that might break. )

--- a/backend/app/converters/lib/jsonmodel_wrap.rb
+++ b/backend/app/converters/lib/jsonmodel_wrap.rb
@@ -13,10 +13,10 @@ module ASpaceImport
         # TODO - speed things up by avoiding this another way
         rescue JSONModel::ValidationException => e

-          e.errors.reject! {|path, mssg|
-                            e.attribute_types &&
-                            e.attribute_types.has_key?(path) &&
-                            e.attribute_types[path] == 'ArchivesSpaceDynamicEnum'}
+          e.errors.reject! {|path, mssg| true }
+#                            e.attribute_types &&
+#                            e.attribute_types.has_key?(path) &&
+#                            e.attribute_types[path] == 'ArchivesSpaceDynamicEnum'}


This generates json files for almost all of the EAD files. ( except for about 30, which I assume are the ones with errors other
than #<:ValidationException...> ).  The ones that would not have normally validated correctly will still generate validation
errors if POSTED to  /repositories/$ID/batch_imports.  However, I can pipe them thru json_pp and search for the schema
property in the error message.  So far, this has yielded enough context information to identify the source of the problem
in the EAD file.

Most of these problems seem to trace back to empty elements in the EAD file.

In a few cases, there is a missing required element ( unitid, for example ), but in most cases, removing the empty element
fixes the problem.  Is this something that could be fixed in the parser ? : if the element is empty, don't create a JSON property
for it ?
( For now, I'm adding templates for all of the glitches I've found to a AS fixup stylesheet run as a pre-process to AS import. )


- Steve Majewski


On Feb 28, 2014, at 8:43 AM, Brad Westbrook <brad.westbrook at lyrasis.org<mailto:brad.westbrook at lyrasis.org>> wrote:


Hi, Steve,

I won't be able to address your mapping request until next week.

We are working on a public release now which will address the LDAP security hole reported a couple of weeks ago and include a number of enhancements made since the 1.0.4 release on Jan. 20.  We are aiming to announce the release later today, but it might not be until Monday, depending on resolution of one item.

Brad

Bradley D. Westbrook
Program Manager
brad.westbrook at lyrasis.org<mailto:brad at archivesspace.org>
800.999.8558 x2910
678.235.2910
bradley_d_westbrook (Skype)
<image001.png>





From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org]On Behalf Of Steven Majewski
Sent: Friday, February 28, 2014 8:33 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD Import - cryptic error messages


Brad:

 Can  you express this requirement in terms of EAD elements  instead of  JSONModel schema types ?
 It's that mapping that is giving me trouble:  trying to turn the schema references in those error messages
 into elements in the imported EAD that need to be addressed.

 Any ETA for that next release ?
I've managed to fixup some of the import problems with a stylesheet: I'm up to 2749 files out of 4074 parsing successfully ( up from 0 and 300+ on my
initial efforts ). That notes/0/content message is my greatest outstanding issue:

1210  #<:ValidationException: {:errors=>{"notes/0/content"=>["At least 1 item(s) is required"]}}>
  31  Unexpected Object Type in Queue: Expected archival_object got container
  30  #<:ValidationException: {:errors=>{"dates"=>["one or more required (or enter a Title)"], "title"=>["must not be an empty string (or enter a Date)"]}}>
  11  #<:ValidationException: {:errors=>{"instances/0/container/indicator_1"=>["Property is required but was missing"]}}>
  11  #<:ValidationException: {:errors=>{"id_0"=>["Property is required but was missing"]}}>
   8  #<:ValidationException: {:errors=>{"extents"=>["At least 1 item(s) is required"], "notes/0/content"=>["At least 1 item(s) is required"]}}>
   6  #<:ValidationException: {:errors=>{"extents"=>["At least 1 item(s) is required"]}}>
   5  #<:ValidationException: {:errors=>{"notes/0/content"=>["At least 1 item(s) is required"], "id_0"=>["Property is required but was missing"]}}>
   5  #<:ValidationException: {:errors=>{"ead_id"=>["Must be 255 characters or fewer"]}}>
   2  #<:ValidationException: {:errors=>{"instances/0/container/type_1"=>["Property is required but was missing"]}}>
   1  Invalid schema given: string
   1  #<:ValidationException: {:errors=>{"record"=>["Can't unambiguously match {:reference_text=>\"(In non correspondence -legal)\"} against schema types: [\"JSONModel(:note_index_item) object\"]. Resolve this by adding a 'jsonmodel_type' property to {:reference_text=>\"(In non correspondence -legal)\"}"]}}>
   1  #<:ValidationException: {:errors=>{"notes/7/subnotes/0/content"=>["Must be 65000 characters or fewer"]}}>
   1  #<:ValidationException: {:errors=>{"notes/0/content"=>["At least 1 item(s) is required"], "notes/8/subnotes/0/content"=>["Must be 65000 characters or fewer"]}}>
   1  #<:ValidationException: {:errors=>{"instances/0/container/type_1"=>["Property is required but was missing"], "instances/0/container/indicator_1"=>["Property is required but was missing"]}}>
   1  #<:ValidationException: {:errors=>{"extents"=>["At least 1 item(s) is required"], "ead_id"=>["Must be 255 characters or fewer"]}}>



- Steve M.


_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20140310/caca62ef/attachment.html>


More information about the Archivesspace_Users_Group mailing list