[Archivesspace_Users_Group] experience with EAD import problems.

Nathan Stevens ns96 at nyu.edu
Wed Mar 11 15:32:04 EDT 2015

Easies way to import all those EAD records into ASpace is to first import
them into an empty AT instance, clean them up in the AT, then use the
migration plugin to copy them to the ASpace instance.  The migration plugin
adds some of the missing dates and extents as well as giving you a nice
report and to what records fail and for why. Also, this method is also
vastly more efficient.

On Fri, Mar 6, 2015 at 3:05 PM, Steven Majewski <sdm7g at virginia.edu> wrote:

> Initially, we were hoping we could do a batch import of our 4000+ EAD
> finding aids into ArchivesSpace
> and use it to clean up the guides to a better level of consistency and
> standardization, and avoid staff
> having to deal with editing and validating XML. However, we ran into the
> problem that ArchivesSpace
> requirements for EAD import are stricter than the EAD schema, and the
> guides need to be cleaned up
> AS xml first before importing.
> Our first attempt was to try to automate some of this with an XSL
> stylesheet that tried to coerce our EAD
> into conformance with what ArchivesSpace’s import expected.  With this
> preprocessing, we were able
> to ingest the majority of our EAD guides.
> A lot of our EAD had <physdesc> with no <extent> elements, or else extent
> did not conform to ( number,unit )
> required by ArchivesSpace, so our stylesheet either wrapped the
> physdesc/text in an extent element
> ( it it started with a digit and looked like it might be  a ( number, unit
> ) ) or inserted:  '<extent>1 arbitrary_unit</extent>’
> Unfortunately, this had the side effect of exploding the controlled value
> list for extent_extent_type and making
> the drop down menu for that field unusable as there were too many values
> to display.
> We were giving up the idea of importing them all in a batch and planning
> on setting up our test server
> as a staging server. We would import and clean up EAD on the test server
> before exporting and re-importing
> on the production server.  Importing them all in one batch made it too
> difficult to clean up and merge
> extents in the controlled value list. There was still a problem with
> flagging all of the extents that needed
> manual review. The ‘1 arbitrary_unit’ was easy to find, but the others
> were more of a problem.
> ( We also had issues with required unitdate | unittitle  , empty elements,
> and other differences in imput mappings
>   that we attempted  to fix with our stylesheet. )
> I have since found another method. I have modified the resource schema to
> make extents and dates not required,
> and added it to plugins/local/schemas:  ASpace-plugins/schemas at master
> · uvalib-dcs/ASpace-plugins
> <https://github.com/uvalib-dcs/ASpace-plugins/tree/master/schemas>
> We may try a combined approach: doing some fixup with XSL stylesheet, but
> not trying to coerce everything
> into an extent, and doing a lot more manual review. And importing in
> smaller batches to avoid massive namespace
> pollution and cleaning up as we go along.
> If we keep the tighter requirements on the production server, we will
> obviously discover missing dates and extents
> on that 2nd import, but we would prefer to be able to catch and flag these
> earlier.  Is there a way to use the looser,
> modified schema on import and require a tighter schema on publishing or
>  export ?
> I would also be interested to hear of others experience with these issues
> or ideas  for dealing with them.
> — Steve Majewski / UVA Alderman Library
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

Nathan Stevens
Digital Library Technology Services
New York University

ns96 at nyu.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150311/ff76a166/attachment.html>

More information about the Archivesspace_Users_Group mailing list