[Archivesspace_Users_Group] experience with EAD import problems.

Steven Majewski sdm7g at virginia.edu
Fri Mar 6 15:05:35 EST 2015

Initially, we were hoping we could do a batch import of our 4000+ EAD finding aids into ArchivesSpace
and use it to clean up the guides to a better level of consistency and standardization, and avoid staff
having to deal with editing and validating XML. However, we ran into the problem that ArchivesSpace
requirements for EAD import are stricter than the EAD schema, and the guides need to be cleaned up
AS xml first before importing. 

Our first attempt was to try to automate some of this with an XSL stylesheet that tried to coerce our EAD
into conformance with what ArchivesSpace’s import expected.  With this preprocessing, we were able
to ingest the majority of our EAD guides. 

A lot of our EAD had <physdesc> with no <extent> elements, or else extent did not conform to ( number,unit )
required by ArchivesSpace, so our stylesheet either wrapped the physdesc/text in an extent element 
( it it started with a digit and looked like it might be  a ( number, unit ) ) or inserted:  '<extent>1 arbitrary_unit</extent>’

Unfortunately, this had the side effect of exploding the controlled value list for extent_extent_type and making 
the drop down menu for that field unusable as there were too many values to display. 

We were giving up the idea of importing them all in a batch and planning on setting up our test server 
as a staging server. We would import and clean up EAD on the test server before exporting and re-importing
on the production server.  Importing them all in one batch made it too difficult to clean up and merge 
extents in the controlled value list. There was still a problem with flagging all of the extents that needed
manual review. The ‘1 arbitrary_unit’ was easy to find, but the others were more of a problem. 

( We also had issues with required unitdate | unittitle  , empty elements, and other differences in imput mappings 
  that we attempted  to fix with our stylesheet. ) 

I have since found another method. I have modified the resource schema to make extents and dates not required,
and added it to plugins/local/schemas:  ASpace-plugins/schemas at master · uvalib-dcs/ASpace-plugins

We may try a combined approach: doing some fixup with XSL stylesheet, but not trying to coerce everything
into an extent, and doing a lot more manual review. And importing in smaller batches to avoid massive namespace
pollution and cleaning up as we go along. 

If we keep the tighter requirements on the production server, we will obviously discover missing dates and extents
on that 2nd import, but we would prefer to be able to catch and flag these earlier.  Is there a way to use the looser,
modified schema on import and require a tighter schema on publishing or  export ? 

I would also be interested to hear of others experience with these issues or ideas  for dealing with them. 

— Steve Majewski / UVA Alderman Library

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150306/113e10c8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4943 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150306/113e10c8/attachment.bin>

More information about the Archivesspace_Users_Group mailing list