[Archivesspace_Users_Group] EAD File Importing and Verification

Noah Huffman noah.huffman at duke.edu
Fri Jan 8 10:41:08 EST 2016


Chris,

The migration sub-team has definitely been kicking around the schematron-validation idea for a while and we're currently doing some work to identify the additional constraints that ArchivesSpace imposes beyond EAD2002 schema validation.

I think Mark Custer has already identified the bulk of these constraints, or at least the most common ones, in this schematron file: https://github.com/fordmadox/schematrons

I like the idea of a separate ASpace job that runs the schematron against a batch of EADs and outputs results in a report.  This way, folks could identify validation issues and clean them up prior to submitting a batch import job.  Currently, batch EAD import jobs stop after the first error, which makes any large batch import a painfully iterative process.

Dave Mayo, who developed the Harvard schematronium gem, has offered to help with this project.  Should we submit a feature request to create a separate ASpace job like you describe?

-Noah

================
Noah Huffman
Archivist for Metadata, Systems, and Digital Records
David M. Rubenstein Rare Book & Manuscript Library
Duke University | 919-660-5982
http://library.duke.edu/rubenstein/ 




-----Original Message-----
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Chris Fitzpatrick
Sent: Thursday, January 07, 2016 4:55 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] EAD File Importing and Verification


The schematron idea has been kicking around for awhile. Maybe an Aspace job that runs the schematron and outputs it's results in a report? 

Looks like this could help => https://github.com/harvard-library/schematronium ? 
( Hey, this gem author's name looks familiar... ) 

I have to say it's probably been almost a decade since I've looked at schematron....I'm guessing the svrl:failed-assert nodes are the ones we want to report on? 




Chris Fitzpatrick | Developer, ArchivesSpace
Skype: chrisfitzpat  | Phone: 918.236.6048 http://archivesspace.org/

________________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Flanagan, Patrick <PJFlanagan at ship.edu>
Sent: Thursday, January 7, 2016 10:43 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD File Importing and Verification

I think they're poorly conforming, as a number of tags were missing at one point -- such as <extent></extent>. They may have been generated by Archon? It's something of a mess.

I have xmllint, but I hadn't found the EAD schema; thank you! I'll try both that and setting up schematron and see what it comes up with. This is exactly what I needed!

~Patrick
________________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [archivesspace_users_group-bounces at lyralists.lyrasis.org] on behalf of Majewski, Steven Dennis (sdm7g) [sdm7g at eservices.virginia.edu]
Sent: Thursday, January 07, 2016 4:30 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD File Importing and Verification

Are they namespaced schema conforming EAD or are they based on the DTD ?
I don't think I've ever seen a completely empty import log - that makes me think it isn't recognizing it as EAD.
( And not being schema conforming is my first guess at a reason. )


Otherwise:


1. Check the the XML is well formed. I use xmllint or Oxygen Editor. ( or google for online validators ) 2. Validate against the EAD schema. Again, I use xmllint or Oxygen.
   get a copy from: http://www.loc.gov/ead/eadschema.html
3. Try validating against the schematron rules at: https://github.com/fordmadox/schematrons
   This may be a bit more difficult to manage. We had some discussion at the NYU workshop about
   setting this up as a supported service, so people don't have to deal with figuring out how to run
   Schematron, but I haven't had a change to look at this. But if you get this far, ask and we'll
   figure out how to help.
4. You can also run the EADConverted from IRB console and output the JSON model.
   But if there's nothing in the log file, I doubt you're getting that far in the import.


- Steve Majewski




On Jan 7, 2016, at 4:09 PM, Flanagan, Patrick <PJFlanagan at ship.edu<mailto:PJFlanagan at ship.edu>> wrote:

Good afternoon,

I've been tasked with figuring out why a simple EAD file fails to be imported into ArchivesSpace. I suspect it's an error with the XML file's formatting, such as a missing tag, but I don't know enough about the file type to verify it by eye. I thought I'd ask if there's any tool archivists use to verify that their EAD files are correct. When imported into ArchivesSpace, the job fails and there is an empty error log; I don't have anything else to go on, unfortunately.

Thank you very much for your time,

~Patrick Flanagan
KLN Applications Administrator
Keystone Library Network Hub
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group



More information about the Archivesspace_Users_Group mailing list