[Archivesspace_Users_Group] EAD File Importing and Verification

Mayo, Dave dave_mayo at harvard.edu
Fri Jan 8 10:54:16 EST 2016


I'm currently working on a pre-processor (and reporting tool) that we're
planning on using to clean up our EADs - I'd expect that some of that code
might be useful (once it exists).

Additionally, the Schematron we're using for our ArchivesSpace checker web
service is available here: it's intended to produce results that are
machine-actionable, so it's a little less clearly commented, but I think
it covers some additional cases.
https://github.com/harvard-library/archivesspace-checker/blob/master/schema
tron/archivesspace_checker_sch.xml

I'd also be very happy to help anyone who wants to set up a local version
of our little EAD checker service - checking things on demand doesn't
solve everything, but it doesn't require waiting for upstream changes,
either.  

- Dave Mayo 

On 1/8/16, 10:41 AM, "Noah Huffman" <noah.huffman at duke.edu> wrote:

>Chris,
>
>The migration sub-team has definitely been kicking around the
>schematron-validation idea for a while and we're currently doing some
>work to identify the additional constraints that ArchivesSpace imposes
>beyond EAD2002 schema validation.
>
>I think Mark Custer has already identified the bulk of these constraints,
>or at least the most common ones, in this schematron file:
>https://github.com/fordmadox/schematrons
>
>I like the idea of a separate ASpace job that runs the schematron against
>a batch of EADs and outputs results in a report.  This way, folks could
>identify validation issues and clean them up prior to submitting a batch
>import job.  Currently, batch EAD import jobs stop after the first error,
>which makes any large batch import a painfully iterative process.
>
>Dave Mayo, who developed the Harvard schematronium gem, has offered to
>help with this project.  Should we submit a feature request to create a
>separate ASpace job like you describe?
>
>-Noah
>
>================
>Noah Huffman
>Archivist for Metadata, Systems, and Digital Records
>David M. Rubenstein Rare Book & Manuscript Library
>Duke University | 919-660-5982
>http://library.duke.edu/rubenstein/
>
>
>
>
>-----Original Message-----
>From: archivesspace_users_group-bounces at lyralists.lyrasis.org
>[mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On
>Behalf Of Chris Fitzpatrick
>Sent: Thursday, January 07, 2016 4:55 PM
>To: Archivesspace Users Group
><archivesspace_users_group at lyralists.lyrasis.org>
>Subject: Re: [Archivesspace_Users_Group] EAD File Importing and
>Verification
>
>
>The schematron idea has been kicking around for awhile. Maybe an Aspace
>job that runs the schematron and outputs it's results in a report?
>
>Looks like this could help =>
>https://github.com/harvard-library/schematronium ?
>( Hey, this gem author's name looks familiar... )
>
>I have to say it's probably been almost a decade since I've looked at
>schematron....I'm guessing the svrl:failed-assert nodes are the ones we
>want to report on?
>
>
>
>
>Chris Fitzpatrick | Developer, ArchivesSpace
>Skype: chrisfitzpat  | Phone: 918.236.6048 http://archivesspace.org/
>
>________________________________________
>From: archivesspace_users_group-bounces at lyralists.lyrasis.org
><archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of
>Flanagan, Patrick <PJFlanagan at ship.edu>
>Sent: Thursday, January 7, 2016 10:43 PM
>To: Archivesspace Users Group
>Subject: Re: [Archivesspace_Users_Group] EAD File Importing and
>Verification
>
>I think they're poorly conforming, as a number of tags were missing at
>one point -- such as <extent></extent>. They may have been generated by
>Archon? It's something of a mess.
>
>I have xmllint, but I hadn't found the EAD schema; thank you! I'll try
>both that and setting up schematron and see what it comes up with. This
>is exactly what I needed!
>
>~Patrick
>________________________________________
>From: archivesspace_users_group-bounces at lyralists.lyrasis.org
>[archivesspace_users_group-bounces at lyralists.lyrasis.org] on behalf of
>Majewski, Steven Dennis (sdm7g) [sdm7g at eservices.virginia.edu]
>Sent: Thursday, January 07, 2016 4:30 PM
>To: Archivesspace Users Group
>Subject: Re: [Archivesspace_Users_Group] EAD File Importing and
>Verification
>
>Are they namespaced schema conforming EAD or are they based on the DTD ?
>I don't think I've ever seen a completely empty import log - that makes
>me think it isn't recognizing it as EAD.
>( And not being schema conforming is my first guess at a reason. )
>
>
>Otherwise:
>
>
>1. Check the the XML is well formed. I use xmllint or Oxygen Editor. ( or
>google for online validators ) 2. Validate against the EAD schema. Again,
>I use xmllint or Oxygen.
>   get a copy from: http://www.loc.gov/ead/eadschema.html
>3. Try validating against the schematron rules at:
>https://github.com/fordmadox/schematrons
>   This may be a bit more difficult to manage. We had some discussion at
>the NYU workshop about
>   setting this up as a supported service, so people don't have to deal
>with figuring out how to run
>   Schematron, but I haven't had a change to look at this. But if you get
>this far, ask and we'll
>   figure out how to help.
>4. You can also run the EADConverted from IRB console and output the JSON
>model.
>   But if there's nothing in the log file, I doubt you're getting that
>far in the import.
>
>
>- Steve Majewski
>
>
>
>
>On Jan 7, 2016, at 4:09 PM, Flanagan, Patrick
><PJFlanagan at ship.edu<mailto:PJFlanagan at ship.edu>> wrote:
>
>Good afternoon,
>
>I've been tasked with figuring out why a simple EAD file fails to be
>imported into ArchivesSpace. I suspect it's an error with the XML file's
>formatting, such as a missing tag, but I don't know enough about the file
>type to verify it by eye. I thought I'd ask if there's any tool
>archivists use to verify that their EAD files are correct. When imported
>into ArchivesSpace, the job fails and there is an empty error log; I
>don't have anything else to go on, unfortunately.
>
>Thank you very much for your time,
>
>~Patrick Flanagan
>KLN Applications Administrator
>Keystone Library Network Hub
>_______________________________________________
>Archivesspace_Users_Group mailing list
>Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users
>_Group at lyralists.lyrasis.org>
>http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>_______________________________________________
>Archivesspace_Users_Group mailing list
>Archivesspace_Users_Group at lyralists.lyrasis.org
>http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>_______________________________________________
>Archivesspace_Users_Group mailing list
>Archivesspace_Users_Group at lyralists.lyrasis.org
>http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>_______________________________________________
>Archivesspace_Users_Group mailing list
>Archivesspace_Users_Group at lyralists.lyrasis.org
>http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group




More information about the Archivesspace_Users_Group mailing list