[Archivesspace_Users_Group] EAD import extents mapping

Rees, John (NIH/NLM) [E] reesj at mail.nlm.nih.gov
Fri Jun 22 15:08:52 EDT 2018

Thanks Mark. Luckily we only use one <physdesc> statement and the syntax is pretty uniform across our corpus.

Today I discovered a 2nd <physdesc> sans <extent> will also import nicely into a Physical Description Note. This note exports as a vanilla <physdesc> but your approach produces more specific and parsable EAD for down the road.

And yes, I've been tracking the silent label drops esp. since our external public access application often depends on them.


From: Custer, Mark [mailto:mark.custer at yale.edu]
Sent: Friday, June 22, 2018 8:59 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] EAD import extents mapping


And it case it helps, here's the admittedly hacky way that I've handled this in the past, only focusing on splitting out those linear footage statements since that's what we had to deal with:

    <!-- hacky way to deal with splitting up the single extent statements due to Yale's BPG
        e.g. breaking up:  1.04 linear feet (3 boxes)
            into two separate extent statements to work with the ASpace EAD importer.
    <xsl:template match="ead:physdesc/ead:extent">
            <xsl:when test="contains(lower-case(.), 'linear feet')">
                <xsl:variable name="first-extent" select="concat(normalize-space(substring-before(lower-case(.), 'feet')), '_feet')"/>
                <xsl:variable name="second-extent" select="normalize-space(substring-after(lower-case(.), 'feet'))"/>
                <xsl:element name="extent" namespace="urn:isbn:1-931666-22-9">
                    <xsl:value-of select="$first-extent"/>
                <xsl:element name="extent" namespace="urn:isbn:1-931666-22-9">
                    <xsl:value-of select="$second-extent"/>
                    <xsl:apply-templates select="node() | @*"/>

    <!-- standard identity template -->
    <xsl:template match="node() | @*">
            <xsl:apply-templates select="node() | @*"/>

I should note, though, that our incoming EAD files would only have a single extent statement to begin with, so I wouldn't use this without inspecting another EAD corpus before running it there.  But for the example that you originally provided, it should do the trick.  Just be careful if those statements aren't uniform in your corpus, since the above template really only expects physdesc elements with a single extent child.


From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Friday, 22 June, 2018 8:50 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] EAD import extents mapping


What I think that you'd want to do in this case is to modify the EAD so that it looks like the following:

<physdesc label="Extent:" encodinganalog="300">
<extent>15.0 linear_feet</extent>
<extent>(36 boxes + oversize folder)</extent>

Two important points about this:

  1.  The ASpace EAD importer uses the database values for controlled value fields, which is why I've changed linear feet to "linear_feet"  ("linear_feet" is one of the database values available in ASpace by default, but if that's not the value that matches your YML translation, you might want to use another one).  The ASpace EAD exporter, on the other hand, will use the YML translation when exporting the EAD, so you'd wind up with "linear feet" in the export, or whatever else is specified in the YML file that's being employed by your application.
  2.  Whatever you put into a second extent statement will be mapped to the Container Summary field in ArchivesSpace.

This is essentially following the AT model for importing EAD physdesc elements.  I wish, instead, that ASpace would only add extent values based on the availability of the extent/@unit attribute in EAD 2002.   That would make things a lot less dicey than trying to  parse the text field of an extent element during import time, but it would require more EAD manipulation for a lot of folks before importing that data since the @unit attribute isn't heavily used.


p.s.   the physdesc/@label attribute, however, does NOT get mapped at all by ASpace.  Instead, it's dropped silently during the import process.  That used to be the case, at least.  I haven't checked in a while to see if that behavior has been changed, but in general label attributes and head elements will usually map to the ASpace label field.

p.p.s. I've never tested to see what happens if you have a third sibling extent statement (to see if those get dropped or appended to container summary).  But I have tested using dimensions and/or physfacet within the same grouping, and those are imported as expected.

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Rees, John (NIH/NLM) [E]
Sent: Thursday, 21 June, 2018 3:43 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: [Archivesspace_Users_Group] EAD import extents mapping

Using the background job importer, I'm trying to import extent data from EAD 2002 schema XML, specifically the other extent data we record in parenthesis like <physdesc label="Extent:" encodinganalog="300"> <extent>15.0 linear feet (36 boxes + oversize folder)</extent> </physdesc>

I'd like these parenthetical statements to map to the Container Summary field. According to the EAD import mapper http://archivesspace.org/wp-content/uploads/2016/05/EAD-Import-Export-Mapping-20171030.xlsx<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Farchivesspace.org%2Fwp-content%2Fuploads%2F2016%2F05%2FEAD-Import-Export-Mapping-20171030.xlsx&data=02%7C01%7Cmark.custer%40yale.edu%7C7fd7312237a74499772f08d5d7af3fb7%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C1%7C636652070056478978&sdata=pB8JYJlx5gpDPD8ugiqZReY2BLk60%2BkfTaOHex%2B0EEU%3D&reserved=0> anything after a number and space that can't be parsed should import into Container Summary.

Is there a syntax for making this string unparsable and force this behavior? Currently "linear feet (36 boxes + oversize folder)" from the above imports to Extent @Type as a new unique value and I don't want all that data pollution.


John P. Rees
Archivist and Digital Resources Manager
History of Medicine Division
National Library of Medicine

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20180622/bfd593d7/attachment.html>

More information about the Archivesspace_Users_Group mailing list