[Archivesspace_Users_Group] EAD XML Finding-aids - results of testing

Chris Fitzpatrick Chris.Fitzpatrick at lyrasis.org
Mon Jul 14 10:51:12 EDT 2014

Hi Emma,

Yes, a lot of these have been reported...especially the unhelpfullness of the export errors and the mixed content convsions ( <emph> need to be converted to actual html ). , which will be fixed in the upcoming release.

Would it be possible for you to send me some of the EADs that have some of these issue? Once I can see your encoding, I can tweak the converter to correctly import these..


Chris Fitzpatrick | Developer, ArchivesSpace
Skype: chrisfitzpat  | Phone: 918.236.6048
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Emma Jolley <EJOLLEY at nla.gov.au>
Sent: Monday, July 14, 2014 12:59 AM
To: Archivesspace Users Group (archivesspace_users_group at lyralists.lyrasis.org)
Subject: [Archivesspace_Users_Group] EAD XML Finding-aids - results of testing

Dear All

As part of our testing of Archives Space (we’re using v1.0.9) we have tried importing some of our legacy EAD XML finding aids.  Some of the issues we have encountered our outlined below.  I’m not sure if they’ve been previously noted (apologies if so!). We’d appreciate any advice on how to resolve these, if they are known bugs or if they are scheduled for resolution. Any advice would be welcome.

Many thanks


Emma Jolley| Curator of Digital Archives, Pictures and Manuscripts Branch|National Library of Australia Canberra ACT 2600
e: emma.jolley at nla.gov.au<mailto:emma.jolley at nla.gov.au>|t: 02 6262 1456| www.nla.gov.au/ms<http://www.nla.gov.au/ms>


Import of EAD XML into ArchivesSpace

-        <index> element prevents import:

o   finding aids with an<index> element failed to import, with the following error message received:

Error: #<:ValidationException: {:errors=>{"record"=>["Can't unambiguously match {:reference_text=>\"1-2, 6*, 7*, 10-12\"} against schema types: [\"JSONModel(:note_index_item) object\"]. Resolve this by adding a 'jsonmodel_type' property to {:reference_text=>\"1-2, 6*, 7*, 10-12\"}"]}}>

-        Unhelpful date/title error:

o   A number of our finding aids failed to import for the following reason:

Error: #<:ValidationException: {:errors=>{"dates"=>["one or more required (or enter a Title)"], "title"=>["must not be an empty string (or enter a Date)"]}}>

o   For some of the (usually very large) finding aids in this category we were unable to determine what was causing the error – if the error message could give a line reference or other pointer that would be helpful!
Review of imported EAD XML data in ArchivesSpace

-        <bioghist> element oddities:

o   Some finding aids have <bioghist> elements that comprise a <chronlist> of chronological dates and events, e.g.:

§  <bioghist><head>Biographical Note</head><chronlist><listhead>
<head01>Date</head01><head02>Event</head02></listhead><chronlist><chronitem><date>1870</date><event>Born 31 May in West Melbourne, the son of Jessie and Harry Ehret Grover</event></chronitem> <chronitem><date>1896</date><event>Joined the <title render="italic">Argus</title></event></chronitem> …etc.

o   We expected this data to import into the ‘Biographical / Historical’ note field as a ‘Chronology’.  It does, but it is also duplicated, appearing a second time as the ‘Biographical / Historical’ note ‘Sub Note’.

o   As well, in the ‘Sub Note’, the ‘Formatted’ view does not recognise mixed content, or display any formatting,  e.g. italicised titles aren’t displayed as italicised, there aren’t any line breaks, etc, i.e. the example text above appears as:

§  DateEvent 1870 Born 31 May in West Melbourne, the son of Jessie and Harry Ehret Grover 1896 Joined the Argus …etc.

o   And in the ‘Chronology’, the ‘Raw’ view does not recognise sub-element tags.  e.g. the <event> text displays as:

§  Joined the <title render="italic">Argus</title>

-        Display of encoding in the<bibliography> element:

o   <bibliography> elements usually contain <bibref> elements, within which titles of works are encoded as <title> elements, e.g.:

§  <bibref>Bloggs, Jocasta.  'Bees and beekeeping', <title render="italic">Journal of Melittology</title>, 1938. </bibref>

o   The ‘Raw’ view of this data does not recognise sub-element tags,  e.g. the <bibref> text displays as:

§  Bloggs, Jocasta. 'Bees and beekeeping', <title render="italic">Journal of Melittology</title>, 1938.

-        <unittitle> element display:

o   <unittitle> elements may contain sub-elements, e.g. <unittitle><title render="italic">The Triple Crown: the Paradox of the Papacy</title></unittitle>.  The <unittitle> element imports into the ‘Title’ field, but unlike other fields there isn’t an option to toggle between the raw and formatted views, i.e. there isn’t an option to view the ‘Title’ text as The Triple Crown: the Paradox of the Papacy.

-        <extref> element display:

o   Text that is encoded in an <extref> element doesn’t translate to a clickable link in the ‘Formatted’ view, e.g. for the following ‘Related material’ note, the ‘Raw’ view is :

§  <p>Papers of Grover's grandson, Michael Cannon, are held in the Manuscript Collection at <extref href="http://www.nla.gov.au/nla.ms-ms6205">MS 6205</extref>.</p>

o   But the ‘Formatted’ view appears as follows, without hyperlinking of the text ‘MS 6205’:

§  Papers of Grover's grandson, Michael Cannon, are held in the Manuscript Collection at MS 6205.

-        <arrangement> element duplication:

o   Most of our EAD XML finding aids have the <arrangement> note encoded as a child of the <scopecontent> note rather than a sibling.  In such cases the <arrangement> note is duplicated upon import – the text appears both as part of the ‘Scope note’ note field and again in the ‘Arrangement’ note field.  We would expect it to either import into the ‘Scope note’ note field as encoded, or to be removed from the ‘Scope note’ and put in the ‘Arrangement’ note.

-        <unitdate> element duplication:

o   As with the <arrangement> note, if a <unitdate> is encoded within the <unittitle> it is duplicated upon import, both in the ‘Title’ field and in the ‘Dates’ fields, rather than either one or the other.
EAD XML export from ArchivesSpace

-        <accessrestrict> element missing:

o   The<accessrestrict> element isn’t included in the export.

-        Incomplete export:

o   Sometimes the EAD XML file is incomplete - one file contained only the collection-level data and the contents of part of Series 1 (ignoring the remainder of Series 1 and the other 26 series in the collection).  We couldn’t work out what went wrong.

-        Sub-elements not encoded as elements:

o   Some sub-elements aren’t recognised as elements, e.g. a <unittitle> containing <emph> text exported as:

§  <unittitle><emph render="underline">Across the Creek, an Australian visits N.Z.</emph> </unittitle>

-        Paragraphs not encoded as separate paragraphs:

o   Some notes in ArchivesSpace comprise multiple paragraphs of text and are displayed as such on the screen, even though the paragraphs aren’t encoded as <p> elements.  Upon export these paragraphs aren’t recognised as separate paragraphs, instead all paragraphs are merged together, e.g. the following two paragraphs are treated as a single paragraph:

§  <p>Personal papers and photographs and biographical newspaper cuttings, manuscripts, typed and drafts, of several works, Sunbeam letters, business letters and magazines, most of which contain articles and stories by author.

School note book 1887, printed diagram of Battle of Waterloo, map of England with countries marked, with quotation of John Drinkwater, photographs with typed letter descriptions of world war I (mention of George Braund M.L.A. and Captain Bauge), photographs of Lady M. E. Jersey 1905, George Meredeth, a legal group, Ethel Turner April 1926, W.P. Turner, newspaper cuttings, including some biographical, several newspapers, two note books with poetry, newspaper cuttings and written copies, and miscellaneous personal articles.</p>

