[Archivesspace_Users_Group] Import Data Errors

Mayo, Dave dave_mayo at harvard.edu
Tue Jan 29 14:56:46 EST 2019


What Kate said, and also: for anything that we figured out during our migration, you can check individual EAD files with somewhat better error reporting by putting them through https://eadchecker.lib.harvard.edu/

--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of "Bowers, Kate A." <kate_bowers at harvard.edu>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Date: Tuesday, January 29, 2019 at 2:21 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Import Data Errors

EAD has fewer contraints than ArchivesSpace.  Those error messages are not terribly helpful. At the very end of this article (before the end notes) is a list of the constraints in AS that will throw an error if your EAD does not comply:
https://journal.code4lib.org/articles/12239

I think the messages are coming from what we have as number 26 on the list “Absence of both unittitle and unitdate at a subordinate level causes import to fail”. Our solution was to grab data via a script from the parent <did>. You may have few enough to do this by hand.  So, for example, we had <did>s that contained only <head> or only <note> or only <united>. These were all valid EAD but not acceptable to ArchivesSpace.

Below is a bit more digestible (but still Harvard-centric and definitely a working document) list of these issues and the practice that is needed going forward to ensure AS will ingest EAD.  Your error messages are coming from the 3rd thing listed under <c>.

Element category

Practice going forward

Reason

Reason category

Schema

Use the canonical EAD schema, not the Harvard modified schema

AS expects data that conforms to the canonical schema.

Schema

<frontmatter>

Do not use frontmatter

This will be added by EAD export from AS

Former practice no longer needed

<language>

 Always include

 Required for saving in AS, will ingest, but annoying to users who edit resources.



<descgrp>

Do not use descgrp

This will not load to AS

Practice not supported by AS EAD ingest

<arrangement>

Do not embed <arrangement> within <scopecontent>

Roundtripped EAD from AS will be invalid

Practice not supported by AS EAD export

<c>

Always include an @level attribute value on all <c>s. If using @level="otherlevel" always include an @otherlevel value.

<c> without a @level will ingest as "otherlevel" but lack @otherlevel value

New attribute data required

<c>

All components should have <unittitle>; in cases where formerly archivsts might have used only <unitdate>, the parent <c>'s unittitle is often a good choice

The component-centered display in ArchivesSpace makes any component lacking a the context provided by <unittitle> text vague and cryptic, hampering recognition and interpretation of the component.

New content strongly recommended

<c>

All components must have either a <unitdate> or a <unittitle>

EAD lacking this data will not load to AS

New content required

<chronlist>

<chronlist> should stand alone, not be embedded within <bioghist>

Will load twice into AS

Practice not supported by AS EAD ingest

<container>

<container> must include type and label attributes, cannot describe multiple containers in one container element, and should not include type of container as part of content.

AS accommodates container numbers and types, but does not accommodate note-like container information. In addition, AS creates "top container" records based on EAD ingest. These are linked records. Placing a range of boxes, for example, in a single container element creates incorrect data about containers.

Data model in AS is different from EAD

<controlaccess>

Do not encode <title> in <controlaccess>

Finding aid will load, but data will be lost. Use <subject> or other appropriate element

Practice not supported by AS EAD ingest

<corpname>

@role ???

 Disappears on ingest



<creation>

<creation> statement should include ingest information

Include ingest to AS in your creation statement, e.g. "Created in Oxygen on 2016-11-18; ingested to AS on 2016-12-12"

New content required

<dao>

One Digital Object per Archival Object

Automated linking to objects in the DRS is based on the ref_id of the Archival Object, which is used as an owner supplied name in DRS.

New limit

<dao>

Supply a title for digital archival objects; use <unittitle> of parent <c>

xlink at title attribute is required by AS ingest

New attribute data required

<dao>

<daodesc>???

 Disappears on ingest?



<dao>

To achieve thumbnails, <daogrp> must be coded thus: ????

 ?



<extent>

Collection-level <physdesc><extent> is required

EAD lacking this data will not load to AS

New content required

<extent>

Do not encode mixed content within <physdesc>

Finding aid will ingest, but content will be lost. Specifically, if a <physdesc> has some child elements, any text that is not inside a child element will be left behind during ingest. An entirely plain-text <physdesc> is OK.

New limit

<extent>

<extent> must begin with a number

EAD with non-numerical extent will not load to AS

New limit

<extref>

<extref> should be used more sparingly, consider using only if @href values link Harvard-managed links

Link rot (has nothing to do with AS), except that during migration, links became noticeable and rot was there

New recommendation

<index>

Do not encode nested indexes

Import may succeed, but data will be lacking from AS

Practice not supported by AS EAD ingest

<index>

Instead of creating <index>es, add controlaccess terms to components

This allows search and retrieval of all components across the whole corpus rather than in one finding aid.

Better data model for discovery and retrieval

<indexentry>

Do not encode nested indexentries

Import may succeed, but data will be lacking from AS

Practice not supported by AS EAD ingest

<list>

Do not encode nested lists

Import may succeed, but data will be lacking from AS

Practice not supported by AS EAD ingest

<list>

Do not use <defitem> or <list type="deflist">

Import may succeed, but data will be lacking from AS

Practice not supported by AS EAD ingest

<name>

Avoid <name>

Import may succeed, but data will be lacking from AS, use a more specific <persname>, <corpname>, or <geogname>

Practice not supported by AS EAD ingest

<namegrp>

Do not encode namegrps

Import may succeed, but data will be lacking from AS

Practice not supported by AS EAD ingest

<note>

Do not use <note> anywhere; where legal, use <odd> preferably with <head>

Import may succeed, but <note> will be lacking from AS; <head> in <odd> provides a better label than "generic note"

New limit

<origination>

Do not encode origination as mixed content; all data must be within child elements

Import may succeed, but data will be lacking from AS. Any content not within the child elements <corpname>, <famname>, or <persname> will not go into AS. It will not stop ingest, but data will be lost.  Attribute values will also be lost.

Practice not supported by AS EAD ingest; new constraint

<persname>

@role ???

 Disappears on ingest?



<processinfo>

Finding aid must have a <processinfo> with <head>Aleph ID</head> and content containing the Aleph record number for the collection

Indexing of finding aids in Primo and connecting them with bibliographic records depends on this exact specification being carried out successfully.

New content required

<ref>

<ref>????

 Internal refs lost on ingest



<table>

Do not use <table>

Longstanding practice to be continued

Existing limit

<unitdate>

Always supply value for @normal attribute in <unitdate>

This had formerly been accomplished through OASIS loader

New attribute data required

<unitdate>

Supply certainty="approximate" value for dates if approximate

New attribute data required

<unitdate>

Do not use @startYear @endYear

These were Harvard-specific attributes and will get lost in AS ingest

New limit

<unitdate>

Do not nest <unitdate> within <unittitle>

These are un-nested during AS ingest. Starting with nested <unitdate>s in EAD will give archivists an unrealistic idea of what the description will convey when un-nested.

New limit

<unitdate>

Use separate <unitdate> elements for bulk and inclusive dates, and indicate these differences by setting the @type attribute accordingly

AS cannot ingest two dates from one <unitdate> tag.

New limit

<unitdate>

Indicate approximation in <unitdate>s by setting the attribute @certainty="approximate"

Circa or approximate as part of the date expression are not machine-actionable

New recommendation

<unitdate>

If there are no dates, do not use <unitdate> at all

Older practice often resulted in the following, which cannot be ingested by AS <unitdate>undated</unitdate>. Consider whether "undated" belongs as part of the title.

New limit

<unitid>

Collection-level <unitid> is required

EAD lacking this data will not load to AS

New content required

<unitid>

Use only one <unitid>. If more than one <unitid> is needed, either place them in separate <c> elements or concatenate all into a single <unitid>

AS will ingest the finding aid, but content will be lost. All but one of the <unitid>s will be lacking.

New limit

<unittitle>

Collection-level <unittitle> is required

EAD lacking this data will not load to AS

New content required

<unittitle>

Use only one <unittitle>. If more than one <unittitle> is needed, either place them in separate <c> elements or concatenate all into a single <unittitle>

AS will ingest the finding aid, but content will be lost. All but one of the <unittitle>s will be lacking.

New limit

<extent>

All <extent> measurement types must come from same list used in AS; if non-canonical measurements are needed, consider a separate <physdesc>

Non-matches will have two results: calculations based on measurements will be inaccurrate, AS extent drop-down will become cluttered

New limit

<bibliography>

Avoid <bibliography>?





<ptrgrp>

avoid <ptrgrp>???








From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> On Behalf Of Ryan Flahive
Sent: Tuesday, January 29, 2019 12:17 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: [Archivesspace_Users_Group] Import Data Errors

Morning Folks,

I’m new to this group as my ArchiveSpace server was just set up a couple days ago.

Due to circumstances too lengthy to describe here, I am manually building this database from exported EAD files from my Archivist’s Toolkit system. My first few imports were successful, but then a few failed due to these errors:

Date: one or more required (or enter a Title)
Title: must not be an empty string (or enter a Date)

These records have titles and dates. Can anyone shed some light on how I resolve this issue? Feel free to email me with suggestions!

Thanks!

Ryan S. Flahive
Archivist
INSTITUTE OF AMERICAN INDIAN ARTS
83 Avan Nu Po Road, Santa Fe, NM 87508
P. 505-424-2392
E. rflahive at iaia.edu<mailto:rflahive at iaia.edu>
www.iaia.edu<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.iaia.edu&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE&m=fH35Xl6mSv68OjLsoAwAZDlklqJhOwX-2PPeWpbhSC8&s=Z10mbeEMqOqFa159iOZ1PoaKVADFt6gU-1tqkD0jV3A&e=>

IAIA's Mission: To empower creativity and leadership in Native arts and cultures through higher education, lifelong learning, and outreach.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20190129/a253f23b/attachment.html>


More information about the Archivesspace_Users_Group mailing list