[Archivesspace_Users_Group] EAD Import trouble with audience/publish

Custer, Mark mark.custer at yale.edu
Wed Apr 6 16:58:06 EDT 2016


Great point, Robin. If the EAD finding aid has audience="external" attached to the ead or archdesc element, then the importer should respect that and publish the finding aid during the import process.  If that attribute is missing, though, or set to "internal" at the collection level, then the finding aid should import as unpublished (which is really just a workaround to deal with the fact that the PUI is so closely tied to the staff interface).

And we can ignore the eadheader section entirely, since ASpace doesn't allow you to unpublish any of those data elements -- they're either there or not.  In other words, what you can import should only match what you can create in the ASpace application.

I also forgot about the finding aid status field, so I'm glad you brought that up Noah.  It seems like a lot of folks rely on those data elements, and it's possible that the new PUI will make use of those as a way to bucket certain finding aids (e.g. a published finding aid with a certain finding aid status could show up under an "Unprocessed materials" grouping, rather than a "Collection" grouping). I'd think that such an update should be an easy thing to add to the importer, but I haven't looked into it yet to be honest.  Regardless, it makes sense to add that to the core code, I think, since it's a standard field in ASpace, and it's also included by the ASpace EAD exporter.

There are a host of other things to consider and possibly fix with the the importers/exporters (e.g. imported table elements shouldn't be re-exported with a paragraph element surrounding them), but just making a few small changes is going to have the biggest impact, I think.  That said, the import/export process in ASpace is already really good, and in some ways much better than what the AT offered.

So, making a slight update to Noah's summary, how does this look?:

  1.  EAD imports should result in resource records marked as “unpublished” at the collection level unless the ead or archdesc element includes @audience="external"
  2.  The EAD importer should, by default, mark all components and notes as “published” unless EAD tags include @audience=”internal”

Mark


________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [archivesspace_users_group-bounces at lyralists.lyrasis.org] on behalf of Wendler, Robin King [robin_wendler at harvard.edu]
Sent: Wednesday, April 06, 2016 4:19 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD Import trouble with audience/publish

I am so glad to see this being discussed again. Quick point regarding #1 (collection-level) below:  While it makes sense that people wouldn’t always want imported resources immediately published, an institution migrating large numbers of finding aids (<6,000 in our case) won’t want to have to publish each resource manually. The last time I tried to get around the currently incorrect handling of audience/publish in the importer by supplying explicit audience=”external” values, the importer failed outright. If the fix could respect an explicit audience attribute=”external” at the collection-level, we could pre-process the finding aids to add that.

Robin

Robin Wendler
Library Technology Services
Harvard University
90 Mt. Auburn St.
Cambridge, MA 02138
617-495-3724
r_wendler at harvard.edu



From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Cobourn, Alston
Sent: Wednesday, April 06, 2016 3:47 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD Import trouble with audience/publish

Thank you Mark and Noah for giving this so much thought.  As someone with a public interface, I agree that the suggested changes make a lot of sense.

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Noah Huffman
Sent: Wednesday, April 06, 2016 11:02 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD Import trouble with audience/publish

Mark, I tend to agree with your suggested behavior now that you’ve laid out the logic so well:)

So, if I read it right, you are suggesting these changes to the current EAD importer behavior?:

1.       EAD imports should result in resource records marked as “unpublished” at the collection level

2.       The EAD importer should, by default, mark all components and notes as “published” unless EAD tags include @audience=”internal”

Because we currently don’t use the public interface at Duke, I hadn’t thought through the consequences of publishing everything by default on import.

To your last point (#3), we’ve definitely experienced issues with the “publish all” button unintentionally publishing components at lower levels that had been selectively marked as ‘internal only’ for various reasons.  On this point, I think it would be nice to have some indication in the staff interface--at the collection level---that a resource contains some components or notes at lower levels that are not marked as published.  It’s impractical to click open every component to see if it’s published or not.

Also, just to add one more issue to the mix, the importer currently ignores eadheader/@findingaidstatus values.  See: https://archivesspace.atlassian.net/browse/AR-1282<https://urldefense.proofpoint.com/v2/url?u=https-3A__archivesspace.atlassian.net_browse_AR-2D1282&d=CwMFAg&c=WO-RGvefibhHBZq3fL85hQ&r=JKUSUWdXrLBGP_rNc_JtcJNO9wvGRzWSZ2uoZzcT59w&m=n9TID9-EvwwWqVDUObvk9YPr-QSL9RqOp5EQSk1-a10&s=hLBnLT1Iyg0dKk6fVXXiZBWs4fHGKJlauLp-VCGoAZg&e=>

We currently use a local vocabulary in the Finding Aid Status field to help manage our workflows.  For example, we use values like “published,” “completed,” “temp_record,” and “placeholder_record” to help identify resource records in various stages of completion.  These values also appear as facets in browse and search results.

-Noah


From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Wednesday, April 06, 2016 10:15 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] EAD Import trouble with audience/publish

And Noah, all:

Here’s what the EAD 2002 tag library states about the audience attribute:

AUDIENCE -- An attribute that helps control whether the information contained in the element should be available to all viewers or only to repository staff. Available for all elements except line break <lb>. The audience attribute can be set to "external" in <archdesc> to allow access to all the information about the materials being described in the finding aid, but specific elements within <archdesc> can be set to "internal" to reserve that information for repository access only. This feature is intended to assist application software in restricting access to particular information by explicitly coding data that is potentially sensitive or may otherwise have a limited audience. Special software capability may be needed, however, to prevent the export of an element marked "internal" when a whole finding aid is displayed in a networked environment. Values are:
•         external (default value)
•         internal
Because of that, I think the community has already made that decision (and that’s how the AT EAD importer behaved; anyone know what the Archon importer did?).

That said, I still think it would be wise, as I mentioned previously, if the ASpace importer would always set an imported finding aid to be unpublished due to the fact that the ASpace staff interface and public interface are so tightly coupled together.  Plus, it’s quite easy to hit “publish” after the fact, once you’re really ready for things to go live.



From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Wednesday, 06 April, 2016 9:59 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD Import trouble with audience/publish

Thanks to you both.  I’ll try to dig into the importer code next week.

I’m not sure if this is entirely an “if the community agrees” type of change, though.  It seems to me that this is just following the EAD standard (i.e. respecting the audience attribute, rather than ignoring it..  at least almost all of the time, which I’ll explain below).

If I understand correctly, the current EAD importer ignores the audience attribute entirely.  Instead, it takes the following three actions by default, regardless of what the metadata says:


1.       Publish the finding aid

2.       Publish all components of the finding

3.       Unpublish all notes in the finding aid

Is that right?  (I haven’t tested this in a while, so I’m not entirely sure)

If so, I think that each of those three actions need to be reconsidered, and here’s why:


1.       If you’re using the ASpace PUI, do you really want a newly imported EAD finding aid to be added immediately to the PUI?  My hunch is no, for the following reasons:

a.       You probably want to review the record first.

b.      You might have to update some of the metadata, depending on the state of the EAD importer (I’m primarily thinking of how those physdesc elements are interpreted upon import).

c.       You might be importing a partial finding aid, like a new series, that you plan to merge with a pre-existing finding aid.  In that case, this newly-imported file is only meant to be temporary, so having that indexed and showing up as a new finding aid in the PUI would be problematic to say the least.
So, this is the one and only place where I’d prefer the EAD importer to ignore the EAD metadata.  By default, all imported resources (whether MARC or EAD) should be set to unpublished.  Setting the entire finding aid to published by default is only going to cause problems, it seems.  Plus, once the finding aid is truly ready to be published, all that you have to do is select the publish checkbox and hit save.  Simple and safe.

2.       What if you have 150 of 2000 components in an EAD file set to internal only?  Since the current importer sets every component to be published, you’d have to unpublish those 150 components after an import, which would be extremely problematic, especially because those components will probably show up in the PUI before you have a chance to update the records.  Much simpler would be if the importer would unpublish the 150 components (the ones that have an audience attribute set to “internal”), and publish the other 1850 components (the ones that either don’t have an audience attribute, or the ones that have an audience attribute set to “external”).

3.       This is the strangest default option, I think.  Why do the components get published by default, but the notes get unpublished?  Just like with action 2, I think that the EAD importer should respect the audience attribute here in the EAD source file.  We definitely have finding aids where a few notes are marked as internal, but the bulk are set to be published.  Because of this, the “publish all” button is actually very dangerous in those cases where the entire contents of a finding aid are not intended to be public.

What do others think?

Mark


From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Galligan, Patrick
Sent: Wednesday, 06 April, 2016 9:09 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD Import trouble with audience/publish

Noah,

Yeah, in my experience the EAD importer modifications that we made should apply to all levels on import, but as you called out, you have to make some more tweaks for different types of notes, but it’s pretty easy to find those in the importer file.

Patrick Galligan
Rockefeller Archive Center
Assistant Digital Archivist
914-366-6386

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Noah Huffman
Sent: Wednesday, April 06, 2016 9:06 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD Import trouble with audience/publish

Mark,

I’m not aware of any pull requests for modifying the EAD converter.  I think it’d be nice to include in the core code assuming that everyone (the community) agrees that publishing everything by default is the expected behavior.

I’m pretty sure the changes I made will apply to notes at any level (component or collection), but I could be wrong and I haven’t tested extensively.  It’s also possible I missed some notes.  I just took the example code that Patrick shared and applied it to some other locations (for both single and multi-part notes).

-Noah

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Tuesday, April 05, 2016 4:38 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] EAD Import trouble with audience/publish

This is great, Noah (and Patrick!).  Do you know if anyone created a pull request yet to put this into the core code?  If not, I’ll be happy to do that at some point next week since it would make the most sense for this type of update to live in the core code.

Also, did you (or anyone else) modify the importer so that @audience=’internal’ was respected at the collection and component level?  I seem to remember that the importer always publishes the components, but it’s been a while since I looked at what happens there.

Mark


From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Noah Huffman
Sent: Tuesday, 05 April, 2016 4:06 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] EAD Import trouble with audience/publish

Hi Jane,

I noticed that Patrick G. from the Rockefeller Archives shared a solution to this same issue back on Nov. 16 (Subject line: Audience Question).

Basically, you can modify the EAD converter that Chris mentions to set all imported notes to ‘published’ by default unless EAD elements have @audience=’internal’.

You can set up the modified EAD converter as a plugin.

I needed to do this locally, so I went ahead a wrote a plugin.

Here is the modified EAD converter: https://github.com/noahgh221/archivesspace-duke-plugins/blob/master/plugins/duke-ead-importer/backend/converters/duke_ead_converter.rb<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_noahgh221_archivesspace-2Dduke-2Dplugins_blob_master_plugins_duke-2Dead-2Dimporter_backend_converters_duke-5Fead-5Fconverter.rb&d=AwMFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=9aIZ08dKR_unOY5i_HM10zQCPnpzrJDXPBGW92jXDHo&s=WkRhMfjAmf0w1Qrvy0yZxKi3QTUG3oZzLXDrWLpsW84&e=>

Here’s how to set up the plugin:

1.       Download the modified EAD converter file above (click ‘Raw’, save as, etc.)

2.       Save the file to the plugins directory in your ASpace application files: plugins/duke-ead-importer/backend/converters/duke_ead_converter.rb (you’ll need to create the duke-ead-importer/backend/converters/ directory structure)

3.  Open the config/config.rb file, add the name of the plugin to the line that begins AppConfig[:plugins], and save it.  For example: AppConfig[:plugins] = ['local', ‘duke-ead-converter’]

4.       Restart the archivesspace application (archivesspace/archivesspace.sh or archivesspace/archivesspace.bat on Windows)

5.       Start a new EAD import job

For more info on how plugins work, see: http://archivesspace.github.io/archivesspace/user/archivesspace-plug-ins-readme/<https://urldefense.proofpoint.com/v2/url?u=http-3A__archivesspace.github.io_archivesspace_user_archivesspace-2Dplug-2Dins-2Dreadme_&d=AwMFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=9aIZ08dKR_unOY5i_HM10zQCPnpzrJDXPBGW92jXDHo&s=ODjNtfXXQlhFC0zTFc1Lu-zq3QU2cSPF64lRy7BGKQ0&e=>

I ran a couple of EAD import tests using this plugin and it seems to publish all imported notes by default unless EAD tags include @audience=”internal”.  Still, I’d test this before importing all your EADs to production.

Hope this helps.

-Noah

================
Noah Huffman
Archivist for Metadata, Systems, and Digital Records
David M. Rubenstein Rare Book & Manuscript Library
Duke University

From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Chris Fitzpatrick
Sent: Tuesday, April 05, 2016 2:45 AM
To: archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] EAD Import trouble with audience/publish




Hi,



Yes, it looks like the EAD converter<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_archivesspace_archivesspace_blob_master_backend_app_converters_ead-5Fconverter.rb-23L108&d=AwMFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=9aIZ08dKR_unOY5i_HM10zQCPnpzrJDXPBGW92jXDHo&s=8IrDk6ha3gPgGoM7aphSqz3AnVMqr6mwUNUijZrZaFI&e=> only looks at @audience for archdesc and c tags.



b,chris.


Chris Fitzpatrick | Developer, ArchivesSpace
Skype: chrisfitzpat  | Phone: 918.236.6048
http://archivesspace.org/<https://urldefense.proofpoint.com/v2/url?u=http-3A__archivesspace.org_&d=AwMFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=9aIZ08dKR_unOY5i_HM10zQCPnpzrJDXPBGW92jXDHo&s=Jls-8V7jtaJVF7ZDaDEcnLttL7qE33gGuI8xrm1I10s&e=>

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Jane LaBarbara <jane.labarbara at mail.wvu.edu<mailto:jane.labarbara at mail.wvu.edu>>
Sent: Monday, April 04, 2016 8:34 PM
To: archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>
Subject: [Archivesspace_Users_Group] EAD Import trouble with audience/publish


Dear all,



I’m using legacy EAD XML to test ArchivesSpace’s EAD Import in preparation for a mass migration to ArchivesSpace.  I’ve noticed that I can add the attribute audience=”external” to <archdesc> and all of my “notes” (e.g. <relatedmaterial>, <bioghist>, and many more), but the resulting Resource Record is only published at the highest level—the notes themselves are not published.  Once we migrate, I’m guessing this will mean that staff will have to go into each Resource Record and click the “publish” checkboxes for everything we need to show up in the public view, and we have over 4000 collections.



Has anyone else noticed that the EAD Import does not respect audience=”external”, or found a solution or workaround for this?



Thanks,

Jane



Jane Metters LaBarbara

Assistant Curator

West Virginia & Regional History Center<https://urldefense.proofpoint.com/v2/url?u=https-3A__wvrhc.lib.wvu.edu_&d=AwMFAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=9aIZ08dKR_unOY5i_HM10zQCPnpzrJDXPBGW92jXDHo&s=3DizfA4Cr8171ah5VvpQI_W1wPlXIjnUOOmh2Jm3moU&e=>

West Virginia University Libraries

Reference desk: 304-293-3536

Office phone: 304-293-0352

Email: jane.labarbara at mail.wvu.edu<mailto:jane.labarbara at mail.wvu.edu>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20160406/9f44f2a5/attachment.html>


More information about the Archivesspace_Users_Group mailing list