[Archivesspace_Users_Group] Mass export of EAD

Kevin Clair Kevin.Clair at du.edu
Wed Jul 29 09:31:26 EDT 2015


Our batch EAD export takes about the same amount of time (~850 resources, but some of them have tens of thousands of linked archival objects and subjects). Most of our EAD generates pretty quickly, but it slows down considerably as the number of links it has to resolve increases.  -k
________________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [archivesspace_users_group-bounces at lyralists.lyrasis.org] on behalf of Noah Huffman [noah.huffman at duke.edu]
Sent: Wednesday, July 29, 2015 7:19 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD

Hi all,

It typically takes me about 48 hours to batch export EAD for ~3400 resource records using the API.  Some of these records are quite large, but I’m wondering if others have experienced similar batch export times?

Should all of the mass export methods discussed take roughly the same amount of time or would Mark’s ead_export.sh script perform faster?  I’m just wondering if there is any way I could speed this up.

Thanks,
-Noah

================
Noah Huffman
Archivist for Metadata and Encoding
David M. Rubenstein Rare Book & Manuscript Library
Duke University


From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of brian
Sent: Wednesday, July 29, 2015 12:12 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD

It seems to me like it might not be a great idea to change the business rules for how a resource record's mtime gets updated, but that it wouldn't be too hard  to add a new field to the resource that tracks the last component update. It also seems like it wouldn't be too hard for the services under discussion to query for component mtimes as well as  resource mtimes.


Sent from my T-Mobile 4G LTE Device

-------- Original message --------
From: "Arnold, Hillel" <harnold at rockarch.org<mailto:harnold at rockarch.org>>
Date:07/28/2015 4:23 PM (GMT-05:00)
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Cc:
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD

Hi Mark,
Yup, you’re absolutely right. I made the (erroneous) assumption that changes to mtimes for descendant components would propagate in the resource record as well. This seems like something that would be best done in AS itself; I’m wondering if Brian or Chris have any thoughts about how this could be accomplished?

Hillel Arnold
Lead Digital Archivist
Rockefeller Archive Center

From: Mark Cooper <mark.cooper at lyrasis.org<mailto:mark.cooper at lyrasis.org>>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Date: Tuesday, July 28, 2015 at 3:41 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD


In case you're interested, but haven't seen it, there is a doc for the export script:

https://github.com/archivesspace/archivesspace/blob/master/launcher/ead_export/REPO_EAD_EXPORT_README.md

Right now it just exports every EAD associated with a specified repo to a zip file and doesn't have date or incremental awareness. That could be added as the resources endpoint accepts a "modified_since" parameter (as a timestamp). I just rough tested:

date -d '2015-07-01 00:00:00' +'%s' # 1435734000
curl -H "X-ArchivesSpace-Session: $TOKEN" "http://localhost:8089/repositories/2/resources?all_ids=true&modified_since=1435734000"

Returns what appears to be the correct set of results. The obvious problem is that it isn't descendent aware, so it's only direct changes to the topmost resource record that count for the "modified_since" parameter. If the api also factored in descendent mtimes for records types that have them that would have been ideal =) Some workaround, or a solution, for that limitation is going to be required for any time based incremental type export (assuming you need any descendent / component updates to be considered as an update to the resource for what you're doing -- in other words, you may not be able to just rely on the resource mtime).

Mark Cooper
Technical Lead, Hosting and Support
LYRASIS
email: mark.cooper at lyrasis.org<mailto:mark.cooper at lyrasis.org>
skype: mark_c_cooper

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Suda, Phillip J <psuda1 at tulane.edu<mailto:psuda1 at tulane.edu>>
Sent: Tuesday, July 28, 2015 9:13 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD


Thanks all for your suggestions/scripts/help. This is a great start.



Thank you,



Phil





Phillip Suda

Systems Librarian

Howard-Tilton Memorial Library

Tulane University

psuda1 at tulane.edu<mailto:psuda1 at tulane.edu>

504-865-5607







From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Kevin Clair
Sent: Tuesday, July 28, 2015 10:41 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD



I have a Perl script I run from command line that runs every batch export I want or need at this point: https://github.com/duspeccoll/as_utils/blob/master/reports.pl



It grabs the JSON list of all the IDs for a given model, and then either dumps everything into a single JSON object or exports to some other format. The EAD export is lines 206-224. This is *extremely* customized for our environment, and I’ve made no effort yet to modify it for general use, but it’s an idea of how one could go about doing this.  -k



From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Steven Majewski
Sent: Tuesday, July 28, 2015 9:34 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD





There is ead_export.sh in the scripts directory.

It only exports published collections, but that can be changed in the code if needed.

That script runs locally on the AS server and it writes into the archivesspace/data/

directory, so you need write access.



resource ids will not necessarily be sequential after deletions and transfers, but you

can get a JSON list of all of the ids from /repositories/$REPO_ID/resources?all_ids=true

and then loop over those ids.



— Steve Majewski





On Jul 28, 2015, at 11:15 AM, Alexander Duryee <alexanderduryee at nypl.org<mailto:alexanderduryee at nypl.org>> wrote:



Phil,

As far as I'm aware, there's no bulk EAD export functionality in ASpace.  However, since ASpace's resource identifiers are sequential integers, you can loop over each resource id in a repository and make an API call for its EAD record:

for x in {first..last}; do curl -H '[session token]' "https://[address]/repositories/[id]/resource_descriptions/${x}.xml<https://[address]/repositories/%5bid%5d/resource_descriptions/$%7bx%7d.xml>" > aspace_${x}.xml; done

A loop like that should generate EAD records for each resource in your repository.

Regards,

--Alex



On Tue, Jul 28, 2015 at 10:27 AM, Suda, Phillip J <psuda1 at tulane.edu<mailto:psuda1 at tulane.edu>> wrote:

Greetings all,



             Is there an API or mass export feature for exporting all EAD records from a repository, etc.? I am only seeing a collection level export feature.



Thanks,



Phil



Phillip Suda

Systems Librarian

Howard-Tilton Memorial Library

Tulane University

psuda1 at tulane.edu<mailto:psuda1 at tulane.edu>

504-865-5607<tel:504-865-5607>



_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group



--

Alexander Duryee

Metadata Archivist

New York Public Library

(917)-229-9590

alexanderduryee at nypl.org<mailto:alexanderduryee at nypl.org>

_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group







More information about the Archivesspace_Users_Group mailing list