[Archivesspace_Users_Group] Mass export of EAD

Arnold, Hillel harnold at rockarch.org
Tue Aug 4 12:11:16 EDT 2015


Hi,
Just wanted to follow up on this thread. It appears that changing an agent or subject record associated with a resource or archival_object changes the modified date for that resource or archival_object. At least, when using the “modified_since” parameter on both the resource and archival_objects endpoints of the AS API, those resources/archival_objects are returned.

Can someone who’s more familiar with the business logic for the “modified_since” parameter confirm this? I don’t see this parameter documented anywhere in the AS API documentation, so I kind of feel like I’m shooting in the dark.

Thanks!

Hillel Arnold
Lead Digital Archivist
Rockefeller Archive Center

From: <Arnold>, Hillel Arnold <harnold at rockarch.org<mailto:harnold at rockarch.org>>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Date: Wednesday, July 29, 2015 at 6:48 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD

So I did a bit more looking into this today and apparently the archival_objects parameter accepts the modified_since parameter as well, so as Brian suggests it’s really easy to query for resource and component mtimes.

I tweaked my incremental export script so it looks in both places now:
https://github.com/RockefellerArchiveCenter/scripts/blob/master/archivesspace/asExportIncremental.py
I haven’t tried, but I’m assuming you could use this same pattern to look for modified agents/subjects etc., if you wanted to.

Also, the other EAD export script that Patrick sent around the other day had an issue with exporting large resources, which I’ve now fixed by streaming the response:
https://github.com/RockefellerArchiveCenter/scripts/blob/master/archivesspace/asExport-ead.py
However, if you’re going to do a mass export of all the EAD from your repo it probably makes sense to just use the ead_export script that comes with AS. It’s way more robust and was written by people who actually know what they’re doing.

Hillel

From: brian <brianjhoffman at gmail.com<mailto:brianjhoffman at gmail.com>>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Date: Wednesday, July 29, 2015 at 12:12 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD

It seems to me like it might not be a great idea to change the business rules for how a resource record's mtime gets updated, but that it wouldn't be too hard  to add a new field to the resource that tracks the last component update. It also seems like it wouldn't be too hard for the services under discussion to query for component mtimes as well as  resource mtimes.


Sent from my T-Mobile 4G LTE Device


-------- Original message --------
From: "Arnold, Hillel" <harnold at rockarch.org<mailto:harnold at rockarch.org>>
Date:07/28/2015 4:23 PM (GMT-05:00)
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Cc:
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD

Hi Mark,
Yup, you’re absolutely right. I made the (erroneous) assumption that changes to mtimes for descendant components would propagate in the resource record as well. This seems like something that would be best done in AS itself; I’m wondering if Brian or Chris have any thoughts about how this could be accomplished?

Hillel Arnold
Lead Digital Archivist
Rockefeller Archive Center

From: Mark Cooper <mark.cooper at lyrasis.org<mailto:mark.cooper at lyrasis.org>>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Date: Tuesday, July 28, 2015 at 3:41 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD


In case you're interested, but haven't seen it, there is a doc for the export script:

https://github.com/archivesspace/archivesspace/blob/master/launcher/ead_export/REPO_EAD_EXPORT_README.md

Right now it just exports every EAD associated with a specified repo to a zip file and doesn't have date or incremental awareness. That could be added as the resources endpoint accepts a "modified_since" parameter (as a timestamp). I just rough tested:

date -d '2015-07-01 00:00:00' +'%s' # 1435734000
curl -H "X-ArchivesSpace-Session: $TOKEN" "http://localhost:8089/repositories/2/resources?all_ids=true&modified_since=1435734000"

Returns what appears to be the correct set of results. The obvious problem is that it isn't descendent aware, so it's only direct changes to the topmost resource record that count for the "modified_since" parameter. If the api also factored in descendent mtimes for records types that have them that would have been ideal =) Some workaround, or a solution, for that limitation is going to be required for any time based incremental type export (assuming you need any descendent / component updates to be considered as an update to the resource for what you're doing -- in other words, you may not be able to just rely on the resource mtime).

Mark Cooper
Technical Lead, Hosting and Support
LYRASIS
email: mark.cooper at lyrasis.org<mailto:mark.cooper at lyrasis.org>
skype: mark_c_cooper

________________________________
From:archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> <archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org>> on behalf of Suda, Phillip J <psuda1 at tulane.edu<mailto:psuda1 at tulane.edu>>
Sent: Tuesday, July 28, 2015 9:13 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD


Thanks all for your suggestions/scripts/help. This is a great start.



Thank you,



Phil





Phillip Suda

Systems Librarian

Howard-Tilton Memorial Library

Tulane University

psuda1 at tulane.edu<mailto:psuda1 at tulane.edu>

504-865-5607







From:archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Kevin Clair
Sent: Tuesday, July 28, 2015 10:41 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org<mailto:archivesspace_users_group at lyralists.lyrasis.org>>
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD



I have a Perl script I run from command line that runs every batch export I want or need at this point: https://github.com/duspeccoll/as_utils/blob/master/reports.pl



It grabs the JSON list of all the IDs for a given model, and then either dumps everything into a single JSON object or exports to some other format. The EAD export is lines 206-224. This is *extremely* customized for our environment, and I’ve made no effort yet to modify it for general use, but it’s an idea of how one could go about doing this.  -k



From: archivesspace_users_group-bounces at lyralists.lyrasis.org<mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org> [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Steven Majewski
Sent: Tuesday, July 28, 2015 9:34 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] Mass export of EAD





There is ead_export.sh in the scripts directory.

It only exports published collections, but that can be changed in the code if needed.

That script runs locally on the AS server and it writes into the archivesspace/data/

directory, so you need write access.



resource ids will not necessarily be sequential after deletions and transfers, but you

can get a JSON list of all of the ids from /repositories/$REPO_ID/resources?all_ids=true

and then loop over those ids.



— Steve Majewski





On Jul 28, 2015, at 11:15 AM, Alexander Duryee <alexanderduryee at nypl.org<mailto:alexanderduryee at nypl.org>> wrote:



Phil,

As far as I'm aware, there's no bulk EAD export functionality in ASpace.  However, since ASpace's resource identifiers are sequential integers, you can loop over each resource id in a repository and make an API call for its EAD record:

for x in {first..last}; do curl -H '[session token]' "https://[address]/repositories/[id]/resource_descriptions/${x}.xml<https://[address]/repositories/%5bid%5d/resource_descriptions/$%7bx%7d.xml>" > aspace_${x}.xml; done

A loop like that should generate EAD records for each resource in your repository.

Regards,

--Alex



On Tue, Jul 28, 2015 at 10:27 AM, Suda, Phillip J <psuda1 at tulane.edu<mailto:psuda1 at tulane.edu>> wrote:

Greetings all,



             Is there an API or mass export feature for exporting all EAD records from a repository, etc.? I am only seeing a collection level export feature.



Thanks,



Phil



Phillip Suda

Systems Librarian

Howard-Tilton Memorial Library

Tulane University

psuda1 at tulane.edu<mailto:psuda1 at tulane.edu>

504-865-5607<tel:504-865-5607>



_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group



--

Alexander Duryee

Metadata Archivist

New York Public Library

(917)-229-9590

alexanderduryee at nypl.org<mailto:alexanderduryee at nypl.org>

_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150804/24629236/attachment.html>


More information about the Archivesspace_Users_Group mailing list