[Archivesspace_Users_Group] Mass export of EAD

Yes, so updating any associated record will update the "system_mtime" field in the data base for the parent record. That because the indexer uses this to tell what records are needing to be reindexed, so for example updating an agent will require the resource to be reindexed.

This is done in the relationship mixin.<https://github.com/archivesspace/archivesspace/blob/master/backend/app/model/mixins/relationships.rb#L702-L755>

Thie modified_since param is what searches against the system_mtime field. It's added with the other pagination parameters<https://github.com/archivesspace/archivesspace/blob/master/backend/app/lib/crud_helpers.rb#L52-L54>, which are really poorly documented.

Sorry, I keep meaning to go back and fix the YARD documentation to include these, since they're kind of mixed into a lot of the end-points. I've been working on the documentation build a lot in preparation for SAA, so I'll be sure to add this to my list.


Just wanted to follow up on this thread. It appears that changing an agent or subject record associated with a resource or archival_object changes the modified date for that resource or archival_object. At least, when using the “modified_since” parameter on both the resource and archival_objects endpoints of the AS API, those resources/archival_objects are returned.

Can someone who’s more familiar with the business logic for the “modified_since” parameter confirm this? I don’t see this parameter documented anywhere in the AS API documentation, so I kind of feel like I’m shooting in the dark.


So I did a bit more looking into this today and apparently the archival_objects parameter accepts the modified_since parameter as well, so as Brian suggests it’s really easy to query for resource and component mtimes.

I tweaked my incremental export script so it looks in both places now:
I haven’t tried, but I’m assuming you could use this same pattern to look for modified agents/subjects etc., if you wanted to.

Also, the other EAD export script that Patrick sent around the other day had an issue with exporting large resources, which I’ve now fixed by streaming the response:
However, if you’re going to do a mass export of all the EAD from your repo it probably makes sense to just use the ead_export script that comes with AS. It’s way more robust and was written by people who actually know what they’re doing.


It seems to me like it might not be a great idea to change the business rules for how a resource record's mtime gets updated, but that it wouldn't be too hard  to add a new field to the resource that tracks the last component update. It also seems like it wouldn't be too hard for the services under discussion to query for component mtimes as well as  resource mtimes.

Hi Mark,
Yup, you’re absolutely right. I made the (erroneous) assumption that changes to mtimes for descendant components would propagate in the resource record as well. This seems like something that would be best done in AS itself; I’m wondering if Brian or Chris have any thoughts about how this could be accomplished?

In case you're interested, but haven't seen it, there is a doc for the export script:


Right now it just exports every EAD associated with a specified repo to a zip file and doesn't have date or incremental awareness. That could be added as the resources endpoint accepts a "modified_since" parameter (as a timestamp). I just rough tested:

date -d '2015-07-01 00:00:00' +'%s' # 1435734000
curl -H "X-ArchivesSpace-Session: $TOKEN" "http://localhost:8089/repositories/2/resources?all_ids=true&modified_since=1435734000"

Returns what appears to be the correct set of results. The obvious problem is that it isn't descendent aware, so it's only direct changes to the topmost resource record that count for the "modified_since" parameter. If the api also factored in descendent mtimes for records types that have them that would have been ideal =) Some workaround, or a solution, for that limitation is going to be required for any time based incremental type export (assuming you need any descendent / component updates to be considered as an update to the resource for what you're doing -- in other words, you may not be able to just rely on the resource mtime).

Thanks all for your suggestions/scripts/help. This is a great start.

I have a Perl script I run from command line that runs every batch export I want or need at this point: https://github.com/duspeccoll/as_utils/blob/master/reports.pl

It grabs the JSON list of all the IDs for a given model, and then either dumps everything into a single JSON object or exports to some other format. The EAD export is lines 206-224. This is *extremely* customized for our environment, and I’ve made no effort yet to modify it for general use, but it’s an idea of how one could go about doing this.  -k

There is ead_export.sh in the scripts directory.

It only exports published collections, but that can be changed in the code if needed.

That script runs locally on the AS server and it writes into the archivesspace/data/

directory, so you need write access.

resource ids will not necessarily be sequential after deletions and transfers, but you

can get a JSON list of all of the ids from /repositories/$REPO_ID/resources?all_ids=true

and then loop over those ids.

As far as I'm aware, there's no bulk EAD export functionality in ASpace.  However, since ASpace's resource identifiers are sequential integers, you can loop over each resource id in a repository and make an API call for its EAD record:

for x in {first..last}; do curl -H '[session token]' "https://[address]/repositories/[id]/resource_descriptions/${x}.xml<https://[address]/repositories/%5bid%5d/resource_descriptions/$%7bx%7d.xml>" > aspace_${x}.xml; done

A loop like that should generate EAD records for each resource in your repository.



Greetings all,

             Is there an API or mass export feature for exporting all EAD records from a repository, etc.? I am only seeing a collection level export feature.



