[Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Custer, Mark mark.custer at yale.edu
Thu Mar 19 16:29:39 EDT 2015


I completely agree.  I'd love to have that sort of functionality as part of ArchivesSpace.  As an aside, I saw a really cool presentation from some folks at Harvard Art Museums recently that showed some really great usage visualizations, including a graph that correlated all of the edits in their cataloging system over time with the online views in their Digital Library, http://www.harvardartmuseums.org/collections, of the corresponding records, which nearly made me weep with envy.

For the time being, though, I just want to share, share, share our publically available descriptions (and really I want to make it easier to keep our files up-to-date in ArchiveGrid).  So, I started this Google Doc, which is completely open for anyone to edit: https://docs.google.com/document/d/1HOxNNRL77i331jL0jp7STz_t1rf8oyorznsi0Js60cw/edit?usp=sharing

Since I don't really know anything about Git, Github, etc., I'd love for folks to provide their comments, suggestions, etc.

All my best, and hopefully more soon about this endeavor,


-----Original Message-----
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Ben Goldman
Sent: Thursday, March 19, 2015 11:39 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Good point, Noah. We did have to make some mass changes post export, but it was relatively painless. The most important change we had to make was to the XML file names, which we were able to do quickly thanks to your ead_renamer python script.

The GitHub idea sounds great, but in my mind it raises a semi-related issue: is version history something we could or should consider within ArchivesSpace, where the information we record about collections includes more than just formal description? 


----- Original Message -----
From: "Noah Huffman" <noah.huffman at duke.edu>
To: "Archivesspace Users Group" <archivesspace_users_group at lyralists.lyrasis.org>
Sent: Thursday, March 19, 2015 10:21:32 AM
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?


I'd be interested in this.  We've discussed using GitHub to track version history for our EAD, but haven't done any work on this front.

For folks doing batch EAD exports, I would still recommend validating your batch before publishing.  I'm still seeing a few validation issues with exported EAD, some related to problems we introduced in our AT database before migration (mostly empty note fields), but a few others that seem inherent to ASpace (namespace conflicts for linking attributes--ns2: vs. xlink:).  We've written some XSLT to post-process our batch exports before publication that seems to catch most of these validation issues.


-----Original Message-----
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Custer, Mark
Sent: Thursday, March 19, 2015 10:03 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Huzzah for mass exports!  Speaking of which, I've been wanting to build (or have built) an ArchivesSpace plugin that'll run on a periodic basis to export recently updated and/or published records as EAD/C files directly to a GitHub repository.  I'd like to do that primarily for four (and probably more) reasons:  

1) to more easily share records with ArchiveGrid
2) to just plain share our public finding aids with everyone
3) to finally have a good system that'll keep track of the revisions to our description over time
4) to include a data license alongside our EAD files

Would anyone else find this useful?  And more importantly, I suppose, has anyone else already done this or at least started work on such a thing?  If so, please let me know!


-----Original Message-----
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Ben Goldman
Sent: Thursday, March 19, 2015 9:42 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?


I have little new to add except to say that the directions Noah provided are spot on. As a result of that conversation back in January, we were able to mass export 1900 finding aids for republishing.

Good luck!


Ben Goldman
Digital Records Archivist
Penn State University Libraries
University Park, PA

----- Original Message -----
From: "Noah Huffman" <noah.huffman at duke.edu>
To: "Archivesspace Users Group" <archivesspace_users_group at lyralists.lyrasis.org>
Sent: Wednesday, March 18, 2015 2:00:22 PM
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?


Below are some instructions I wrote up (pasted from a txt file) for batch exporting EAD through the API using Curl in Windows Powershell.

You can use one call to export all the resources, you just have to obtain and then include the entire list of resource IDs in the call in this format "{1, 11, 21, 31, ...}."

Hope this helps.


Steps for Batch Exporting EAD from ArchivesSpace using CURL (Windows Powershell) and REST API

1. Obtain Session Token from ASpace backend (9089) Using CURL

Command: curl -Fpassword=admin "[backend-url]/users/admin/login"

2. Copy Token from response and store as the variable $TOKEN

Command: $TOKEN = "8e5813109906328fd4ba1cf68be3435cb3b763b056f3d9ca2d992ccac9db794d"

3. Obtain a list of resource record identifiers in the appropriate ASpace repository and store as a variable $IDs

Command: $IDs= curl -H "X-ArchivesSpace-Session: $TOKEN" "[backend-url]/repositories/[repository number]/resources?all_ids=1"

4. Replace brackets in list with braces using Powershell regex find and replace and re-save as $IDs variable

Command: $IDs = $IDs -replace '^\[(.*)\]$', "{`$1}"

5. Batch Export EADs to current directory by passing list of resource IDs stored as $IDs variable.

Command: curl --output "resource_#1.xml" -H "X-ArchivesSpace-Session: $TOKEN" "[backend-url]/repositories/[repository number]/resource_descriptions/$IDs.xml?numbered_cs=true&?include_daos=true&?include_unpublished=true"

--output option will write filename to current location, #1 will use resource ID as filename for files in batch

-----Original Message-----
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Mary Willoughby
Sent: Wednesday, March 18, 2015 1:46 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Thanks! I'll try out the script approach first before wading further into cURL.

On 3/18/2015 1:24 PM, Steven Majewski wrote:
> See this thread from January: [Archivesspace_Users_Group] curl help 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis
> .org_pipermail_archivesspace-5Fusers-5Fgroup_2015-0A&d=AwICAg&c=-dg2m7
> zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4
> dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=5X4nzEhb2V974MP2z7KtiUg0F0dTH
> lESSMyFEDA64KA&e= > -January/001059.html>
> The API call you want is :
> GET  /repositories/$REPO_ID/resource_descriptions/${ID}.xml?${PARAMS}
> ( where PARAMS may be something like: 
> "include_daos=true&numbered_cs=true" )
> There isn't one call to export all resources: You have to first do a 
> call to GET  /repositories/$REPO_ID/resources?all_ids=true
> and loop thru the id's returned with something like:
> for ID in $( curl -s -H "X-ArchivesSpace-Session:
> $session" "$REPO/repositories/$REPO_ID/resources?all_ids=true" | tail
> -1
> | tr '[],' ' ' )
> do
> curl  [ . .  . ]
> If you can directly login to the server, running the ead_export script 
> may be easier.
> I have seen problems though if there is anything wrong with the 
> exported EAD, you will get incomplete data when Nokogiri silently 
> chokes on it. If you use the API calls, you will get a complete copy 
> of the bad XML.  ( I saw this in the case I noted where ASpace inserts 
> <p> tags incorrectly and exports malformed XML. )
> - Steve Majewski
> On Mar 18, 2015, at 12:52 PM, Mary Willoughby <smirk at uga.edu 
> <mailto:smirk at uga.edu>> wrote:
>> Hi everyone,
>> I'm trying to bulk export EAD as xml using cURL to communicate with 
>> the backend of our ArchivesSpace instance. I've gotten through the 
>> very basic steps-- can connect, get session token, export session 
>> token, login, and get details on specific repositories etc. What I'm 
>> a little confused about is the specific syntax required to do a bulk 
>> export of all the EAD from a given repository. Does anyone know of 
>> any documentation/examples of this, or has anybody tried it and had 
>> it work who would share the commands they used?  I've looked at the 
>> thread from back in January and the HM screencasts about the backend 
>> on youtube, and those have been a great help in getting this far, but 
>> unfortunately I don't know enough about cURL to come up with the 
>> string I need on my own. At least not so far.
>> Thanks,
>> Mary Willoughby
>> Digital Library of Georgia
>> _______________________________________________
>> Archivesspace_Users_Group mailing list 
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis
>> .org_mailman_listinfo_archivesspace-5Fusers-5Fgro&d=AwICAg&c=-dg2m7zW
>> uuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4d
>> skQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=9TsQgrnRnHw3gV0sxKOMDbIqB2w0C
>> CpueUHx8TjXJVA&e=
>> up
> _______________________________________________
> Archivesspace_Users_Group mailing list 
> Archivesspace_Users_Group at lyralists.lyrasis.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.
> org_mailman_listinfo_archivesspace-5Fusers-5Fgrou&d=AwICAg&c=-dg2m7zWu
> uDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dsk
> Q-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=G4CO-A9K88Q_2ZwsgWRK3Gm0tOlOTtAL
> DGAo3KhHGBE&e=
> p
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org

More information about the Archivesspace_Users_Group mailing list