[Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Custer, Mark mark.custer at yale.edu
Thu Mar 19 10:02:52 EDT 2015


Huzzah for mass exports!  Speaking of which, I've been wanting to build (or have built) an ArchivesSpace plugin that'll run on a periodic basis to export recently updated and/or published records as EAD/C files directly to a GitHub repository.  I'd like to do that primarily for four (and probably more) reasons:  

1) to more easily share records with ArchiveGrid
2) to just plain share our public finding aids with everyone
3) to finally have a good system that'll keep track of the revisions to our description over time
4) to include a data license alongside our EAD files

Would anyone else find this useful?  And more importantly, I suppose, has anyone else already done this or at least started work on such a thing?  If so, please let me know!

Mark



-----Original Message-----
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Ben Goldman
Sent: Thursday, March 19, 2015 9:42 AM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Mary,

I have little new to add except to say that the directions Noah provided are spot on. As a result of that conversation back in January, we were able to mass export 1900 finding aids for republishing.

Good luck!

-Ben



Ben Goldman
Digital Records Archivist
Penn State University Libraries
University Park, PA
814-863-8333
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.libraries.psu.edu_psul_speccolls.html&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=fZJyoG9y4qysAPLVrkOj1X02LNrO6Qti9_FHS7FpPN0&e=  



----- Original Message -----
From: "Noah Huffman" <noah.huffman at duke.edu>
To: "Archivesspace Users Group" <archivesspace_users_group at lyralists.lyrasis.org>
Sent: Wednesday, March 18, 2015 2:00:22 PM
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Mary,

Below are some instructions I wrote up (pasted from a txt file) for batch exporting EAD through the API using Curl in Windows Powershell.

You can use one call to export all the resources, you just have to obtain and then include the entire list of resource IDs in the call in this format "{1, 11, 21, 31, ...}."

Hope this helps.

-Noah

Steps for Batch Exporting EAD from ArchivesSpace using CURL (Windows Powershell) and REST API

1. Obtain Session Token from ASpace backend (9089) Using CURL

Command: curl -Fpassword=admin "[backend-url]/users/admin/login"

2. Copy Token from response and store as the variable $TOKEN

Command: $TOKEN = "8e5813109906328fd4ba1cf68be3435cb3b763b056f3d9ca2d992ccac9db794d"

3. Obtain a list of resource record identifiers in the appropriate ASpace repository and store as a variable $IDs

Command: $IDs= curl -H "X-ArchivesSpace-Session: $TOKEN" "[backend-url]/repositories/[repository number]/resources?all_ids=1"

4. Replace brackets in list with braces using Powershell regex find and replace and re-save as $IDs variable

Command: $IDs = $IDs -replace '^\[(.*)\]$', "{`$1}"

5. Batch Export EADs to current directory by passing list of resource IDs stored as $IDs variable.

Command: curl --output "resource_#1.xml" -H "X-ArchivesSpace-Session: $TOKEN" "[backend-url]/repositories/[repository number]/resource_descriptions/$IDs.xml?numbered_cs=true&?include_daos=true&?include_unpublished=true"

--output option will write filename to current location, #1 will use resource ID as filename for files in batch

-----Original Message-----
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Mary Willoughby
Sent: Wednesday, March 18, 2015 1:46 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Thanks! I'll try out the script approach first before wading further into cURL.

On 3/18/2015 1:24 PM, Steven Majewski wrote:
>
>
> See this thread from January: [Archivesspace_Users_Group] curl help 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis
> .org_pipermail_archivesspace-5Fusers-5Fgroup_2015-0A&d=AwICAg&c=-dg2m7
> zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4
> dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=5X4nzEhb2V974MP2z7KtiUg0F0dTH
> lESSMyFEDA64KA&e= > -January/001059.html>
>
> The API call you want is :
>
> GET  /repositories/$REPO_ID/resource_descriptions/${ID}.xml?${PARAMS}
>
> ( where PARAMS may be something like: 
> "include_daos=true&numbered_cs=true" )
>
>
> There isn't one call to export all resources: You have to first do a 
> call to GET  /repositories/$REPO_ID/resources?all_ids=true
> and loop thru the id's returned with something like:
>
>
> for ID in $( curl -s -H "X-ArchivesSpace-Session:
> $session" "$REPO/repositories/$REPO_ID/resources?all_ids=true" | tail
> -1
> | tr '[],' ' ' )
> do
> curl  [ . .  . ]
>
>
>
> If you can directly login to the server, running the ead_export script 
> may be easier.
> I have seen problems though if there is anything wrong with the 
> exported EAD, you will get incomplete data when Nokogiri silently 
> chokes on it. If you use the API calls, you will get a complete copy 
> of the bad XML.  ( I saw this in the case I noted where ASpace inserts 
> <p> tags incorrectly and exports malformed XML. )
>
> - Steve Majewski
>
>
>
> On Mar 18, 2015, at 12:52 PM, Mary Willoughby <smirk at uga.edu 
> <mailto:smirk at uga.edu>> wrote:
>
>> Hi everyone,
>> I'm trying to bulk export EAD as xml using cURL to communicate with 
>> the backend of our ArchivesSpace instance. I've gotten through the 
>> very basic steps-- can connect, get session token, export session 
>> token, login, and get details on specific repositories etc. What I'm 
>> a little confused about is the specific syntax required to do a bulk 
>> export of all the EAD from a given repository. Does anyone know of 
>> any documentation/examples of this, or has anybody tried it and had 
>> it work who would share the commands they used?  I've looked at the 
>> thread from back in January and the HM screencasts about the backend 
>> on youtube, and those have been a great help in getting this far, but 
>> unfortunately I don't know enough about cURL to come up with the 
>> string I need on my own. At least not so far.
>>
>> Thanks,
>> Mary Willoughby
>>
>> Digital Library of Georgia
>> _______________________________________________
>> Archivesspace_Users_Group mailing list 
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis
>> .org_mailman_listinfo_archivesspace-5Fusers-5Fgro&d=AwICAg&c=-dg2m7zW
>> uuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4d
>> skQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=9TsQgrnRnHw3gV0sxKOMDbIqB2w0C
>> CpueUHx8TjXJVA&e=
>> up
>
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list 
> Archivesspace_Users_Group at lyralists.lyrasis.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.
> org_mailman_listinfo_archivesspace-5Fusers-5Fgrou&d=AwICAg&c=-dg2m7zWu
> uDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dsk
> Q-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=G4CO-A9K88Q_2ZwsgWRK3Gm0tOlOTtAL
> DGAo3KhHGBE&e=
> p
>
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=bv7NScv_FU4fa1k_TBsgzyjYseGutAZkZPDXvJ68qmA&e=
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=bv7NScv_FU4fa1k_TBsgzyjYseGutAZkZPDXvJ68qmA&e=
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=bv7NScv_FU4fa1k_TBsgzyjYseGutAZkZPDXvJ68qmA&e= 



More information about the Archivesspace_Users_Group mailing list