[Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Ben Goldman bmg17 at psu.edu
Thu Mar 19 09:42:06 EDT 2015


Mary,

I have little new to add except to say that the directions Noah provided are spot on. As a result of that conversation back in January, we were able to mass export 1900 finding aids for republishing.

Good luck!

-Ben



Ben Goldman 
Digital Records Archivist 
Penn State University Libraries 
University Park, PA 
814-863-8333 
http://www.libraries.psu.edu/psul/speccolls.html 



----- Original Message -----
From: "Noah Huffman" <noah.huffman at duke.edu>
To: "Archivesspace Users Group" <archivesspace_users_group at lyralists.lyrasis.org>
Sent: Wednesday, March 18, 2015 2:00:22 PM
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Mary,

Below are some instructions I wrote up (pasted from a txt file) for batch exporting EAD through the API using Curl in Windows Powershell.

You can use one call to export all the resources, you just have to obtain and then include the entire list of resource IDs in the call in this format "{1, 11, 21, 31, ...}."

Hope this helps.

-Noah

Steps for Batch Exporting EAD from ArchivesSpace using CURL (Windows Powershell) and REST API

1. Obtain Session Token from ASpace backend (9089) Using CURL

Command: curl -Fpassword=admin "[backend-url]/users/admin/login"

2. Copy Token from response and store as the variable $TOKEN

Command: $TOKEN = "8e5813109906328fd4ba1cf68be3435cb3b763b056f3d9ca2d992ccac9db794d"

3. Obtain a list of resource record identifiers in the appropriate ASpace repository and store as a variable $IDs

Command: $IDs= curl -H "X-ArchivesSpace-Session: $TOKEN" "[backend-url]/repositories/[repository number]/resources?all_ids=1"

4. Replace brackets in list with braces using Powershell regex find and replace and re-save as $IDs variable

Command: $IDs = $IDs -replace '^\[(.*)\]$', "{`$1}"

5. Batch Export EADs to current directory by passing list of resource IDs stored as $IDs variable.

Command: curl --output "resource_#1.xml" -H "X-ArchivesSpace-Session: $TOKEN" "[backend-url]/repositories/[repository number]/resource_descriptions/$IDs.xml?numbered_cs=true&?include_daos=true&?include_unpublished=true"

--output option will write filename to current location, #1 will use resource ID as filename for files in batch

-----Original Message-----
From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Mary Willoughby
Sent: Wednesday, March 18, 2015 1:46 PM
To: Archivesspace Users Group
Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Thanks! I'll try out the script approach first before wading further into cURL.

On 3/18/2015 1:24 PM, Steven Majewski wrote:
>
>
> See this thread from January: [Archivesspace_Users_Group] curl help 
> <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/2015
> -January/001059.html>
>
> The API call you want is :
>
> GET  /repositories/$REPO_ID/resource_descriptions/${ID}.xml?${PARAMS}
>
> ( where PARAMS may be something like: 
> "include_daos=true&numbered_cs=true" )
>
>
> There isn't one call to export all resources: You have to first do a 
> call to GET  /repositories/$REPO_ID/resources?all_ids=true
> and loop thru the id's returned with something like:
>
>
> for ID in $( curl -s -H "X-ArchivesSpace-Session:
> $session" "$REPO/repositories/$REPO_ID/resources?all_ids=true" | tail 
> -1
> | tr '[],' ' ' )
> do
> curl  [ . .  . ]
>
>
>
> If you can directly login to the server, running the ead_export script 
> may be easier.
> I have seen problems though if there is anything wrong with the 
> exported EAD, you will get incomplete data when Nokogiri silently 
> chokes on it. If you use the API calls, you will get a complete copy 
> of the bad XML.  ( I saw this in the case I noted where ASpace inserts 
> <p> tags incorrectly and exports malformed XML. )
>
> - Steve Majewski
>
>
>
> On Mar 18, 2015, at 12:52 PM, Mary Willoughby <smirk at uga.edu 
> <mailto:smirk at uga.edu>> wrote:
>
>> Hi everyone,
>> I'm trying to bulk export EAD as xml using cURL to communicate with 
>> the backend of our ArchivesSpace instance. I've gotten through the 
>> very basic steps-- can connect, get session token, export session 
>> token, login, and get details on specific repositories etc. What I'm 
>> a little confused about is the specific syntax required to do a bulk 
>> export of all the EAD from a given repository. Does anyone know of 
>> any documentation/examples of this, or has anybody tried it and had 
>> it work who would share the commands they used?  I've looked at the 
>> thread from back in January and the HM screencasts about the backend 
>> on youtube, and those have been a great help in getting this far, but 
>> unfortunately I don't know enough about cURL to come up with the 
>> string I need on my own. At least not so far.
>>
>> Thanks,
>> Mary Willoughby
>>
>> Digital Library of Georgia
>> _______________________________________________
>> Archivesspace_Users_Group mailing list 
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_gro
>> up
>
>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list 
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_grou
> p
>
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group



More information about the Archivesspace_Users_Group mailing list