[Archivesspace_Users_Group] cURL for bulk export of EAD in xml?

Steven Majewski sdm7g at virginia.edu
Thu Mar 19 10:24:06 EDT 2015


+1 : That’s also been on my to-do list. Something that on-publish would export EAD
into repo. ( We’re using subversion now, but would like to switch it to git. ) 

Since there are still a few export glitches, would probably want to add validation on export. 

— Steve Majewski


On Mar 19, 2015, at 10:13 AM, Kevin Clair <Kevin.Clair at du.edu> wrote:

> Hi Mark,
> 
> That (or something like it) has been on my list for a while, but I haven't been able to start working on it. I'd definitely be interested.  -k
> ________________________________________
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org [archivesspace_users_group-bounces at lyralists.lyrasis.org] on behalf of Custer, Mark [mark.custer at yale.edu]
> Sent: Thursday, March 19, 2015 8:02 AM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?
> 
> Huzzah for mass exports!  Speaking of which, I've been wanting to build (or have built) an ArchivesSpace plugin that'll run on a periodic basis to export recently updated and/or published records as EAD/C files directly to a GitHub repository.  I'd like to do that primarily for four (and probably more) reasons:
> 
> 1) to more easily share records with ArchiveGrid
> 2) to just plain share our public finding aids with everyone
> 3) to finally have a good system that'll keep track of the revisions to our description over time
> 4) to include a data license alongside our EAD files
> 
> Would anyone else find this useful?  And more importantly, I suppose, has anyone else already done this or at least started work on such a thing?  If so, please let me know!
> 
> Mark
> 
> 
> 
> -----Original Message-----
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Ben Goldman
> Sent: Thursday, March 19, 2015 9:42 AM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?
> 
> Mary,
> 
> I have little new to add except to say that the directions Noah provided are spot on. As a result of that conversation back in January, we were able to mass export 1900 finding aids for republishing.
> 
> Good luck!
> 
> -Ben
> 
> 
> 
> Ben Goldman
> Digital Records Archivist
> Penn State University Libraries
> University Park, PA
> 814-863-8333
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.libraries.psu.edu_psul_speccolls.html&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=fZJyoG9y4qysAPLVrkOj1X02LNrO6Qti9_FHS7FpPN0&e=
> 
> 
> 
> ----- Original Message -----
> From: "Noah Huffman" <noah.huffman at duke.edu>
> To: "Archivesspace Users Group" <archivesspace_users_group at lyralists.lyrasis.org>
> Sent: Wednesday, March 18, 2015 2:00:22 PM
> Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?
> 
> Mary,
> 
> Below are some instructions I wrote up (pasted from a txt file) for batch exporting EAD through the API using Curl in Windows Powershell.
> 
> You can use one call to export all the resources, you just have to obtain and then include the entire list of resource IDs in the call in this format "{1, 11, 21, 31, ...}."
> 
> Hope this helps.
> 
> -Noah
> 
> Steps for Batch Exporting EAD from ArchivesSpace using CURL (Windows Powershell) and REST API
> 
> 1. Obtain Session Token from ASpace backend (9089) Using CURL
> 
> Command: curl -Fpassword=admin "[backend-url]/users/admin/login"
> 
> 2. Copy Token from response and store as the variable $TOKEN
> 
> Command: $TOKEN = "8e5813109906328fd4ba1cf68be3435cb3b763b056f3d9ca2d992ccac9db794d"
> 
> 3. Obtain a list of resource record identifiers in the appropriate ASpace repository and store as a variable $IDs
> 
> Command: $IDs= curl -H "X-ArchivesSpace-Session: $TOKEN" "[backend-url]/repositories/[repository number]/resources?all_ids=1"
> 
> 4. Replace brackets in list with braces using Powershell regex find and replace and re-save as $IDs variable
> 
> Command: $IDs = $IDs -replace '^\[(.*)\]$', "{`$1}"
> 
> 5. Batch Export EADs to current directory by passing list of resource IDs stored as $IDs variable.
> 
> Command: curl --output "resource_#1.xml" -H "X-ArchivesSpace-Session: $TOKEN" "[backend-url]/repositories/[repository number]/resource_descriptions/$IDs.xml?numbered_cs=true&?include_daos=true&?include_unpublished=true"
> 
> --output option will write filename to current location, #1 will use resource ID as filename for files in batch
> 
> -----Original Message-----
> From: archivesspace_users_group-bounces at lyralists.lyrasis.org [mailto:archivesspace_users_group-bounces at lyralists.lyrasis.org] On Behalf Of Mary Willoughby
> Sent: Wednesday, March 18, 2015 1:46 PM
> To: Archivesspace Users Group
> Subject: Re: [Archivesspace_Users_Group] cURL for bulk export of EAD in xml?
> 
> Thanks! I'll try out the script approach first before wading further into cURL.
> 
> On 3/18/2015 1:24 PM, Steven Majewski wrote:
>> 
>> 
>> See this thread from January: [Archivesspace_Users_Group] curl help
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis
>> .org_pipermail_archivesspace-5Fusers-5Fgroup_2015-0A&d=AwICAg&c=-dg2m7
>> zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4
>> dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=5X4nzEhb2V974MP2z7KtiUg0F0dTH
>> lESSMyFEDA64KA&e= > -January/001059.html>
>> 
>> The API call you want is :
>> 
>> GET  /repositories/$REPO_ID/resource_descriptions/${ID}.xml?${PARAMS}
>> 
>> ( where PARAMS may be something like:
>> "include_daos=true&numbered_cs=true" )
>> 
>> 
>> There isn't one call to export all resources: You have to first do a
>> call to GET  /repositories/$REPO_ID/resources?all_ids=true
>> and loop thru the id's returned with something like:
>> 
>> 
>> for ID in $( curl -s -H "X-ArchivesSpace-Session:
>> $session" "$REPO/repositories/$REPO_ID/resources?all_ids=true" | tail
>> -1
>> | tr '[],' ' ' )
>> do
>> curl  [ . .  . ]
>> 
>> 
>> 
>> If you can directly login to the server, running the ead_export script
>> may be easier.
>> I have seen problems though if there is anything wrong with the
>> exported EAD, you will get incomplete data when Nokogiri silently
>> chokes on it. If you use the API calls, you will get a complete copy
>> of the bad XML.  ( I saw this in the case I noted where ASpace inserts
>> <p> tags incorrectly and exports malformed XML. )
>> 
>> - Steve Majewski
>> 
>> 
>> 
>> On Mar 18, 2015, at 12:52 PM, Mary Willoughby <smirk at uga.edu
>> <mailto:smirk at uga.edu>> wrote:
>> 
>>> Hi everyone,
>>> I'm trying to bulk export EAD as xml using cURL to communicate with
>>> the backend of our ArchivesSpace instance. I've gotten through the
>>> very basic steps-- can connect, get session token, export session
>>> token, login, and get details on specific repositories etc. What I'm
>>> a little confused about is the specific syntax required to do a bulk
>>> export of all the EAD from a given repository. Does anyone know of
>>> any documentation/examples of this, or has anybody tried it and had
>>> it work who would share the commands they used?  I've looked at the
>>> thread from back in January and the HM screencasts about the backend
>>> on youtube, and those have been a great help in getting this far, but
>>> unfortunately I don't know enough about cURL to come up with the
>>> string I need on my own. At least not so far.
>>> 
>>> Thanks,
>>> Mary Willoughby
>>> 
>>> Digital Library of Georgia
>>> _______________________________________________
>>> Archivesspace_Users_Group mailing list
>>> Archivesspace_Users_Group at lyralists.lyrasis.org
>>> <mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis
>>> .org_mailman_listinfo_archivesspace-5Fusers-5Fgro&d=AwICAg&c=-dg2m7zW
>>> uuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4d
>>> skQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=9TsQgrnRnHw3gV0sxKOMDbIqB2w0C
>>> CpueUHx8TjXJVA&e=
>>> up
>> 
>> 
>> 
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.
>> org_mailman_listinfo_archivesspace-5Fusers-5Fgrou&d=AwICAg&c=-dg2m7zWu
>> uDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dsk
>> Q-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=G4CO-A9K88Q_2ZwsgWRK3Gm0tOlOTtAL
>> DGAo3KhHGBE&e=
>> p
>> 
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=bv7NScv_FU4fa1k_TBsgzyjYseGutAZkZPDXvJ68qmA&e=
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=bv7NScv_FU4fa1k_TBsgzyjYseGutAZkZPDXvJ68qmA&e=
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=AwICAg&c=-dg2m7zWuuDZ0MUcV7Sdqw&r=s7ciGQfUJeaV_ryx908hbeXDoU9aqDwDN0Z0VbfsJ3Y&m=3Rvi4dskQ-yVyIRfWo308PoRxVF_yT3eJcc5gwrToJ8&s=bv7NScv_FU4fa1k_TBsgzyjYseGutAZkZPDXvJ68qmA&e=
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
> 
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4943 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150319/cc56e87a/attachment.bin>


More information about the Archivesspace_Users_Group mailing list