[Archivesspace_Users_Group] Data cleanup

Edgar Garcia edgar-garcia at northwestern.edu
Wed Feb 14 08:37:06 EST 2018


Then you’ll appreciate that this is waiting for my wife today at home: http://www.thinkgeek.com/product/kloj/?srp=3

Edgar
----------
Edgar Garcia
Senior Software Developer
Discovery Platform Services
Metadata & Discovery Services
Digital Strategies
Northwestern University Libraries
Northwestern University
Evanston, IL 60208
From: <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Olivia S Solis <livsolis at utexas.edu>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Date: Tuesday, February 13, 2018 at 5:09 PM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] Data cleanup

:) I'm a bit corgi obsessed.

On Tue, Feb 13, 2018 at 8:54 AM, Margaret Kidd <kiddm at vcu.edu<mailto:kiddm at vcu.edu>> wrote:
Totally off topic but, Olivia, your title slide might be the best ever!

Margaret




Margaret Turman Kidd

Access and Electronic Records Archivist, Special Collections & Archives

VCU Libraries | Tompkins-McCaw Library for the Health Sciences

509 N. 12th Street / Box 980582, Richmond, VA 23298-0582

(804) 828-3152<tel:(804)%20828-3152>
[em_twitter.png]<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_VCUTMLibrary&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=K_5atGy51REG1LQ2X6PZr15eB98PT06l-ic9z-MME64&e=>[em_fb.png]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_VCUTMLib&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=bac3hX5knnJ-E48C7mDdX3lwzmgdygHBwU5jyMStEe0&e=>


Error! Filename not specified.[http://identity.vcu.edu/images/photos/vcu_brand_mark_email_sig.jpg]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.vcu.edu_&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=OsAZQjYc9WgAOqWLasblWwts2y_SrqNoGlyEVyQd4xE&e=>     [http://pages.shanti.virginia.edu/Virginia_Heritage/files/2013/01/vhpban.jpg] <https://urldefense.proofpoint.com/v2/url?u=http-3A__vaheritage.org&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=st0DZTZMD76RilJTpUOCU34V1zprSlPdKTg5c4YQEPk&e=>


On Tue, Feb 13, 2018 at 9:25 AM, Olivia S Solis <livsolis at utexas.edu<mailto:livsolis at utexas.edu>> wrote:
Hi there,

Yes it is definitely possible to use OpenRefine to migrate data into the system. It's my primary tool for our data migration. One of the nice things about OpenRefine is its Templating export option. I was inspired by an extremely helpful University of Maryland Chaos to Order post:
https://icantiemyownshoes.wordpress.com/2015/11/20/how-i-learned-to-stop-worrying-and-love-the-api/<https://urldefense.proofpoint.com/v2/url?u=https-3A__icantiemyownshoes.wordpress.com_2015_11_20_how-2Di-2Dlearned-2Dto-2Dstop-2Dworrying-2Dand-2Dlove-2Dthe-2Dapi_&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=0tbsuWFx_U7Vtsn2gCr-EtBVJywCEqpJ31yvcPjc3nk&e=>

That is how we've been creating records and we have developed a number of templates to export the JSON. There are a few differences in our process. For instance, the bash script didn't like spaces in content, so we added a line
IFS=$|

and all of my templates end in a pipe.

To update records, we were inspired by a Duke Python script:
https://github.com/duke-libraries/archivesspace-duke-scripts/blob/master/python/duke_archival_object_metadata_adder.py<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_duke-2Dlibraries_archivesspace-2Dduke-2Dscripts_blob_master_python_duke-5Farchival-5Fobject-5Fmetadata-5Fadder.py&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=cbfsf4kpul-fSX68tgL-zqcahhHfA2zmsXjtG3OBDE4&e=>
https://blogs.library.duke.edu/bitstreams/2016/09/21/archivesspace-api-fun/<https://urldefense.proofpoint.com/v2/url?u=https-3A__blogs.library.duke.edu_bitstreams_2016_09_21_archivesspace-2Dapi-2Dfun_&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=-qefkAwZtir42axnbTE8AvmAYoDvhlJfxjYaznRIQRY&e=>
as described in the blog post above.

So for instance, we migrated our EAD by using BaseX to extract elements. I created minimal resource records and incrementally added notes.

The process is somewhat laid out in the slides I made for a presentation:
https://docs.google.com/presentation/d/1cBrd8qzHK4i8S6SQ_vDlScERixX95lAbE__hd48yIDg/edit?usp=sharing<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_presentation_d_1cBrd8qzHK4i8S6SQ-5FvDlScERixX95lAbE-5F-5Fhd48yIDg_edit-3Fusp-3Dsharing&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=0SR7fmyIhhG-mo819WmjV-lTseRlWo5_J8Y6s-2DXHQ&e=>

Hopefully, this helps.

-Olivia




On Mon, Feb 12, 2018 at 9:45 AM, Joan Curbow <CurbowJ at bvu.edu<mailto:CurbowJ at bvu.edu>> wrote:
We are a new archives, so we have not had to import any existing data. Lucky us, right? But I’m all-too-human, and I’d now like to do some data cleanup. Is it possible to do data cleanup using OpenRefine? Theoretically, it seems possible to export “stuff” and pull it into OpenRefine, but I’ve only ever used OpenRefine in a classroom situation, where the data was already populated for us. We did not import the data back into an existing database, either, so my experience is limited to just the mechanics of OpenRefine. Has anyone used OpenRefine in a real-world situation with data that’s already in Aspace? Or is there a better method for data cleanup?

A further question/complication is that I’m a lone arranger, and my instance is hosted by Libraryhost, so any data cleanup may have to be done by them? My tech skills are rudimentary, so I’m not clear just how I could get this to work. I asked them once, but didn’t get a real answer.

Sincerely,

Joan Curbow
Reference Librarian and Archivist
Buena Vista University Library
Buena Vista University
610 West Fourth Street
Storm Lake, Iowa  50588
712.749.2094<tel:(712)%20749-2094>
www.library.bvu.edu<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.library.bvu.edu&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=3tr9n8ZYzyoL1nlLX2cOO3O2g6vCT4byD11uM3tAWW0&e=>​​


_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group<https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=NK-umBXCkIOdhPuORgtYXJ5ygwabItkQiYx7nXMbvx4&e=>



--
Olivia Solis, MSIS
Metadata Coordinator
Dolph Briscoe Center for American History
The University of Texas at Austin
2300 Red River St. Stop D1100
Austin TX, 78712-1426
(512) 232-8013<tel:(512)%20232-8013>

_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group<https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=NK-umBXCkIOdhPuORgtYXJ5ygwabItkQiYx7nXMbvx4&e=>


_______________________________________________
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>
http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group<https://urldefense.proofpoint.com/v2/url?u=http-3A__lyralists.lyrasis.org_mailman_listinfo_archivesspace-5Fusers-5Fgroup&d=DwMFaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=qL9mtrzN5YM9JLty72jmSvS2xXL_z4-MUTBxVwpL3SM&m=gn0LkCour0XUTNFKe5XswZtBLCpalmrOsrZuHB-qMcI&s=NK-umBXCkIOdhPuORgtYXJ5ygwabItkQiYx7nXMbvx4&e=>



--
Olivia Solis, MSIS
Metadata Coordinator
Dolph Briscoe Center for American History
The University of Texas at Austin
2300 Red River St. Stop D1100
Austin TX, 78712-1426
(512) 232-8013
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20180214/661865c3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 200 bytes
Desc: image001.png
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20180214/661865c3/attachment.png>


More information about the Archivesspace_Users_Group mailing list