[Archivesspace_Users_Group] Data cleanup

Olivia S Solis livsolis at utexas.edu
Tue Feb 13 18:33:00 EST 2018


The other thing I should have mentioned is that we chose to go the route of
the API, but there is always the option of importing directly to the
database. I'm curious if other institutions have done this. It would take
an understanding of the table structures, which seemed quite complex to me,
but perhaps not to a MySQL guru. The JSON was easier for me to absorb
because it's actually simpler than XML and I don't have to wrap my head
around a complex table structure. You definitely need to spend time
understanding the JSON, but after a bit it wasn't that hard, particularly
if you create a dummy record with the fields you want to migrate to.

I'm also wondering if we are in a unique situation. My observations (and
this might not be accurate) is that a lot of institutions were either 1)
already in Archon or some other archival management system or 2) starting
from scratch. We never had an archival management system to migrate from. A
database for accessions and some EAD, though we've still got plenty of
paper records that fall outside of both of those data sources. OpenRefine
was great for normalizing dates, disambiguating agent names, etc. In theory
you could develop an EAD template to export and then upload to ASpace but
that would be an insanely huge project.

I'd recommend a plugin for OpenRefine called BITS VIB. If you've got a
bunch of OpenRefine projects, it makes it super easy to combine projects
provided the records you want to join share a common identifier.

The other nice thing about OpenRefine is the Custom Tabular Export feature.
If you've got a project with a bazillion columns, as I tend to have, you
can select the columns you want to export, rearrange them, and set a custom
delimiter. Other options too. You can just export a subset of your
OpenRefine project depending on the facets and filters you've got selected.
I would imagine it would be useful if you are importing directly to the
database, though I've no experience here.

Thanks,
Olivia

On Tue, Feb 13, 2018 at 5:09 PM, Olivia S Solis <livsolis at utexas.edu> wrote:

> :) I'm a bit corgi obsessed.
>
> On Tue, Feb 13, 2018 at 8:54 AM, Margaret Kidd <kiddm at vcu.edu> wrote:
>
>> Totally off topic but, Olivia, your title slide might be the best ever!
>>
>> Margaret
>>
>>
>> ------------------------------
>>
>> Margaret Turman Kidd
>>
>> Access and Electronic Records Archivist, Special Collections & Archives
>>
>> VCU Libraries | Tompkins-McCaw Library for the Health Sciences
>>
>> 509 N. 12th Street / Box 980582, Richmond, VA 23298-0582
>>
>> (804) 828-3152
>> [image: em_twitter.png] <https://twitter.com/VCUTMLibrary> [image:
>> em_fb.png] <https://www.facebook.com/VCUTMLib>
>>
>>
>> <http://www.vcu.edu/>      <http://vaheritage.org>
>>
>>
>> On Tue, Feb 13, 2018 at 9:25 AM, Olivia S Solis <livsolis at utexas.edu>
>> wrote:
>>
>>> Hi there,
>>>
>>> Yes it is definitely possible to use OpenRefine to migrate data into the
>>> system. It's my primary tool for our data migration. One of the nice things
>>> about OpenRefine is its Templating export option. I was inspired by an
>>> extremely helpful University of Maryland Chaos to Order post:
>>> https://icantiemyownshoes.wordpress.com/2015/11/20/how-i-lea
>>> rned-to-stop-worrying-and-love-the-api/
>>>
>>> That is how we've been creating records and we have developed a number
>>> of templates to export the JSON. There are a few differences in our
>>> process. For instance, the bash script didn't like spaces in content, so we
>>> added a line
>>> IFS=$|
>>>
>>> and all of my templates end in a pipe.
>>>
>>> To update records, we were inspired by a Duke Python script:
>>> https://github.com/duke-libraries/archivesspace-duke-scripts
>>> /blob/master/python/duke_archival_object_metadata_adder.py
>>> https://blogs.library.duke.edu/bitstreams/2016/09/21/archive
>>> sspace-api-fun/
>>> as described in the blog post above.
>>>
>>> So for instance, we migrated our EAD by using BaseX to extract elements.
>>> I created minimal resource records and incrementally added notes.
>>>
>>> The process is somewhat laid out in the slides I made for a presentation:
>>> https://docs.google.com/presentation/d/1cBrd8qzHK4i8S6SQ_vDl
>>> ScERixX95lAbE__hd48yIDg/edit?usp=sharing
>>>
>>> Hopefully, this helps.
>>>
>>> -Olivia
>>>
>>>
>>>
>>>
>>> On Mon, Feb 12, 2018 at 9:45 AM, Joan Curbow <CurbowJ at bvu.edu> wrote:
>>>
>>>> We are a new archives, so we have not had to import any existing data.
>>>> Lucky us, right? But I’m all-too-human, and I’d now like to do some data
>>>> cleanup. Is it possible to do data cleanup using OpenRefine? Theoretically,
>>>> it seems possible to export “stuff” and pull it into OpenRefine, but I’ve
>>>> only ever used OpenRefine in a classroom situation, where the data was
>>>> already populated for us. We did not import the data back into an existing
>>>> database, either, so my experience is limited to just the mechanics of
>>>> OpenRefine. Has anyone used OpenRefine in a real-world situation with data
>>>> that’s already in Aspace? Or is there a better method for data cleanup?
>>>>
>>>>
>>>>
>>>> A further question/complication is that I’m a lone arranger, and my
>>>> instance is hosted by Libraryhost, so any data cleanup may have to be done
>>>> by them? My tech skills are rudimentary, so I’m not clear just how I could
>>>> get this to work. I asked them once, but didn’t get a real answer.
>>>>
>>>>
>>>>
>>>> Sincerely,
>>>>
>>>>
>>>>
>>>> *Joan Curbow*
>>>>
>>>> Reference Librarian and Archivist
>>>>
>>>> Buena Vista University Library
>>>>
>>>> Buena Vista University
>>>>
>>>> 610 West Fourth Street
>>>>
>>>> Storm Lake, Iowa  50588
>>>>
>>>> 712.749.2094 <(712)%20749-2094>
>>>>
>>>> www.library.bvu.edu​​
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Archivesspace_Users_Group mailing list
>>>> Archivesspace_Users_Group at lyralists.lyrasis.org
>>>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>>>
>>>>
>>>
>>>
>>> --
>>> Olivia Solis, MSIS
>>> Metadata Coordinator
>>> Dolph Briscoe Center for American History
>>> The University of Texas at Austin
>>> 2300 Red River St. Stop D1100
>>> Austin TX, 78712-1426
>>> (512) 232-8013 <(512)%20232-8013>
>>>
>>> _______________________________________________
>>> Archivesspace_Users_Group mailing list
>>> Archivesspace_Users_Group at lyralists.lyrasis.org
>>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>>
>>>
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>>
>
>
> --
> Olivia Solis, MSIS
> Metadata Coordinator
> Dolph Briscoe Center for American History
> The University of Texas at Austin
> 2300 Red River St. Stop D1100
> Austin TX, 78712-1426
> (512) 232-8013 <(512)%20232-8013>
>



-- 
Olivia Solis, MSIS
Metadata Coordinator
Dolph Briscoe Center for American History
The University of Texas at Austin
2300 Red River St. Stop D1100
Austin TX, 78712-1426
(512) 232-8013
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20180213/536f3fc3/attachment.html>


More information about the Archivesspace_Users_Group mailing list