[Archivesspace_Users_Group] brainstorming -- data integrity checking

Callahan, Maureen maureen.callahan at yale.edu
Tue Jun 2 15:44:40 EDT 2015

Hi Nathan,

Yeah, that’s what I just suggested in my email. There are two problems with this idea. One, refids are only unique to the resource. This is fine if my query can also pull down the resource identifier that a component belongs to, but here’s where we get to our next problem… The resourcescomponents table in AT doesn’t reference the related resource unless it’s a top-level component. My colleague Steelsen Smith wrote some SQL procedures<https://github.com/SteelsenS/ATK_Tools/tree/master/databaseTools> to recursively find the parent of a component until it gets to a component that has a resource record ID, but this obviously won’t find anything if I have an orphan component (in fact, the procedures don’t envision this, so mysql hangs and then barfs). I think my best bet will probably be to modify Steelsen’s procedures to manage error handling, so it can tell me explicitly that a component doesn’t have a parent resource.

Obviously, I’m open to more straightforward approaches! It would probably be good to know, as part of the migrator, not just that resources migrated but that all parts of all resources are accounted for.


On Jun 2, 2015, at 2:22 PM, Nathan Stevens <ns96 at nyu.edu<mailto:ns96 at nyu.edu>> wrote:

Try a combination of ref_ids and call numbers (I am assuming call numbers get placed into the component unique id field in AT).  A combination of those two fields should result in uniquely identifying the resource components as long as call numbers were always field out.

On Tue, Jun 2, 2015 at 1:59 PM, Callahan, Maureen <maureen.callahan at yale.edu<mailto:maureen.callahan at yale.edu>> wrote:
Hi everyone,

We’re running our fourth of four migrations at Yale this week (WOOHOO), and right now I’m doing some data integrity checking between AT and ArchivesSpace to get slightly more detailed information than what the migrator provides.

Almost everything looks pretty good, except for one problem – we have 2603 fewer archival_objects in ArchivesSpace than we had components in Archivists’ Toolkit. This wasn’t a problem for migrations for other repositories. I am almost positive that this was the result of irresponsible SQL delete statements many moons ago, and that the components in AT are orphans, but I’d like to figure out a way to check. Does anyone have any ideas? Maybe comparing refids and call numbers? All ideas are welcome.


Maureen Callahan
Archivist, Metadata Specialist
Manuscripts & Archives
Yale University Library
maureen.callahan at yale.edu<mailto:maureen.callahan at yale.edu>

Webpage: web.library.yale.edu/mssa<http://web.library.yale.edu/mssa>
Collections: drs.library.yale.edu<http://drs.library.yale.edu/>

Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>

Nathan Stevens
Digital Library Technology Services
New York University

ns96 at nyu.edu<mailto:ns96 at nyu.edu>
Archivesspace_Users_Group mailing list
Archivesspace_Users_Group at lyralists.lyrasis.org<mailto:Archivesspace_Users_Group at lyralists.lyrasis.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150602/09f585d7/attachment.html>

More information about the Archivesspace_Users_Group mailing list