[Archivesspace_Users_Group] [EXTERNAL] Re: [EXTERNAL] Re: ASpace database constraints

Huebschen, Alan M ahueb2 at uis.edu
Fri Jan 10 11:36:12 EST 2020


I've been sticking with attempting to upload our modified records through the API in JSON format, but I'm running into some issues.


What I've been attempting to do is remigrate our data from an old instance of Archon to a new ASpace instance, but we have been using ASpace for some time now and there are modified records that I need to move over to our new migration. The reason for redoing the Archon -> ASpace migration is that there were some top container issues which were easily fixed in the Archon db premigration. I'm using the API to iterate over our records that have been modified or created by anyone besides the 'admin' account because that account was only used for the migration, so anything done by any other account needs to be moved over. I have scripts written with ASnake to check over records and download the modified ones to JSON files, but I'm getting some errors when trying to get those records into the newly migrated database.


It appears that there are dependencies such as a top container record needs a location to attach to, I thought that uploading the modified records in order of dependency would solve this issue but I'm still getting errors such as:


Can't relate to non-existent record: /locations/542
./top_containers/35245.json


I have the location records going in first, none of those seem to error out and when I was first testing this it didn't appear that IDs changed, so theoretically the records should be able to get tied together if uploaded in the correct order.


Now that I'm looking into the newly uploaded locations it appears like the IDs are different. Has anyone had to deal with this before and how might I be able to keep IDs consistent so that the dependent records line up?


Thanks,


-Alan Huebschen

University of Illinois at Springfield
Brookens Library Information Systems

________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Mayo, Dave <dave_mayo at harvard.edu>
Sent: Monday, January 6, 2020 9:48 AM
To: Archivesspace Users Group
Subject: [EXTERNAL] Re: [Archivesspace_Users_Group] [EXTERNAL] Re: ASpace database constraints

You’re very welcome, and I’m real glad ASnake is working well for you.

Re: JSONModel vs EAD export – the JSONModel version is authoritative – that is, it’s “the record as the system understands it.”  Import/export to EAD are best effort and defined by the code doing the conversion.  So there can be differences, and those differences can be lossy; if you’re writing code to make scripted changes using the API, you’re much better off altering the JSON and reuploading it than trying to do so via EAD.

However:

If you do want/need to upload EAD, there’s a plugin that provides an API route for doing so: https://github.com/lyrasis/aspace-jsonmodel-from-format

And if you want to get EAD out of the system:

Looking at the script, it appears to be just calling an API method, specifically this one: https://archivesspace.github.io/archivesspace/api/#get-export-metadata-for-a-resource-description

The definition of “export” in the ruby file called by the scripts gives us the params being used:

def export(id)
    params = "include_unpublished=false&include_daos=true&numbered_cs=true"
    url = URI("#{AppConfig[:backend_url]}/repositories/#{repo_id}/resource_descriptions/#{id}.xml?#{params}")
    get(url, :xml)
  end

So, using the example values of 2 for repo_id and 42 for resource ID, it in ASnake this looks like:

repo_id = 2
resource_id = 42
resp = asnake_client.get(f’repositories/{repo_id}/resource_descriptions/{resource_id}.xml’, params={‘include_unpublished’: False, ‘include_daos’: True, ‘numbered_cs’: True})
if (resp.status_code == 200) {
  # resp.text or resp.content to get EAD as either string or bytes


--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of "Huebschen, Alan M" <ahueb2 at uis.edu>
Reply-To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Date: Monday, December 23, 2019 at 10:42 AM
To: Archivesspace Users Group <archivesspace_users_group at lyralists.lyrasis.org>
Subject: Re: [Archivesspace_Users_Group] [EXTERNAL] Re: ASpace database constraints


Thanks Dave!



ASnake is incredibly helpful, I've written scripts to automatically sort through our records to delete what needs to be removed for our database merge.

It looks like it is possible to upload json records through the ASpace API, initially I was planning on uploading our modified records in EAD format but it would be nice if I can streamline the entire process through the API.



Does anyone know if the json records obtained through the API differ from EAD records obtained using the ead_export script that comes with ASpace? At first glance it appears there might be some differences.

Or might there be a way to incorporate EAD export/import using ASnake and the ASpace API?


-Alan Huebschen

University of Illinois at Springfield
Brookens Library Information Systems



________________________________
From: archivesspace_users_group-bounces at lyralists.lyrasis.org <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of Mayo, Dave <dave_mayo at harvard.edu>
Sent: Friday, December 20, 2019 9:02 AM
To: Archivesspace Users Group
Subject: [EXTERNAL] Re: [Archivesspace_Users_Group] ASpace database constraints

Hi Alan,

So, it’s _possible_ to do a cascade delete of records in ASpace, but for a couple of reasons, I think it’s not necessarily a real good idea.  Solr is unhappy not just because there’s leftover info in other tables, but also probably because it can’t inherently see the bulk deletions in MySQL.

I think it’d probably be safer to delete things via the ArchivesSpace API; this way Solr will stay consistent throughout, and subsidiary records that depend on Resource _should_ all get deleted as well (I’m not quite comfortable saying “will,” but if not, you can also clean those up via the API.

There’s an actively maintained API client for ArchivesSpace, ArchivesSnake - https://github.com/archivesspace-labs/archivessnake<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_archivesspace-2Dlabs_archivessnake&d=DwMF-g&c=WO-RGvefibhHBZq3fL85hQ&r=_Mv1dY22K7jvT5MD7xjbvGVzRDOUMhx4WYcnPSIzYnE&m=aqeGmj6cf7ys9vbiRA1UMRmCcUpEl8tbwlqIi9jCiD0&s=wmtkX2JZs_Fu2S0CcbyLWtnM2ceF4Eexlj2X1XYEceQ&e=>; and a lot of example scripts are linked in the documentation.  I think in principle, this would be fairly straightforward; you’d need to collect the ids of the resources you want to delete, and then delete them.

So, just as a simplified example, if you wanted to delete all resources in repository 2:

from asnake.client import ASnakeClient
client = ASnakeClient(username=”admin”, password=”admin”, baseurl=”http://path.to.backend”)
repos_response = client.get(‘repositories/2/resources’, params={“all_ids”: True})
if (repos_response.status_code != 200): raise “Something went wrong!”
for res_id in repos_response.json():
    delete_response = client.delete(‘repositories/2/resources/{}’.format(res_id))
    if (delete_response.status_code != 200): print(“Failed to delete {}“.format(res_id))

If you decide to try this and run into trouble, please feel free to email me, I’d be happy to help walk you through setup/troubleshooting.
--
Dave Mayo (he/him)
Senior Digital Library Software Engineer
Harvard University > HUIT > LTS

From: <archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of "Huebschen, Alan M" <ahueb2 at uis.edu>
Reply-To: Archivesspace Users Group <Archivesspace_Users_Group at lyralists.lyrasis.org>
Date: Friday, December 20, 2019 at 9:44 AM
To: Archivesspace Users Group <Archivesspace_Users_Group at lyralists.lyrasis.org>
Subject: [Archivesspace_Users_Group] ASpace database constraints


Good morning all,



I've been working with some test instances of our database and I need to remove some records in bulk. We are currently running ASpace against a MySQL instance and I am attempting to remove all traces of specific records from the tables.



Easily enough I can delete the records from the resource table after disabling foreign key checks, however it appears that there is information left over in other tables making Solr an unhappy camper. I created an EER diagram in MySQL Workbench to try and figure out which records are tied together, but as someone who is fairly new to database work it's a bit of a headache to wrap my mind around.



>From the research I've done, some records can be set as a parent and with a cascade setting the child records in other tables will be removed when the parent is removed. I've looked at some of the table settings but I haven't been able to figure out what needs to be removed to clean up the db or what the proper order of removal would be.



Has anyone here removed resource table entries and their associated records with success? How can I go about figuring out what I need to remove and/or how to remove it?



-Alan Huebschen

University of Illinois at Springfield

Brookens Library Information Systems
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20200110/740069e9/attachment.html>


More information about the Archivesspace_Users_Group mailing list