[Archivesspace_Users_Group] Import EAD via the API?

Mon Apr 6 16:35:13 EDT 2015

What I had been doing was running EADConverter locally on a batch of files, saving the JSON
output if successful, and posting those JSON files to 
http://archivesspace.github.io/archivesspace/doc/file.API.html#post-repositoriesrepoidbatchimports

( This two stage process was also very useful in earlier versions when the error reporting
  was missing the context: ArchivesSpace would tell you what was missing, but it didn’t point
  to where it was missing from. It was possible to inspect the JSON and look for the null or
  missing value. ) 

You want to:

	converter = EADConverter.new( eadxml )
	converter.run
and then do something (move, copy or sent directly to batch_imports API) with:
	converter.get_output_path

and if you wrap it in a begin/rescue block, you can catch and report the errors. 

I’ve experimented with a couple of variations on the error catching and processing.
For example, if you move the JSON output in the ensure clause ( begin/rescue/ensure ),
you can save the JSON to inspect even if it’s not complete enough to successfully 
import with /batch_imports, but you might not want to mix “good” and “bad” JSON in
your output files.  

More recently, I’ve been experimenting with using an alternate EAD importer with 
looser schema validation rules. 

One problem with importing thousands of EAD files by this batch method is that 
we have had problems with “namespace pollution” of the controlled vocab lists 
for extents and containers. These values are controlled from the webapp and editor,
but importing from EAD adds to the values in the database. If you import a few 
EAD files at a time, it’s not difficult to merge and clean up these values, but
importing several thousand EAD files that aren’t very controlled for those values
led to an explosion that makes the drop down lists of those values unusable. 

See a previous message about this: 
http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/2015-March/001216.html

— Steve Majewski / UVA Alderman Library 

On Apr 6, 2015, at 3:08 PM, Dallas Pillen <djpillen at umich.edu> wrote:

> Hello all,
> 
> I was curious if anyone has had any success starting EAD import jobs via the API?
> 
> I was thinking this could be done using POST /repositories/:repo_id/jobs_with_files described here: http://archivesspace.github.io/archivesspace/doc/file.API.html#post-repositoriesrepoidjobswithfiles
> 
> However, I am not entirely sure how the job and file parameters should be sent in the POST request, and I haven't seen anyone ask this question before or give an example of how it might work. I've tried sending the POST request several different ways and each time I am met with: {"error":{"job":["Parameter required but no value provided"],"files":["Parameter required but no value provided"]}}. 
> 
> I suppose it's worth mentioning that the reason I want to do this is that, at some point, we will be importing several thousand EADs into ArchivesSpace. We're doing a lot of preliminary work to make our EADs import successfully, but know there will likely be some that will fail. Right now, the only way to do a batch import of EADs is to do a batch as a single import job. If one EAD in that job has an error, the entire job fails. For that reason, I would like to be able to import each EAD as a separate job so that the EADs that will import successfully will do so without being impacted by the EADs with errors. However, starting several thousand individual import jobs would be very tedious, and I'm looking for a way to automate that process. If anyone else has come up with any creative solutions or knows of a better way to do that than the API, I would be very interested to know.
> 
> The end goal would be to have a script that would batch start the import jobs, get the ID for each job, check up on the jobs every so often and, once there are no longer any active jobs, output some information about each of the jobs that failed. I've figured out how to do most of that using the API, but I'm stumped on how to get the whole process started.
> 
> Thanks!
> 
> Dallas
> 
> 
> -- 
> Dallas Pillen
> Project Archivist
> 
> 
>   Bentley Historical Library
>   1150 Beal Avenue
>   Ann Arbor, Michigan 48109-2113 
>   734.647.3559
>   Twitter Facebook 
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150406/6da3011e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4943 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150406/6da3011e/attachment.bin>