[Archivesspace_Users_Group] Import EAD via the API?

Thu Apr 9 13:33:13 EDT 2015

Hi Mark,

Adding "charset=utf-8" did the trick!

This plugin + batch_imports will allow me to do essentially what I was
hoping to do by batch importing EADs.

Thanks so much for the help, and for offering to take a further look into
the error.

Thanks,

Dallas

On Wed, Apr 8, 2015 at 3:05 PM, Mark Cooper <mark.cooper at lyrasis.org> wrote:

>  Hi Dallas,
>
>
>  Have you tried adding "charset=utf-8" to the content-type headers? Also
> feel free to send me an EAD file that produces the error and I'll try to
> look into it over the weekend.
>
>
>  Best,
>
> Mark
>
>
>    Mark Cooper
>  Technical Lead, Hosting and Support
> LYRASIS
> email: mark.cooper at lyrasis.org
> skype: mark_c_cooper
>    ------------------------------
> *From:* archivesspace_users_group-bounces at lyralists.lyrasis.org <
> archivesspace_users_group-bounces at lyralists.lyrasis.org> on behalf of
> Dallas Pillen <djpillen at umich.edu>
> *Sent:* Tuesday, April 7, 2015 9:09 AM
> *To:* Archivesspace Users Group
> *Subject:* Re: [Archivesspace_Users_Group] Import EAD via the API?
>
>  Thanks for the replies!
>
>  Noah: we have talked about first importing our EADs into AT and then
> migrating to ArchivesSpace, but we're trying to do a lot of preliminary
> data cleanup and have made some modifications to the AS EAD importer in an
> attempt to make our legacy data as clean and AS-friendly as possible before
> migrating. Ideally, we'll account for most of the errors we've identified
> during our EAD cleanup and won't need to rely on the more forgiving nature
> of the AT importer. But it's something we're open to taking a look at at
> some point, at least as a supplemental error discovery/identification tool.
>
>  Mark and Steve: Something along those lines just might work. As part of
> our EAD cleanup we are going through our extents, subjects, and other
> potential sources of namespace pollution so those sorts of issues will be
> controlled for in our EADs before our actual migration to AS. This should
> also make a pre-migration conversion to the AS JSONModel more feasible
> since, when we are ready to do so, our EADs will theoretically convert to
> the AS JSONModel with minimal errors.
>
>  Mark, I tried your plugin on a batch of EADs and there were a few cases
> in which the conversion failed due to something like the following:
>
>  (UndefinedConversionError) ""\xC3\xA9"" from UTF-8 to US-ASCII
>
>  The EAD that threw this particular error has some 'é' characters that
> the AS EAD converter imports without issue. Any thoughts on that? Other
> than that, converting our EADs to JSON and then bulk importing each of the
> JSON files would essentially accomplish what I was thinking of doing by
> importing EAD through the API.
>
>  Still, like Noah, I'd still be interested in knowing if anyone else has
> figured out how to start EAD import jobs through the API, partially out of
> curiosity and also to be able to compare some of these potential
> conversion/import strategies.
>
>  Thanks again for all the helpful suggestion!
>
>  Dallas
>
> On Mon, Apr 6, 2015 at 4:35 PM, Steven Majewski <sdm7g at virginia.edu>
> wrote:
>
>>
>>  What I had been doing was running EADConverter locally on a batch of
>> files, saving the JSON
>> output if successful, and posting those JSON files to
>>
>> http://archivesspace.github.io/archivesspace/doc/file.API.html#post-repositoriesrepoidbatchimports
>>
>>  ( This two stage process was also very useful in earlier versions when
>> the error reporting
>>   was missing the context: ArchivesSpace would tell you what was missing,
>> but it didn’t point
>>   to where it was missing from. It was possible to inspect the JSON and
>> look for the null or
>>   missing value. )
>>
>>  You want to:
>>
>>  converter = EADConverter.new( eadxml )
>> converter.run
>>  and then do something (move, copy or sent directly to batch_imports
>> API) with:
>>  converter.get_output_path
>>
>>  and if you wrap it in a begin/rescue block, you can catch and report
>> the errors.
>>
>>
>>  I’ve experimented with a couple of variations on the error catching and
>> processing.
>> For example, if you move the JSON output in the ensure clause (
>> begin/rescue/ensure ),
>> you can save the JSON to inspect even if it’s not complete enough to
>> successfully
>> import with /batch_imports, but you might not want to mix “good” and
>> “bad” JSON in
>> your output files.
>>
>>  More recently, I’ve been experimenting with using an alternate EAD
>> importer with
>> looser schema validation rules.
>>
>>  One problem with importing thousands of EAD files by this batch method
>> is that
>> we have had problems with “namespace pollution” of the controlled vocab
>> lists
>> for extents and containers. These values are controlled from the webapp
>> and editor,
>> but importing from EAD adds to the values in the database. If you import
>> a few
>> EAD files at a time, it’s not difficult to merge and clean up these
>> values, but
>> importing several thousand EAD files that aren’t very controlled for
>> those values
>> led to an explosion that makes the drop down lists of those values
>> unusable.
>>
>>  See a previous message about this:
>>
>> http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/2015-March/001216.html
>>
>>
>>
>>  — Steve Majewski / UVA Alderman Library
>>
>>
>>   On Apr 6, 2015, at 3:08 PM, Dallas Pillen <djpillen at umich.edu> wrote:
>>
>>    Hello all,
>>
>>  I was curious if anyone has had any success starting EAD import jobs
>> via the API?
>>
>>  I was thinking this could be done using POST
>> /repositories/:repo_id/jobs_with_files described here:
>> http://archivesspace.github.io/archivesspace/doc/file.API.html#post-repositoriesrepoidjobswithfiles
>>
>>  However, I am not entirely sure how the job and file parameters should
>> be sent in the POST request, and I haven't seen anyone ask this question
>> before or give an example of how it might work. I've tried sending the POST
>> request several different ways and each time I am met with:
>> {"error":{"job":["Parameter required but no value
>> provided"],"files":["Parameter required but no value provided"]}}.
>>
>>  I suppose it's worth mentioning that the reason I want to do this is
>> that, at some point, we will be importing several thousand EADs into
>> ArchivesSpace. We're doing a lot of preliminary work to make our EADs
>> import successfully, but know there will likely be some that will fail.
>> Right now, the only way to do a batch import of EADs is to do a batch as a
>> single import job. If one EAD in that job has an error, the entire job
>> fails. For that reason, I would like to be able to import each EAD as a
>> separate job so that the EADs that will import successfully will do so
>> without being impacted by the EADs with errors. However, starting several
>> thousand individual import jobs would be very tedious, and I'm looking for
>> a way to automate that process. If anyone else has come up with any
>> creative solutions or knows of a better way to do that than the API, I
>> would be very interested to know.
>>
>>  The end goal would be to have a script that would batch start the
>> import jobs, get the ID for each job, check up on the jobs every so often
>> and, once there are no longer any active jobs, output some information
>> about each of the jobs that failed. I've figured out how to do most of that
>> using the API, but I'm stumped on how to get the whole process started.
>>
>>  Thanks!
>>
>>  Dallas
>>
>>
>>  --
>>
>> *Dallas Pillen *Project Archivist
>>
>>
>>    Bentley Historical Library <http://bentley.umich.edu/>
>>   1150 Beal Avenue
>>   Ann Arbor, Michigan 48109-2113
>>   734.647.3559
>>   Twitter <https://twitter.com/umichBentley> Facebook
>> <https://www.facebook.com/bentleyhistoricallibrary>
>>      _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>>
>>
>> _______________________________________________
>> Archivesspace_Users_Group mailing list
>> Archivesspace_Users_Group at lyralists.lyrasis.org
>> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>>
>>
>
>
>  --
>
> *Dallas Pillen *Project Archivist
>
>
>    Bentley Historical Library <http://bentley.umich.edu/>
>   1150 Beal Avenue
>   Ann Arbor, Michigan 48109-2113
>   734.647.3559
>   Twitter <https://twitter.com/umichBentley> Facebook
> <https://www.facebook.com/bentleyhistoricallibrary>
>
> _______________________________________________
> Archivesspace_Users_Group mailing list
> Archivesspace_Users_Group at lyralists.lyrasis.org
> http://lyralists.lyrasis.org/mailman/listinfo/archivesspace_users_group
>
>

-- 

*Dallas Pillen*Project Archivist

  Bentley Historical Library <http://bentley.umich.edu/>
  1150 Beal Avenue
  Ann Arbor, Michigan 48109-2113
  734.647.3559
  Twitter <https://twitter.com/umichBentley> Facebook
<https://www.facebook.com/bentleyhistoricallibrary>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20150409/a8cbe955/attachment.html>