[Archivesspace_Users_Group] Batch imports [was: EAD Import - cryptic error messages]
Steven Majewski
sdm7g at virginia.edu
Tue Feb 18 15:59:51 EST 2014
On Feb 18, 2014, at 2:40 PM, Noah Huffman <noah.huffman at duke.edu> wrote:
> Hello,
>
> As others have mentioned, some of the EAD import error log messages are rather cryptic. Could anyone help decipher the two messages below? After running a few test imports, I’m getting the first “wrong number of arguments” error quite a bit on schema valid EAD files.
>
> Error: wrong number of arguments (6 for 4)
>
> Error: Unexpected Object Type in Queue: Expected archival_object got file_version
>
>
> Also, is there any method for batch importing EAD that will allow the entire batch to process even if one particular file fails?
>
> Thanks,
> -Noah
Here’s what we’re hacked together:
[1] created a top-level env.sh file with PATH settings and env variables cribbed from scripts/jirb:
# cd to archivespace top-level directory and source this file
export BUNDLE_GEMFILE="$PWD/backend/Gemfile"
JAVA_OPTS="$JAVA_OPTS -Daspace.config.search_user_secret=devserver -Daspace.config.public_user_secret=devserver -Daspace.config.staff_user_secret=devserver"
export JAVA_OPTS
export RUBYLIB=$PWD/common/
PATH=$PWD/build/gems/bin:$PATH:$PWD/scripts:$PWD/backend/scripts
[2] created a Ruby script in backend/scripts/ead_parse.rb
( figured out the gist of this from looking at spec test: backend/spec/lib_ead_converter_spec.rb )
#!/usr/bin/env jruby
#
# script attempts to parse files with EADConverter
#
# You need to source env.sh to get settings for
# JAVA_OPTS, RUBYLIB, etc. before running the script.
#
require_relative '../spec/spec_helper'
require_relative '../app/converters/ead_converter'
ARGV.each do |eadxml|
begin
converter = EADConverter.new( eadxml )
converter.run
puts eadxml + " : ok."
# if parse is successful, then write out json for later import
outname = eadxml.slice(0..eadxml.rindex('.')) + 'json'
out = File::open( outname, 'w' )
puts "writing json to: " + outname
out.write(IO.read(converter.get_output_path))
out.close
rescue Exception => e
puts eadxml + " : failed: " + e.message
$stderr.puts e.backtrace
end
end
There will be a lot of stack traces on stderr, but stdout will just be a listing of success and failures along with the JSON output filenames.
Running that script on a directory full of EAD xml files will leave behind .json files for the ones that successfully parse.
You can then import the json files with something like ( where $N = repository number ) :
for JSON in *.json
> do
> curl_as_osx admin admin -d @$JSON http://localhost:8089/repositories/$N/batch_imports
> done
An advantage of the separate parse & import is that you can do all of the parsing on your development laptop,
but post to another server’s backend port.
— Steve Majewski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20140218/f325a0c8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4943 bytes
Desc: not available
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20140218/f325a0c8/attachment.bin>
More information about the Archivesspace_Users_Group
mailing list