[Archivesspace_Users_Group] Curious! : Import Job says it failed but resources imported [SQLException: Lock wait timeout exceeded; try restarting transaction]

Fri Feb 5 14:06:23 EST 2016

More details:

It appears that the job is hitting a MySQL transaction timeout.
This is internal to MySQL and separate from AppConfig[:job_timeout_seconds]. Changing that AppConfig timeout has no effect on the results. This error does not occur when running against the derby demo db.

It appears that that MySQL error is not causing a rollback of the transaction, and the error seems to be generated when the transaction is closed, so it appears to leave all of the resources created from that job intact. But the job is left with a “running" status. It will eventually be relaunched as a “stale” job on another thread.  ( This is where AppConfig[:job_timeout_seconds] comes in: the only effect of increasing that value is a longer wait before the job is considered stale and relaunched. ) When the job is relaunched, it starts again with the first file, and it hits the duplicate ID error, which finally causes the status to be updated to ‘failed’ .

From googling the errors it appears that the MySQL variable that needs to be changed is @@innodb_lock_wait_timeout.
However, I haven’t tested that yet, and I’m not sure that is the way to fix this problem. I’m thinking it might be better to limit the transaction to a single file import rather than the whole job, but perhaps that is done to keep the job status locked and held by the processing thread.

I guess the immediate solution is just to run smaller import jobs. But I was investigating this due to the ambiguous result: I wasn’t sure if the resources produced by the failed job should be trusted to be intact and valid, but if my interpretation above is correct, then the job has run to completion, but the job status was not updated correctly.

[ extracts from the log file, with stack traces truncated and some extra debug output added to backend/app/lib/background_job_queue.rb ]

     [java] D, [2016-02-03T07:47:40.754000 #18042] DEBUG -- : Thread-5162: Run_Pending_Job: #<Job:0x2318025e> {
     [java] D, [2016-02-03T07:47:40.762000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:47:45.769000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:47:50.778000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:47:55.786000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:48:00.794000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:48:05.800000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:48:10.806000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:48:15.812000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:48:20.816000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:48:25.823000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:48:30.827000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:49:22.939000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:50:14.025000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:51:06.112000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:51:58.214000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:52:50.316000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:53:42.428000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:54:34.505000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:55:26.607000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] D, [2016-02-03T07:56:18.710000 #18042] DEBUG -- : Thread-5164: Running job Job:1
     [java] E, [2016-02-03T07:58:16.577000 #18042] ERROR -- : Thread-5162: Job 1 failed: Java::JavaSql::SQLException: Lock wait timeout exceeded; try restarting transaction ["com.mysql.jdbc.SQLError.createSQLException(com/mysql/jdbc/SQLError.java:1086)", "com.mysql.jdbc.MysqlIO.checkErrorPacket(com/mysql/jdbc/MysqlIO.java:4237)", "com.mysql.jdbc.MysqlIO.checkErrorPacket(com/mysql/jdbc/MysqlIO.java:4169)", "com.mysql.jdbc.MysqlIO.sendCommand(com/mysql/jdbc/MysqlIO.java:2617)", "com.mysql.jdbc.MysqlIO.sqlQueryDirect(com/mysql/jdbc/MysqlIO.java:2778)", "com.mysql.jdbc.ConnectionImpl.execSQL(com/m
     [java] D, [2016-02-03T07:58:21.603000 #18042] DEBUG -- : Thread-5162: Stale_Job:#<Job:0x7a0115ee> {
     [java] D, [2016-02-03T07:58:21.614000 #18042] DEBUG -- : Thread-5162: Run_Pending_Job: #<Job:0x7a0115ee> {
     [java] D, [2016-02-03T07:58:21.616000 #18042] DEBUG -- : Thread-6580: Running job Job:1
     [java] D, [2016-02-03T07:58:26.622000 #18042] DEBUG -- : Thread-6580: Running job Job:1
     [java] D, [2016-02-03T07:58:31.626000 #18042] DEBUG -- : Thread-6580: Running job Job:1
     [java] D, [2016-02-03T07:58:36.630000 #18042] DEBUG -- : Thread-6580: Running job Job:1
     [java] D, [2016-02-03T07:58:41.633000 #18042] DEBUG -- : Thread-6580: Running job Job:1
     [java] E, [2016-02-03T07:58:45.759000 #18042] ERROR -- : Thread-5162: Job 1 failed: Problem creating 'University of Virginia Hospital Executive Director's Office (HEDO) Collection': id_0 That ID is already in use, ead_id Must be unique ["/projects/Archivespace/dcs-archivesspace/build/gems/gems/sequel-4.20.0/lib/sequel/model/base.rb:1545:in `save'", "/projects/Archivespace/dcs-archivesspace/build/gems/gems/sequel-4.20.0/lib/sequel/model/base.rb:148:in `create'", "/projects/Archivespace/dcs-archivesspace/backend/app/model/ASModel_crud.rb:345:in `create_from_json'", "/projects/Archivespace/d
     [java] D, [2016-02-03T07:58:46.641000 #18042] DEBUG -- : Thread-5162: Completed job Job:1

On Feb 1, 2016, at 1:28 PM, Majewski, Steven (sdm7g) <sdm7g at eservices.virginia.edu<mailto:sdm7g at eservices.virginia.edu>> wrote:

Just noting that I’m still seeing this phenomenon with 1.4.2 on a clean install starting with a db:nuke database
and a clean archivesspace/data directory.

It only seems to occur when importing a large number of EAD files in one batch.
If the import is broken up into smaller batches it works as expected. ( No problems with a batch of 10 )
With a batch of 50 or more, it imports all of the guides, then attempts to import the first guide a second time,
and the batch fails with an error message about duplicate ID.  However, clicking on ‘browse resources’, all
of the EAD files appear to have been imported.

In this last case, I selected 64 files, the import job lists 64 files, and I get 64 resources.
Viewing the Failed background job shows 64 files.
The data/shared/job_files/import_job_1/ directory contains 64 hex-renamed xml files plus output.log.
/repositories/4/jobs/1 lists 64 filenames.
But the log shows that after processing 64 files, it attempts to import the first one on the list again
and fails.

— Steve.

On Jan 12, 2016, at 5:56 PM, Majewski, Steven (sdm7g) <sdm7g at eservices.virginia.edu<mailto:sdm7g at eservices.virginia.edu>> wrote:

Very curious error observed:

Trying to move 50 resources from test server to production.

Same version: ArchivesSpace v1.4.2 running on all servers.

Exported all 50 from test server as EAD.
Imported all 50 onto another test server as initial test.
Had to redo when I found that I needed to add param to export unpublished notes.
But both imports completed without complaint.

Attempted to repeat the same import on production server.

Imports appeared to be running successfully for an uncounted number of files, but gave this error message at the end:

================================================== 54-Mss_77-1.xml ================================================== 1. STARTED: Reading JSON records 1. DONE: Reading JSON records 2. STARTED: Validating records and checking links 2. DONE: Validating records and checking links 3. STARTED: Evaluating record relationships 3. DONE: Evaluating record relationships 4. STARTED: Saving records: cycle 1 Created: /revision_statement/import_42935b5d-613c-42fa-a297-d87bd61a77f3 Created: /revision_statement/import_d0d44810-bbf2-448a-803e-59e4abe479b2 Created: /revision_statement/import_b900aae8-c51e-45d6-afd3-31d8cd45b35e Created: /revision_statement/import_42b9bd89-2058-462a-9425-8b317bacb633 5. STARTED: Cleaning up 5. DONE: Cleaning up !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! IMPORT ERROR !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! The following errors were found: id_0 : That ID is already in use ead_id : Must be unique %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Full Error Message: Problem creating 'Papers of Frederick D. G. Ribble': id_0 That ID is already in use, ead_id Must be unique !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

However, on clicking on the Browse Resources link for that repo, it appears that all of the resources are there.
Double checking on the status of that background import job shows Job Status as “Failed” .

I stopped server, deleted solr index and indexer_state files, and restarted just to be sure of consistent state.
No change. Job has failed but it appears I have all 50 imported resources.

The file in the error message: 54-Mss_77-1.xml, was the first file on the list of Import files in the Batch Job page.
It is not listed twice in that file list, but it appears both at the start and the end of the import log,
with that failure message at the end.

I believe that every other time I’ve had a batch job import failure on a single file, NONE of the files are imported.

— Steve Majewski

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lyralists.lyrasis.org/pipermail/archivesspace_users_group/attachments/20160205/2283e359/attachment.html>