Wednesday, December 06, 2006

testing issues with threaded ActiveRecord commits

I need to create a more generalized test in order to better exhibit the problems I experienced when spinning up a bunch of threads that all accessed a synchronized block of code that invoked ActiveRecord::Base#save & ActiveRecord::Base#destroy with use_transactional_fixtures = true. Below is the investigation and conversation with Jamis:


(12:27:33) russ: Jamis, can I plead with you yet again for more assistance? Name your price.
(12:31:34) jamisbuck: haha :) what's up?
(12:32:06) russ: I'm having some deadlocking issues with some thread-safe code I'm _attempting_ to write, and I've narrowed my issue to what I'm doing with ActiveRecord
(12:32:13) russ: here's the test:
(12:32:18) russ:
def test_provider_is_threadsafe
Thread.abort_on_exception = true

@ids = []
threads = []

#obtain three threads, each getting 10 ids
begin
3.times do

thread = Thread.new do
10.times do
sleep(0.2)
id = IdServiceHelper.instance.next_id

if (@ids.include?(id))
fail("Duplicate ID!")
else
@ids << id
end

end
end
threads << thread

end
threads.each {|thread| thread.join}

rescue Exception => ex
fail("Exception raised! " + ex.message + ex.backtrace.join("\n"))
end
end
(12:32:39) russ: and the culprit code is this:
(12:32:45) russ:
def next_id

next_id = -1

synchronize do
#get the next seed
orig_seed = Seed.new
orig_seed.save
seed_val = orig_seed.id

#destroy the used seed
orig_seed.destroy

next_id = @@provider_class.id_for_seed(seed_val)
end

return next_id
end
(12:33:04) russ: however, when I run the tests I get deadlock errors
(12:33:10) russ: and I've spent several hours trying different variations
(12:33:19) jamisbuck: what does the 'synchronize' method look like?
(12:33:39) russ: mixing in MonitorMixin
(12:33:54) russ: if I remove the .save and .destroy on the Seed models then my deadlocks disappear
(12:33:58) russ: but of course that fails the test
(12:34:14) jamisbuck: deadlocks from where?
(12:34:15) jamisbuck: AR?
(12:34:33) russ: that's my issue, not sure exactly
(12:34:39) jamisbuck: can you show me the error?
(12:34:41) russ: Loaded suite test/unit/id/id_service_helper_test
Started
.deadlock 0xb7d73748: sleep:J(0xb73e2204) (main) - test/unit/id/id_service_helper_test.rb:57
deadlock 0xb73e1a0c: sleep:- - /usr/lib/ruby/1.8/thread.rb:100
deadlock 0xb73e209c: sleep:- - /usr/lib/ruby/1.8/monitor.rb:266
deadlock 0xb73e2204: sleep:- - /usr/lib/ruby/1.8/monitor.rb:266

(12:34:46) jamisbuck: ok, not AR
(12:35:03) jamisbuck: what's your OS?
(12:35:05) jamisbuck: and ruby version?
(12:35:06) russ: Gentoo
(12:35:13) russ: 1.8.4
(12:35:47) russ: so even if I remove the synchronization block, which I have done, I get the same errors
(12:35:47) jamisbuck: where is MonitorMixin defined?
(12:35:55) jamisbuck: is that in the stdlib?
(12:35:57) russ: think so
(12:36:12) jamisbuck: ok, I see it
(12:36:21) jamisbuck: to be honest, I've never used monitor mixin
(12:36:24) jamisbuck: I always just do
(12:36:26) jamisbuck: require 'thread'
(12:36:29) jamisbuck: mutex = Mutex.new
(12:36:34) jamisbuck: mutex.synchronize { ... }
(12:36:37) russ: done that too
(12:36:38) russ: same results
(12:36:39) russ: :(
(12:36:54) russ: perhaps I should approach this differently
(12:37:03) jamisbuck: it's almost like Seed#save or Seed#destroy are trying to do something thread-wise
(12:37:11) russ: right, my thoughts also
(12:37:17) russ: I already have ActiveRecord::Base.allow_concurrency = true
(12:37:22) russ: which I thought would solve my problem
(12:37:29) russ: (since there is an underlying "freeze")
(12:37:33) jamisbuck: well
(12:37:37) jamisbuck: that only allows multiple connections
(12:37:42) jamisbuck: one per thread
(12:37:44) russ: gotcha
(12:37:46) jamisbuck: AR is not thread-safe
(12:37:48) jamisbuck: by itself
(12:37:58) jamisbuck: but still
(12:38:03) jamisbuck: save/destroy shouldn't be problematic
(12:38:17) russ: k, so if I approach this differently, the problem being that I want to fetch a new Seed, get the id and then destroy it
(12:38:36) russ: and lock anything else from getting the same value
(12:38:38) russ: any other ideas?
(12:38:42) jamisbuck: is your ruby version built with phtread support?
(12:38:50) russ: good question, I'll check
(12:39:23) russ: there is a USE flag for -threads
(12:39:31) russ: not sure if I have it enabled or not
(12:39:38) jamisbuck: k, well, you don't want it, regardless
(12:39:42) russ: k
(12:39:49) jamisbuck: try looking at what the ruby binary is linked against
(12:39:51) jamisbuck: ldd
(12:40:10) russ: ldd /usr/bin/ruby
linux-gate.so.1 => (0xffffe000)
libruby18.so.1.8 => /usr/lib/libruby18.so.1.8 (0xb7e3b000)
libdl.so.2 => /lib/libdl.so.2 (0xb7e21000)
libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7df3000)
libm.so.6 => /lib/tls/libm.so.6 (0xb7dd0000)
libc.so.6 => /lib/tls/libc.so.6 (0xb7cb6000)
/lib/ld-linux.so.2 (0xb7f05000)
(12:40:17) jamisbuck: k, looks good
(12:40:35) jamisbuck: I really have no idea on this
(12:40:49) russ: then maybe the issue is that freeze I saw
(12:40:51) jamisbuck: this is outside my area of expertise
(12:41:11) russ: well you've broadened my horizons
(12:41:15) jamisbuck: :)
(12:41:18) russ: I'll keep digging in
(12:41:22) jamisbuck: k, good luck
(12:41:31) russ: you let me know if you have an Amazon list
(12:41:39) jamisbuck: :) I don't, but thanks anyway
(12:41:43) russ: your time is valuable and I appreciate the info
(12:41:53) jamisbuck: fwiw, ruby threads are awful..if at all possible, you should consider using fork instead
(12:41:59) jamisbuck: not always possible, but definitely preferable
(12:42:04) russ: that's good to know
(12:42:06) russ: I'll explore that also
(12:42:12) russ: have a good one
(12:42:15) jamisbuck: you too


I changed my code to try a more sql-native approach with the following:

#version 2

connection = ActiveRecord::Base.connection

orig_seed_val = Seed.find(:first).id
connection.begin_db_transaction

seed_val = orig_seed_val+1
connection.delete("delete from seeds where id=#{orig_seed_val};");
connection.commit_db_transaction
Seed.connection.execute("insert into seeds values(#{seed_val});");

next_id = @@provider_class.id_for_seed(seed_val)

Using this method I was then getting a native MySQL error regarding a lock timeout. This was curious and lead to the following:

(14:05:01) russ: wanna hear the gruesome details?
(14:05:20) jamisbuck: let's hear it
(14:05:38) russ: the locking was at the MySQL level
(14:05:48) russ: I tried to bypass AR
(14:05:53) russ: and execute straight sql
(14:06:03) russ: only to get a Lock timeout from mysql
(14:06:12) russ: investigated by "show innodb status"
(14:06:20) russ: and it gave me the locks/threads
(14:06:25) russ: followed that to my log
(14:06:30) russ: and whattayaknow
(14:06:40) jamisbuck: really! the exception you posted didn't look like one from the database, so I discarded that hypothesis
(14:06:48) russ: well, here's the culprit: self.use_transactional_fixtures = true
(14:07:24) jamisbuck: ah, ok
(14:07:26) jamisbuck: very wierd
(14:07:27) russ: the top level transaction had the lock!
(14:07:28) jamisbuck: what mysql version?
(14:07:57) russ: Gentoo Linux mysql-4.1.20
(14:08:05) jamisbuck: wierd! good catch
(14:08:13) russ: anyhow, changing back and then using AR
(14:08:15) russ: everything is good again
(14:08:21) jamisbuck: awesome, glad to hear it
(14:08:27) russ: so I lose, for that specific test, the benefits of transactional fixtures
(14:08:30) russ: oh well it's a toss up
(14:08:34) jamisbuck: yah
(14:08:43) jamisbuck: if your test uses transactions, you can't use transactional fixtures
(14:08:46) russ: so I suppose the theory learned is:
(14:09:03) russ: you can't write your own thread-safe code using AR in a test with self.use_transactional_fixtures=true
(14:09:19) jamisbuck: well, not even thread-safe
(14:09:21) jamisbuck: transactional
(14:09:23) russ: right
(14:09:27) russ: anyhow, back to work
(14:09:30) jamisbuck: cool beans
(14:09:32) jamisbuck: good find

No comments: