Discussion:
Deadlock and "back off"
(too old to reply)
John Flynn
2008-09-29 21:32:31 UTC
Permalink
Hi.
In reading up about ASE deadlocks I have seen advice on some websites that
says, if you choose to have the 1205 victim recover by resubmitting its
transaction, you should first "back off" some period of time. I don't
understand this advice. I understand how you theoretically could deadlock
again when you retry, but it seems to me that's just as likely whether you
retry immediately or wait a little while (or a year). For best throughput
you should immediately retry the transaction without delay. Maybe the "other
guy" is still in his transaction and still has one of your tables locked,
but in that case you will experience an ordinary block (that will clear
according to ASE's normal lock handling) but you won't DEADlock with him
again. [For the same basic reason, there's no benefit in counting up how
many times you've retried a transaction due to 1205, and giving up after
some number of retries.]

Is my logic right? I couldn't find any info about this in any Sybase doc
(other than the general advice to "resubmit the transaction").

Thanks.
- John.
Mark A. Parsons
2008-09-29 22:15:53 UTC
Permalink
It really depends on your situation and your knowledge of why the deadlock is occurring.

I've seen some sites get into repeatable deadlocks between the same 2 processes ... loser keeps resubmitting it's
transaction as fast as it can, just to get deadlocked again. In one case I counted up 50 deadlocks within a 3 second
interval.

At other sites there are no problems with the immediate resubmission of a transaction (by the loser).

So, do you put a small delay into the deadlock retry logic of your applications? *shrug* yes, no, maybe, how long can
the application/user wait for a retry? ymmv ...
Post by John Flynn
Hi.
In reading up about ASE deadlocks I have seen advice on some websites that
says, if you choose to have the 1205 victim recover by resubmitting its
transaction, you should first "back off" some period of time. I don't
understand this advice. I understand how you theoretically could deadlock
again when you retry, but it seems to me that's just as likely whether you
retry immediately or wait a little while (or a year). For best throughput
you should immediately retry the transaction without delay. Maybe the "other
guy" is still in his transaction and still has one of your tables locked,
but in that case you will experience an ordinary block (that will clear
according to ASE's normal lock handling) but you won't DEADlock with him
again. [For the same basic reason, there's no benefit in counting up how
many times you've retried a transaction due to 1205, and giving up after
some number of retries.]
Is my logic right? I couldn't find any info about this in any Sybase doc
(other than the general advice to "resubmit the transaction").
Thanks.
- John.
John Flynn
2008-09-30 20:11:51 UTC
Permalink
Post by Mark A. Parsons
I've seen some sites get into repeatable deadlocks between the same 2
processes ... loser keeps resubmitting it's transaction as fast as it
can, just to get deadlocked again. In one case I counted up 50
deadlocks within a 3 second interval.
That would have to be a case where the winner is in some sort of loop,
right? Submitting the same type of transaction over and over. In that
scenario, I agree that raw probability would make it likely that another
process could repeatedly deadlock against him, losing each time. But each
time it would be in the context of a NEW winner's transaction, right? I
guess what I'm asking is, if the winner starts one transaction, does some
stuff, then commits the one transaction, is it possible for another process
to deadlock and lose against that, then resubmit, and deadlock again?

The scenario I'm think of is:

Process 1 starts a transaction.
Process 2 starts a transaction.
Process 1 locks resource A.
Process 2 locks resource B.
Process 1 tries to lock resource B but blocks.
Process 2 tries to lock resource A but blocks.
ASE kills process 2's transaction.
Process 1 immediately locks resource B.
Process 2 restarts his transaction.
Process 2 again tries to lock resource B but blocks.

This is not a deadlock. Process 1 will be allowed to complete his
transaction, and process 2 will wait until process 1 is done, no matter how
long it takes. There is nothing gained by process 2 waiting before
restarting his transaction. How can the two processes possibly deadlock
AGAIN? Unless they do it within the context of a NEW unrelated transaction.
Bret Halford
2008-09-30 20:41:02 UTC
Permalink
Post by John Flynn
Post by Mark A. Parsons
I've seen some sites get into repeatable deadlocks between the same 2
processes ... loser keeps resubmitting it's transaction as fast as it
can, just to get deadlocked again. In one case I counted up 50
deadlocks within a 3 second interval.
That would have to be a case where the winner is in some sort of loop,
right? Submitting the same type of transaction over and over. In that
scenario, I agree that raw probability would make it likely that another
process could repeatedly deadlock against him, losing each time. But each
time it would be in the context of a NEW winner's transaction, right? I
guess what I'm asking is, if the winner starts one transaction, does some
stuff, then commits the one transaction, is it possible for another process
to deadlock and lose against that, then resubmit, and deadlock again?
Process 1 starts a transaction.
Process 2 starts a transaction.
Process 1 locks resource A.
Process 2 locks resource B.
Process 1 tries to lock resource B but blocks.
Process 2 tries to lock resource A but blocks.
ASE kills process 2's transaction.
Process 1 immediately locks resource B.
Process 2 restarts his transaction.
Process 2 again tries to lock resource B but blocks.
This is not a deadlock. Process 1 will be allowed to complete his
transaction, and process 2 will wait until process 1 is done, no matter how
long it takes. There is nothing gained by process 2 waiting before
restarting his transaction. How can the two processes possibly deadlock
AGAIN? Unless they do it within the context of a NEW unrelated transaction.
The deadlock wouldn't happen on the same resource, but
it can certainly happen again on a different resource.
A "resource" can be a page or a row, so there can be many
opportunities.

Imagine a big table whose pages are arranged in the unusually regular
order of 1,2,3,4,....97,98,99.

Process A is doing updates and scanning the table forward,
process b is doing a select with an order by that is being
handled by doing a backward scan of the table.

For the first deadlock, B has started at 99 and scanned all the
way to page 55 and will be wanting a lock on page 54. A has
just started, has an exclusive lock on page 54 and will want one
on page 55.

deadlock, B has done a little less activity and is the victim.

A gets the lock on page 55. B restarts at 99 and scans
backward to page 70 while A is scanning forward to 69...
John Flynn
2008-09-30 21:02:30 UTC
Permalink
Post by Bret Halford
Imagine a big table whose pages are arranged in the unusually regular
order of 1,2,3,4,....97,98,99.
Process A is doing updates and scanning the table forward,
process b is doing a select with an order by that is being
handled by doing a backward scan of the table.
For the first deadlock, B has started at 99 and scanned all the
way to page 55 and will be wanting a lock on page 54. A has
just started, has an exclusive lock on page 54 and will want one
on page 55.
deadlock, B has done a little less activity and is the victim.
A gets the lock on page 55. B restarts at 99 and scans
backward to page 70 while A is scanning forward to 69...
Ah, that's just the kind of counter example I was looking for, thanks.

So, I wonder, in your scenario would there be a benefit in process B
"backing off" prior to restarting? I guess it would be "giving process A a
chance to finish"? But how can it know how much to back off? It has no idea
whether A just started or was almost finished. It seems like it would have
to be a totally arbitrary guess. And if it backs off too much, then process
B suffers an unecessary and arbitrary delay in its processing time. And what
would be the inherent flaw in NOT backing off, even granting that an
additional deadlock might occur?
Bret Halford
2008-09-30 21:47:22 UTC
Permalink
Post by John Flynn
Post by Bret Halford
Imagine a big table whose pages are arranged in the unusually regular
order of 1,2,3,4,....97,98,99.
Process A is doing updates and scanning the table forward,
process b is doing a select with an order by that is being
handled by doing a backward scan of the table.
For the first deadlock, B has started at 99 and scanned all the
way to page 55 and will be wanting a lock on page 54. A has
just started, has an exclusive lock on page 54 and will want one
on page 55.
deadlock, B has done a little less activity and is the victim.
A gets the lock on page 55. B restarts at 99 and scans
backward to page 70 while A is scanning forward to 69...
Ah, that's just the kind of counter example I was looking for, thanks.
So, I wonder, in your scenario would there be a benefit in process B
"backing off" prior to restarting? I guess it would be "giving process A a
chance to finish"? But how can it know how much to back off? It has no idea
whether A just started or was almost finished. It seems like it would have
to be a totally arbitrary guess. And if it backs off too much, then process
B suffers an unecessary and arbitrary delay in its processing time. And what
would be the inherent flaw in NOT backing off, even granting that an
additional deadlock might occur?
Well, the additional deadlocks would cause additional logging.
And there is a chance that the "cpu accounting flush interval"
could kick in and flush the accumulated cpu stats of the longer
running process, causing it to be the victim next time...

But I agree in general, the circumstances would have to be pretty
specific to justify the suggestion.
Carl Kayser
2008-09-30 10:12:48 UTC
Permalink
Post by John Flynn
Hi.
In reading up about ASE deadlocks I have seen advice on some websites that
says, if you choose to have the 1205 victim recover by resubmitting its
transaction, you should first "back off" some period of time. I don't
understand this advice. I understand how you theoretically could deadlock
again when you retry, but it seems to me that's just as likely whether you
retry immediately or wait a little while (or a year). For best throughput
you should immediately retry the transaction without delay. Maybe the
"other guy" is still in his transaction and still has one of your tables
locked, but in that case you will experience an ordinary block (that will
clear according to ASE's normal lock handling) but you won't DEADlock with
him again. [For the same basic reason, there's no benefit in counting up
how many times you've retried a transaction due to 1205, and giving up
after some number of retries.]
Is my logic right? I couldn't find any info about this in any Sybase doc
(other than the general advice to "resubmit the transaction").
Thanks.
- John.
I see your view (as stochastic processes) and agree with one exception. The
winner in a deadlock situation is the one that has consumed the most CPU
time. (But how is this measured? Total session time? Total time for
current transaction? Something else?) If the average CPU time for the
winning processes are "short" then waiting is probably not beneficial. Past
histories don't determine future (e.g., immediately following) histories.
If "long" then waiting is more likely to be beneficial (because the winning
process is likely to still be running and will win any new deadlock
situations).
Bret Halford
2008-09-30 15:35:45 UTC
Permalink
Post by John Flynn
Hi.
In reading up about ASE deadlocks I have seen advice on some websites that
says, if you choose to have the 1205 victim recover by resubmitting its
transaction, you should first "back off" some period of time. I don't
understand this advice. I understand how you theoretically could deadlock
again when you retry, but it seems to me that's just as likely whether you
retry immediately or wait a little while (or a year). For best throughput
you should immediately retry the transaction without delay. Maybe the "other
guy" is still in his transaction and still has one of your tables locked,
but in that case you will experience an ordinary block (that will clear
according to ASE's normal lock handling) but you won't DEADlock with him
again. [For the same basic reason, there's no benefit in counting up how
many times you've retried a transaction due to 1205, and giving up after
some number of retries.]
Is my logic right? I couldn't find any info about this in any Sybase doc
(other than the general advice to "resubmit the transaction").
Thanks.
- John.
The advice could make sense in specific cases where the developers know
the application is prone to deadlocks due to processes that are
known to try to take locks in opposite orders (but presumably
they just can't do anything to fix that), and that one side
is rarely run but takes some time to run.

-bret
John Flynn
2008-09-30 20:11:50 UTC
Permalink
Post by Bret Halford
The advice could make sense in specific cases where the developers
know the application is prone to deadlocks due to processes that are
known to try to take locks in opposite orders (but presumably
they just can't do anything to fix that), and that one side
is rarely run but takes some time to run.
But only if the long-running side breaks it up into many transactions,
right?

I mean, if it's just one long transaction, then when the deadlock loser
comes back and resubmits his quick transaction, he will tend to get BLOCKED
again by the winner, but it won't tend to be a DEADLOCK. The original winner
will be allowed to complete his transaction, and then the original loser
gets his turn and completes his transaction, no problem. Both transactions
complete. That's ordinary concurrency. There's no functional difference
between that and if the losing process had chosen to "back off" and wait 60
seconds before resubmitting his transaction.

But if the winner was actually in a loop submitting many separate
transactions, each one of which was prone to cause deadlock individually,
then I see the point.
Derek Asirvadem
2008-10-01 14:42:17 UTC
Permalink
Good discussion.

You have a good handle on the idea that the loser wil retry and lose
again. I agree, do not wait or "back off".

But Carl is right, the winner will keep winning.

The common thing about deadlocks, is that the same deadlocks happen in
the same scenarios. Eg. app XYZ always gets a few deadlocks at
10:05am, most often Alice has started her batch process which takes 20
minutes, and Bob gets a deadlock on his sales report. It is not the
person, it is the two pieces of adly witten app code, that predictably
deadlocks. The point is the deadlock is written in the app code, and
that is what needs to be examined and changed.

In one site two years ago, when they moved from 11.5 to 12.5.4 [don't
ask, it was a third party app and the third party went bankrupt years
ago], the deadlocks which were a Big Problem because they crashed the
app server, but which were few and manageable (3 or 4 per week) went
right through the roof (30 per day). What they did not understand is
that since ASE was running MUCH faster, but the physical structures
were unchanged and slow, blocks and deadlocks (the result of faster
processing) showed up much faster. They were brought back down to a
manageable level by increasing deadlock_checking_period. But the real
solution was to rebuild the server, reload all tables and all indices:
this removed fragmentation and increased speed of the physical
strucures. They have had virtually no deadlocks since.

The other unfortunately common scenario is many sites have parallelism
turned off. They have found that when they turn it on, the app
deadlocks it self silly. Not only updating resources in non-standard
sequences, but long transactions, long sequences, and long sequences in
the same table. This is fairly serious because they cannot use the
basic performance features and functions of Sybase. If you can take
anything from my post, it is to go back to the code of the two
transactions and recode then to avoid the blocking locks (which will
escalate to a deadlock) by:
• updating resources/tables in the same (overall, across the db) sequenc
• updating rows in teh same table in a likewise prescribed sequence.
--
Cheers
Derek
Senior Sybase DBA / Information Architect
Copyright © 2008 Software Gems Pty Ltd
Quality Standards = Zero Maintenance + Zero Surprises
Performance Standards = Predictability + Scaleability
Continue reading on narkive:
Loading...