Planning transactions with N-tier, distributed transactions, multiple databases

Hi Forum

I would like to know if there are some best practices for planning/documenting transactions in order to avoid deadlocks.

We unfortunately have the bad habit of going into production before discovering deadlocks.

The problem is that we have an N-tier architecture where developers can choose to start transactions that spans not just multiple tables but also multiple databases. So it is distributed transactions.

It gets quite complex when many people implement transactions, not necessarily calling the resources in the same order.

What should I as an architect do?

- Should I put together a document and have the developers document all transactions, including isolation level and in what order they are calling resources? Does anyone know of a standard template for this?

- Should we just try to find problems with load tests and then tweak isolation level and re-order calls as we detect deadlocks during testing.

I'm hoping to find a simple and effective solution that will not impact our day-to-day work too much.

What are you guys during? any articles?

I hope someone can help me get my head around this.

[1858 byte] By [Siteadm] at [2007-12-27]
# 1

I would start with the basics: make sure the dev team knows what different isolation levels really mean and how things like autocommit settings on ODBC connections work. I still like Tim Ewald's "Transactional COM+" book even though we use .NET. We actually built a small test app that created transactions under different isolation levels and intentionally deadlocked. That way, programmers who want to refresh their skills or test out scenarios never had to start from scratch. I know it seems trivial, but remember to include some fringe situations like nesting a transaction with a different isolation level within an existing transaction (XP used to act differently than Server 2003 when this happened, BTW). Also, play with how timeouts work when a process waits for data to become available. The DTC is pretty unforgiving (with a 30-second default) and when it has a problem, and the error messages are pretty confusing. Finally, make sure your team understands exception handling and transactions. We had problems when a programmer would call a transactional routine in a loop contained in a try/catch block. When an error occured, the outer transaction was doomed, but the code kept on going.

Pick one or two standard transaction patterns that all of your components are built around. One well-known phenomenon is that programmers will copy something they think is similar and works. Make sure they know which code is the best reference going forward. But I think it's important that individual routines remain "transaction agnostic" meaning they attach to any existing transaction. Then make sure you have well-defined points that instigate the transaction (like the outer-most edge of the app server). Obviously, make sure that the transaction instigator is also the transaction finalizer and that commits aren't explicitly separated. We never allowed callers (the client tier) to control the transaction boundary on the server. Also, in .NET 2.0 the transaction scope features in the framework are really useful because programmers can control locks more intuitively. I would code around those constructs.

To determine our patterns, we basically copied the COM+ approach except we allowed individual operations to set their own isolation level requirements. Most of our code worked fine with READ_COMMITTED isolation and we always used any existing transaction if it was present. But some operations that accrue figures -- like inventory or GL balances -- required REPEATABLE_READ isolation because we had to lock the record from the moment we get it until we commit the change. But in those situations we had to watch out that all of the *other* data we used to validate/process wouldn't get locked up. We also had to pay special attention to tables that potentially lots of processes would hit at once. For example, certain financial transactions require an "audit ID" that MUST be gapless and sequential. We had to make sure that incrementing those values required a single SQL statement and we separated that update transactionally.

Finally, see if the new snapshot isolation features (READ_COMMITTED_SNAPSHOT) in SQL 2005 (I know -- Oracle's had row versioning for a long time). It might be relatively safe to enable and might alleviate some of your current issues.

erikj at 2007-9-5 > top of Msdn Tech,Architecture,Architecture General...
# 2

you should try to refactor your solution and avoid distributed transactions as much as possible

Dan Pritchett blogged about it recently : http://www.addsimplicity.com/adding_simplicity_an_engi/2006/12/2pc_or_not_2pc_.html

Arnon

ArnonRotemGalOz at 2007-9-5 > top of Msdn Tech,Architecture,Architecture General...
# 3

Hi Siteadm,

I've found these for you, enlisted in order of relevance:

Hope that, complemented with my colleagues gave you, helps

DiegoDagum at 2007-9-5 > top of Msdn Tech,Architecture,Architecture General...