SOAP WS-AT DurablePC Network outage after succesful PREPARE phase
Hi,
Given the following WS-AT 2PC txn scenario: -
1) All remote WS-AT Transaction Managers have responded PREPARED to the coordinating msdtc TM, and the outcome of the transaction is COMMIT
2) The network link to one of the participating TMs is broken before a COMMIT command could be sent.
Does the WS-Atomic Transactions protocol stipulate that it is the Coordinating TM's responsibility to eventually communicate the transactions outcome to the remote subordinate TM?
If so, what rules/algorithm does DTC adhere to when deciding how often to retry the connection to the remote TM? Can this be configured?
Or is it beholden on the subordinate to contact the Coordinator to discover the outcome?
Is heuristic transaction resolution part of the WS-AT protocol? (I think IBM DB2 is big on that?)
My database is hung! My loyal RMs are preventing access to the "in-doubt" rows. Surely we all agree ACID is the only true test of a true 2PC? Please help me plan for the possibility of manual/forced resolution (or otherwise) of my transactions. Please help me predict msdtc behaviour here.
What governs this situation and is it configurable?
Regards Richard Maher
Hi Richard,
The WS-AT spec is somewhat open in that regard in that there is no specific "participant MUST at all costs try to restablish connections to ..." verbage, but it weighs heavily in the minds of all vendors involved. So while it does not stipulate this, the implementation can certainly imply it. While it does become an implementation-detail at this point, all vendors we've interoped against have supported retries -- though I cannot speak to any of their continued support for such a feature except for MSDTC below.
In your scenario above either participant is allowed to (nothing in the spec says you're not allowed to) retry. For instance the subordinate TM can resend its PREPARED. Likewise, the superior TM can resend its COMMIT. MSDTC uses a backoff algorithm to determine when it should retry all of its messages and each message has its own set of highly configurable tunables. Things like the initial delay, subsequent backoff percentages, and max number of messages can all be configured for prepare, prepared, and commit notifications.
More specifically for the example above, if both ends use MSDTC and its default settings, the subordinate TM will resend first -- which will trigger the superior to also resend its notification of commit. If that communication fails for some reason, the superior TM will eventually resend its commit and hopefully all participants in the system will resolve themselves.
Heuristic resolution is not part of WS-AT as this spec only deals with purely atomic outcome. The WS-BA spec defines partial outcome but is not part of WCF v1.
Hi Jesse,Thanks very much for the reply.
> The WS-AT spec is somewhat open in that regard in that there
> is nospecific "participant MUST at all costs try to restablish
> connections to ..." verbage,but it weighs heavily in the minds
> of all vendors involved. I have to admit to being curious as to why the WS-AT protocol did not seek to specifically address this issue; especially given its importance to any 2PC protocol and, in particular, when contrasted to TIP's QUERY/RECONNECT mechanisms. Still, that's neither here nor there. One's a Connection-Oriented context-rich protocol and the other's an extremely flexible Service (SOAP) Oriented protocol. One's an Apple and the other's an Orange, and there the simularity ends :-)
Can I at least assume that regardless of retry strategy, the coordinating TM in a WS-AT txn "MUST NOT" forget a txn until all participants have formally acknowledged the COMMIT command?
> MSDTC uses a backoff algorithm to determine when it should retry
> all of its messages and each message has its own set of highly
> configurable tunables.
> Things like the initial delay, subsequent backoff percentages, and max
> number of messages can all be configured for prepare, prepared, and
> commit notifications.This (when combined with the WS-RM spec) is exactly what I've been looking for! How will a System/Network Manager (or programmer for that matter) be able configure the values for such things as Inactivity Timeout, Retransmission Time, Exponential Backoff? Does the granularity go from Server level down to even individual Transaction level? Please, don't be afraid to say "Just RTFM!" :-) I've only just skimmed through it this arvo and it's probably all in there.
> More specifically for the example above, if both ends use MSDTC
> and its default settings, the subordinate TM will resend first -- which
> will trigger the superior to also resend its notification of commit.
> If that communication fails for some reason, the superior TM
> will eventually resend its commit and hopefully all participants
> in the system will resolve themselves.
Let me come clean and tell you why I’m really asking. It is to do with a problem I’m having with TIP rather than WS-AT and I am here under false pretences :-) For background information, please see: -
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=612311&SiteID=1
It’s just that you talk about msdtc-wide, or global, settings as if they could be brought into play for protocols other than just WS-AT and in particular TIP. Is this in fact the case? Or is TIP subject to its own retry regime?
> Heuristic resolution is not part of WS-AT as this spec only deals with purely atomic outcome.
Glad to hear it!
Regards Richard Maher
> Can I at least assume that regardless of retry strategy, the coordinating TM in a WS-AT txn "MUST NOT" forget a
> txn until all participants have formally acknowledged the COMMIT command?
Yes, of course. That is true for every correct embodiment of the 2PC protocol.
> How will a System/Network Manager (or programmer for that matter) be able configure the values for such things
> as Inactivity Timeout, Retransmission Time, Exponential Backoff?
To be honest with you, our expectation is that the defaults will work for most scenarios of interest. It would be very unusual for an administrator to need to change these settings. I would expect that this would only occur in cases where high trust and tight coupling are maintained (as required for distributed transactions) but the topology is unusual - e.g. a very low bandwidth or highly lossy network.
To address your concrete question, there are registry values that can be used to control these settings for WS-AT. To the best of my knowledge, these settings have not been publicly documented and are not directly accessible through the wsatconfig.exe tool or the WS-AT configuration UI in the WCF SDK.
> Does the granularity go from Server level down to even individual Transaction level?
The settings Jesse mentions are global to the WS-AT protocol service and cannot be made specific to individual transactions.
> Or is TIP subject to its own retry regime?
Yes, I'm afraid it is. That said, I need to write the standard disclaimer here: TIP is a deprecated protocol that should not be used by anyone. Is there a particular reason for why you are interested in solutions involving TIP?