TIP Txn resolution (*ONLY* failed to notify)
Hi,
It appears that DTC is not fulfilling its end of the bargain when it acts as coordinator in a distributed TIP transaction.
I deliberately killed my Transaction Manager after it had successfully returned a PREPARED response to the DTC. I then pulled the plug (literally) on the network connection. After bringing my TM back up, it began waiting t3$tip_unresolved_search_detent seconds between attempts to obtain the outcome of the txn from DTC.
I then restored the network connection, and to my astonishment, DTC just sat there content that my loyal Resource Managers (Rdb) were enforcing the ACID properties of a true 2PC and locking all affected rows. After some many minutes of inaction, a net stop/start msdtc was able to force a commit message down to my TM, but I don't think one can rely on this sort of procedure in production.
Have I got some configuration parameter wrong? I certainly don't recall this happening in earlier versions. I'm on Windows 2000 5.00.2195 service pack 4.
(Curious thing was, the stop/start flushed the COMMIT message but still left the transaction in the "Transaction List" with that sublimely pragmatic message (Only failed to notify) a subsequent txn appeared to remove it.)
Any ideas? How can I force MTS/DTC to re-try the COMMIT message in a timely fashion?
Regards Richard Maher
Based on the info you provide, this sounds like expected behavior.
If your TM is the only subordinate to MSDTC, then MSDTC would have performed a single-phase commit, sending your TM a COMMIT message and remaining in the Enlisted state.It then becomes the responsibility of the subordinate to commit or abort the transaction and report the outcome (COMMITTED or ABORTED) to its superior.If your TM responded PREPARED to the COMMIT message, then MSDTC would log a message that it received a bad message, but would remain in the Enlisted state.
When the connection fromMSDTC is lost to the subordinate TM, MSDTC would drop its connection and the transaction would be in-doubt.According to the TIP protocol (http://www.ietf.org/rfc/rfc2371.txt), when COMMIT is issued to the superior’s only subordinate while in the Enlisted state, “the sender will not be involved in any transaction recovery process”.MSDTC will thus wait for the subordinate TM to issue a QUERY call to initiate recovery.From your post, it appears that your TM is not performing recovery.
When you restarted MSDTC, MSDTC enters recovery and will attempt to determine the outcome of all in-doubt transactions.MSDTC will establish a connection with your TM (IDENTIFY/IDENTIFIED) and then wil send a RECONNECT call to your TM.Your TM must have responded with RECONNECTED, and MSDTC would then move the transaction to the Prepared state.It would then issue a COMMIT call to the TM, and the transaction would remain in the “Failed to Notify” state until your TM responds with a COMMITTED message.Evidentially, your TM never sends the COMMITTED call.
-Richard
Hi Richard,
(Thanks for the reply! I was beginning to get lonely :-)
If we can leave what you imagine my TM to be doing to one side for one moment (I will revisit that next) could someone please answer a straight forward, direct question: -
"When msdtc encounters an network error when sending a COMMIT command to a remote TM, what parameters, asynchronous events and environmental variables (if any) control how long it takes for msdtc to retry sending the command?"
This is *not* and never has been a one-phase commit. This also doesn't have to be the TIP ptotocol and could equally apply to WS-AT DurablePC transactions. msdtc sends a COMMIT, it gets a network time-out, when will it retry to transmit the COMMIT command?
I also have anecdotal (I'm sure I saw it :-) evidence that prior to Service Pack 4 the retry was almost instantanious once the link was back up.
Getting back to your reply, there are two TMs involved; my hotTIP TM and msdtc's coordinating TM. If the insert into my Rdb database violates a commit-time evaluated constraint (you don't have those do you ;-) then the insert into the Northwind.employees table must not be commited even though it successfully prepared. Needless to say a one-phase commit to my TM would hardly be the most prudent course of action and, anyway, is simply not what MTS/DTC is doing.
Firstly, hotTIP will *never* be involved in a one-phase-commit as it explicitly rejects them. Secondly, DTC *has* sent me a PREPARE command and I have voted PREPARED and waited till it was sent before breaking the link. Thirdly, I have sent numerous QUERY commands, all of which msdtc has replied to with QUERIEDFOUND (or was it QUERIEDEXISTS? I cater for both Windows options :-), Finally, I wait and wait and. . . zzzzz
Once it got a QUERY command from a previously known dodgy IP address, is the latest I'd personally expect msdtc to spring into action. What's going on?
Regards Richard Maher
> Evidentially, your TM never sends the COMMITTED call.
Depends of what the court is willing to accept as evidence; I think we're a long way from DNA here :-) I send the COMITTED command; evidently msdtc doesn't have the transaction log maintenace code as part of its startup procedures along with the retry-committed code?
Please explain why, if I hadn't had sent a COMMITTED comand, msdtc would remove its transaction log record when it receives the next (totally different) transaction? Is there anyone who can have a quick look at the code and see where log-file cleanup takes place?
If MSDTC sent PREPARE and your TM replied PREPARED and then dropped the connection, MSDTC will immediately try to reconnect and retry every second.
In MS05-051 (http://www.microsoft.com/technet/security/Bulletin/MS05-051.mspx), several changes were made to the behavior of TIP.Some of which is to disable TIP by default on W2K, and to verify the TIP TM URL and port from the partner TM, among others.I assumed based on the fact that you’re performing TIP on W2K that either MS05-051 is not installed on your machines or that you are aware of it and have re-enabled TIP.You can determine if MS05-051 is installed by checking Add/Remove Programs for KB902400. The FAQ will also explain the mitigations and how to disble them, if necessary.
I cannot explain the behavior you experienced based on the information you provide.I will suggest that you turn on tracing on MSDTC (http://support.microsoft.com/?kbid=899115) and use a tool such as netmon to capture any TCP/IP traffic and that may help you determine what is working and not working. Checking the event log may also provide some insight.
I hope this helps.
-Richard
Hi Richard,
Thanks for your reply. I did have DTC Trace on but it wasn't telling me much. I've attached the relevant bits below in case you have better luck. I've also attached the TraceCMErr output and my Registry settings. (I have some Registry entries that are only relevant to Windows2003 and XP but I was testing the changes with MS05-051 (that have been installed) on this test box and was pulling a few levers at the time.)
I think it's important to point out at this stage that If there is no induced network failure and I simply crash my TM after it has sent the PREPARED back to msdtc then, as soon as I start my TM up again, msdtc finds hotTIP, I accept the connection, we do the IDENTIFYs the transaction is RECONNECTed and gets COMMITted in a timely fashion. That is to say, if the network is there but nobody is listening at port 3372 then Windows/msdtc appear happy to tolerate the failure, but if it's a network issue then there appears to be some maximum retries before giving up?
Something else worthy of note is the fact that when I look at my sleeping txn via Component Services->Transaction List->Properties it says there is (No Parent). This maybe just because the local windows TM has told SQL Server to commit and has moved on an Forgotten its side of the txn, but I include it for completeness.
You said> If MSDTC sent PREPARE and your TM replied PREPARED and then dropped the connection, MSDTC will immediately try to reconnect and retry every second.
"Fail->wait 1sec->try-again" Can I take that to the bank as the definitive msdtc strategy? Is the wait not configurable and is there no "give-up" maximum number of tries? I'll take your word for it, but there is clearly a down-side to that algorithm. I'm told that these msdtc Retry modules are Protocol specific but I can see no reason for the underlying algorithm to vary between diffent 2PC protocols such as TIP and WS-AT(DurablePC). Does WSATDPC :-) do it differently?
Thanks for your (and anyone else's) help.
Regards Richard Maher
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSDTC]
"MaxLogSize"=dword:00000200
"DisableTipTmIdVerification"=dword:00000000
"DisableTipTmIdPortVerification"=dword:00000000
"DisableTipBeginCheck"=dword:00000000
"DisableTipPassThruCheck"=dword:00000000
"TraceTxFlags"=dword:ffffffff
"TraceCMErr"=dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSDTC\MTxOCI]
"OracleXaLib"="xa73.dll"
"OracleSqlLib"="SQLLib18.dll"
"OracleOciLib"="ociw32.dll"
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSDTC\Security]
"NetworkDtcAccessTip"=dword:00000001
"NetworkDtcAccess"=dword:00000001
"NetworkDtcAccessTransactions"=dword:00000001
"NetworkDtcAccessAdmin"=dword:00000001
"NetworkDtcAccessClients"=dword:00000001
"NetworkDtcAccessInbound"=dword:00000001
"NetworkDtcAccessOutbound"=dword:00000001
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSDTC\Setup]
"MajorVersion"=dword:00030000
"MinorVersion"=dword:00000d5d
"Progman Folder"="Microsoft Transaction Server"
"Source Drive Type"="CD-ROM"
"Source Path"="D:\\"
"InstallState"=dword:00000000
"Install Path"="C:\\WINNT\\System32"
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSDTC\Setup\Silent]
MsDtcTxTrace.txt
: : :
pid=1152 ;tid=1296 ;time=08/11/2006-10:41:14.395 ;seq=51 ;eventid=TRANSACTION_BEGUN ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"transaction got begun, description : Gateway.Distributed.1"
pid=1152 ;tid=1296 ;time=08/11/2006-10:41:15.427 ;seq=52 ;eventid=RM_ENLISTED_IN_TRANSACTION ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"resource manager #1 enlisted as transaction enlistment #1001. RM guid = {D6B0741F-A78E-4072-8146-7044C82DCF57}"
pid=1152 ;tid=1240 ;time=08/11/2006-10:41:16.808 ;seq=53 ;eventid=RECEIVED_COMMIT_REQUEST_FROM_BEGINNER ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"received request to commit the transaction from beginner"
pid=1152 ;tid=1240 ;time=08/11/2006-10:41:16.808 ;seq=54 ;eventid=RM_ISSUED_PREPARE ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"prepare request issued to resource manager #1 for transaction enlistment #1001"
pid=1152 ;tid=1296 ;time=08/11/2006-10:41:16.808 ;seq=55 ;eventid=RM_VOTED_COMMIT ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"resource manager #1 voted commit for transaction enlistment #1001"
pid=1152 ;tid=348 ;time=08/11/2006-10:41:16.929 ;seq=56 ;eventid=TRANSACTION_COMMITTED ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"transaction has got committed"
pid=1152 ;tid=348 ;time=08/11/2006-10:41:16.929 ;seq=57 ;eventid=RM_ISSUED_COMMIT ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"commit request issued to resource manager #1 for transaction enlistment #1001"
pid=1152 ;tid=2564 ;time=08/11/2006-10:41:16.929 ;seq=58 ;eventid=RM_ACKNOWLEDGED_COMMIT ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"received acknowledgement of commit request from the resource manager #1 for transaction enlistment #1001"
pid=1152 ;tid=1296 ;time=08/11/2006-10:41:14.395 ;seq=59 ;eventid=TRANSACTION_BEGUN ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"transaction got begun, description : Gateway.Distributed.1"
pid=1152 ;tid=1296 ;time=08/11/2006-10:41:15.427 ;seq=60 ;eventid=RM_ENLISTED_IN_TRANSACTION ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"resource manager #1 enlisted as transaction enlistment #1001. RM guid = {D6B0741F-A78E-4072-8146-7044C82DCF57}"
pid=1152 ;tid=1240 ;time=08/11/2006-10:41:16.808 ;seq=61 ;eventid=RECEIVED_COMMIT_REQUEST_FROM_BEGINNER ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"received request to commit the transaction from beginner"
pid=1152 ;tid=1240 ;time=08/11/2006-10:41:16.808 ;seq=62 ;eventid=RM_ISSUED_PREPARE ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"prepare request issued to resource manager #1 for transaction enlistment #1001"
pid=1152 ;tid=1296 ;time=08/11/2006-10:41:16.808 ;seq=63 ;eventid=RM_VOTED_COMMIT ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"resource manager #1 voted commit for transaction enlistment #1001"
pid=1152 ;tid=348 ;time=08/11/2006-10:41:16.929 ;seq=64 ;eventid=TRANSACTION_COMMITTED ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"transaction has got committed"
pid=1152 ;tid=348 ;time=08/11/2006-10:41:16.929 ;seq=65 ;eventid=RM_ISSUED_COMMIT ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"commit request issued to resource manager #1 for transaction enlistment #1001"
pid=1152 ;tid=2564 ;time=08/11/2006-10:41:16.929 ;seq=66 ;eventid=RM_ACKNOWLEDGED_COMMIT ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"received acknowledgement of commit request from the resource manager #1 for transaction enlistment #1001"
MtsDtcCMErr.txt
08-11-2006 10:41 16:929: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, BuildContextWrapper call failed. This is usually due to network configuration issues., .\iomgrclt.cpp (336)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, BuildContextWrapper call failed. This is usually due to network configuration issues., .\iomgrclt.cpp (336)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, BuildContextWrapper call failed. This is usually due to network configuration issues., .\iomgrclt.cpp (336)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, BuildContextWrapper call failed. This is usually due to network configuration issues., .\iomgrclt.cpp (336)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, BuildContextWrapper call failed. This is usually due to network configuration issues., .\iomgrclt.cpp (336)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 16:939: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 16:939: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 31:980: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 31:980: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 31:980: CM Error Value = 0x80000171, BuildContextWrapper call failed. This is usually due to network configuration issues., .\iomgrclt.cpp (336)
08-11-2006 10:41 31:980: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 31:980: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 31:980: CM Error Value = 0x80000171, BuildContextWrapper call failed. This is usually due to network configuration issues., .\iomgrclt.cpp (336)
08-11-2006 10:41 31:980: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 31:980: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 31:980: CM Error Value = 0x80000171, BuildContextWrapper call failed. This is usually due to network configuration issues., .\iomgrclt.cpp (336)
08-11-2006 10:41 31:980: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 31:980: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 31:980: CM Error Value = 0x80000171, BuildContextWrapper call failed. This is usually due to network configuration issues., .\iomgrclt.cpp (336)
08-11-2006 10:41 31:980: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 31:980: CM Error Value = 0x000006d9, Exception raised in the rpc call C_BuildContext, .\iomgrclt.cpp (561)
08-11-2006 10:41 31:980: CM Error Value = 0x80000171, BuildContextWrapper call failed. This is usually due to network configuration issues., .\iomgrclt.cpp (336)
08-11-2006 10:41 32:020: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 32:020: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 32:020: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 32:020: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 32:020: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 32:020: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 32:020: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 32:020: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 32:020: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 32:020: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 32:020: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 32:020: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 32:020: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 32:020: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 32:020: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 32:020: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 32:020: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 32:020: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
08-11-2006 10:41 32:020: CM Error Value = 0x000006d9, Exception raised in the rpc call C_PokeW , .\iomgrclt.cpp (716)
08-11-2006 10:41 32:020: CM Error Value = 0x80000171, PokeWrapper call failed, .\iomgrclt.cpp (648)
Last Messages from DTC Trace
pid=2668 ;tid=2520 ;time=08/11/2006-10:53:20.850 ;seq=1 ;eventid=AT_RESTART_COMMIT_REDELIVERY_NEEDED_TO_RM ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"commit redelivery needed for transaction resource #0 at dtc restart. RM guid = {00000000-0000-0000-0000-000000000000}"
pid=2668 ;tid=2520 ;time=08/11/2006-10:53:20.850 ;seq=2 ;eventid=AT_RESTART_COMMITTED_TRANSACTION_FOUND ;tx_guid=C47E6FE4-8C5E-4018-9AD4-E1CF4010C214 ;"committed transaction found during dtc restart. Description = Gateway.Distributed.1"
Sorry to take so long to respond.You’re right that the trace logs aren't telling us much.
The trace log shows a subordinate RM is getting the prepare and commit requests and is responding committed. But the communication with your TIP TM is handled by the gateway, and this communication is not traced.The CM tracing is showing that you lost communication with some RPC endpoints when the network went down; that’s expected and unrelated.
Since you have not set DisableTipBeginCheck to non-zero, I am assuming you have an application that uses ITransactionDispenser::BeginTransaction() to begin the transaction.It also appears that you have two subordinates: 1) an RM connected to MSDTC that receives commit and responds commit done; and 2) your TIP TM.To repro this failure, my understanding is that you crash your TIP TM when the COMMIT arrives, and then you disconnect the network.When the network is restored and the TIP TM is brought back online, no RECONNECT commands are being sent from MSDTC.It would be interesting to note whether MSDTC is sending out the IDENTIFY commands to your TM’s previous URL before the network is brought down; but then after disconnecting and reconnecting the network the IDENTIFY commands have stopped.
If my understanding above is correct, then MSDTC is not following specification and this appears to be a bug in MSDTC.You will probably want to get in touch with Product Support to get this fixed.
WRT your question about WS-AtomicTransactions: Yes, they are handled very differently; their specifications are very different from TIP.Also, WS-AtomicTransactions are more secure than TIP transactions, and widespread use of the former has much more growth potential than the latter.
I hope this helps.
-Richard
Hi Richard,
> Sorry to take so long to respond.
No worries. I wasn't doing anywhere :-)
> Since you have not set DisableTipBeginCheck to non-zero,
Richard, What is it about one-phase-commits that is preoccupying you? hotTIP doesn't support, them msdtc/TIP doesn't support them, a BEGIN TIP command has for all intents and purposes has been ruled illegal. Therefore, the TipBeginCheck stays, but is honestly (please trust me on this) totally irrelevant to the discussion.
> I am assuming you have an application that uses ITransactionDispenser::BeginTransaction() to begin the transaction.
I'm more than happy to attach a small (370K PDF) file "Tier3 Development Manual" (CHPT 7 is about hotTIP) but people would probably accuse me of spamming so below are a couple more snippets from what I'm doing. (Well it's not actually me; the Windows code is down to a brilliant developer (and good friend of mine) Franco Cravero): -
> It also appears that you have two subordinates: 1) an RM connected to MSDTC that receives commit and responds commit done; and 2) your TIP TM.
Yes.
> To repro this failure, my understanding is that you crash your TIP TM when the COMMIT arrives, and then you disconnect the network.
Yes. (Technically, I crash it after sending the PREPARED. DTC get's it and tells SQL Server to commit, but could the code-path differ if no successful COMMIT command had ever been sent to my TM? I think not.)
> When the network is restored and the TIP TM is brought back online, no RECONNECT commands are being sent from MSDTC.
Yes.
> If my understanding above is correct, then MSDTC is not following specification and this appears to be a bug in MSDTC.
That's my conclusion :-(
> You will probably want to get in touch with Product Support to get this fixed.
Will report back if/when I have more.
I'll take up the WS-AT comments under a seperate post to keep this one OT.
Cheers Richard Maher
No Tier3 or hotTIP specific software needs to be installed on the client node, nor does hotTIP have any control over, or place any restrictions on, the client API to the Transaction Internet Protocol on that node. The documentation for your client operating system will discuss its own TIP API in detail, but as Wndows2000 MTS/DTC is currently the only supported transaction coordinator, a brief description of the actions that your client application will need to perform in order to be able to successfully push a transaction to hotTIP will be included here.
First of all, the CoGetObjectContext function can be used to check whether an object is being run within COM+, and hence whether transactions are handled declaratively or whether the DTC needs to be contacted directly. The following code shows how to perform this check:
#include <ComSvcs.h>
. . .
HRESULThr;
IObjectContextInfo*pObjectContextInfo;
hr = CoGetObjectContext (IID_IObjectContextInfo,(void **)&pObjectContextInfo);
if (SUCCEEDED(hr)) {
// object being run within COM+, get transaction if any
} else {
// object not within COM+, start a transaction
}
The HRESULT type is declared in winnt.h, and the SUCCEEDED() macro is declared in winerror.h, which also contains a description of most error codes. For the sake of brevity, checks for failure against the codes returned by any of the function calls in the following code are omitted.
To get a handle to the current transaction when run within COM+ the following code can be used:
IUnknown *pTransUnknown;
ITransaction *pTransaction
hr = pObjectContextInfo->GetTransaction (&pTransUnknown);
hr = pTransUnknown->QueryInterface (IID_ITransaction,(void **)&pTransaction);
On the other hand, to start a new distributed transaction explicitly the code is:
#include <XoleHlp.h>// define DTCGetTransactionManager (requires xolehlp.lib)
. . .
ITransactionDispenser*pTransactionDispenser;
ITransaction *pTransaction
hr = DtcGetTransactionManager(
NULL,// pszHost
NULL,// pszTmName
IID_ITransactionDispenser, // ID of the interface
0,// Reserved: must be null
0,// Reserved: must be null
0,// Reserved: must be null
(void **)&pTransactionDispenser// the interface
);
hr = g_pTransactionDispenser->BeginTransaction (
0,// Must be null
ISOLATIONLEVEL_ISOLATED,// Isolation level
ISOFLAG_RETAIN_DONTCARE,// Isolation flags
0,// Transaction options
(void **)&pTransaction);// The transaction object
Once a handle to the transaction has been obtained, a remote transaction manager can be contacted to enlist any remote services in the transaction as follows:
ITipTransaction*tpTipTransaction;
Char*remoteTxUrl;
hr = pTransaction->QueryInterface (IID_ITipTransaction,(void **)&pTipTransaction);
hr = tpTipTransaction->Push(“tip://remotehost.co.uk/”,&remoteTxUrl);
Local work and remote work can now be carried out within the scope of the transaction provided that all the data sources involved have a resource manager that is enlisted with each node’s transaction manager. In this case, the application would send the remoteTxUrl over to the remote application for enlisting with hotTIP and DECdtm
Note that when using ADO to perform work against a local database, only COM+ transactions can be joined as ADO automatically checks for the context and enlists the database with the transaction manager (the DTC in this case). Transactions that are started explicitly through the DTC cannot be joined, as the underlying OLE DB interface for doing so is not surfaced by ADO.
Once all the work is complete the transaction can be terminated by either committing or rolling back the changes. The following code can be used to commit a transaction explicitly:
hr = pTransaction->Commit (FALSE, XACTTC_ASYNC, 0);
For transactions under COM+ control the composite object ‘votes’ for the transaction outcome as follows:
IObjectContext*pObjectContext;
hr = CoGetObjectContext (IID_IObjectContext,(void **)&pObjectContext);
hr = pObjectContext->SetComplete();
The COM+ services will then commit or rollback the transaction according to how all the objects involved in the transaction have voted (a COM+ transaction can be declared to span multiple compound objects).
Once again, it should be stressed that what has been described above are standard Windows2000 APIs for which complete documentation can be found on the Microsoft Developer Network. No Tier3 or hotTIP specific software is required at the coordinating node.
I apologize if I offended you.It’s not a question of honesty; I trust that you are being honest.Your posts, however, were not clear with respect to your RM-TM topology and I was trying to decipher that topology based on the info included in your posts.I wanted to understand that topology and how to reproduce the problem, because whether or not you pursue fixing this through Product Support, we always want to enhance our internal testing infrastructure in order to catch these problems earlier.
A correction and a comment:
- MSDTC does support 1PC for TIP; this is part of the TIP spec.
- Personally, I want to thank you for using our product.
-Richard
Hi Richard,
> I apologize if I offended you.
You didn't. Sorry if I took my frustration out on you. I'm a little bit focused at the moment. > MSDTC does support 1PC for TIP; this is part of the TIP spec.
I was just referring to MS05-051 from: -
http://support.microsoft.com/kb/908620/en-us
DisableTipBeginCheck REG_DWORD 0 (default). TIP BEGIN commands are always rejected.A non-zero value. TIP BEGIN commands are enabled. In most TIP scenarios, transaction managers do not use the BEGIN command in communications. For example, MS DTC does not use this command. If you use TIP only with MS DTC and if you set this value to 0, you do not disable any functionality.
Would I have been correct if I had have said "msdtc will never *initiate* a 1PC for TIP"?
Anyway, if anyone does send a BEGIN command to my hotTIP TM, I simply respond NOTBEGUN (that's also part of the TIP spec :-)
> Personally, I want to thank you for using our product.
No problem, MSDTC is a fabulous product! And is bearing all sorts of fruits from the investment and commitment that Microsoft has made in it over recent years. Sadly, it reminds me of what DECdtm could have been :-(
All politics and philosopy aside, has Microsoft ever thought about bringing out a UNIX version of the DTC? Sure it would be a break from tradition, and could upset IBM and BEA, but you really have a great product and, to date, I perceive a total void/vacuum in the Generic-Bundled-TM space over there. The space between Tuxedo,Encina and WebLogic,WebSphere is wide open! (Must mention Oracle iAS and whatever it is that IONA is doing)
Will add more to: -
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=639493&SiteID=1
But if you will allow me to re-state/paraphrase your position, it is that msdtc doesn't use anything like the same "backoff" algorithm, for TIP retries, that it uses for WS-AT DurablePC? And that the retry module for the msdtc TIP protocol does nothing more than "retry<->fail<->wait 1 second<->retry". Sound about right?
Thanks for you interest and help.
Cheers Richard Maher
Thank you for the kind words about our product.I’ll pass then on.
Our discussion about single-phase commit is orthogonal to whether MSDTC will accept a BEGIN call from a TIP participant or not.When we say “MS DTC will not use this command”, this means that the MSDTC proxy will not send a TIP BEGIN call to start a transaction.When the proxy needs a new transaction, it will create a native transaction on its TM.By setting DisableTipBeginCheck to non-zero, a TIP participant can also begin a new transaction on the MSDTC TM; if the registry value is missing or set to zero, then a transaction cannot be begun using TIP BEGIN.
Single phase commit is an optimization that MSDTC will employ if it only has one durable enlistment, irrespective of how the transaction was begun.If your TIP TM had been the only subordinate to MSDTC, i.e. that there was no second enlistment on SQL, then MSDTC should have sent your TM a COMMIT instead of a PREPARE.When your TM went down and/or the network connection was lost, MSDTC would have told the client’s proxy that the transaction is In Doubt, and it would have been the responsibility of your TM to do the QUERY.[Note: of course, if MSDTC goes down it will try to recover the in-doubt transaction when it comes back up]
With respect to WS-AtomicTransactions, the protocol is much different.For one thing, there is no single phase commit.When a TM is sending COMMIT to a 2PC durable participant, it must retain knowledge of the committed transaction until it receives a COMMITTED or READONLY call from each participant; but there is no obligation on the TM’s part to have to seek out COMMITTED notifications from the participants.I guess this is what you mean by “back-off”.With TIP, however, the TM is obligated to RECONNECT with the down-stream participants and to try to solicit the COMMITTED command.The authors of WS-AtomicTransactions wanted to avoid that obligation.
With respect to the retry every second algorithm: yes that is our current behavior; but that is an implementation detail which could change in the future should we so choose.
I think this wraps it all up.Thanks again for using our product.Cheers.
-Richard