Numerous errors in the Application Event Log when TFS host is rebooted
I'm not sure that there's anything "wrong" here, but it seems worth asking about - I noticed in the event log for my TFS host that quite a number of errors and warnings were logged within the first 15-20 seconds of operation following a reboot. Subsequent to that, everything's working just fine.
Off hand, it looks like most of the issues stem from Tfs services spinning up and attempting to (directly or indirectly) use SQL Server before it's ready to accept connections, but that's really just a guess, not the result of any detailed analysis.
Of course, in the long run, I'd like to see a reboot that doesn't log bogus exceptions from services stumbling over each other at startup, but for 1.0, is this expected/typical behavior?
I can post actual event log messages if anyone's interested in taking a closer look...
Several things happen at startup that may log warnings and errors. The system connects to the database and Active Directory if you are in a domain environment. It loads data from the database and begins some background processing tasks. It will log any configuration issues that might cause errant behavior later on. Please post the actual messages and we may be able to help give diagnosis.
I'd like to see the event log entries -- please post.
Can you recommend a tool for getting a usable extract from the event log? The system event viewer doesn't cut it - exporting entries doesn't include the details, and viewing details doesn't include the basic event information in the selectable text.
Looking at a couple of reboots, it looks like various things are trying to communicate with the database too soon, and/or the database is trying to communicate with the domain controller too soon. In all cases, after the system was fully up and running, TFS was working fine.
Carl,
If you press the "clipboard icon", all of the entry details will be available. This icon is below the "down" arrow.
OK, I've got a file of events from my last reboot (on 5/2/2006). I don't see any way to attach files here, so here's a rough rundown of what happened. I can send the zipped output from LogParser directly if you're interested in more details:
A reboot was initiated at 13:19:06. There's the normal flood of events from SQL Server (etc) during startup, but at 13:19:21 I see:
MSSQLSERVER "Login failed for user '{my-domain}\tfsservice'. [CLIENT: 192.168.1.247]"
and then a series of errors from Sharepoint and various Tfs services that appear to be the result of failed database connection attempts.
After the reboot was completed, I was able to access everything - Sharepoint, work item tracking, source control, reports, etc. No user credentials have been changed. No network configurations have been changed. Nothing's been changed.
The only thing I can see that seems suspicious is that the domain controller was rebooted at 13:25:58, but I believe I had successfully used everything in Tfs before that (I was installing updates and rebooting machines - nothing seemed out of the ordinary).
I'll reboot again some time in the next month, I'm sure - I'll keep a close eye on that reboot - maybe this was just noise.
I rebooted again last night - not quite as many errors logged, but still a few, and all seemingly caused by login failures. I see failure audit events for both the tfsservice account and the built-in local system account! It's as if SQL Server is trying to log in before the local security authority is able to authenticate anyone. In all cases, after the system is fully booted, everything works fine. But it's still unsettling.
I'd like to bring this topic back from the dead, as I'm seeing it too. I've searched this forum, and this is really the best/only thread that talks about it, but with no real explanation.
I've got a single-server setup, and installed Foundation Server 2005 as per the instructions (both downloaded from Microsoft's site just a week ago or so). Verision is 8.0.50727.42, this is the 180-day trial. I'm just testing around with this, learning the product, so I can reboot at will. And when I do reboot, I get these errors in the Application event log consistently.
As with Carl, my VSTS seems to be working fine. But the errors are unsettling, and even if harmless they raise unnecessary warnings and alerts that could cause trouble and be annoying (especially if MOM is paging/emailing based on alerts being logged).
I will provide some of the errors seen here, in the order I see them after a reboot. I can't copy/paste direct, so I'll paraphrase:
- 8:58:44 AM - ERROR Event ID 17187, source MSSQLSERVER - "SQL Server is not ready to accept new client connections; the connection has been closed. Wait a few minutes before trying again.....CLIENT:[Lists IP address of foundation server]" This error shows up twice in a row with the same time stamp.
- 8:58:45 AM - ERROR Event ID 1000, source Windows SharePointService - "#50070: Unable to connect to the database STS_Config_TFS on [servername]. Check the database connection information and make sure that the database server is running".
- 8:58:51 AM - FAILURE AUDIT Event ID 18456, source MSSQLSERVER - "Login failed for user 'DOMAIN\TfsService [Client: serverIPaddress]".
- 8:58:51 AM - ERROR Event ID 1000, source Windows SharePointService - "#50070: Unable to connect to the database STS_Config_TFS on [servername]. Check the database connection information and make sure that the database server is running".
- 8:58:54 AM - FAILURE AUDIT Event ID 18456, source MSSQLSERVER - "Login failed for user 'DOMAIN\[SQL Service Account] [Client: localmachine]".
- 8:58:54 AM - ERROR Event ID 107, source Report Server Windows - "Report Server Windows Service (MSSQLSERVER) cannot connect to the report server database".
After all that, no errors, seemingly no problems (although we haven't tried everything on this yet!). I think that at a minumum, giving off false alarms for a production monitored enterprise system would be a very bad thing. Our Ops guys get paged enough as it is. :)
Thanks!
On a reboot, since we don't have service-based dependency information, you get these kinds of errors when, for instance, IIS has started up and your first TFS request comes in but SQL Server hasn't started yet. This is no different than if the SQL Server were remote and wasn't available.
Inside MS, we disable monitoring of the TFS instance when we have known outages like this. Since it sounds like this is a controlled reboot, you should take the same kind of precaution. Most of the time ops guys have ways of temporary disabling monitoring, as you may need reboots for things like applying patches to the OS.
In the extreme case, you could either try to introduce a service dependency or change IIS to manual startup then have some service or boot-time program that started IIS once you knew SQL Server was up and going. FWIW, most of our "enterprise" customers have dual-server setups - as such, this is a problem that we can't deal with locally on the app tier - the data tier reboot can happen at any time.
TFS V1 SP1 and later will include a "TfsQuiesce" tool that will help shut down the TFS instance, so that may be the route you decide to go for avoiding these spurious event log entries.
Thanks for your response, James. I guess that's just the way it is, and disabling MOM and other monitoring tools during planned outages (once this gets deployed to a live production/enterprise system) is the way to go. I guess I just wanted that warm fuzzy feelings from you guys that all is indeed ok. :)
SP1 sounds like it will certainly help this though, so that's good to know.