Answer Re: Error Report processing delay

Hi,

I've signed up for WER and done some of the important steps by now:

1. Obtained Verisign ID

2. Setup a proxy caching Symbol Store to hold our symbols and also cache the msdl ones.

3. Source indexing on all my binaries to automatically extract the correct source code from svn for the relevant crash dumps.

4. Mapped my products using the Product Feedback mapping tool.

I've managed to get my application to not respond several times. The Fault report dialog popped up and I clicked Send which succeeded. I've also managed to get a crash to happen which I also sent.

The Crash32 event has now shown up in WER, but there is still no sign of the hangs. My questions are:

1. Does an application hang also show up in WER?

2. What is the turnaround from when the user clicks Send to when the event appears in WER? Do you do a batch process every day? Or less often?

3. How does WER decide to only register a few details about the event, or to get a full minidump? My crash only has a few details, and reports that it has requested a CAB.

Thanks,

Brian

[1159 byte] By [brokenn] at [2008-2-15]
# 1

Hi Brian,

You have done the right things (symbols and source), and it looks like you are all set. Here are the answers you are looking for...

1. Does an application hang also show up in WER?

Application Hangs from Windows Vista will show up, but not from down-level operating systems. The reason we don't include the older hang events is largely due to the way the WER client grouped hang problems... by process name. This dumped too many failure types into a single event ID and became impossible to manage. This issue also made it problematic to set a response since you could only set a singular response for the event although there may have been many issues in that event. Windows Vista does a better job of uniquely identifying hang events, and they are actionable to you.

2. What is the turnaround from when the user clicks Send to when the event appears in WER? Do you do a batch process every day? Or less often?

It depends on how we receive the report, and the data we are working with. If the report is the first one seen by our servers, then it gets logged into our SQL DB right away, but the Cab File is processed in batch (if a cab file was requested by the server; I'll explain that more later). If it is not the first report we have seen, then we count the hit to the event path URL in batch. With that said, you also should know that there are many different event types (Crash32, Crash64, Managed Code Failures, Mobile OS events, etc.). Each event type gets processed in their own way and the batch may complete at different times based on volumes per day of that type of event from around the world. Smile It is possible that Crash32 events from yesterday will be visible today but the Hung Application events from yesterday are still being processed. This happens often, and sometimes there can be a 3-5 day lag between the report and the finished process. We constantly work to improve the processing times.

3. How does WER decide to only register a few details about the event, or to get a full minidump? My crash only has a few details, and reports that it has requested a CAB

The Servers decide (ultimately) what gets sent across the wire to Microsoft. We don't collect every cab file, and we don't collect cabs on the first report in most cases. I feel that in your case, we haven't finished processing the reported cabs so the service is telling you that we will be collecting cabs for you. There are some changes (they were a long time in the works) we are making to Cab download to make it easy for you to identify if cabs are available and make it easier to download them from the service. You should see these changes soon (few months).

Okay, I said I would explain more about cabs being requested by the server so here it is. When files are mapped in the WER service, we automatically set up collection rules for those files. Those rules request cabs for events related to the mapped files. This works out nicely for new events, but for old existing events there are explicit 'no collect' rules in place. We periodically prune these existing rules based on the new registrations but the work is not trivial so we do not do this all the time. This is why you can sometimes see events that say there are no cabs for the files you mapped. We are working on a process that will automatically mark mapped events without cabs to collect more based on some triggers, but we are still working out the rules so I don't have anything concrete to share with you on this right now. I envision a threshold trigger based on growth and volume; If the event does not have cabs but the event is hitting the growth threshold defined, then we will mark it to proactively request cabs.

Our cab space is not infinite, and we are required to purge cab data periodically (for many reasons, not just space) so we only keep about 6 months work of cabs holistically. After that it's a FIFO process. That means that we could have had some cab files for you to download last week but they were old and now they are purged and we need to collect more. This works out since events that need to be fixed because they are a current problem will always yield more cabs in a few days on request.

I hope this helps you Brian! Thanks for using the service!

Kind Regards,

-Jason

JasonHardester at 2007-9-12 > top of Msdn Tech,Microsoft ISV Community Center Forums,Windows Error Reporting for ISVs...
# 2

Hi Jason,

Thank you very much for the prompt and detailed response! Try as I might, the website won't allow me to respond unless I quote your reply, so here it is.

I have now received the cabs for my frequent crashes and still have only 1 event hit for the less frequent ones. It seems as if all is working as it should.

Thanks,

Brian

brokenn at 2007-9-12 > top of Msdn Tech,Microsoft ISV Community Center Forums,Windows Error Reporting for ISVs...