Windows vista – CPU usage shoots to ~50% and stays there until suspended

This is the Windows Vista problem, seen, I think, mostly on dual processor laptops, where % CPU suddenly shoots to about 45% and stays there.  Once this has happened, % CPU will never return to normal on its own, though, curiously, "sleeping" the box for a few seconds will reset the condition.

I’ve seen this happen many times (Sony VGN-CS215J laptop with Intel dual core CPU) when the box is sitting there doing nothing, with only the normal 2-3% background CPU, then suddenly — BOOM!

"Process Explorer" shows that the CPU in one of these episodes is being consumed by "Interrupts", rather than any specific program.

It is definitely the case that this condition is "real", and not just a problem with CPU metering.  When it occurs the box slows down, and sometimes particular applications slow to a crawl (tasks that would take ten seconds take ten minutes, eg).  In addition, on my laptop the fan takes off at high speed.

Google searches show that this is a fairly common problem, and many supposed "causes" have been "identified", though they always turn out to be false leads.  The problem tends to come and go (it appears that the likelihood of it varies from IPL to IPL, with some IPLs hardly ever doing it and others doing it every ten minutes), so it’s easy to get the false impression that the problem has been "cured", only to have it come back.

Microsoft, of course, denies all knowledge of the problem, even though it occurs on several different brands of system.

One clue I have is that it doesn’t seem to happen when my laptop is running on battery (though of course with the variability of the symptom it’s hard to say this with any certainty).  But I tried playing with the CPU speed controls (under advanced power options) and that didn’t cure it.

Update 1:

I’ve checked several times, and there aren’t any new drivers available for my box.  (There is a new display driver, but Sony hasn’t respun it with their special hooks, so it won’t work on this box.)

I don’t see that "walking the stack" would do any good since the "looping" is in interrupts, not any specific process.  I suppose I could try to do an interrupt trace, but it would likely take a lot of time that I don’t have.

Update 2:

Update: Today I experienced the failure while running on battery, the first time that has happened.  So I know of no conditions that prevent the failure.

Re turning off Windows services such as search indexing, I did that a long time ago.

Update 3: (5/21/11)

On a whim I unplugged the network cable and have been running wireless at home and at work for the past two days.  (I don’t generally like to run wireless if I don’t need to since I figure there’s already too much RF pollution.)  No episodes have occurred.  Weird.

Update 4: (5/30/11)

I’ve been running for the past 11 or so days, using wireless only.  (Not something I normally like to do, since I feel there’s too much RF pollution already and no need to add more when a wired connection is available.)  And for the past 11 days I’ve not had an "incident" — by far the longest incident-free time I’ve seen.  In a day or two I’ll start plugging in again and see what happens.

Update 5: (6/2/11)

As a result of a wireless router outage at work, I had to use the wired connection there for two days, and the old behavior (40% or so "events" after 30-60 minutes of up-time) returned.  Curious thing, though: On both days, when I brought the laptop home and connected to wireless, the problem would recur within a few minutes.  But once I did a "sleep" and "reawaken" the problem would be permanently gone.

To bring the laptop home I’d sleep it, but somehow the "bug" survived through that.  Or, quite possibly, the wired interface didn’t get reset until after reawakening, and it did something nasty during those few seconds.

Just for reference, the wired adapter is a "Marvell Yukon 88E8040 PCI-E Fast Ethernet Controller".  It would be interesting to know if the same adapter is associated with other cases of this problem.

Update 6: (6/6/11)

I’m beginning to suspect that somehow the wireless adapter is the culprit.  When it’s turned off it can somehow corrupt the system.  I say this because the router at work is a little "funky" and I sometimes have to turn the wireless off and back on (via a mechanical switch on the front of the laptop) to get a connection.  When I do this, inevitably within a few minutes (not immediately) I get the interrupts back.  Sleeping and reawakening the laptop clears the interrupt problem, seemingly permanently (until the next time the wireless is turned off).  For the record, the wireless adapter is a "Intel(R) WiFi Link 5100 AGN", though it could be more of a problem with the way the switch is implemented.

Update 7: (7/5/11)

I’ve been running for over a month now on the wireless network adapter (vs hardwired) and the problem has essentially gone away.  A few times (due to losing connectivity for some reason) I’ve turned the adapter off for several seconds and then back on, to reset it.  In all but one of these cases, as good as I can remember, I got the 50% CPU problem after the off/on cycle, though, curiously, in several cases the problem didn’t appear for 30 minutes or more after the off/on.

Update 8: (7/18/13)

About 10 months ago I had to completely restore my system from backup, and since then I’ve not seen the 50% cpu problem.  (Haven’t tried to deliberately provoke it, but the radio has been accidentally turned off on several occasions.)  Of course, no Windows bug ever goes away completely, so now I have a problem with Open Office crashing, but I guess I can live with that.

Solution:

Take a look at the Windows Performance Toolkit: http://blogs.msdn.com/b/pigscanfly/archive/2009/08/06/stack-walking-in-xperf.aspx

My money’s on crappy drivers.    

I had this happen with crappy Broadcom (that’s redundant) network drivers.