Auto Update Stats
Today I spent several hours troubleshooting a long-running query in a (thankfully) test environment. Yesterday, everything was working fine. In addition, the same executable could be used successfully against a different instance of the database. However, in one particular database, the application would freeze at certain points, sometimes ending in an 'unspecified error', other times in a timeout.
The usual suspect in these cases is blocking, so that's the first thing I looked for. No luck though, the process shows as Suspended, but not blocked by any other process. The wait type for the process is PAGEIOLATCH_SH, but the waittime is always just a few milliseconds. The CPU and IO for the process are off the charts however, and increasing with every refresh. So it's doing a whole lot of reading, but there is no clear indication of what. Even more strange, once I was hung up on a certain point in the processing, other users could get past that point, but would hang up in different spots.
So I trace to find the last query executed by the application and run that in the query analyzer. Sure enough, it returns immediately. So it's reading a huge amount of data, but I only asked for a few rows, what is the server doing? And why is it different from the application versus the query analyzer? So I turned on all SP and query-related events in the profiler, and now the culprit is revealed. An SP:Statement event shows that what is happening is this:
SELECT StatMan([SC0], [SB0000]) FROM (SELECT TOP 100 PERCENT [SC0], step_direction([SC0] over (order by NULL) AS [SB0000] FROM (SELECT SUBSTRING([Image_Back], 1, 100)++substring([Image_Back], case when datalength([Image_Back])<=200 then 101 else datalength[Image_Back])-99 end, datalength([Image_Back])) AS [SC0] FROM [dbo].[CheckItemImage] WITH (READUNCOMMITTED, SAMPLE 6.712329e+001 PERCENT) ) AS _MS_UPDSTATS_TBL_HELPER ORDER BY [SC0], [SB0000]) AS _MS_UPDSTATS_TBL OPTION (MAXDOP 1)
This is happening during a recompile of a view involved in the select. The table in question only contains a few hundred thousand rows, but they are check images, so there is a huge amount of binary data involved, and this statistics query is basically reading the whole thing.
Updating statistics on the database did not resolve the issue, the server still tries to build the statistics to optimize the query. So I turned off Auto Create Statistics, and now the query runs instantly (turning off Auto Update Statistics did not work).
Of course, I don't want to turn off these features, they are great features. And I'm not faulting SQL Server in the slightest on this one. Query optimization is a difficult problem, and statistics are a great way to help prune the set of potential plans, but I will from this point on immediately suspect this scenario whenever I see a process suspended with a waittype of PAGEIOLATCH_SH.
To re-enable these features, I'll need to determine why the stats query is taking so long. I may be able to build those particular statistics manually, or it may involve actually archiving some of the older data in the table.
So why didn't it hang in the query analyzer? Turns out it did if the application wasn't already hanging on the same query, I just never tried that until after I understood what the problem was. I suspect this is because if the server is already computing a particular statistic for another plan, then it will just use the existing statistic or compile the query without statistics. In this case, that resulted in almost instant execution times.
In a production database, this would have meant several hours of downtime. I hope this post helps others that may be seeing the same behavior find the solution more quickly than I did.
Bryn Rhodes
Alphora
Comments
This helped me a lot, I have the same problem. Thanks for posting!
Posted by: Stefan | April 21, 2009 05:50 AM
Thanks Bryn,
We are facing a similar issue here in our test environment and after struggling for two days I came accross this article.
I was looking for anything related to STATISTIC updates that could cause PAGEIOLATCH.
however, in our case everything was working fine and since we converted to SQL 2008 we always run update statistic manually everytime we restore a database and we never faced this problem in production. But one fine day in test the application started freezing up. The only change we had done was increase tempdb data file into multiple files ( same as # of cpu ) for parallelism. Once I removed them it initally had to update statistics once but its been fine since then.
We still have other servers with multiple tempdb files where app freezes occasionally.
With this what I am assuming is the "update statistics query" while using tempdb is outgrowing tempdb and slowing this process. The culprit was exact same query that you have.
"SELECT StatMan([SC0], [LC0], [SB0000])
FROM
(SELECT TOP 100 PERCENT [SC0], [LC0],
step_direction([SC0]) over (order by NULL) AS [SB0000]
FROM (SELECT SUBSTRING ([Image_Back], 1, 100)++substring([Image_Back],
case when datalength([Image_Back]) then 101 else datalength([Image_Back])-99 end,
datalength([Image_Back])) AS [SC0], datalength([Image_Back]) AS [LC0]
FROM [dbo].[CheckItemImage] TABLESAMPLE SYSTEM (3.774856e+001 PERCENT) WITH (READUNCOMMITTED) )
AS _MS_UPDSTATS_TBL_HELPER ORDER BY [SC0], [SB0000] ) AS _MS_UPDSTATS_TBL OPTION (MAXDOP 1)
"
Posted by: manish | August 21, 2009 12:55 PM