Category Archives: SQL Server Tools

Who blew up my TempDB?

Who blew up my TempDB?

Who hasn’t experienced a server all of a sudden running out of TempDB space and seeing the following error message?

Could not allocate a new page for database 'tempdb' because of insufficient disk space in filegroup 'PRIMARY'. Create the necessary space by dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

The difficulty about investigating a TempDB issue like this is that in general the solution chosen to fix a full TempDB is to restart the SQL Server instance. Seeing as how the instance has quite often locked up completely and the customer wants their server to work again ASAP, then a restart is almost inevitable.

A restart is an option to combat the symptom, but not the cause. It also (in)conveniently removes the evidence of the transaction that caused TempDB to fill up in the first place.

To see how to deal with TempDB (or other databases) filling up/filling up the disk it is on please take a look at this blogpost.

Returning to the question at hand: Who filled my TempDB?

One solution I looked into, and have implemented for this exact reason, uses Extended Events.

Extended Events is an event processing framework built inside SQL Server since SQL 2008. It allows you to run very lightweight background monitoring of system events, and in this case can capture information about which command/transaction caused a file growth event to occur.

For a full introduction to Extended Events, please take a look at the 31 Days of Extended Events series from Jonathan Kehayias ( b | t ).

Catch the culprit

To be able to catch the offending process/user, we will create an Extended Events session on the server and the following code does just that:

CREATE EVENT SESSION [Database_Growth_Watchdog] ON SERVER
ADD EVENT sqlserver.database_file_size_change (
 ACTION ( sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.database_name, sqlserver.session_nt_username, sqlserver.sql_text )
 WHERE ( [database_id] = ( 2 ) ) -- We filter on database_id=2 to get TempDB growth only
)
ADD TARGET package0.event_file ( SET filename = 'D:\Temp\Database_Growth_Wathdog.xel',
 max_file_size = ( 10 ) )
WITH ( MAX_MEMORY = 4096 KB,
 EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
 MAX_DISPATCH_LATENCY = 1 SECONDS,
 MAX_EVENT_SIZE = 0 KB,
 MEMORY_PARTITION_MODE = NONE,
 TRACK_CAUSALITY = OFF,
 STARTUP_STATE = ON )
GO
ALTER EVENT SESSION [Database_Growth_Watchdog] ON SERVER STATE = START

Some things to note about the code:

  • We are monitoring for data file and log file growth. The event sqlserver.database_file_size_change fires for any file size change. We do this because we want to be informed of any and all file growth just to make sure we don’t miss anything.
  • If you have multiple data files for TempDB (like you may for a multi-core environment) you will see one event fire for each file that is growing. E.g. You have 4 data files and the database grows, you will see 4 entries in the extended events output.
  • The session is set to flush events to the output file in 1 second intervals (MAX_DISPATCH_LATENCY). This is done to ensure we lose as few entries to the output file as possible. If TempDB fills up, the entire instance can often stop working completely. We want to catch as much information as possible before this happens, so we flush to the output file in very short intervals.
  • We start the session at instance startup (STARTUP_STATE). This ensures we have the event active immediately on server startup. As this is monitoring file growth events, we should remain very lightweight so not have to worry about swamping our system with extended events processing.
  • We limit the output file to 10MB and allow it to deliver to a total of 5 files. This means we have to have 50MB for the files in the output directory and won’t spam the folder with too much data.
  • When the event fires, we collect information about the query/command that caused the file growth to occur. This includes NT username, Hostname, origin database of the query, command text and application name.

The information collected by this session can be vital in pinpointing the cause for the TempDB filling up. However, there is the possibility of false positives in this setup. TempDB may have been almost completely filled by another previous transaction and the transaction causing the growth event is an innocent bystander. This is an unavoidable situation, but needs to be kept in mind when analysing the data. If you don’t catch the exact transaction this way, you are still on the right track.

Analyse the data

Once the data has been collected, we need to load and parse the output files to make sense of what has happened. The following code will parse the xml output that is in the (up to) 5 files.

DECLARE @TraceFileLocation NVARCHAR(255)= N'D:\Temp\Database_Growth_Watchdog*.xel';
WITH FileSizeChangedEvent
 AS (
 SELECT object_name Event,
 CONVERT(XML, event_data) Data
 FROM sys.fn_xe_file_target_read_file(@TraceFileLocation, NULL, NULL, NULL)
 )
 SELECT Data.value('(/event/@timestamp)[1]', 'DATETIME') EventTime,
 Data.value('(/event/data/value)[7]', 'BIGINT') GrowthInKB,
 Data.value('(/event/action/value)[2]', 'VARCHAR(MAX)') ClientUsername,
 Data.value('(/event/action/value)[4]', 'VARCHAR(MAX)') ClientHostname,
 Data.value('(/event/action/value)[5]', 'VARCHAR(MAX)') ClientAppName,
 Data.value('(/event/action/value)[3]', 'VARCHAR(MAX)') ClientAppDBName,
 Data.value('(/event/action/value)[1]', 'VARCHAR(MAX)') SQLCommandText,
 Data.value('(/event/data/value)[1]', 'BIGINT') SystemDuration,
 Data.value('(/event/data/value)[2]', 'BIGINT') SystemDatabaseId,
 Data.value('(/event/data/value)[8]', 'VARCHAR(MAX)') SystemDatabaseFileName,
 Data.value('(/event/data/text)[1]', 'VARCHAR(MAX)') SystemDatabaseFileType,
 Data.value('(/event/data/value)[5]', 'VARCHAR(MAX)') SystemIsAutomaticGrowth,
 Data
 FROM FileSizeChangedEvent;

Please take note of the variable @TraceFileLocation. The example uses a wildcard to allow loading/parsing of multiple files, this is particularly useful if you really do rollover into multiple files.

The results from the query provide a range of information regarding to who the offending process was, what command they had submitted last and a set of information about the client.

So, now we have a starting point to discover who/what is causing TempDB to grow and can discuss this with application owners.

Other ideas

It is extremely simple to to extend the session to monitor all databases and run a separate monitoring solution to inform you of such growth events. In fact, I saw Gianluca Satori ( b | t ) talk about streaming Extended Event data and processing them in near real time at his SQLSat Oslo session. I am certain that this would be a fantastic way of doing some sort of real time monitoring of growth events.

Some homework for you: What could you use this Extended Events session for? How could you modify/improve on it to help with your system monitoring and administration? Let us know in the comments.

The case of the non-starting SQL Agent Job

I ran into an interesting situation recently that involved a SQL Agent job that would not/could not run. Looking at the job settings everything seemed fine, yet the SQL Agent deemed the job “un-runnable”.

As we can see here “My Test Job” is enabled, and scheduled, but has no “Next Run” and is deemed not “Runnable”.

Job Activity Monitor

Taking a closer look at the job properties, we can further see that the job is enabled:

Job Properties - General_3

We also see that there really is a schedule for the job:

Job Properties - Schedules

And on further investigation I saw something that I have never really looked at before. The “Targets” properties:

There are no entries here, but more importantly, the radio button “Target local server” is not selected. This turned out to be the cause of the job not running!

If I try and exit the properties with “OK” (which you should never do unless you have intentionally changed a setting somewhere), then we are presented with a clear error message informing us about the missing Target:

Job Save Error

“You must specify the Target Servers on which this multi server job will execute.”

The “Targets” section of a SQL Agent Job is something that I have never delved into in any detail, but is a long standing feature to allow for administration of large server environments. You can read more into Multiserver Environment Administration on MSDN.

The reason for the setting being missing was through an incomplete scripting process for the affected job. If you script out a SQL Agent Job in SQL Server Management Studio, you will see that one of the last steps in the script is a system stored procedure “msdb.dbo.sp_add_jobserver”:

EXEC msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'

This stored procedure associates a job to a jobserver, which in my experience has always been “(local)”. Obviously, this will be different if you are using master servers and target servers in your environment.

As soon as I had set the target server to local for the job, everything was back to normal and the job ran again according to it’s schedule.

SQL Server 2012 SP2 – Making SIMPLE recovery simple again

Yesterday morning I blogged about how SQL Server 2012 RTM and SP1 are both flawed with regards to the model database and SIMPLE recovery model usage.  The error is fixed in a Critical Update, but many customers cannot/do not want to install CUs as they are less tested than Service Packs.

In that post I mentioned that SQL Server 2012 SP2 would include the fix for the issue I mentioned, but there were no details on when it would be released.

It seems Microsoft read my blog, because they released SQL Server 2012 SP2 late last night/early this morning!

This rolls up an entire set of CUs that have been released since the last SP in November 2012. So go out, download the SP and install it on test machines and then, after testing, onto your production boxes.

SQL Server Consolidation – Resistance is Futile: Remote presentation for SQL Southwest, 21st November 2013

I am booked to speak at the SQL Southwest User Group in the UK on the 21st of November at some time from 18:00 (GMT) onwards.

This will be a new session on SQL Server consolidation titled “SQL Server Consolidation – Resistance is Futile!”.  The session will cover the reasons for and against consolidation and hopefully help you to understand how consolidation can be performed in your environment. Due to the subject there won’t be many demos (if at all) and it will be more of a high-level discussion.

The session will be run remotely using a yet to be decided medium (GoToMeeting, WebEx or something similar), which will be a first for me.  A suitable link or login details will be made available via the SQL Southwest website and I will tweet the details ahead of time too.  I have been informed that there will be some sort of question/answer capability which will also be communicated in advance and during the session.

I hope to see you there!

Reducing data footprints on test systems

Picture the scene:

You are a DBA. You have to make everyone happy, including DEVs (yes, including them too). Part of that is to ensure that your DEVs have a fresh copy of your production data so that they can test their programming with current data and structures. The issue with that is your production database is multiple hundreds of GB in size and your test system doesn’t have the capacity to store a full copy of your database. Your IT Budget will also not stretch to purchasing more storage! What can you do?

I had a similar situation recently and took a look at the possibilties and here is what I came up with:

1. Try out tools like SQL Virtual Restore to allow you to “mount” a backup from a remote server avoiding the space issue all together. This is, of course, a solution that requires a license (no budget maybe?)

2. Use the great features offered to you by SQL Server Developer Edition: data compression to the rescue!

So of course, I chose door number 2.

As you may know, Developer Edition offers the same features as Enterprise Edition with the caveat that the instance cannot be used for production purposes in any way. This means that the seriously powerful and useful data compression becomes available for your test servers! This counts even if you only use Standard Edition in production – joy of joys! The good thing being that Developer Edition only costs around $50 (or less if you have an MSDN subscription) – even more joys!

If you have never seen/used compression take a quick look over on MSDN to see what it all is (Data Compression). Basically, you can compress data in indexes and tables at the storage level avoiding any query rewrites and still profiting from the storage savings. This can be a major advantage in terms of raw storage needs, but can also benefit you in high read environments with low RAM sizes. The data remains compressed when held in the buffer pool and is only de-compressed when being accessed. This means that you can keep more data in the buffer pool and reduce hard disk I/O (obviously not for free, compressing and de-compressing costs CPU cycles).  This may be acceptable on a test system with extremely constrained storage space.

The usage in my example scenario is now an even better proposition, not only can I reduce my storage footprint, I can also potentially increase test system performance on the I/O side of the equation (who doesn’t have I/O issues, especially on a test box!).

The next hurdle is of course identifying which indexes and tables that are in the database you want to squeeze down. This is possible via SSMS’ object explorer, but only if you want to spend an eternity doing so! The best way is to take a look at the meta-data tables/views/functions to interrogate the system objects. These are really interesting on their own (at least they are to me), but after writing my Index Script Creator, I realised the potential for using the information in these system objects to allow me to automate certain tasks like this one.

Similarly to the Index Script Creator, my Index Compressor Script (uploaded as a doc  but it is a .sql file really – stupid wordpress restrictions!) runs off and investigates all tables and indexes (down to the partition level) and then generates an index rebuild command to compress any currently non-compressed partitions of indexes / tables. The resulting commands can then be run and space can be reclaimed – just be aware that rebuilding indexes does cause log activity.

After compressing your tables and indexes you should have a much smaller amount of data (depending on how well your data compresses) and be able to reclaim some disk space.

I realise that this method will mean that you have to have a large enough disk to have the database in its uncompressed state to begin with, but you can end up with a cost effective solution to a budget-constrained test environment.

Happy zipping!

Good News / Bad News

Given the choice, most people want to hear the bad news before the good:

The Bad News

I recently found out that the owner of Atlantis Interactive has decided to throw in the towel with regards to his absolutely amazing SQL Server Tools and IDE.

Matt Whitfield has produced (on his own) a set of tools that puts a lot of full blown companies to shame! The IDE SQL Everywhere is a complete replacement for SSMS, but boasts a number of improvements/extra features that seem blindingly obvious when you see them in SQL Everywhere. My favourites are “extract as CTE” where you mark some code and allow SQL Everywhere to create a CTE for you, and the smart renaming of variables/aliases etc.

When I heard that he had made this decision, I could understand it, but still find it a shame. He was obviously in a hard market to start with, with some very firm competition. I am still surprised that the tools haven’t made any noticeable headway – there are loads of people looking for good tools, even in the niche of T-SQL development.

The Good News

Matt has not decided to throw away all that he created. He has unlocked all the tools and set them free! His blood, sweat and tears (well, not so much blood – I hope!) have not been wasted.

If you have had enough of the limitations of SSMS intellisense, or general SSMS strangeness, take a good look at SQL Everywhere. Whilst you are there, have a look at the other great tools he created, you will be surprised at what you find.

My hat goes off to him and I hope that this setback doesn’t stop him from giving it another go in the future.

@Matt – if you read this, I’m looking forward to meeting you at SQLBits. I would love to hear about any other ideas you have on the back burner 🙂