Tag Archives: administration

You didn’t know that?! But that’s so obvious!

Rant-mode activated

Today I noticed a tweet from Andrew Prewski (@DBAFromTheCold)

Loving all the passive aggressive comments about my article on SQL on linux, keep ’em coming!

Andrew’s tweet piqued my interest and I went to look at his blog-post “Killing databases in SQL Server on Linux“. In the post, Andrew investigates how files are handled inside Linux, in particular how database files are handled. Go and read it, it is really interesting and a possible eye-opener for anyone who has no experience with Linux.

Now, the reason for this rant is the comments that came through from people who do have experience with Linux/Unix (go and read those too). Many of them boil down to basically scoffing at how Andrew didn’t know that Linux handles files differently to Windows and has done for years/decades. The tone of these comments were aimed to belittle Andrew for not knowing about this behaviour.

This pisses me off!

Maybe I’m just having a bad day or something, but it seems wrong to me that someone who is trying to learn should be made to feel in any way stupid for exploring something. It seems even wronger for this sort of behaviour to come from an area of the IT world which generally prides itself with the idea of “hacking”. The whole premise of taking a technology that you don’t understand and prodding, poking, dissembling and rebuilding – that is the “way of the hacker”, or at least how I have always understood it.

I for one hope that Andrew won’t be deterred by this shitty behaviour and also would hope that anyone reading this would think twice about how they respond to a similar situation. Everyone starts out at zero, not even Linus Torvalds was born being able to code.

Please remember: Don’t be condescending. Offer tips, advice or references to help, not ridicule.

Who blew up my TempDB?

Who blew up my TempDB?

Who hasn’t experienced a server all of a sudden running out of TempDB space and seeing the following error message?

Could not allocate a new page for database 'tempdb' because of insufficient disk space in filegroup 'PRIMARY'. Create the necessary space by dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

The difficulty about investigating a TempDB issue like this is that in general the solution chosen to fix a full TempDB is to restart the SQL Server instance. Seeing as how the instance has quite often locked up completely and the customer wants their server to work again ASAP, then a restart is almost inevitable.

A restart is an option to combat the symptom, but not the cause. It also (in)conveniently removes the evidence of the transaction that caused TempDB to fill up in the first place.

To see how to deal with TempDB (or other databases) filling up/filling up the disk it is on please take a look at this blogpost.

Returning to the question at hand: Who filled my TempDB?

One solution I looked into, and have implemented for this exact reason, uses Extended Events.

Extended Events is an event processing framework built inside SQL Server since SQL 2008. It allows you to run very lightweight background monitoring of system events, and in this case can capture information about which command/transaction caused a file growth event to occur.

For a full introduction to Extended Events, please take a look at the 31 Days of Extended Events series from Jonathan Kehayias ( b | t ).

Catch the culprit

To be able to catch the offending process/user, we will create an Extended Events session on the server and the following code does just that:

CREATE EVENT SESSION [Database_Growth_Watchdog] ON SERVER
ADD EVENT sqlserver.database_file_size_change (
 ACTION ( sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.database_name, sqlserver.session_nt_username, sqlserver.sql_text )
 WHERE ( [database_id] = ( 2 ) ) -- We filter on database_id=2 to get TempDB growth only
)
ADD TARGET package0.event_file ( SET filename = 'D:\Temp\Database_Growth_Wathdog.xel',
 max_file_size = ( 10 ) )
WITH ( MAX_MEMORY = 4096 KB,
 EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
 MAX_DISPATCH_LATENCY = 1 SECONDS,
 MAX_EVENT_SIZE = 0 KB,
 MEMORY_PARTITION_MODE = NONE,
 TRACK_CAUSALITY = OFF,
 STARTUP_STATE = ON )
GO
ALTER EVENT SESSION [Database_Growth_Watchdog] ON SERVER STATE = START

Some things to note about the code:

  • We are monitoring for data file and log file growth. The event sqlserver.database_file_size_change fires for any file size change. We do this because we want to be informed of any and all file growth just to make sure we don’t miss anything.
  • If you have multiple data files for TempDB (like you may for a multi-core environment) you will see one event fire for each file that is growing. E.g. You have 4 data files and the database grows, you will see 4 entries in the extended events output.
  • The session is set to flush events to the output file in 1 second intervals (MAX_DISPATCH_LATENCY). This is done to ensure we lose as few entries to the output file as possible. If TempDB fills up, the entire instance can often stop working completely. We want to catch as much information as possible before this happens, so we flush to the output file in very short intervals.
  • We start the session at instance startup (STARTUP_STATE). This ensures we have the event active immediately on server startup. As this is monitoring file growth events, we should remain very lightweight so not have to worry about swamping our system with extended events processing.
  • We limit the output file to 10MB and allow it to deliver to a total of 5 files. This means we have to have 50MB for the files in the output directory and won’t spam the folder with too much data.
  • When the event fires, we collect information about the query/command that caused the file growth to occur. This includes NT username, Hostname, origin database of the query, command text and application name.

The information collected by this session can be vital in pinpointing the cause for the TempDB filling up. However, there is the possibility of false positives in this setup. TempDB may have been almost completely filled by another previous transaction and the transaction causing the growth event is an innocent bystander. This is an unavoidable situation, but needs to be kept in mind when analysing the data. If you don’t catch the exact transaction this way, you are still on the right track.

Analyse the data

Once the data has been collected, we need to load and parse the output files to make sense of what has happened. The following code will parse the xml output that is in the (up to) 5 files.

DECLARE @TraceFileLocation NVARCHAR(255)= N'D:\Temp\Database_Growth_Watchdog*.xel';
WITH FileSizeChangedEvent
 AS (
 SELECT object_name Event,
 CONVERT(XML, event_data) Data
 FROM sys.fn_xe_file_target_read_file(@TraceFileLocation, NULL, NULL, NULL)
 )
 SELECT Data.value('(/event/@timestamp)[1]', 'DATETIME') EventTime,
 Data.value('(/event/data/value)[7]', 'BIGINT') GrowthInKB,
 Data.value('(/event/action/value)[2]', 'VARCHAR(MAX)') ClientUsername,
 Data.value('(/event/action/value)[4]', 'VARCHAR(MAX)') ClientHostname,
 Data.value('(/event/action/value)[5]', 'VARCHAR(MAX)') ClientAppName,
 Data.value('(/event/action/value)[3]', 'VARCHAR(MAX)') ClientAppDBName,
 Data.value('(/event/action/value)[1]', 'VARCHAR(MAX)') SQLCommandText,
 Data.value('(/event/data/value)[1]', 'BIGINT') SystemDuration,
 Data.value('(/event/data/value)[2]', 'BIGINT') SystemDatabaseId,
 Data.value('(/event/data/value)[8]', 'VARCHAR(MAX)') SystemDatabaseFileName,
 Data.value('(/event/data/text)[1]', 'VARCHAR(MAX)') SystemDatabaseFileType,
 Data.value('(/event/data/value)[5]', 'VARCHAR(MAX)') SystemIsAutomaticGrowth,
 Data
 FROM FileSizeChangedEvent;

Please take note of the variable @TraceFileLocation. The example uses a wildcard to allow loading/parsing of multiple files, this is particularly useful if you really do rollover into multiple files.

The results from the query provide a range of information regarding to who the offending process was, what command they had submitted last and a set of information about the client.

So, now we have a starting point to discover who/what is causing TempDB to grow and can discuss this with application owners.

Other ideas

It is extremely simple to to extend the session to monitor all databases and run a separate monitoring solution to inform you of such growth events. In fact, I saw Gianluca Satori ( b | t ) talk about streaming Extended Event data and processing them in near real time at his SQLSat Oslo session. I am certain that this would be a fantastic way of doing some sort of real time monitoring of growth events.

Some homework for you: What could you use this Extended Events session for? How could you modify/improve on it to help with your system monitoring and administration? Let us know in the comments.

Full disks and databases

How do you deal with a full hard disk/volume/partition?

I have recently had a number of discussions about dealing with hard drives or databases themselves filling up on SQL Servers. There seems to be a few different opinions on how to deal with this and I’d like to share those here and give my preferred solution.

N.B. (For brevity, I will refer to disks, but mean either disk, volume or partition where applicable)

The main error that people are confronted with when either a disk or a database file has filled to the brim is something like this:

Could not allocate a new page for database 'DatabaseName' because of insufficient disk space in filegroup 'PRIMARY'. Create the necessary space by dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

There are a few reasons this could happen (not an exhaustive list):

  1. The database file is full but the disk holding the database file does not have enough room to accommodate the file growth required for auto-growth to occur.
  2. The database file is full but the disk holding the database file has a quota activated that has been reached, so does not have enough room to accommodate the file growth required for auto-growth to occur.
  3. The database file is full but database does not have auto-growth setup.

The problem with the first two situations I mentioned is that they are generally not easy/quick situations to fix. If a disk is full, then extending that disk may not be possible, or only possible with a lot of administrative overhead. The same applies for disk quotas, even more so with quotas, as quotas usually have some sort of “internal politics” attached to them.

The third reason occurs surprisingly often. For some reason, certain companies just don’t want their databases to grow automatically. They want to keep everything under human control, so they turn off all sorts of automatic support options like auto-growth. I also heard a customer mention that they preferred the data files to be static in size and it was easier for them to extend these than the disk (again confirming my point about disk/partition/volume extensions being difficult).

Regardless of the reasoning for databases/database files actually filling up and the error occurring, what can be done in an attempt to help prevent this from happening, or at least allow for a speedier resolution of the issue?

Dummy/Buffer files

If you have a volume size limitation and need to ensure that you can quickly arrange for more disk space very quickly, one option is to use “dummy” or “buffer” files. These are files that are of a certain size, but have no actual content. They are created on the disk that you want to have “insurance” for and kept around until disk space is either low (a pre-defined threshold) or the disk is actually full. When either of these situations occurs, you simply delete one or more of these “dummy” files and you immediately have more disk space.

To create these files you can use fsutil a great little tool that is bundled with windows and allows sysadmins to do a number of things with disks. You call fsutil from the command line and provide it with the correct parameters and an “empty” file will be created.

Syntax of fsutil (length is in bytes)
fsutil file createnew <filename> <length>

Example to create a single 50MB file
fsutil file createnew d:\temp\dummyfile 52428800

dummy_file

Once the file(s) has been created you have a buffer of space so that you can quickly react to a disk filling up (even with some sort of scripted automation). Of course, with the consideration that the disk now has less space available for “real” data, meaning a full disk can actually occur quicker.

Using Max Size Limits for a Database

Among the settings that a database can have, you can control in what size “steps” a database should grow by if auto-growth is activated. You can also set a maximum size for the database file(s) to grow to.  This is similar to the idea of a disk quota, but on a database file level. Setting this limit can be useful to prevent a database from growing so large that it will completely fill a disk. I actually prefer the idea of setting an upper limit on the database file(s) rather than allowing unlimited growth. The reason for my preference is down to the fact that it is easier to change the maximum growth limitation than it is to extend a disk.

Setting the maximum size for a database is a simple operation, either in SSMS via the GUI:

database_autogrowth

Or via T-SQL:

ALTER DATABASE [sandbox] MODIFY FILE ( NAME = N'sandbox', MAXSIZE = 102400KB )

With this soft-limit for a database file, you are able to control file growth with greater precision. The main benefit is, that you can increase the maxsize at a moment’s notice and allow normal operations to continue if a database file should accidentally/suddenly fill up. This is not an ideal solution, but still improves control and allows normal operations to resume.

I like dummies!

I prefer the dummy file solution out of the two that I have explained above. Using a dummy file allows you to have auto-growth turned on (to cover those immediate growth requirements), while stilling offering you the chance to also avoid a disk from filling up completely through an auto-growth event.

Using the maxsize solution is ok, so long as you are able to connect to the instance to extend the maxsize. This is often possible for user databases, but what about TempDB? I have experienced systems where the system has locked up because TempDB is full and attempts to connect were fruitless (Remote DAC was not activated and no local access to the SQL instance), so it was not possible to extend TempDB.

Having a few dummy files around would allow me to quickly supply TempDB with the required disk space for an auto-growth and for me to connect to the instance again and take charge of the situation. The other alternative (which customers often chose in the past), was to reboot the server / restart the instance. The issue with this being, the culprit for the “crash” is no longer identifiable, unless you have a solution implemented to do just that (which I will blog about soon!).

Conclusion

In my experience most installations / customers do not have a solution implemented to counteract a disk/database filling up. The two solutions I have explained are good starting points to help avoid catastrophe in future. Please take these ideas and see if you can improve the stability and availability of your systems using them, or even better, provide your ideas/solutions in the remarks of this blogpost.

Thanks for reading and have fun with your diskspace!

The case of the non-starting SQL Agent Job

I ran into an interesting situation recently that involved a SQL Agent job that would not/could not run. Looking at the job settings everything seemed fine, yet the SQL Agent deemed the job “un-runnable”.

As we can see here “My Test Job” is enabled, and scheduled, but has no “Next Run” and is deemed not “Runnable”.

Job Activity Monitor

Taking a closer look at the job properties, we can further see that the job is enabled:

Job Properties - General_3

We also see that there really is a schedule for the job:

Job Properties - Schedules

And on further investigation I saw something that I have never really looked at before. The “Targets” properties:

There are no entries here, but more importantly, the radio button “Target local server” is not selected. This turned out to be the cause of the job not running!

If I try and exit the properties with “OK” (which you should never do unless you have intentionally changed a setting somewhere), then we are presented with a clear error message informing us about the missing Target:

Job Save Error

“You must specify the Target Servers on which this multi server job will execute.”

The “Targets” section of a SQL Agent Job is something that I have never delved into in any detail, but is a long standing feature to allow for administration of large server environments. You can read more into Multiserver Environment Administration on MSDN.

The reason for the setting being missing was through an incomplete scripting process for the affected job. If you script out a SQL Agent Job in SQL Server Management Studio, you will see that one of the last steps in the script is a system stored procedure “msdb.dbo.sp_add_jobserver”:

EXEC msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'

This stored procedure associates a job to a jobserver, which in my experience has always been “(local)”. Obviously, this will be different if you are using master servers and target servers in your environment.

As soon as I had set the target server to local for the job, everything was back to normal and the job ran again according to it’s schedule.

Beitrag bei “Informatik Aktuell” zu AlwaysOn

Am 16.06. ist mein Artikel zu SQL Server AlwaysOn bei Informatik Aktuell erschienen. Dort habe ich versucht zu erklären wie die Technologien von Microsoft entwickelt wurden und wie sie eingesetzt werden können. Schaut euch das mal an, und wenn es Fragen oder Anmerkungen gibt, bitte einfach eine Nachricht an mich schicken.

 


I wrote an article explaining about the high availabilty features within SQL Server are, which was published on the 16.06 on the German language website “Informatik Aktuell”. For those of you that speak German, please take a look and let me know what you think.

SQLSaturday Exeter – Third time a charm!

sqlsat372_web

SQLSaturday Exeter

SQLSaturday Exeter 2015 – I’m there!

I have the great pleasure of announcing that I will be presenting at SQLSaturday Exeter 2015 in April 2015.

SQLSaturday Exeter is an extra special event for me. Not only is it a fantastically well run event, it is also the first conference that I ever presented at, way back in 2013. I had never spoken to a larger audience before and was a little nervous of speaking. However, once my session was over, I had found a new area for me to improve on and now love presenting on SQL Server. Since then, I have spoken at a variety of conferences and user groups around Europe and (hopefully) improved my presentation skills.

This time around, I have been asked to present a new topic: “Stories from the Trenches: Upgrading SQL with Minimal Downtime“, where I plan on showing how to use the high availability features of SQL Server to help perform upgrades whilst keeping downtime to a minimum.

But wait, there’s more!

As is the tradition of the organisers of SQLSaturday Exeter, they like to mix things up a bit and also try new speakers out (hence my chance back in 2013). This has lead them to also choose me to present a pre-conference training day session!

Check out the full list of speakers/sessions in this video created by the SQLSaturday Exeter team:

 

Being chosen for a pre-conference training day is another first for me and I am excited/scared/humbled/ecstatic to have been chosen to do this – many thanks to the SQLSaturday Exeter organisers, you brave fools!! 🙂

I will be presenting “SQL Server: An Introduction” and as the title suggests, this will be an introductory day, covering a set of basic information on SQL server. It is designed for accidental/occasional DBAs that have been using SQL Server for <12 months.

You can find a full description/abstract for my training day here: SQL Server: An Introduction

The training days are full day training sessions on one topic, priced at £150 per attendee (early bird rate).

The full list of available sessions is here: SQLSaturday Exeter Training Day – if you don’t fancy my session, there are 7 other sessions to choose from. I dare you to not find a session that could be useful to you! 🙂

See you in April!

I’m off to prepare my sessions and look forward to seeing you in Exeter in April.

Speaking at SQL Server Days 2014 in Belgium

I’m Speaking at SQL Server Days 2014 in Belgium

Back in the spring I met two Belgian guys when I spoke at SQL Saturday Exeter, those guys were Pieter Vanhove ( b | t ) and Steve Verschaeve ( b | t ) and they told me about a conference that they are involved in: SQL Server Days

This annual conference is one of the larger SQL Server conferences in Europe, with over 300 attendees and has a whole host of world class speakers providing full day pre-con sessions and the standard 60 minute sessions.

Over a beer (or three) Pieter and Steve convinced me that this conference is one of the “must attend” conferences, so when their call for speakers was announced I submitted a couple of sessions.

A few weeks ago, the first (pre-con) speakers were announced. I saw the names and realised that Pieter and Steve meant business – the pre-con line-up is extremely good. Some of the biggest names in the SQL Server community are in attendance and there are subjects to account for anyone’s taste.

After seeing the pre-con speaker names I wasn’t going to hold my breath on being chosen to speak. However, last night I got the confirmation that my session “Consolidation – Resistence is Futile” has made the list!

I look forward to visiting Belgium for the first time (I’ve driven through Belgium often enough, but never stopped) and have been promised that Belgian beer is the best in the world…… we’ll see……

So if you are thinking of attending, make sure to book soon, The early bird offer (expires 15.07) will give you a discount and the site says that the first 100 registrations are eligible for a buy two get one free offer.

So don’t wait, get registered now and I’ll see you there at the end of September.