Category Archives: Code

Introducing Open Query Store

Introducing Open Query Store

Many of us may have heard about the amazing new feature in SQL Server 2016 called Query Store. However, there are a lot of SQL Servers out there that are neither 2016, nor will they ever be upgraded to 2016.

What is Open Query Store?

Open Query Store is our attempt at copying the capabilities of the Query Store, but for SQL 2005 – 2014. As the name suggests, this is an open source project with the MIT license, meaning you can use it for fun or profit (or both).

After a few meetings (read: SQL Server events and beers), Enrico van der Laar ( b | t ) and myself got the idea of creating a Query Store for the rest of us!

This quickly became what is now known as Open Query Store.

The first release (Updated release 2017-07-03) was published at the end of June 2017 and provides a background collection of query execution statistics in a database of your choice. The v1.0 release supports SQL Server from 2008 to 2014 and all editions (Express, Standard, Enterprise). There is also a PowerShell installer for those that are so inclined, which will grab the code from GitHub and install OQS into a specified database.

There is also a custom report which can be installed in SSMS (2016 and 2017), which will display information that OQS has collected.

OQS_Report

What is the Future of Open Query Store?

The future of OQS is to get as close to feature parity with “real” Query Store as possible. We’re not sure yet exactly how close that will be, but we’ll do our best!

We have an overview of the current features on GitHub and will be adding features as time goes by.

If you have a specific feature that you want to add, then please provide feedback/suggestions via the Issues tab in GitHub.

Thanks for reading and we hope OQS can help you with query tuning in the future.

Presenting: Presentation Mode!

As a presenter at events I am constantly trying to improve the experience of showing information in slides and transitioning back and forth to demos.

ZoomIt: An OK solution for a bad problem?

The most jarring aspect of this is making sure that demo code is visible to the audience. The fantastic ZoomIt allows a presenter to (surprise, surprise) zoom into portions of the screen and highlight/annotate code or information to the audience:

ZoomIt

First of all, the act of zooming can be disorienting to the audience. There is a flurry of zoom and scrolling activity to get to where you want to on the screen. After this, the actual presentation of the zoomed content usually works nicely enough. However, the zoom out must occur before moving back into the PowerPoint slide deck to continue with the next portion of the presentation.

This has been the only way to give a consistent and clear overview to an audience, particularly when SSMS was being used for demos. The issue revolves around the fact that although the T-SQL code editor window can resize fonts, the remainder of the SSMS interface is set in a single font type and size.

Many of you may have noticed that Microsoft made a key change in their deployment strategy with regards to SSMS when SQL Server 2016 was released. SSMS was decoupled from the core engine components and follows a much shorter release cycle. Since SQL Server 2016 was released to market in September, there have been at least 6 versions of SSMS released. This is fantastic, we now no longer have to wait for the next release of SQL Server (whether a full version or a service pack) for SSMS to get bug-fixes or feature additions.

This is now extremely important when we look at the issue around font sizes and types. Microsoft has paid attention and with their current Release Candidate (RC) for SSMS 17 they included a very important release note entry…..

Presentation Mode!

If we read the release notes, we see that there are three new tasks available via Quick Launch inside SSMS.

  • PresentEdit – This allows the fonts for presentation mode to be set
  • PresentOn – This turns on presentation mode
  • RestoreDefaultFonts – Reverts SSMS back to the default fonts after activating presentation mode

All three tasks are pretty easy to understand, although the final task highlights that a task to specifically turn off the presentation mode is currently missing (this is an RC after all).

The “Quick Launch” field can be found in the top right corner of SSMS 17.0 RC3 and begins searching as soon as we start to type in it:

Present

By choosing “PresentEdit” an xml file is opened in a new tab in SSMS, showing us the options that we can change to make SSMS look different when presentation mode is activated.

PresentEdit

We are presented with the option to choose font family and font size for both the text editor and, more importantly, for the environment in general (menus, object explorer etc.). This is where we can play around an find the fonts that work best in our presentations.

Using the values in my screenshot and launching PresentOn made a huge difference in font readability inside SSMS. The image below shows SSMS on the left in “standard” mode and in presentation mode on the right.

PresentOnSize

The difference is quite clear, all environment fonts are much larger and easier to read on during presentation mode. This is great for demoing SSMS during a presentation!

However, the biggest improvement is when we are querying data. In previous versions of SSMS the grid results were tiny when projected onto a wall. The only way to see the results were to either return the results as text (which has the downside of running off the right-side of the screen for larger result sets), or using ZoomIt and people getting motion sickness.

Now, with presentation mode on, the results grid is included in the font resizing:

PresentOnGridResultsSize

Praise be to the spaghetti monster! No more motion sickness required and attendees can concentrate their contempt at all the bullet points in the slide deck instead.

So if you are a presenter, or want to have more control over the fonts in SSMS, your wait is almost over…… or is over now if you are brave enough to install the RC of SSMS 17 🙂

Happy font-changing

Database objects can DIE in SQL 2016 without using the KILL command

When developing database code for SQL Server one issue that has got in the way of smooth deployment has been a simple check for the existence of an object before creation.

The issue being, whenever an object should be created, we need to be sure that there isn’t an object already in the database that has the same name. If there is, we need to do something beforehand otherwise the CREATE command is going to fail like this:

Msg 2714, Level 16, State 6, Line 16
There is already an object named {objectname} in the database.

DIE Database Object, DIE!

Up until SQL Server 2016 the only way of making sure an object wasn’t already present was by adding a snippet of code directly before the CREATE statement. This could be achieved in multiple ways, but essentially all solutions just checked for an object with the name provided.

Something along the lines of this:

IF OBJECT_ID('MyTable') IS NOT NULL
DROP TABLE MyTable

While that is a very short piece of code, it is still code that needs to be typed and checked and tested. Of special note is the fact that the object name needs to be correctly typed twice. Let us also not forget, because there are multiple ways of achieving this existence check a new developer may not immediately understand what this code is doing.

You may notice in the paragraph above the code example, I wrote “Up until SQL Server 2016….”. That is because with the release of SQL Server 2016, Microsoft has made our lives a little bit easier in this respect. It is now possible to do the existence check and the drop in one command rather than two.

The long-winded and more error prone example above simply becomes:

DROP TABLE IF EXISTS MyTable

Wow! Drop Table If Exists, or DIE for short. Super-short (only one chance of mistyping the object name) and super easy for anyone to understand, right?

But wait! There’s more!

This existence check is not limited to tables. It covers many more objects, as outlined on the SQL Server Database Engine Blog.

But wait! There’s even more!

If you wanted to be really efficient and only want to write the (not really) verbose DROP TABLE IF EXISTS once for all the tables that you want to drop, you can!

This code will work flawlessly:

DROP TABLE IF EXISTS MyTable, MyOtherTable, YetAnotherTable

The elegance of DROP TABLE IF EXISTS is that if any (or all) of the three tables above exist, they will be deleted. If none of them exist, then nothing will happen.

And here is the catch

Unfortunately, the pluralised usage of DROP TABLE IF EXISTS doesn’t seem to work for all object types. I tried to do the same thing with database users:

DROP USER IF EXISTS User1, User2

This would end in a syntax error.

So there we have it. Objects can now DIE inside SQL Server, without the KILL command 🙂

 

At the time of writing I have not checked all of the supported objects of DIE, but will update this post once I have found the time to do so.

Who blew up my TempDB?

Who blew up my TempDB?

Who hasn’t experienced a server all of a sudden running out of TempDB space and seeing the following error message?

Could not allocate a new page for database 'tempdb' because of insufficient disk space in filegroup 'PRIMARY'. Create the necessary space by dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.

The difficulty about investigating a TempDB issue like this is that in general the solution chosen to fix a full TempDB is to restart the SQL Server instance. Seeing as how the instance has quite often locked up completely and the customer wants their server to work again ASAP, then a restart is almost inevitable.

A restart is an option to combat the symptom, but not the cause. It also (in)conveniently removes the evidence of the transaction that caused TempDB to fill up in the first place.

To see how to deal with TempDB (or other databases) filling up/filling up the disk it is on please take a look at this blogpost.

Returning to the question at hand: Who filled my TempDB?

One solution I looked into, and have implemented for this exact reason, uses Extended Events.

Extended Events is an event processing framework built inside SQL Server since SQL 2008. It allows you to run very lightweight background monitoring of system events, and in this case can capture information about which command/transaction caused a file growth event to occur.

For a full introduction to Extended Events, please take a look at the 31 Days of Extended Events series from Jonathan Kehayias ( b | t ).

Catch the culprit

To be able to catch the offending process/user, we will create an Extended Events session on the server and the following code does just that:

CREATE EVENT SESSION [Database_Growth_Watchdog] ON SERVER
ADD EVENT sqlserver.database_file_size_change (
 ACTION ( sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.database_name, sqlserver.session_nt_username, sqlserver.sql_text )
 WHERE ( [database_id] = ( 2 ) ) -- We filter on database_id=2 to get TempDB growth only
)
ADD TARGET package0.event_file ( SET filename = 'D:\Temp\Database_Growth_Wathdog.xel',
 max_file_size = ( 10 ) )
WITH ( MAX_MEMORY = 4096 KB,
 EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
 MAX_DISPATCH_LATENCY = 1 SECONDS,
 MAX_EVENT_SIZE = 0 KB,
 MEMORY_PARTITION_MODE = NONE,
 TRACK_CAUSALITY = OFF,
 STARTUP_STATE = ON )
GO
ALTER EVENT SESSION [Database_Growth_Watchdog] ON SERVER STATE = START

Some things to note about the code:

  • We are monitoring for data file and log file growth. The event sqlserver.database_file_size_change fires for any file size change. We do this because we want to be informed of any and all file growth just to make sure we don’t miss anything.
  • If you have multiple data files for TempDB (like you may for a multi-core environment) you will see one event fire for each file that is growing. E.g. You have 4 data files and the database grows, you will see 4 entries in the extended events output.
  • The session is set to flush events to the output file in 1 second intervals (MAX_DISPATCH_LATENCY). This is done to ensure we lose as few entries to the output file as possible. If TempDB fills up, the entire instance can often stop working completely. We want to catch as much information as possible before this happens, so we flush to the output file in very short intervals.
  • We start the session at instance startup (STARTUP_STATE). This ensures we have the event active immediately on server startup. As this is monitoring file growth events, we should remain very lightweight so not have to worry about swamping our system with extended events processing.
  • We limit the output file to 10MB and allow it to deliver to a total of 5 files. This means we have to have 50MB for the files in the output directory and won’t spam the folder with too much data.
  • When the event fires, we collect information about the query/command that caused the file growth to occur. This includes NT username, Hostname, origin database of the query, command text and application name.

The information collected by this session can be vital in pinpointing the cause for the TempDB filling up. However, there is the possibility of false positives in this setup. TempDB may have been almost completely filled by another previous transaction and the transaction causing the growth event is an innocent bystander. This is an unavoidable situation, but needs to be kept in mind when analysing the data. If you don’t catch the exact transaction this way, you are still on the right track.

Analyse the data

Once the data has been collected, we need to load and parse the output files to make sense of what has happened. The following code will parse the xml output that is in the (up to) 5 files.

DECLARE @TraceFileLocation NVARCHAR(255)= N'D:\Temp\Database_Growth_Watchdog*.xel';
WITH FileSizeChangedEvent
 AS (
 SELECT object_name Event,
 CONVERT(XML, event_data) Data
 FROM sys.fn_xe_file_target_read_file(@TraceFileLocation, NULL, NULL, NULL)
 )
 SELECT Data.value('(/event/@timestamp)[1]', 'DATETIME') EventTime,
 Data.value('(/event/data/value)[7]', 'BIGINT') GrowthInKB,
 Data.value('(/event/action/value)[2]', 'VARCHAR(MAX)') ClientUsername,
 Data.value('(/event/action/value)[4]', 'VARCHAR(MAX)') ClientHostname,
 Data.value('(/event/action/value)[5]', 'VARCHAR(MAX)') ClientAppName,
 Data.value('(/event/action/value)[3]', 'VARCHAR(MAX)') ClientAppDBName,
 Data.value('(/event/action/value)[1]', 'VARCHAR(MAX)') SQLCommandText,
 Data.value('(/event/data/value)[1]', 'BIGINT') SystemDuration,
 Data.value('(/event/data/value)[2]', 'BIGINT') SystemDatabaseId,
 Data.value('(/event/data/value)[8]', 'VARCHAR(MAX)') SystemDatabaseFileName,
 Data.value('(/event/data/text)[1]', 'VARCHAR(MAX)') SystemDatabaseFileType,
 Data.value('(/event/data/value)[5]', 'VARCHAR(MAX)') SystemIsAutomaticGrowth,
 Data
 FROM FileSizeChangedEvent;

Please take note of the variable @TraceFileLocation. The example uses a wildcard to allow loading/parsing of multiple files, this is particularly useful if you really do rollover into multiple files.

The results from the query provide a range of information regarding to who the offending process was, what command they had submitted last and a set of information about the client.

So, now we have a starting point to discover who/what is causing TempDB to grow and can discuss this with application owners.

Other ideas

It is extremely simple to to extend the session to monitor all databases and run a separate monitoring solution to inform you of such growth events. In fact, I saw Gianluca Satori ( b | t ) talk about streaming Extended Event data and processing them in near real time at his SQLSat Oslo session. I am certain that this would be a fantastic way of doing some sort of real time monitoring of growth events.

Some homework for you: What could you use this Extended Events session for? How could you modify/improve on it to help with your system monitoring and administration? Let us know in the comments.

The case of the non-starting SQL Agent Job

I ran into an interesting situation recently that involved a SQL Agent job that would not/could not run. Looking at the job settings everything seemed fine, yet the SQL Agent deemed the job “un-runnable”.

As we can see here “My Test Job” is enabled, and scheduled, but has no “Next Run” and is deemed not “Runnable”.

Job Activity Monitor

Taking a closer look at the job properties, we can further see that the job is enabled:

Job Properties - General_3

We also see that there really is a schedule for the job:

Job Properties - Schedules

And on further investigation I saw something that I have never really looked at before. The “Targets” properties:

There are no entries here, but more importantly, the radio button “Target local server” is not selected. This turned out to be the cause of the job not running!

If I try and exit the properties with “OK” (which you should never do unless you have intentionally changed a setting somewhere), then we are presented with a clear error message informing us about the missing Target:

Job Save Error

“You must specify the Target Servers on which this multi server job will execute.”

The “Targets” section of a SQL Agent Job is something that I have never delved into in any detail, but is a long standing feature to allow for administration of large server environments. You can read more into Multiserver Environment Administration on MSDN.

The reason for the setting being missing was through an incomplete scripting process for the affected job. If you script out a SQL Agent Job in SQL Server Management Studio, you will see that one of the last steps in the script is a system stored procedure “msdb.dbo.sp_add_jobserver”:

EXEC msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'

This stored procedure associates a job to a jobserver, which in my experience has always been “(local)”. Obviously, this will be different if you are using master servers and target servers in your environment.

As soon as I had set the target server to local for the job, everything was back to normal and the job ran again according to it’s schedule.

SQL Server Simple Recovery Model Requires Log Backups

“SQL Server Simple Recovery Model never requires a log backup!!!” I hear you say. Well sit back, relax and let me show you that it does…… sometimes…… although it shouldn’t!

Issue Description

I ran into a confusing situation yesterday which has a published solution, but I had never heard about it. It is one of those things that should stick in your mind considering how much havoc this may cause.

I am talking about what happens if you setup your model database so that future new databases are created in a way you want. If you stray from the SQL Server defaults, then this bug might catch you out. The issue I am talking about is the recovery model of the model database. By default SQL Server has a model database set to FULL recovery and with (in my opinion) vastly outdated initial size and auto-growth settings.

My department’s policy is to set the model database to SIMPLE and to set initial size and auto-growth to something more sensible than 1MB and 10% (for argument’s sake 128MB initial size and 128MB growth).

With SQL Server 2012 (RTM and SP1), setting the model to SIMPLE looks fine on the outside, but under the covers that is not what happens. A new database will be created using model, setting the new DB to SIMPLE recovery and you won’t immediately notice any issues. However, as soon as you start running DML you can see that something isn’t quite right by interrogating sys.databases and looking at the log_reuse_desc column for the database.

You can tag along for the ride by running the following code on a TEST!!! 2012 RTM or SP1 instance

SET NOCOUNT ON
GO

USE master
GO

IF EXISTS (SELECT * FROM sys.databases AS D WHERE name ='MyTestDB')
 DROP DATABASE MyTestDB
GO

ALTER DATABASE model SET RECOVERY SIMPLE
GO

-- Create a test database using defaults from model
CREATE DATABASE MyTestDB
GO

-- Verify that the database is in SIMPLE recovery model and log reuse is 'NOTHING'
SELECT name,
 recovery_model_desc,
 log_reuse_wait_desc
FROM sys.databases
WHERE name = 'MyTestDB';

-- Create a table in the test db and add some data (creating log file entries)
USE MyTestDB
GO
CREATE TABLE dbo.TestTable
 (Col1 int NOT NULL IDENTITY (1,1),
 Col2 char(256) NOT NULL
 DEFAULT REPLICATE('a', 256))
GO

INSERT INTO dbo.TestTable
 (Col2)
VALUES (default)
GO 2000

-- Now look at the log reuse information again - all is fine......
SELECT name,
 recovery_model_desc,
 log_reuse_wait_desc
FROM sys.databases
WHERE name = 'MyTestDB';
GO

-- Perform a backup of the database, because that is what we DBAs do regularly (here to nul so we don't fill up disk space)
BACKUP DATABASE MyTestDB TO DISK = 'nul'
GO

-- Add yet more test data
INSERT INTO dbo.TestTable
 (Col2)
VALUES (default)
GO 2000

-- Now look at the log reuse information again - oh look..... apparently we need a log backup!!!
SELECT name,
 recovery_model_desc,
 log_reuse_wait_desc
FROM sys.databases
WHERE name = 'MyTestDB';
GO

As you can see from running that code block, the log reuse description is now ‘LOG_BACKUP’ which is not possible for a SIMPLE recovery database! In fact, you can run a log backup if you like too, it will run successfully but won’t clear the log out for re-use!! So if you don’t do something about this, you will either fill your log file (if it isn’t set to auto-grow), or you will fill your disk drive where your log file resides. How fast this will happen depends solely on what activity is going on in that particular database. Your monitoring processes for drive space will warn you, but if the growth is fast enough, even these early warning systems may not be enough.

Solutions to the problem

As I mentioned at the beginning of this post, there is a solution to the problem (or rather, there are multiple solutions).

  1. Don’t set model to SIMPLE in the first place, making sure it is a step you make after creating databases that don’t require FULL recovery.
  2. Continue with model set to SIMPLE, but incorporate a process that “fixes” your databases after creation.
  3. Install at least CU4 for SQL Server 2012 SP1 or CU7 for SQL Server 2012 RTM both of which incorporate the bug-fix for this problem (along with a whole host of other fixes).

The steps required to follow option 2 would be:

  • Set the affected database to FULL recovery model
  • Perform a FULL backup of the database
  • Set the database back to SIMPLE recovery model

Here is the code to achieve that and also prove the title of this blog post:

-- Now look at the log reuse information again - oh look..... apparently we need a log backup!!!
SELECT name,
 recovery_model_desc,
 log_reuse_wait_desc
FROM sys.databases
WHERE name = 'MyTestDB';
GO

-- Perform the "required" log backup
BACKUP LOG MyTestDB TO DISK = 'nul'

-- The log has been cleared...... So we have FULL recovery in disguise
SELECT name,
 recovery_model_desc,
 log_reuse_wait_desc
FROM sys.databases
WHERE name = 'MyTestDB';
GO

-- How about we fix this permanently. Change to FULL, run a backup, change back to SIMPLE

ALTER DATABASE MyTestDB SET RECOVERY FULL
GO

BACKUP DATABASE MyTestDB TO DISK='nul'
GO

ALTER DATABASE MyTestDB SET RECOVERY SIMPLE
GO

-- Now fill our table with more data and check the log reuse column again
INSERT INTO dbo.TestTable
 (Col2)
VALUES (default)
GO 2000

SELECT name,
 recovery_model_desc,
 log_reuse_wait_desc
FROM sys.databases
WHERE name = 'MyTestDB';
GO

As for option 3: I realise that using RTM is something people just shouldn’t be doing, but the fact that this bug is in SP1 is a nasty one. Also, not everyone will install Cumulative Updates (CUs) because they are not as rigorously tested as Service Packs (SPs). I know of companies that have policies strictly forbidding installing CUs on production machines for this reason alone. This makes it all the more serious that it is a bug that managed to get through to SP1. Obviously, as soon as SP2 for SQL 2012 comes out (but when will that be?) this bug-fix will be available, but until that time, you have to live with CUs or other workarounds.

I have to say again, I am surprised I hadn’t heard about this issue before – it flew under my radar or was just not picked up by the “normal news channels”. Either way, I thought I’d throw this out there to help the google juices for anyone who has similar issues.

And so, that is how the SIMPLE recovery model sometimes does need a log backup……… sort of.

Have fun 🙂

Windows Failover Cluster Patching using PowerShell

PowerShell is the new standard scripting and automation language for Microsoft products. In this post I will be describing how I used two new Powershell commands to help semi-automate the installation of Windows Updates to a Windows 2012 Failover Cluster.

I run a number of clusters on both Windows Server 2008 and Windows Server 2008R2 and I must say that although Windows Clustering was a new thing to me about 5 years ago, my overall experience has been pretty painless during that time. From what I hear/read Server 2008 was the first outing of a much updated Windows Failover Clustering (including PowerShell integration), so I managed to body-swerve the more painful versions before it.

Be that as it may, Windows Server 2008 was always lacking in terms of scripting/automation and didn’t get much better with Windows Server 2008R2. The previous incarnation of the automation tooling was “cluster.exe” which was a command line tool that allowed for quite detailed control of a cluster. This was a tool that has “grown” over time to cover many areas of Windows Clustering, but was not designed to be as flexible or programmable as PowerShell. The PowerShell command-set for Clustering covered just about everything that “cluster.exe” covered and then some more – a good overview can be found at the Windows File Server Team Blog.

As Windows has evolved, the PowerShell offerings for each feature within Windows has also evolved. I recall hearing that the with Windows Server 2012 you can control the entire O/S from within PowerShell and that the GUI basically constructs and runs these PowerShell commands in the background (my brain may be playing tricks on me though).

With this in mind, I am preparing a server migration/replacement which will move a cluster from Windows Server 2008 to Windows Server 2012.  I took another look at the PowerShell commands available to me on Server 2012 and was rather pleased to see two improvements in the Failover Clustering command-set. The major command addition that I discovered, and had to write about here, was the “Suspend-ClusterNode” command and its brother “Resume-ClusterNode”.

Suspend-ClusterNode allows you to temporarily “pause” a cluster node. This basically removes any currently running resources/services from the specified node and the cluster will not assign any workload to that node until it has been re-instated as an active cluster participant. Resume-ClusterNode brings a previously suspended node back online within the cluster.

You may ask “Why would you want to do this?” or be thinking “This was possible with Windows Server 2008R2”; well dear reader, let’s take a look at those two points.

“Why would you want to do this?”

The short answer: Server maintenance with minimal downtime.

The slightly longer answer: Imagine you have a 3 node cluster. It is patchday and you want to cleanly patch the nodes, one after the other, with a minimum of downtime. You can afford for one node of the three to be offline for maintenance at any one time. This means that you can suspend node 3 while nodes 1 and 2 remain online. The cluster then knows that node 3 is down, but it is not down not due to a failure (so will not start panicking). You can then patch the node, reboot as necessary (who am I kidding, this is a windows server, you’ll have to reboot it!) and then the node is ready to re-join the cluster as an active/available member. You are then able repeat this process on the remaining nodes to install updates across the whole cluster.

“This was possible with Windows Server 2008R2”

The short answer: Yes it was, but required some manual intervention.

The long answer: Windows Server 2008R2 offered the ability to suspend a cluster node, but without the ability to control how the cluster dealt with any resources on the node being suspended. This is where the real magic in the new PowerShell command “Suspend-ClusterNode” comes into play.

Let’s take a quick look at the syntax of both so we can compare:

Suspend-ClusterNode

Old Syntax

Suspend-ClusterNode [[-Name] ] [[-Cluster] ]

New Syntax

Suspend-ClusterNode [[-Name] ] [[-Cluster] ] [[-TargetNode] ] [-Drain] [-ForceDrain] [-Wait] [-Confirm] [-WhatIf]

As we can see, the new syntax offers quite a few extra parameters over the old; the main ones to note are [TargetNode] and [Drain]. [TargetNode] allows us to specify where any resources should be moved to and [Drain] initiates the move of the resources/services. This allows for a much finer control over resources within a cluster during maintenance operations. With the new command it is really easy to perform the 3 node cluster maintenance outlined earlier. We suspend one node after the other, moving any resources they should have to another node of our choosing and can then resume the node after the maintenance has completed. If we now take a look at Resum-ClusterNode, we will see another level of brilliance that becomes available to us that further eases node maintenance work:

Resume-ClusterNode

Let’s compare old vs. new:

Old Syntax

Resume-ClusterNode [[-Name] ] [-Cluster ]

New Syntax

Resume-ClusterNode [[-Name] ] [-Cluster ] [[-Failback] {Immediate | NoFailback | Policy}]

Again, we can see that there is more to decide upon with the new syntax. When you resume a suspended node, you can decide what happens to the resources that were previously running on that node.

The parameter [Failback] has three options:

  • “Immediate” is pretty obvious and will immediately take control of the resources that were previously running on that node before suspension.
  • “NoFailback” will resume the node but leave all resources where they currently are – this is a good idea if you have already failed over to an updated node and don’t want another service outage in this maintenance period.
  • Finally, “Policy” would postpone any failback until a pre-defined failover timeframe is reached.

Once we see how Suspend-ClusterNode and Resume-ClusterNode have been extended, we can understand how the extensions open up better scripting control of clusters and their resources.  I have prepared a script that can be run in our maintenance window that will suspend a node and push all resources to another node.  The suspended node can then receive all Windows Updates and be rebooted if necessary and finally a script can be run to bring the node back online.  The same set of scripts are then run against the second node, reversing the flow back to the node we have just patched.  Using Suspend-ClusterNode and Resume-ClusterNode only reduces the overall code in the PowerShell scripts by a few more lines, but the complexity is reduced drastically.  This makes code maintenance easier and the code is just easier on the eye.

After seeing the improvement to my local code, I am certain that the extension to the PowerShell command-set was internally driven at Microsoft. Imagine how many clusters they have running that needed this extension to improve their automation. As far as I can see it, this is an obvious knock-on effect of their drive to “The Cloud” and a very positive one at that!

As always, think before running any scripts and please don’t blindly setup scripts to suspend and resume nodes in a cluster. This is an aid to your daily work and not a robot you should build to run any/all updates onto a cluster 🙂

Have fun!