Tag Archives: Maintenance

Windows Failover Cluster Patching using PowerShell

PowerShell is the new standard scripting and automation language for Microsoft products. In this post I will be describing how I used two new Powershell commands to help semi-automate the installation of Windows Updates to a Windows 2012 Failover Cluster.

I run a number of clusters on both Windows Server 2008 and Windows Server 2008R2 and I must say that although Windows Clustering was a new thing to me about 5 years ago, my overall experience has been pretty painless during that time. From what I hear/read Server 2008 was the first outing of a much updated Windows Failover Clustering (including PowerShell integration), so I managed to body-swerve the more painful versions before it.

Be that as it may, Windows Server 2008 was always lacking in terms of scripting/automation and didn’t get much better with Windows Server 2008R2. The previous incarnation of the automation tooling was “cluster.exe” which was a command line tool that allowed for quite detailed control of a cluster. This was a tool that has “grown” over time to cover many areas of Windows Clustering, but was not designed to be as flexible or programmable as PowerShell. The PowerShell command-set for Clustering covered just about everything that “cluster.exe” covered and then some more – a good overview can be found at the Windows File Server Team Blog.

As Windows has evolved, the PowerShell offerings for each feature within Windows has also evolved. I recall hearing that the with Windows Server 2012 you can control the entire O/S from within PowerShell and that the GUI basically constructs and runs these PowerShell commands in the background (my brain may be playing tricks on me though).

With this in mind, I am preparing a server migration/replacement which will move a cluster from Windows Server 2008 to Windows Server 2012.  I took another look at the PowerShell commands available to me on Server 2012 and was rather pleased to see two improvements in the Failover Clustering command-set. The major command addition that I discovered, and had to write about here, was the “Suspend-ClusterNode” command and its brother “Resume-ClusterNode”.

Suspend-ClusterNode allows you to temporarily “pause” a cluster node. This basically removes any currently running resources/services from the specified node and the cluster will not assign any workload to that node until it has been re-instated as an active cluster participant. Resume-ClusterNode brings a previously suspended node back online within the cluster.

You may ask “Why would you want to do this?” or be thinking “This was possible with Windows Server 2008R2”; well dear reader, let’s take a look at those two points.

“Why would you want to do this?”

The short answer: Server maintenance with minimal downtime.

The slightly longer answer: Imagine you have a 3 node cluster. It is patchday and you want to cleanly patch the nodes, one after the other, with a minimum of downtime. You can afford for one node of the three to be offline for maintenance at any one time. This means that you can suspend node 3 while nodes 1 and 2 remain online. The cluster then knows that node 3 is down, but it is not down not due to a failure (so will not start panicking). You can then patch the node, reboot as necessary (who am I kidding, this is a windows server, you’ll have to reboot it!) and then the node is ready to re-join the cluster as an active/available member. You are then able repeat this process on the remaining nodes to install updates across the whole cluster.

“This was possible with Windows Server 2008R2”

The short answer: Yes it was, but required some manual intervention.

The long answer: Windows Server 2008R2 offered the ability to suspend a cluster node, but without the ability to control how the cluster dealt with any resources on the node being suspended. This is where the real magic in the new PowerShell command “Suspend-ClusterNode” comes into play.

Let’s take a quick look at the syntax of both so we can compare:

Suspend-ClusterNode

Old Syntax

Suspend-ClusterNode [[-Name] ] [[-Cluster] ]

New Syntax

Suspend-ClusterNode [[-Name] ] [[-Cluster] ] [[-TargetNode] ] [-Drain] [-ForceDrain] [-Wait] [-Confirm] [-WhatIf]

As we can see, the new syntax offers quite a few extra parameters over the old; the main ones to note are [TargetNode] and [Drain]. [TargetNode] allows us to specify where any resources should be moved to and [Drain] initiates the move of the resources/services. This allows for a much finer control over resources within a cluster during maintenance operations. With the new command it is really easy to perform the 3 node cluster maintenance outlined earlier. We suspend one node after the other, moving any resources they should have to another node of our choosing and can then resume the node after the maintenance has completed. If we now take a look at Resum-ClusterNode, we will see another level of brilliance that becomes available to us that further eases node maintenance work:

Resume-ClusterNode

Let’s compare old vs. new:

Old Syntax

Resume-ClusterNode [[-Name] ] [-Cluster ]

New Syntax

Resume-ClusterNode [[-Name] ] [-Cluster ] [[-Failback] {Immediate | NoFailback | Policy}]

Again, we can see that there is more to decide upon with the new syntax. When you resume a suspended node, you can decide what happens to the resources that were previously running on that node.

The parameter [Failback] has three options:

  • “Immediate” is pretty obvious and will immediately take control of the resources that were previously running on that node before suspension.
  • “NoFailback” will resume the node but leave all resources where they currently are – this is a good idea if you have already failed over to an updated node and don’t want another service outage in this maintenance period.
  • Finally, “Policy” would postpone any failback until a pre-defined failover timeframe is reached.

Once we see how Suspend-ClusterNode and Resume-ClusterNode have been extended, we can understand how the extensions open up better scripting control of clusters and their resources.  I have prepared a script that can be run in our maintenance window that will suspend a node and push all resources to another node.  The suspended node can then receive all Windows Updates and be rebooted if necessary and finally a script can be run to bring the node back online.  The same set of scripts are then run against the second node, reversing the flow back to the node we have just patched.  Using Suspend-ClusterNode and Resume-ClusterNode only reduces the overall code in the PowerShell scripts by a few more lines, but the complexity is reduced drastically.  This makes code maintenance easier and the code is just easier on the eye.

After seeing the improvement to my local code, I am certain that the extension to the PowerShell command-set was internally driven at Microsoft. Imagine how many clusters they have running that needed this extension to improve their automation. As far as I can see it, this is an obvious knock-on effect of their drive to “The Cloud” and a very positive one at that!

As always, think before running any scripts and please don’t blindly setup scripts to suspend and resume nodes in a cluster. This is an aid to your daily work and not a robot you should build to run any/all updates onto a cluster 🙂

Have fun!

Automating SQL Server installations / maintenance

I am laying down the plans for a standard server installation at work.  I want to have all production systems setup the same way to reduce the installation and maintenance overhead for each server.  Ideally, I won’t have to do anything by hand until the instance is setup and SQL Server is running.

Of course, I am a firm believer of not re-inventing the wheel, so my first course of action was to see what the great encyclopaedia called the internet could offer up…. and it didn’t fail me!

Ola Hallengren’s MaintenanceSolution

The first resource that I found comes from Ola Hallengren (http://ola.hallengren.com).

He has developed a set of maintenance scripts that do Integrity Checks, Database Backups and intelligent Index Reorgs/Rebuilds.  I took a first look at these scripts quite some time ago and proceeded to set them up on one of my production systems (mainly for the index maintenance).  I must say, that the code is nicely written and commented and allows you to easily see what Ola is intending to do.  I made a few small changes to fit my environment and the jobs have run happily ever since.

SQL Server FineBuild

The second resource that I found was SQL Server FineBuild from Ed Vassie (http://sqlserverfinebuild.codeplex.com).

This is a tool to help standardise SQL Server Installations (ha-ha! just what I want). After taking a look at the very detailed documentation and taking a few hours going through the installation options I had a completed configuration file for the installation.  I then took a brand new test server and set FineBuild on its way and waited to see the results.

FineBuild warned me that the installation could take 40 minutes or more, so I got on with something else, and when I looked back to see how FineBuild was doing it had created a summary .txt file showing that it had finished installing in 20 Minutes and everything had been successful.  I now had an all singing all dancing SQL Server installation with extra tools installed, all DB-Files separate from LOG-Files, and a plethora of added-extras to boot.

I then went about taking this installation and modifying the config file to further optimise the installation for my environment.  This was not necessary, but I wanted some of the cosmetic folder naming schemes to be kept and so on.

I am still tinkering with this config and integrating Ola’s scripts into the mix and will be setting up 5 identical servers for a replication system in the near future.  This will easily save me a couple of days setup and configuration.

Conclusion

Ola’s Scripts are great for all DBAs, but especially for those new/accidental DBAs who have little experience/knowledge, but have to look after SQL Server.  They have an easy entry point, but allow experienced users to configure the scripts to fit their requirements.

FineBuild is definitely for the more advanced user/dba. The wide range of options and add-ons means that a greater base knowledge of SQL Server is required before even thinking of using this tool.  If, however, you are comfortable setting up and administering SQL Server, you will find this tool extremely helpful in reducing your admin overhead.  The fact that once the settings are finalised (normally a one of task), you can then install an unlimited amount of servers, secure in the knowledge that each one will be identical in the basic structure.

A big thank you goes out to Ed Vassie and Ola Hallengren.  Both of whom have created resources that will save lots of time for DBAs everywhere!