SQL Server Best Practices, Part II: Virtualized Environments

Rsdaa 29/11/2021 4051

This article is part of the series "SQL Server Best Practices". Check out the rest:

It is 2016 and some people still think SQL Server cannot be run on a virtual machine. SQL Server can successfully run in a VM but SQL is resource-intensive by nature and so if you are going to virtualize SQL then you simply must adhere to best practices. Not following best practices can be the difference between poor vs exceptional virtual SQL Server performance. Please see my previous blog post on general SQL server best practices as these apply in a virtualized environment also.

Power Management

The physical VM host should be set to high performance in the BIOS to ensure that it is firing on all cylinders which in turn will allow the hypervisor to allocate the abstracted resources as it sees fit.

Hate computers professionally? Try Cards Against IT.

Power management should always be set to high performance within Windows VMs. Balanced is a setting for laptops which need to reserve power. VMs can have serious performance issues if not configured correctly. In some environments VM power management settings can be controlled by the hypervisor but when resource intensive apps such as SQL server are in play, make sure that Windows power management is set to high performance.

Always Use SLAT Compatible Server Hardware

Although it might not be the case with older hardware, most modern servers have x64 processors which support SLAT (Second Level Address Translation).

VMware and Hyper-V hosts should run 64-bit x64 processors (AMD or Intel). It is absolutely vital that the host processor supports SLAT. SLAT goes by several aliases.

Intel calls it Extended Page TablesAMD calls it Nested Page Tables or Rapid Virtualization Indexing

SLAT enables the CPU to maintain the mapping between the virtual memory used by the VMs and the physical memory on the hypervisor host. If the CPU cannot perform this memory mapping then it would fall to the hypervisor to do so. Performance and scalability are both improved by having the CPU perform the memory mapping.

Microsoft studies have proved that SLAT:

Considerably reduces the host processing overhead to about 2 percentReduces the host memory requirements by about 1MB per running VM

Don’t think too much about it – just make sure that the underlying VM host’s hardware supports SLAT.

Do Not Overcommit the VM Host CPU

I cannot stress this point enough. If you overcommit the VM host, and have resource intensive applications such as SQL server running on the VMs on that VM host, then you will encounter performance issues sooner or later. It is not a problem if you have a bunch of low resource usage web / app servers sharing resources as the hypervisor can easily keep up with which VM needs which resources but when you bring resource intensive apps into the mix it is a recipe for disaster.

If your virtualized SQL Server workload is highly intensive, then make sure you are running the latest version of Hyper-V or vSphere as each iteration comes with new maximums for scalability.

Best practice for the initial sizing of a VM, especially one that will host a resource intensive application such as SQL server, is to make sure that the total number of virtual CPUs assigned to the VM does not exceed the number of physical CPU sockets (as opposed to the logical cores) available on host machine.

CPU Ready

This is not something you want to encounter as it is indicative of an overprovisioned VM and/or host. CPU ready is the amount of time a VM is ready (needs) CPU clock cycles on the physical host but has to wait to get time because other VMs are already using the resources.

Calculating ready time can be a pain because it depends on the polling interval for the metric presented on the VM host e.g. 20 seconds (20,000 milliseconds):

(CPU ready value / 20,000 ms) x 100% = Percentage performance impact per 20 second interval.

If you extrapolate over time you can quickly see how this would cause performance degradation, especially if running high performance applications such as SQL server.

Ready values <5% per vCPU are generally OK. Ready values >5% per vCPU are a warning and you are likely already experiencing performance degradation.

Without any misconfiguration it is not at all difficult to find CPU ready values of >=10% due to some large VMs with several vCPUs running on a few physical cores or a similar disproportion of vCPU to pCPU.

If the VM itself is overprovisioned e.g. a VM with 8 vCPUs must wait for all 8 pCPUs on the underlying VM host to be free before getting any clock cycles. This is where right sizing comes into pay. If the VM truly needs a large number of vCPUs then by all means add them. If you are sizing for a new application then only add vCPUs as you monitor performance. Windows task manager is not a great indicator of performance in a virtualized environment and so monitor from the VM host side. If all vCPUs are maxed then it likely needs more vCPUs. If not then leave well enough alone. I’ve seen situations where removing vCPUs from a VM actually improved the performance of the applications with databases hosted on that virtual SQL server.

If the VM host is overprovisioned then there are several VMs running on that host which are all competing for resources. If this is the case you should migrate some VMs to other hosts to alleviate the resource contention issues.

The equivalent to CPU ready on Hyper-V is the Perfmon counter Hyper-V Hypervisor Virtual Processor\CPU Wait Time Per Dispatch which is available since Windows Server 2012.

Hyper-threading

Hyper-threading is an Intel technology that exposes two hardware contexts (threads) from a single physical core. These threads are referred to as logical CPUs. It is a common misconception that hyper-threading doubles the number of CPUs or cores. This is simply not the case. Hyper-threading improves the overall host throughput from 10-30% by keeping the processor pipeline busier and allowing the hypervisor more opportunities to schedule CPU clock cycles and so you should definitely take advantage of hyper-threading by enabling it in the BIOS of the VM host machine.

SQL Server Best Practices, Part II: Virtualized Environments

Cores per Socket

NUMA (Non-Uniform Memory Access) allocates each CPU its own local memory. The CPU and memory combined are known as a NUMA node. The advantages of NUMA is that it enables a processor to access its own local memory faster than it would non-local memory. Both Windows and SQL are fully NUMA aware and make scheduling decisions for threads based on the NUMA topology.

vNUMA presents the physical VM host’s NUMA architecture directly to the VM guest OS. The vNUMA topology of a VM can span across multiple physical NUMA nodes. After a vNUMA-enabled VM is powered on, the architecture presented to the OS cannot be altered. This is actually a positive thing because altering the vNUMA architecture can cause instabilities in the OS. This restriction can however cause problems if an attempt is made to migrate the VM to another VM host which has a different NUMA architecture.

vNUMA is enabled by default for VMs which have more than 8 vCPUs (regardless of the combination of sockets and cores which makes up the number of vCPUs in play).

Best practice:

The number of virtual sockets should equal the number of vCPUs you want (single core per socket).

This is the default setting when creating a VM. This config is known as wide and flat, and vNUMA will present the optimal vNUMA topology to the guest operating system, based on the underlying physical VM host server’s NUMA topology. If a VM’s config is not wide and flat then vNUMA will not be able to automatically pick the best NUMA configuration, and will instead simply match whatever config you entered, which can lead to a NUMA topology mismatch which detrimentally affects performance.

Licensing constraints are the most common reason why admins chose to go against these best practices. If you must use do so then make sure that you mirror the physical VM host’s NUMA topology at least.

CPU Hot-Add

This setting can be a bit of a catch 22 – there are pros and cons to enabling and disabling.

Pros

CPU hot plug allows VM admins to add CPUs on the fly to VMs without needing to shut down the VM first. CPU hot plug allows for dynamic resource management and the ability to add CPUs when vNUMA is not required (usually smaller VMs).

Cons

When CPU hot add is enabled on a VM it automatically disables vNUMA. SQL servers which are wider than the NUMA architecture of the physical server they reside on cannot see the underlying NUMA architecture, which results in performance degradation.

Whether or not to enable CPU hot-add comes down to a question of how wide your VM will be. My recommendation is to disable CPU hot-add for larger VMs which require vNUMA. Prevention is always better than cure and so take the time to right size the SQL server VM’s CPU rather than relying on CPU hot-add as a fall back.

CPU Affinity

I do not recommend using CPU affinity on production machines because it limits the hypervisor’s ability to efficiently schedule vCPUs on the physical server.

Do not overcommit the VM host memory

Again cannot stress this one enough. When initially sizing a SQL VM make sure that the host is not and will not be overcommitted when the SQL VM is powered on. Don’t forget that the VM host machine has its own memory overhead to run the hypervisor operating system too!

Memory Reservation

SQL server is a memory hog and so whatever memory you throw its way will be used and not released. It might make sense to therefore set the memory reservation for the SQL VM to equal the provisioned memory minus 4-6GB for Windows to function. This will significantly reduce the likelihood of ballooning and swapping, and will guarantee that the virtual machine gets the memory it needs for optimum performance. Memory reservations can however prevent the migration of VMs between hosts if the target host does not have unreserved memory equal to or greater than the size of the memory reservation.

To calculate the amount of memory to provision for a virtual machine:

VM Memory = SQL Max Server Memory + ThreadStack + OS Mem + VM OverheadThreadStack = SQL Max Worker Threads * ThreadStackSizeThreadStackSize = 1MB on x86, 2MB on x64, and 4MB on IA64

Dedicated vs Dynamic memory

Yes, this goes against the fundamentals of virtualization and so you may lose this fight with your Virtualization admin but it is worth arguing for. Dedicated resources mean that you can take a fight for resources on the VM host out of the equation.

Memory is one of the biggest factors when it comes to SQL Server performance. SQL uses memory for its internal buffer (recently used data) and procedure caches (recently executed T-SQL commands). These buffers mean that SQL Server can get the data and commands it requires from the caches instead of having to go to disk and incur the associated I/O overhead. SQL Server can automatically manage and grow its buffer and procedure caches based on the requirements of the workload and the memory that’s available. If there is no available memory then performance will be impacted.

If your Virtualization admin has ruled dedicated memory out of the question then ask about Hyper-V Dynamic Memory or VMware memory overcommit configurations.

VMware treats memory as a sharable resource, and each megabyte of memory is considered an individual share. Memory overcommit is an automated dynamic process which takes shares from VMs which are not using them and allocates those shares to other VMs as required.

Memory is reclaimed from VMs that have less proportional shares to give to the VMs with more proportional shares and so make sure the SQL VM has a high enough weighting of shares.

Hyper-V Dynamic memory also dynamically distributes unused memory. In both technologies VMs retain 25% of unused memory as a cushion in case it suddenly requires more memory.

It is worth noting that Datacenter or Enterprise editions of SQL 2008 or later are required to support hot-add RAM. Microsoft server operating systems have been hot-add compatible since W2K3r2sp2.

Do Not Store Files on the Same Disk

OS files, SQL data files, SQL log files, SQL backups, etc… will all end up on the same VHD if you build a VM with the default settings and install SQL with the default settings.

SQL Server binaries, data files should be placed on separate VMDKs.

Use RAID 10 for user data, log files, and TempDB files for best performance and availability.

Check out my previous post on SQL server best practices in relation to tempdb sizing.