Did your software vendor indicate that you can virtualize their application, but only if you dedicate one or more CPU cores to it? Not clear on what happens when you assign CPUs to a virtual machine? You are far from alone.
Note: This article was originally published in February 2014. It has been fully updated to be relevant as of November 2019.
Like all other virtual machine “hardware”, virtual CPUs do not exist. The hypervisor uses the physical host’s real CPUs to create virtual constructs to present to virtual machines. The hypervisor controls access to the real CPUs just as it controls access to all other hardware.
Make sure that you understand this section before moving on. Assigning 2 vCPUs to a system does not mean that Hyper-V plucks two cores out of the physical pool and permanently marries them to your virtual machine. You cannot assign a physical core to a VM at all. So, does this mean that you just can’t meet that vendor request to dedicate a core or two? Well, not exactly. More on that toward the end.
Let’s kick this off by looking at how CPUs are used in regular Windows. Here’s a shot of my Task Manager screen:
Nothing fancy, right? Looks familiar, doesn’t it?
Now, back when computers never, or almost never, shipped as multi-CPU multi-core boxes, we all knew that computers couldn’t really multitask. They had one CPU and one core, so there was only one possible active thread of execution. But aside from the fancy graphical updates, Task Manager then looked pretty much like Task Manager now. You had a long list of running processes, each with a metric indicating what percentage of the CPUs time it was using.
Then, as in now, each line item you see represents a process (or, new in the recent Task Manager versions, a process group). A process consists of one or more threads. A thread is nothing more than a sequence of CPU instructions (keyword: sequence).
What happens is that (in Windows, this started in 95 and NT) the operating system stops a running thread, preserves its state, and then starts another thread. After a bit of time, it repeats those operations for the next thread. We call this pre-emptive, meaning that the operating system decides when to suspend the current thread and switch to another. You can set priorities that affect how a process rates, but the OS is in charge of thread scheduling.
Today, almost all computers have multiple cores, so Windows can truly multi-task.
Because of its role as a thread manager, Windows can be called a “supervisor” (very old terminology that you really never see anymore): a system that manages processes that are made up of threads. Hyper-V is a hypervisor: a system that manages supervisors that manage processes that are made up of threads.
Task Manager doesn’t work the same way for Hyper-V, but the same thing is going on. There is a list of partitions, and inside those partitions are processes and threads. The thread scheduler works pretty much the same way, something like this:
Hypervisor Thread Scheduling
Of course, a real system will always have more than nine threads running. The thread scheduler will place them all into a queue.
You probably know that you can affinitize threads in Windows so that they always run on a particular core or set of cores. You cannot do that in Hyper-V. Doing so would have questionable value anyway; dedicating a thread to a core is not the same thing as dedicating a core to a thread, which is what many people really want to try to do. You can’t prevent a core from running other threads in the Windows world or the Hyper-V world.
The simplest answer is that Hyper-V makes the decision at the hypervisor level. It doesn’t really let the guests have any input. Guest operating systems schedule the threads from the processes that they own. When they choose a thread to run, they send it to a virtual CPU. Hyper-V takes it from there.
The image that I presented above is necessarily an oversimplification, as it’s not simple first-in-first-out. NUMA plays a role, for instance. Really understanding this topic requires a fairly deep dive into some complex ideas. Few administrators require that level of depth, and exploring it here would take this article far afield.
The first thing that matters: affinity aside, you never know where any given thread will execute. A thread that was paused to yield CPU time to another thread may very well be assigned to another core when it is resumed. Did you ever wonder why an application consumes right at 50% of a dual-core system and each core looks like it’s running at 50% usage? That behavior indicates a single-threaded application. Each time the scheduler executes it, it consumes 100% of the core that it lands on. The next time it runs, it stays on the same core or goes to the other core. Whichever core the scheduler assigns it to, it consumes 100%. When Task Manager aggregates its performance for display, that’s an even 50% utilization — the app uses 100% of 50% of the system’s capacity. Since the core not running the app remains mostly idle while the other core tops out, they cumulatively amount to 50% utilization for the measured time period. With the capabilities of newer versions of Task Manager, you can now instruct it to show the separate cores individually, which makes this behavior far more apparent.
Now we can move on to a look at the number of vCPUs assigned to a system and priority.
You should first notice that you can’t assign more vCPUs to a virtual machine than you have logical processors in your host.
Invalid CPU Count
So, a virtual machine’s vCPU count means this: the maximum number of threads that the VM can run at any given moment. I can’t set the virtual machine from the screenshot to have more than two vCPUs because the host only has two logical processors. Therefore, there is nowhere for a third thread to be scheduled. But, if I had a 24-core system and left this VM at 2 vCPUs, then it would only ever send a maximum of two threads to Hyper-V for scheduling. The virtual machine’s thread scheduler (the supervisor) will keep its other threads in a queue, waiting for their turn.
Yes, the total number of vCPUs across all virtual machines can exceed the number of physical cores in the host. It’s no different than the fact that I’ve got 40+ processes “running” on my dual-core laptop right now. I can only run two threads at a time, but I will always far more than two threads scheduled. Windows has been doing this for a very long time now, and Windows is so good at it (usually) that most people never see a need to think through what’s going on. Your VMs (supervisors) will bubble up threads to run and Hyper-V (hypervisor) will schedule them the way (mostly) that Windows has been scheduling them ever since it outgrew cooperative scheduling in Windows 3.x.
This is the question that’s on everyone’s mind. I’ll tell you straight: in the generic sense, this question has no answer.
Sure, way back when, people said 1:1. Some people still say that today. And you know, you can do it. It’s wasteful, but you can do it. I could run my current desktop configuration on a quad 16 core server and I’d never get any contention. But, I probably wouldn’t see much performance difference. Why? Because almost all my threads sit idle almost all the time. If something needs 0% CPU time, what does giving it its own core do? Nothing, that’s what.
Later, the answer was upgraded to 8 vCPUs per 1 physical core. OK, sure, good.
Then it became 12.
And then the recommendations went away.
They went away because no one really has any idea. The scheduler will evenly distribute threads across the available cores. So then, the amount of physical CPUs needed doesn’t depend on how many virtual CPUs there are. It depends entirely on what the operating threads need. And, even if you’ve got a bunch of heavy threads going, that doesn’t mean their systems will die as they get pre-empted by other heavy threads. The necessary vCPU/pCPU ratio depends entirely on the CPU load profile and your tolerance for latency. Multiple heavy loads require a low ratio. A few heavy loads work well with a medium ratio. Light loads can run on a high ratio system.
I’m going to let you in on a dirty little secret about CPUs: Every single time a thread runs, no matter what it is, it drives the CPU at 100% (power-throttling changes the clock speed, not workload saturation). The CPU is a binary device; it’s either processing or it isn’t. When your performance metric tools show you that 100% or 20% or 50% or whatever number, they calculate it from a time measurement. If you see 100%, that means that the CPU was processing during the entire measured span of time. 20% means it was running a process 1/5th of the time and 4/5th of the time it was idle. This means that a single thread doesn’t consume 100% of the CPU, because Windows/Hyper-V will pre-empt it when it wants to run another thread. You can have multiple “100%” CPU threads running on the same system. Even so, a system can only act responsively when it has some idle time, meaning that most threads will simply let their time slice go by. That allows other threads to access cores more quickly. When you have multiple threads always queuing for active CPU time, the overall system becomes less responsive because threads must wait. Using additional cores will address this concern as it spreads the workload out.
The upshot: if you want to know how many physical cores you need, then you need to know the performance profile of your actual workload. If you don’t know, then start from the earlier 8:1 or 12:1 recommendations.
I don’t recommend that you tinker with CPU settings unless you have a CPU contention problem to solve. Let the thread scheduler do its job. Just like setting CPU priorities on threads in Windows can cause more problems than they solve, fiddling with hypervisor vCPU settings can make everything worse.
Let’s look at the config screen:
The first group of boxes is the reserve. The first box represents the percentage of its allowed number of vCPUs to set aside. Its actual meaning depends on the number of vCPUs assigned to the VM. The second box, the grayed-out one, shows the total percentage of host resources that Hyper-V will set aside for this VM. In this case, I have a 2 vCPU system on a dual-core host, so the two boxes will be the same. If I set 10 percent reserve, that’s 10 percent of the total physical resources. If I drop the allocation down to 1 vCPU, then 10 percent reserve becomes 5 percent physical. The second box, will be auto-calculated as you adjust the first box.
The reserve is a hard minimum… sort of. If the total of all reserve settings of all virtual machines on a given host exceeds 100%, then at least one virtual machine won’t start. But, if a VM’s reserve is 0%, then it doesn’t count toward the 100% at all (seems pretty obvious, but you never know). But, if a VM with a 20% reserve is sitting idle, then other processes are allowed to use up to 100% of the available processor power… until such time as the VM with the reserve starts up. Then, once the CPU capacity is available, the reserved VM will be able to dominate up to 20% of the total computing power. Because time slices are so short, it’s effectively like it always has 20% available, but it does have to wait like everyone else.
So, that vendor that wants a dedicated CPU? If you really want to honor their wishes, this is how you do it. You enter whatever number in the top box that makes the second box show the equivalent processor power of however many pCPUs/cores the vendor thinks they need. If they want one whole CPU and you have a quad-core host, then make the second box show 25%. Do you really have to? Well, I don’t know. Their software probably doesn’t need that kind of power, but if they can kick you off support for not listening to them, well… don’t get me in the middle of that. The real reason virtualization densities never hit what the hypervisor manufacturers say they can do is because of software vendors’ arbitrary rules, but that’s a rant for another day.
The next two boxes are the limit. Now that you understand the reserve, you can understand the limit. It’s a resource cap. It keeps a greedy VM’s hands out of the cookie jar. The two boxes work together in the same way as the reserve boxes.
The final box is the priority weight. As indicated, this is relative. Every VM set to 100 (the default) has the same pull with the scheduler, but they’re all beneath all the VMs that have 200 and above all the VMs that have 50, so on and so forth. If you’re going to tinker, weight is safer than fiddling with reserves because you can’t ever prevent a VM from starting by changing relative weights. What the weight means is that when a bunch of VMs present threads to the hypervisor thread scheduler at once, the higher-weighted VMs go first.
Hyper-Threading allows a single core to operate two threads at once — sort of. The core can only actively run one of the threads at a time, but if that thread stalls while waiting for an external resource, then the core operates the other thread. You can read a more detailed explanation below in the comments section, from contributor Jordan. AMD has recently added a similar technology.
To kill one major misconception: Hyper-Threading does not double the core’s performance ability. Synthetic benchmarks show a high-water mark of a 25% improvement. More realistic measurements show closer to a 10% boost. An 8-core Hyper-Threaded system does not perform as well as a 16-core non-Hyper-Threaded system. It might perform almost as well as a 9-core system.
With the so-called “classic” scheduler, Hyper-V places threads on the next available core as described above. With the core scheduler, introduced in Hyper-V 2016, Hyper-V now prevents threads owned by different virtual machines from running side-by-side on the same core. It will, however, continue to pre-empt one virtual machine’s threads in favor of another’s. We have an article that deals with the core scheduler.
I know this is a lot of information. Most people come here wanting to know how many vCPUs to assign to a VM or how many total vCPUs to run on a single system.
Personally, I assign 2 vCPUs to every VM to start. That gives it at least two places to run threads, which gives it responsiveness. On a dual-processor system, it also ensures that the VM automatically has a presence on both NUMA nodes. I do not assign more vCPU to a VM until I know that it needs it (or an application vendor demands it).
As for the ratio of vCPU to pCPU, that works mostly the same way. There is no formula or magic knowledge that you can simply apply. If you plan to virtualize existing workloads, then measure their current CPU utilization and tally it up; that will tell you what you need to know. Microsoft’s Assessment and Planning Toolkit might help you. Otherwise, you simply add resources and monitor usage. If your hardware cannot handle your workload, then you need to scale out.
PREV: Using OAuth 2.0 for Server to Server Applications | Google Identity
NEXT: Authenticate with a backend server | Google Sign-In for Websites