CPU Ready and vCPU over-subscription

I’ve been doing a bit of performance tuning on some of the clusters I look after at work and started looking deeper into CPU Ready times. This particular metric has always been something I’m aware of and it’s impact on performance but I had never gone looking for issues relating to it. Mostly because I’d never had a host or cluster that was that over-subscribed!

Anyway, I thought I’d do a quick post on what CPU Ready times mean, how you can measure them and how you can help reduce them…here goes.

What is CPU Ready???

The term CPU Ready is a bit confusing at first as one would assume that it refers to how much CPU is ready to be used, but this is not the case. The lower the CPU Ready time the better!

CPU Ready is the percent of time a process is ready to run but is waiting for the CPU scheduler to allow that process to run on a physical processor. I.e. “I’m ready to go but I can’t do anything yet!”.

So now that we have a better understanding of what CPU Ready is, lets look at what can cause this value to increase and hurt your VM’s performance.

What causes CPU Ready times to increase???

1. Over-commitment/Over-subscription of physical CPU

This would be the most common cause and can happen when you have committed too many vCPU’s in relation to the number of physical CPU cores in your host.

From what I have read it seems that for best performance you should keep your pCPU:vCPU ratio equal to or less than 1:3. So in other words, if your host has a total of four CPU cores, you should not allocate more than a total of 12 vCPU to the VM’s on that host. This isn’t to say you can’t have more but you may run into performance problems doing so.

2. Using CPU affinity rules

Using CPU affinity rules across multiple VM’s can cause high CPU Ready times as this can restrict how the CPU scheduler balances load. Unless specifically required I would not recommend using CPU affinity rules.

3. Using CPU limits on virtual machines

Another potential cause of CPU Ready is using CPU limits on virtual machines. Again, from what I have read I would suggest that you do not use CPU limits unless absolutely necessary. CPU limits can prevent the scheduler from allocating CPU time to a VM if it were to violate the limit set, hence causing ready times to increase.

4. When Fault Tolerance is configured on a VM

The last scenario could be where you have deployed a VM using FT and the primary and secondary VM can’t keep up with the synchronisation of changes. When this happens the CPU can be throttled causing higher ready times.

Now that we’ve covered what can cause CPU Ready times to increase, lets look at how to measure them and reduce them. For this example I’ve used the most common cause, over-provisioning.

How do I look for CPU Ready issues???

Take the example below; it is a VM that has been configured with four vCPU. Looking at the last days CPU usage you can see this particular VM is doing almost nothing (it is a test VM).

Image

When I then look at the CPU Ready times for the same period I see that the summation value is around 9200ms. Remember that both of these charts are the last day roll up.Image

Now you are probably thinking, what the hell does that mean? Well, we can convert this summation into a percentage to make things a little easier to quantify.

The formula is simply this:

CPU Ready % = (Summation value / (chart update interval in seconds x 1000)) x 100

Each of the available update intervals are listed below (refer to KB article 2002181: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2002181)

  • Realtime: 20 seconds
  • Past Day: 5 minutes (300 seconds)
  • Past Week: 30 minutes (1800 seconds)
  • Past Month: 2 hours (7200 seconds)
  • Past Year: 1 day (86400 seconds)

So returning to our previous chart of the last day we get:

(9200 / (300 x 1000)) x 100 = 3.06%

Now this isn’t a bad CPU Ready time percentage but it will do for the purposes of this example. VMware recommends that for best performance CPU Ready % should be less than 5%.

Based on the fact that my virtual machine is by no means busy and it has been given 4 vCPU I will now drop this back to 2. Yes I could drop it back even further to 1 but for the purposes of this example I’ll bring it back to 2 🙂

After powering off the VM, changing the vCPU and powering back on I get a significant drop in CPU Ready time as seen below in a Real-time chart.

Image

Running a new calculation on this value of around 54ms (a very rough guesstimate average :-P) we get this:

(54 / (20 x 1000)) x 100 = 0.27%

As you can see the average CPU Ready time has decreased quite significantly by simply lowering committed resource to the VM. Obviously this would only be practical on VM’s that are not vCPU constrained.

In my experience most people (myself included) over allocate vCPU, particularly when translating vendor hardware requirements into virtual machine requirements! Some of the worst I’ve seen are when sizing some of the Microsoft System Center products. The sizing guides often suggest dual quad-core physical servers, but this does not mean you should give your VM eight vCPU.

I think the best approach is to size lower and adjust accordingly if you are hitting a CPU resource limit. Spend some time looking over your environment and see where you might be able to tune your performance, you might be surprised at how much you can improve it!

Advertisements

About Ben Loveday
My name is Ben Loveday and I am working as a Systems Architect in New Zealand. I have a keen interest in VMware products and are VCAP5-DCD, VCAP5-DCA and Microsoft MCITP certified. I am studying towards VCDX5 certification…I hope! My areas of focus are the virtualisation of manufacturing automation systems with the aim to improve traditional automation/SCADA system design and improve their availability and reliability. I am married with three kids and my hobbies include playing the guitar (less often than I’d like) and listening to music, mostly 80′s Metal and Rock/Blues :-) Oh..and I'm a PC gamer!

4 Responses to CPU Ready and vCPU over-subscription

  1. Pingback: Ceiba3D Studio | Ready Queue Causes and Mitigation

  2. Francisco says:

    Great info about the myth of ratio 1:1 or 1:2 or 1:3 when you are making server sizing, hyperthreading is 1:2 so 1:3 is okay

    • Ben Loveday says:

      Thanks Francisco!
      Yes it’s an interesting process determining host CPU requirements. I’ve done some work recently comparing Opteron and Xeon and they both implement CPU cores differently and thus your over-commitment ratios will differ between them.

      Your own environment may give you more or less mileage also based on it’s actual workload. I’ve seen in dev environments pushing this ratio much higher and in other prod cases much lower.

      I do think that a 1:2/1:3 ratio is typically a safe bet though.

      Cheers,
      Ben

  3. Pingback: Entendendo o problema do excesso de provisionamento de vCPUs no VMWARE – Blog Pessoal

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: