CPU Ready and vCPU over-subscription

I’ve been doing a bit of performance tuning on some of the clusters I look after at work and started looking deeper into CPU Ready times. This particular metric has always been something I’m aware of and it’s impact on performance but I had never gone looking for issues relating to it. Mostly because I’d never had a host or cluster that was that over-subscribed!

Anyway, I thought I’d do a quick post on what CPU Ready times mean, how you can measure them and how you can help reduce them…here goes.

What is CPU Ready???

The term CPU Ready is a bit confusing at first as one would assume that it refers to how much CPU is ready to be used, but this is not the case. The lower the CPU Ready time the better!

CPU Ready is the percent of time a process is ready to run but is waiting for the CPU scheduler to allow that process to run on a physical processor. I.e. “I’m ready to go but I can’t do anything yet!”.

So now that we have a better understanding of what CPU Ready is, lets look at what can cause this value to increase and hurt your VM’s performance.

What causes CPU Ready times to increase???

1. Over-commitment/Over-subscription of physical CPU

This would be the most common cause and can happen when you have committed too many vCPU’s in relation to the number of physical CPU cores in your host.

From what I have read it seems that for best performance you should keep your pCPU:vCPU ratio equal to or less than 1:3. So in other words, if your host has a total of four CPU cores, you should not allocate more than a total of 12 vCPU to the VM’s on that host. This isn’t to say you can’t have more but you may run into performance problems doing so.

2. Using CPU affinity rules

Using CPU affinity rules across multiple VM’s can cause high CPU Ready times as this can restrict how the CPU scheduler balances load. Unless specifically required I would not recommend using CPU affinity rules.

3. Using CPU limits on virtual machines

Another potential cause of CPU Ready is using CPU limits on virtual machines. Again, from what I have read I would suggest that you do not use CPU limits unless absolutely necessary. CPU limits can prevent the scheduler from allocating CPU time to a VM if it were to violate the limit set, hence causing ready times to increase.

4. When Fault Tolerance is configured on a VM

The last scenario could be where you have deployed a VM using FT and the primary and secondary VM can’t keep up with the synchronisation of changes. When this happens the CPU can be throttled causing higher ready times.

Now that we’ve covered what can cause CPU Ready times to increase, lets look at how to measure them and reduce them. For this example I’ve used the most common cause, over-provisioning.

How do I look for CPU Ready issues???

Take the example below; it is a VM that has been configured with four vCPU. Looking at the last days CPU usage you can see this particular VM is doing almost nothing (it is a test VM).

Image

When I then look at the CPU Ready times for the same period I see that the summation value is around 9200ms. Remember that both of these charts are the last day roll up.Image

Now you are probably thinking, what the hell does that mean? Well, we can convert this summation into a percentage to make things a little easier to quantify.

The formula is simply this:

CPU Ready % = (Summation value / (chart update interval in seconds x 1000)) x 100

Each of the available update intervals are listed below (refer to KB article 2002181: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2002181)

  • Realtime: 20 seconds
  • Past Day: 5 minutes (300 seconds)
  • Past Week: 30 minutes (1800 seconds)
  • Past Month: 2 hours (7200 seconds)
  • Past Year: 1 day (86400 seconds)

So returning to our previous chart of the last day we get:

(9200 / (300 x 1000)) x 100 = 3.06%

Now this isn’t a bad CPU Ready time percentage but it will do for the purposes of this example. VMware recommends that for best performance CPU Ready % should be less than 5%.

Based on the fact that my virtual machine is by no means busy and it has been given 4 vCPU I will now drop this back to 2. Yes I could drop it back even further to 1 but for the purposes of this example I’ll bring it back to 2 šŸ™‚

After powering off the VM, changing the vCPU and powering back on I get a significant drop in CPU Ready time as seen below in a Real-time chart.

Image

Running a new calculation on this value of around 54ms (a very rough guesstimate average :-P) we get this:

(54 / (20 x 1000)) x 100 = 0.27%

As you can see the average CPU Ready time has decreased quite significantly by simply lowering committed resource to the VM. Obviously this would only be practical on VM’s that are not vCPU constrained.

In my experience most people (myself included) over allocate vCPU, particularly when translating vendor hardware requirements into virtual machine requirements! Some of the worst I’ve seen are when sizing some of the Microsoft System Center products. The sizing guides often suggest dual quad-core physical servers, but this does not mean you should give your VM eight vCPU.

I think the best approach is to size lower and adjust accordingly if you are hitting a CPU resource limit. Spend some time looking over your environment and see where you might be able to tune your performance, you might be surprised at how much you can improve it!

vSphere Home Lab: Part 1

After getting the VCAP5-DCD exam out of the way I started to work out what hardware I’d buy for creating a new home lab for my DCA study. Up till now I have used my main gaming rig as a home lab, running an 8-core AMD FX cpu and 32GB of RAM. While this has served me well it isn’t ideal and doesn’t have the flexibility I’d like.

I started trawling through numerous blogs about other home labs and liked the idea of using the Supermicro uATX motherboards that support the E3 Xeons and IPMI. However, after a lot of looking mostly on Amazon (here in NZ the only place I could find boards from was going to cost me almost $400 NZD per board…) I gave up. It was going to be too risky ordering pc gear from overseas and not have the confidence I’d get the right memory modules, etc. Don’t get me wrong, I’d love to have some, in particular the MBD-X9SCM-iiF as it has the two onboard 82574L LAN ports as well as the dedicated IPMI port. But for what I needed I could not justify almost doubling my budget, particularly as the E3 Xeons, such as the E3-1230 would set me back almost $400 NZD a piece too.
Instead I opted for more AMD based gear šŸ™‚
Here is the spec I came up with:

3 x AMD FX 6100 Six-core CPU 3.3ghz – $180 NZD each

3 x Gigabyte GA-78LMTUSB3 – Nice uATX form factor, supports FX cpus, can take up to 32GB DDR3 with support for ECC un-buffered RAM – $115 NZD each

3 x Coolermaster 343 uATX cases (these are pretty cheap and are reasonably small) – $97 NZD each

6 x OCZ Vertex2 120GB SSDs – I got these because they were on special for $114 NZD each šŸ™‚

6 x 8GB DDR3 1333mhz non-ECC – These were about $65 NZD each. Couldn’t afford to go with ECC and didn’t feel I really needed it…when money permits I’ll be upgrading each host to 32gb RAM

3 x HP NC364T 4 port GbE NIC’s – I’m using some spare ones from work

2 x HP ProCurve 2910al-48G switches – Another loaner from work šŸ˜› We had these surplus and aren’t planning on deploying them anywhere

3 x HP P4000 VSA licenses – Yet another thing I was lucky to get from work, we had three licenses we purchased a while back and ended up putting physical P4300 SAN’s in so I figured these would be perfect in a home lab!

Here’s a few pics of the gear so far. Excuse the poor quality photos…my HTC Sensation’s camera is not working that well running a beta JB ROM šŸ™‚

HP Procurve 2910al-48G

HP switches – sweet!!!!

My three vSphere hosts

Cool, VCAP-DCA here I come!

The guts

Cheap and cheerful, no frills at it’s best! Notice I haven’t installed the additional NIC card or the SSDs…where’s my adapters!!!!!

All up I’ve spent close to $2500 NZD which isn’t too bad, but certainly not a cheap exercise…oh well, it’s going to be a great tool for learning so it’s worth it for that!

Bear in mind that most of these parts won’t be on the VMware HCL but this isn’t a production environment, and as such they don’t need to be.

So, I’ve got all the gear mostly built other than waiting on some 2.5″ to 3.5″ SSD drive adapters (the cases don’t have 2.5″ bays šŸ˜¦ ) and I screwed up with one of the cases. I accidentally purchased the wrong model (I initially purchased only one case as a test) and didn’t realise that the power supply included didn’t have a 4+4 12v molex plug for the cpu power…argh! I’ve got an adapter cable coming that will fix the problem though. I also have three 4gb USB sticks on order too for the hypervisor to boot from. This will mean I can allocate as much of the SSD storage as possible to the VSA’s.

At this stage I think I’ll configure the VSA cluster volumes using NRAID5 (for those of you who haven’t used the HP Lefthand gear it supports various network RAID levels when using multiple nodes) as this will give me close to 400GB of SSD storage. I’ll enable thin provisioning on both the datastores and in the VSAs so I should get a reasonable number of VM’s on it.

If you are wondering “but what about TRIM support?” I have thought about this. It seems that vSphere does not support the TRIM command but to be honest I don’t really care. I figure it will probably take me a while to kill them and they do have a three year warranty :-). At one stage I was going to build a FreeNAS server or similar with all the SSDs (which does support TRIM) but I thought I’d get more out of running the VSAs. Since I use P4300 SANs at work this would give me more opportunity to play around with the software and different configurations.

As for the network configuration, I haven’t quite decided my layout yet. I am probably going to trunk two nics for Management, vMotion, FT and VM traffic, possibly leaving two nics for iSCSI. I probably won’t get the same benefit out of using two nics per host for iSCSI as I would with a physical SAN as the VSA only supports one virtual network adapter (i think…it’s been a long time since I looked at it) but I will still be able to simulate uplink failure, etc.

Anyway, I better get back to trying to configure these switches…went to plug my rollover cable into them and realised my pc doesn’t have a serial port…doh!
Stay tuned for part 2, building the hosts and setting up the VSA šŸ˜‰

Using VMware Update Manager to upgrade ESX/ESXi 4 to ESXi 5

Today I spent some time configuring and testing upgrading some ESX and ESXi 4 hosts using VUM. We’ve got a project coming up that will involve upgrading about ten remote hosts. Being connected via relatively low WAN links I was unsure how well this would work, hence my testing in the office :-). Luckily our office has the same speed link as most of our remote sites so it provided a good test scenario.

We had a few dev/test hosts not doing much so I chose one ESX and one ESXi install, both on HP DL385 G6 hardware. Both had different patch revisions and had old HP offline bundle extensions installed, further making them a good cross section of variables.

The first thing that was required was to upload the ESXi 5 image into VUM via the Admin View. The ESXi Images tab contains a link that you click to import an ESXi Image. This allows you to select an ESXi ISO image and import it into the VUM patch database. Once this has been uploaded you can now create an upgrade baseline using this image.

Remember that this type of baseline must be set as a Host Upgrade before you can select the ESXi image.

Now that you have a baseline you can apply this to a baseline group or host directly. From the Update Manager tab within the host view you can attach this baseline or baseline group and scan the host to check for compliance against this baseline.

All going well you should have a Non-compliant baseline meaning that the host upgrade is compatible but not currently applied to the host.

Clicking remediate will initialise the remediation wizard as shown below.

Working through the wizard you need to accept the EULA before you come to the next important step. Here you can select whether any incompatible third-party extensions are removed before remediation. Select this if you have extensions such as the HP offline bundles that I have on my HP hosts. Bear in mind that the upgrade procedure is not reversible!

Any host extensions that you require after the upgrade can either be integrated into the ESXi image using the custom image builder or applied as a separate remediation task using VUM. I chose the latter because I didn’t have time to create a custom image šŸ˜›

Continuing through the remaining options you can finally chose to remediate the host. For me this process took about 20 minutes over a 10mb WAN link.

When the host remediation has completed you should be presented with an upgraded 5.0 host! Yay! One thing to note is that the host will require re-licensing which is simply done via the Licenses option within the vSphere client.

A few things I did encounter were not major but things to keep in mind. My ESX host upgrade at first appeared to fail but was actually the result of the temporary host license having expired. I was able to apply a new license and reconnect the host. The next thing I noticed was a host alarm saying that system logging was not enabled on my host.

After a bit of reading I found that under the Advanced host settings the syslog default datastore location for logs (Syslog.global.logDir) was blank! Setting this to []/scratch/log fixed the issue. If a different datastore location for your logs is desired this can be changed, for example: [mydatastore]/logs.

After all this I had two fully functional ESXi 5 hosts that were both previously ESX and ESXi 4!

One last thing to remember is to upgrade any VMFS volumes to v5. This can be done online and takes a matter of seconds from the datastore view or host storage view. Take note that any existing block size will be retained, whereas a new VMFS datastore will be created with a 1MB block size always.

The next step as I mentioned earlier is to apply any host extensions. In my case I applied my HP offline bundles (make sure you select the baseline as a host extension) and now I can see all my hardware on the Hardware Status tab šŸ™‚

You can normally tell when the HP bundles aren’t applied as you only see a basic list of hardware and does not show components such as the Storage and iLo devices.

Anyway, hope this helps! Thanks for reading.

VMware VCAP v5

It’s been a long time since my first post and I thought I’d start to share my thoughts on my studies for the VMware VCAP exams. I passed my VCP5 back in December last year which was really good. My study gave me an insight into the new features vSphere 5 has to offer.

Now I have started to focus on the next stage of my development in my ultimate goal of achieving VCDX šŸ™‚

I’ve thought a lot about whether I should start studying for the DCA or DCD exam over the last few months and have struggled to make my mind up! After much thought and some discussion with a friend of mine I decided to go with the DCD as my first milestone. Since I am working towards the v5 exams it does make things a little harder since there isn’t a lot of information about them yet. I have heard a few things about the recent beta of the DCD but I figure I’ll keep going regardless.

I have built a home lab mostly consisting of a desktop pc running an AMD FX 8 core cpu (stop laughing! I like AMD!) and just today received my 32GB of DDR3, yay! With an OCZ ssd and some larger spindle drives I have a pretty good setup whereby I can virtualise MOST of what I need. I figure over time I might need to up the RAM to 64GB, etc but its a good start.

I’ve built some template 2k8R2 VM’s using Sammy Bogaert’s great guide to building a lab using workstation (http://boerlowie.wordpress.com/2011/11/30/building-the-ultimate-vsphere-lab-part-1-the-story/). I’ll eventually configure a vCenter server along with Auto Deploy and Update Manager, then move onto deploying two or three virtual ESXi guests. I’m also keen to have a real play with View as I also have some potential applications for this within my work environment so I’ll be exploring these.

So, stay tuned and hopefully I’ll keep this ball rolling and post some more soon around my lab and any interesting things I find šŸ™‚