How I identified over-sizing in my VMs with vPV

There are blogs, white-papers and KBs from VMware and VMware enthusiasts all across the world-wide web providing steps to resolve high CPU ready utilization in VMware guests. The classification of problems generally tends to be around the following topics:

 

table-analysis-BLUF.png

 

I have searched across the web looking for some solutions to the above problems but I haven’t found the answers I need. Especially when everything appears to be normal, yet ready-utilization is really high.

 

So here is what I  have found up until now...

 

Before I proceed here’s the quick definition of ready utilization – this VMware counter presents percentage of time in the last interval that the VM was in a ‘ready-to-run’ state but did not actually run since it did not get CPU time from the host side – Ready time represents a kind of wait time for the VM but must not be confused with wait time as caused by IO waits within the guest OS. This is primarily a scheduling problem and it is important to keep this ready util to not more than 5 percent. From 0-5 percent is a warning and above 5 percent is something to look into immediately. 

 

CPU Ready Util (%RDY for esxtop fans) can be high due to following reasons –

 

  • Resource pool limits which work against guest OS need for processor time – Like the VM admin puts your VMs into a resource pool that has a really low limit on CPU utilization.
  • VMs allocated vCPU configuration which does not match very well with the core count on the host processors. Running 3 or more 8-vCPU VMs on a host with 6-core physical CPUs.
  • VMs of varying configuration (1-core, 2-core, 4-core, 8-core, higher – combinations thereof) all running on 1 host – if CPU over-commitment is also in place. If you have a server with 2 4-core processors and you have a mix of 1-vCPU, 2-vCPU and 4-vCPU VMs allocated beyond the 8 available cores there’s a good chance that you see high ready-util VMs in place. With just over-commitment, but not a mix of different VMs, the ready util due to core contention will be present but expected to be within reasonable limits.
  • Excess allocation of vCPU to VMs beyond the guest OS/application needs

Here’s a case study. I am running a 2-host VMware cluster. I find a few VMs in my cluster running constantly at high CPU-ready utilization (>5 percent). These show up in HP’s free tool Virtualization Performance Viewer (vPV) as seen below – for more details about this free tool go to http://www.hp.com/go/vpv.

 

NOTE: See the VMs I have marked out below in the graphic (click to enlarge) and note their ready utilization (%RDY).

 

tree-map-outlines.png

 

Spoiler

TIP: Do you notice how the VMs with ready-util appear in bright yellow above? The standard vPV settings mark all VMs with less than 5percent ready util in green. This can be changed in such a way that anything above 0 percent ready util figures in the green/yellow/red colour spectrum (with VMs nearing 10percent ready util show as deep red). The setting is in the /opt/OV/newconfig/OVPM/VCENTER_GC_Integration.xml. Just comment out line 95 in this file as shown below (no need to restart vPV, just enough if you reload the page):

 

 

    92             <METRIC Name="rdyCPU" ColorCaption="readyCPUPct" SizeCaption="availvCPUs">
    93                 <COLOR_CLASS>VCENTER_GUEST</COLOR_CLASS>
    94                 <COLOR_METRIC>CPU_ready_percent</COLOR_METRIC>
    95                 <!-- <COLOR_METRIC_MIN_VAL>5</COLOR_METRIC_MIN_VAL> -->
    96                 <COLOR_METRIC_MAX_VAL>10</COLOR_METRIC_MAX_VAL>
    97                 <SIZE_CLASS>VCENTER_GUEST_CONFIG</SIZE_CLASS>
    98                 <SIZE_METRIC>numCpu</SIZE_METRIC>
    99             </METRIC>

 

 

 

Ok back to the problem - of the boxes highlighted above, note that 2 VMs have 8-CPU each allocated, and the 1 VM with highest ready-CPU has 2 vCPU allocated to it. Also, note that these VMs do not really have high CPU utilization as can be seen from this tree-map visual below showing the same cluster (in expanded form) – mostly green – the lightest of the green boxes (indicating most busy VM) is showing 15.74 percent CPU utilization.

 

low-cpu-util-overview.png 

The first step is to check the configuration of the host running this VM. The vPV report gives us an indication of the level of over-commitment.

 

 

Spoiler

Here’s another vPV tip – to find out the host a VM is running on, open up the VM->Status report – the host name is mentioned in the location details.

 

 

host-overcommits-report.png

 

 

The over-allocation of CPU at host-level is only 161 percent of available CPUs – while this is definitely over-committing the available CPUs, this level is really not very high considering that with multi-CPU VMs running it is considered okay to go up to and beyond 200 percent.

 

Also note that the CPU utilization for this host is found to be really low/moderate (<20 percent), looking into the vPV workbench. An interesting insight is the high IOPS (esp writes) on the host.

 

workbench-view-host.png

  

As a next step I used the vPV host-configuration report, which along with other information, also shows the allocation of vCPU to the guests. Sure enough, I found a lot of VMs with varied vCPU allocations in this setup. However it was confirmed that this is not causing that much of a problem because core-contention is either very low or zero on most occasions on the server. (This was ascertained from vSphere client).

 

co-stop-vSphere-client.png

 

NOTE: core contention is a situation wherein a VM is unable to run due to co-scheduling constraints – this basically comes up due to the fact that a 8-vCPU or 4-vCPU VM does not always get all CPUs to run. It has been documented and expounded in several blogs that VMware has fixed this with their ‘relaxed co-scheduling’. 

 

VM-configs-report.png

 

So we’ve ruled out the possibilities for over-commit at host-level and core count contention causing high vCPU. Also, I am not setting any limits for my VMs either at VM-level or at resource pool-level – so there cannot be a possibility of my VMs getting constrained by these limits. However there’s still one thing to check - that’s co-stop (%CSTP).

 

NOTE: co-stop is a counter that applies for SMP virtual machines and it is a measure of the amount of time a vCPU is stopped in order for other vCPUs can catch up. This is not physical world and so vCPUs might be delayed and run into a skew. This basically implies that on VMs which were allocated multiple vCPUs which are not used, the running vCPU may move forward while the other vCPUs are left behind – so they need to catch up at some point of time.

 

In this case, and this is where I had to turn to esxtop to confirm the %CSTP values – it was noticed that the above VMs with high ready utilization rates had really high co-stop values too. The simple recommendation here is to reduce the number of vCPUs allocated to these VMs so that their ready-time accordingly reduces automatically.

 

 

esxtop-putty.png

 

vPV is the tool that helped to identify high-ready time on the VMs in the cluster and triage the problem, setting the ball rolling for me.

 

My next step therefore is to work with the owner of these VMs to reduce the vCPUs allocated to the above VMs – it is actually the reason I wrote this blog. I plan on using another tool from the HP software arsenal - Service Health Optimizer (SHO) to suggest a right-size for CPUs for this VM, based on the VM demand trend. In my setup, I already see SHO marking these VMs as ‘over-sized’ which adds up well to my case study – but more on this later.

 

References (leads to external blogs. HP is not responsible for the content):

 

Comments
Patrik Batsching | ‎02-26-2013 02:02 AM

Hi Ram,

excellent technical blog describing how to identify "cpu-ready" VMs by using Virtualization Performance Viewer.

The power of vPV is really amazing and this feature is really a money-saver!

 

I like the possibility to click on your embedded screen shots to get enlarged pictures, where the graphic details and text are perfectly readable.

 

Thank you.

 

  Patrik Batsching

 

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
About the Author(s)
  • Beth Parker is a Product Marketing Manager for HP Software. Beth is responsible for outbound marketing and sales enablement for the HP Systems Management portfolio including HP Operations Manager, Operations Agents, Smart Plug-ins and SiteScope. She is based in Naples, Florida.
  • Doug is a subject matter expert for network and system performance management. With an engineering career spanning 25 years at HP, Doug has worked in R&D, support, and technical marketing positions, and is an ambassador for quality and the customer interest.
  • This account is for guest bloggers. The blog post will identify the blogger.
  • Jim is a technology marketer with over two decades experience in product launch, branding, and product marketing
  • Jimmy Augustine is Director of Product Marketing within HP Software and leads the HP Application Performance Management and Configuration Management System product marketing teams. JImmy has been with HP for over a year. Prior to HP, Jimmy was VP of Marketing for ASG Software Solutions. Prior to joining ASG, Jimmy was at IBM for ten years. He was a marketing manager for IBM's Outsourcing business, the market leader, and helped launch IBM's market leading web hosting business. Jimmy holds an M.S. in industrial engineering and a B.S. in aerospace engineering from the University of Florida. Jimmy resides in Naples, FL with his wife and two children.
  • Ken is responsible for worldwide marketing of HP’s virtualization and systems management products. His experience includes over 20 years in marketing, product management and business development. In addition to IT management software, his background includes enterprise storage and systems.
  • Mark Pinskey currently serves as Sr. Product Marketing Manager for HP Network Management Solutions within the HP IT Management software business unit. Mark's expertise is around HP's comprehensive network management solutions. Mark has been with HP for over 29 years. Prior to HP, Mark has worked with IT for the FBI in Washington D.C.. Mark holds a dual major in Political Science and Biology and additional credentialis in Computer Science. Mark resides in the Cleveland, Ohio area.
  • Ramkumar Devanathan works in the IOM-Customer Assist Team (CAT) providing technical assistance to HP Software pre-sales and support teams with Operations Management products including vPV, SHO, VISPI. He has experience of more than 12 years in this product line, working in various roles ranging from developer to product architect.
  • Sonja is a Product Marketing Manager for the HP Software Operations Center portfolio of products. She has 19 years of product marketing, product management, engineering, and consulting experience with privately-held, start-up, and Fortune 500 companies. Sonja has been responsible for positioning, messaging, strategy, and go-to-market programs for both consumer and B2B product lines.  Companies that she has worked for include InstallShield, Loudcloud, Sun Microsystems, and AT&T.


Twitter Stream
Follow Us