While vPV is primarily for operational monitoring to aid in drill-down / triage of performance problems, the workbench and the reports in vPV provide not just immediate data presentation. These components help the user to navigate across time and find out problems that occur beyond just a day or a week.
In this document, we analyze how the cluster host-distribution and utilization report can be used to rate efficiency of a cluster, and what steps need to be taken to balance load amongst hosts in a cluster.
First let's look at the host distribution chart which provides an idea of utilization level of a VMware cluster.
The above chart indicates resource utilization of CPU and memory, across the hosts in a VMware cluster. There are line markers showing segments of the chart as critical, major and low-usage zones.
It is important to keep in mind that this is a scatter chart which is not based on time - it is based only on the data (with each data point as an aggregate over the report duration, such as day, week or month), and representation of data in the 0-100 (percentage value) plane.
Memory utilization markers are denoted as squares.
CPU utilization markers are denoted as diamonds.
We show not just average utilization value but also peak-utilization value and the 90-percentile value.
NOTE: the 90-percentile sample is the 90th ranked utilization value, if the values are sorted from highest to lowest and split into 100 different segments. Basically 90th percentile is the value below which 90 percent of the data points fall.
The idea is to show the spread of values from the average (or mean), to the peak demand needs. Since the peak value for < resource > utilization might be a one-off sample that never recurs or just occurs sparsely in the sample data set, we show also the 90-percentile which in some senses is a better indicator.
If all diamonds in a cluster fall in the low-use category, the cluster is under-used - the host using least cpu can be removed from the cluster and added into other clusters where demand is there.
if all diamonds in a cluster fall in the major-use category, then the cluster is over-used with respect to its cpu capacity.
The above picture represents a cluster which has more processing power (CPU/cores) than needed, and a bit less memory than needed - do note that here we use the memory usage value for the hosts and this is actually derived from the 'mem.consumed' counter of vCenter.
The reason the 3 markers for memory are so close to one another is because the mem.consumed metric is almost always constant. In the chart, we find the peak, average and 90-percentile close to or overlaid on one another for a host, in some cases due to this reason.
This report next shows the trend of memory and cpu utilization of the hosts in the cluster in stacked area format. This helps to identify the trend of utilization of the hosts and specifically which host is being used more or less at which portion of the month - we have specifically used 'raw' values as 'MHz used' and 'GB used'.
NOTE: While the scatter-chart shows the range of values that each host hit in resource utilization, the below chart shows trend of the utilization - this resonates with the above chart indicating that the hosts which are constantly high resource users.
The administrator of the cluster may use this to balance out utilization amongst the hosts in this DRS cluster. This DRS cluster has full automation for vMotion turned on which results in almost equal distribution of load across the hosts below.
Next the report also shows IO rates for the host network cards and storage drives. From the below charts it is evident that a couple of hosts in the middle of the chart show high IO, especially in the case of disk IO. The other hosts are showing moderate to low IO.
The administrator of the cluster may use this information to balance out IO across the hosts in the cluster and their storage paths. For instance VMs on the host iwftesx0204 may be doing high IO and it is recommended that some of these VMs be moved to other hosts and use different path to the storage. Potential IO bottleneck could result if many VMs doing high IO start using the same storage paths.
Finally the report also shows memory savings in the cluster that is a result of virtualization and VMware's memory management techniques.
VMware has an interesting scheme of memory management that allows to share memory pages between virtual machines on 1 host running similar type of workloads. When the VMs are all running same or similar workloads, the savings from memory page re-use or sharing (TPS) will also be really high. While it may not be considered inefficient that the memory savings is low, the metric can still be tracked to see where there is possibility for improvement.
To summarize, this report available to view against VMware clusters in vPV is useful for ascertaining efficiency of compute and IO resources, as well as determine most used and least-used hosts in the cluster.
NOTE: In vPV, reports are not available to view with the free version. Please turn on reporting feature, either by purchasing license of vPV . Reports are also available in evaluation mode.
TIP: you can directly go to vPV's workbench view via this url -