07-12-2012 12:12 AM
We stopped using the VSA's. We cant work with the product if it isn't stable and lacks of proper support :(
Last advice of vmware was setting the --iops parameter to 1.
Hope you have more luck
11-03-2012 08:00 AM
Let's do some new testing with ESXi 5.1 and Lefthand OS 10 as soon it's GA. (4th of December)
I hope the problems are fixed by now..
New version 10 VSA's have 2 vCPU's and should work a lot better.
11-03-2012 04:58 PM
I hope to see the write performance increase in 10.0 as I did notice a dip in throughput after upgrading to 9.5.
I think the one thing HP needs to do is add a CMC plugin for the vSphere client. If I could manage my SANiQ cluster from the vsphere client that would be killer.
11-06-2012 07:37 AM
of course, I have bonding before presenting to the VM (VSA) but it is failover type (active-pasive) i. e. I can see traffic flow on one NIC only in realtime.
11-06-2012 12:24 PM
I don't really follow vmware much... can it not do something like LCAP bonding at the host? I would have thought it could do that. If not, I guess the guest is the only option, but LCAP or the equivalent requires the switches to accept that active-active link so if you don't have that you can't really solve your problem through the VSA anyway.
11-06-2012 01:26 PM
VMware can nic bond, but LACP doesn't really give you twice the bandwidth unless you are talking to multiple destinations. It isn't the recommended way of doing things with VMware, as their built in load balancing and redundancies work better.
Also, you won't have more than 1 gbps unless you are using VMXNET3, which is what this whole thread is about. All of the other nics are 1 gbps, so actually bonding channels won't get you 2 gbps of throughput out of a 1 gpbs virtual nic.
02-20-2013 07:56 PM
I found the cause of the high latency is due to setting the IOPS per path lower than the default of 1000 using the command:
esxcli storage nmp psp roundrobin deviceconfig set --device=naa.xxx --iops 1 --type iops
Watching the device latency in esxtop shows that after applying this setting the latency to my SANiQ volumes increases dramatically for any virtual machine that happens to be running on the same host as the gateway VSA. In some cases the latency is in the 1000's.
Changing the IOPS setting on the fly and watching esxtop can reproduce or eliminate the high latency at will.
If someone else could try this and verify they see similar behavior that would be awesome.
My configuration mimics what HP recommends, separate vswitch and (4) NICs for iSCSI, iSCSI port bindings, psp set to round robin, etc.
Hope to hear back from some of you.
02-21-2013 02:11 PM
Just to clarify, when the default is used, you see low latency, when you set it to 1, you see high latency?
Have you tried somewhere down the middle, say 100?
02-21-2013 05:37 PM
I think the root cause of the latency is having a VSA portgroup sharing vmnics that are used for iscsi port bindings. I'm am testing to prove that theory now and will report my findings.
03-06-2013 03:55 AM
For those of you reporting performance issues, how are your vSwitches configured? I ran into a huge latency issue (ESX 5.0 software iSCSI, SANiQ 9.5) when my vmk bound to iSCSI was sharing the same vSwitch as the VSA node... this was with the system largely idle. By simply moving the VSA to its own vSwitch, and forcing iSCSI traffic out through the physical switch we saw dramatic improvement. The latency was only being seen by the ESX host local to the VSA, conditions were reproducible with other ESX hosts. Each time, the second ESX host in the cluster (remote to the first VSA) saw no issues with latency when attached to the first VSA. Single physical switch with separate VLANs for iSCSI and DATA used to connect the ESX hosts.
I've only deployed VSA using the flexible adaptor, always keeping the VMXNET3 disconnected in vSphere. iSCSI VLAN is also always routed to allow VSA's to communicate with email, NTP, CMC etc. in the data network. HP VSA is a great piece of software IMO, I will be testing ESX 5.1 SANiQ 10 in the coming weeks.
03-06-2013 01:14 PM
"or those of you reporting performance issues, how are your vSwitches configured? I ran into a huge latency issue (ESX 5.0 software iSCSI, SANiQ 9.5) when my vmk bound to iSCSI was sharing the same vSwitch as the VSA node... this was with the system largely idle. By simply moving the VSA to its own vSwitch, and forcing iSCSI traffic out through the physical switch we saw dramatic improvement."
This is exactly what I am seeing. I wrote in my previous post that the iSCSI port bindings and the VSA on the same vSwitch seems to be the root cause of the majority of the latency. Setting the IOPS = 1 makes the problem much worse.
After separating the VSA from the iSCSI vSwitch the latency improved dramaticaly. Changing the IOPS doesn't seem to make any difference with this configuration.
In some cases I see a VSA max out a 2 Gbps etherchannel (3 node cluster). I would imagine there must be extreme resource contention when that volume of traffic is mixed with iSCSI traffic.
It seems separating the VSA and iSCSI initiator is the way to go.
03-13-2013 02:51 PM
I still find it very hard to believe HP still didn't fix this issue.
03-13-2013 03:00 PM
You could do something like this, http://blog.davidwarburton.net/2010/10/25/rdm-mapp
But this is NOT supported by VMware. I personally won't run my production storage on an unsupported solution.
Some local storage can use RDMs out of the box, if you have that set up, then great, run with it. I would, because I also feel like RDMs have to be at least a little bit faster. I have just never seen any documentation to show that they are. VMware will tell you there is no performance improvement. A former director at LeftHand also told me there was no improvement.
03-15-2013 11:38 AM
I tried to configure my VSAs to use RDMs and even carved out 5 additional luns on my RAID controller to do it. Unfortunately, the RAID controller didn't allow me to use them. I think the exact reason was due to the RAID controller not reporting a unique NAA number for each lun. I read on these forums that RDMs cut down on the latency, which made me eager to give it a shot.
Now that I have eliminated as much latency as possible via network and iSCSI settings I notice the read latency is a still little high for the VSA. With very low IO to the SAN, in the dozens of IOPS according to CMC, the VSA read latency hovers around 20 ms. The write latency is fine, likely due to the RAID controller cache. I see this same behavior on systems with 8 spindles and also on systems with 25 spindles. I guess this is more or less normal as even the best predicitive read-ahead-and-cache algorithm won't help with random reads.
When you factor in the operation of the VSA, for example; the gateway VSA may have to request blocks from the other VSAs, then that server's seek times, and then transferring the IO back through the gateway VSA to the initiator, it makes sense the latency is going to be a little higher than usual.
If HP would create their own NMP/SATP/PSP for ESXi that functions similar to how the DSM works for Windows that would probably help with the performance. If I understand correctly the DSM for Windows has a gateway to each VSA and accesses the appropriate VSA for any given block. Someone had a good post recently on here that called out the differences.
I can live with the latency, because of what the VSA allows me to do. If space is a concern and you have extremely high consoldiation ratios the VSA is the best option out there.
08-23-2013 07:53 PM
I am using VMware ESXi 5.0.0 build-469512. (ESXi5.0 base). I have a single node 9.5 VSA deployed on this ESX server and i have a vSwitch configured so that my iSCSI adapter vmhba33 is bound to the vmkernel port on vSwitch1. vSwitch is also where all VSA iSCSI traffic is configured to be.
I have a HP DL370G6 with the integration NC375i (quad port network card) and P410i Smart Array controller.
I created a volume in CMC and presented it to my ESXi5.0 server via iSCSI. I created a datastore on this volume and use it for IO tests to try and duplicate the high latency issues.
I am unable to duplicate the issue you encountered below and were hoping you could share more configuration details.
On the config above a ~1GB file create using dd takes about 10 seconds as I am able to get over 100MB/s of write throughput and latency less than 1ms range.
/vmfs/volumes/52180571-4a9dfce4-fab5-0025b3a8ec7a # time dd if=/dev/zero of=testfile bs=1024 count=1024000 1024000+0 records in
1024000+0 records out
real 0m 12.07s
For a ~10GB file create similar latencies are observed and i get:
/vmfs/volumes/52180571-4a9dfce4-fab5-0025b3a8ec7a # time dd if=/dev/zero of=testfile bs=1024 count=10240000 10240000+0 records in
10240000+0 records out
real 1m 41.95s
So i think i may be missing a key configuration item to duplicate these high latency issues.
For my test, I decided to use a single vSwitch and keep everything, VSA, management network and iSCSI traffic on that same switch. So vmhba33 (iSCSI initiator) is bound to only vmk0.
/vmfs/volumes/52180571-4a9dfce4-fab5-0025b3a8ec7a # esxcli iscsi logicalnetworkportal list -A vmhba33
Adapter Vmknic MAC Address MAC Address Valid Compliant
------- ------ ----------------- ----------------- ---------
vmhba33 vmk0 00:25:b3:a8:ec:78 true true
The vSwitch info looks like this:
Num Ports: 128
Used Ports: 5
Configured Ports: 128
CDP Status: listen
Beacon Enabled: false
Beacon Interval: 1
Beacon Threshold: 3
Beacon Required By:
Portgroups: VM Network, Management Network
09-06-2013 06:28 PM
try doing 2 svmotion -> one from das over network to vsa
one from iscsi to das
then run CDM (pure random mode) on 3 clients at the same time 100 % random read/write
watch the network stack crumble and hosts doing svmotion skew time
03-20-2014 06:26 PM
I know that this is a very old thread... but I seem to be having this issue with ESX 5.x and VSA 11.0.
Was this ever resolved? Also, I do not see a way to change the VMXNET3 to E1000 (not during the install).
Please advise. Thanks.
04-01-2014 04:18 PM
FYI: I am also having the same problem very high cluster write latency with ESXi 5.5 (build 1623387) and VSA 11.0. latest patches on everything.
write latency on each node is fine, only the cluster write latency is bad 50ms to 150ms.
it seems to be vSwitch related like this thread says. I will try to split my VSA and my software iSCSI ports as suggested. What a waste of NICs.