Performance issues when using VSA on ESX with VMXNET3 driver (12446 Views)
Reply
Frequent Advisor
M.Braak
Posts: 45
Registered: ‎10-20-2009
Message 1 of 77 (12,446 Views)

Performance issues when using VSA on ESX with VMXNET3 driver

Hi,

 

I want to share a big performance issue with you.

 

Currently there is a big problem when using HP P4000 VSA's on VMWare when using VMXNET3 driver.

When the VSA is colocated on a ESX server with other VM's and the gateway node of a SAN volume is the locally hosted VSA node then there is a huge performance problem when the ESX server itselve uses the volume (for example deleting a snapshot)

Latency of the volume goes sky high (300+ms) and IO's are very slow.

 

VMWare also ackowledges this problem. There seems to be a problem with the TSO of the VMXNET3 driver which is being bypassed by the ESX server which causes severe performance degradation.

 

When you change the VMXNET3 driver of the VSA to E1000 the problem is solved, however i'm still waiting on a reply of HP if using E1000 is supported.

 

I''ll keep you updated

Wvd
Occasional Advisor
Wvd
Posts: 5
Registered: ‎01-17-2012
Message 2 of 77 (12,426 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Please keep us posted on this issue.


We are experiencing something similar.

Config is two VSA 9.5 with flexible nic's on DL380 G7's with ESXi 4.1 U1.

 

The cluster can perform normally for some time but then suddenly one of the ESX hosts experiences heavy write latency (150ms+) to it's local disk. Due to the network raid 10 this affects the whole P4000 cluster.

Only way to restore performance is to shut down the bad performing VSA node and reboot the ESXi server.

 

Strange with our case is that the local disk performance is affected even after shutting down the node.

Adding a local disk on the VSA datastore to a virtual machine still shows bad write latency.

This leads me to believe that the write cache got disabled for some reason but the hardware status makes no mention of this.

 

Sounds like a hardware issue but we have seen the local write latency happen on both servers.

Firmware level is up to date with latest firmware DVD 9.30.

 

Your story makes me reconsider the vmxnet3 driver as a suspect.

The nic is configured as flexible but in the kernel.log of the VSA the vmxnet3 driver is mentioned:

 

Jan 16 09:19:09 vsa2 kernel: VMware vmxnet virtual NIC driver
Jan 16 09:19:09 vsa2 kernel: GSI 18 sharing vector 0xB9 and IRQ 18
Jan 16 09:19:09 vsa2 kernel: ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 19 (level, low) -> IRQ 185
Jan 16 09:19:09 vsa2 kernel: Found vmxnet/PCI at 0x14a4, irq 185.
Jan 16 09:19:09 vsa2 kernel: features: ipCsum zeroCopy partialHeaderCopy
Jan 16 09:19:09 vsa2 kernel: numRxBuffers = 100, numRxBuffers2 = 1
Jan 16 09:19:09 vsa2 kernel: VMware vmxnet3 virtual NIC driver - version 1.0.11.1-NAPI

Frequent Advisor
M.Braak
Posts: 45
Registered: ‎10-20-2009
Message 3 of 77 (12,420 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Just got off the phone with HP support.

The E1000 driver is officially not supported by HP. But HP support advices me to use it when it's performing better in our case ?!?!?!

 

They wont investigate the problem further cause in their opinion it's a vmware problem and should fix it.

 

I'm awaiting further information from VMWare.

 

In meanwhile my opinion is that the Lefthand VSA is cripled for this moment and should not being used on ESX servers with VM's locally hosted on the same server until this problem is fixed.

 

Using Flexible interfaces doesn't show the extreme behaviour as VMXNET3 but it also shows weird latencies some times.

 

I have tested this on several different hardware and all the same problem.

 

VMWare mentions also the following possible workaround : Create a seperate vSwitch on which you connect only the VSA, but this option needs additional hardware NICS. This way the TSO of the VMXNET3 driver wont be bypassed.

 

Frequent Advisor
Tedh256
Posts: 57
Registered: ‎11-29-2011
Message 4 of 77 (12,417 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

"This way the TSO of the VMXNET3 driver wont be bypassed."

 

What is the "TSO"?

 

Also - best practice already dictates that a seperate vswitch be used for the VSA/iSCSI traffic - that should be no burden! If you are planning a virtual host, you need to incorporate enough interfaces for the host storage access and guest communication, but ...

 

I am not certain that I understand why/how having a seperate vswitch for the VSAs prevents the VMXnet3 "TSO bypass" - could you help me understand what's going on?

 

 

Frequent Advisor
M.Braak
Posts: 45
Registered: ‎10-20-2009
Message 5 of 77 (12,412 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

TSO = CheckSum Offloading. So checksum calculations are done by hardware (NIC) instead of CPU

 

iSCSI traffic should always be on a seperate vSwitch indeed. But VMWare meant a seperate vSwitch for the VSA and a seperate vSwitch for the VMKernel iSCSI network. So whe you als want redundancy you need 4 physical NICS this way. 2 for each vSwitch.

 

When using two vSwitches VMWare uses a different path internally to communicate and this way TSO of the VMXNET3 driver could function properly. (I didn't tested this possible workaround however!)

 

Frequent Advisor
Tedh256
Posts: 57
Registered: ‎11-29-2011
Message 6 of 77 (12,407 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

huh

 

but this problem only applies to situations where you are running VMs (other than the VSA VMs themselves, I presume?) on local storage?

 

Why would you want to do that - if these hosts are running VSAs wouldn't you simply use up all local storage so that it can be presented as shared storage?

Frequent Advisor
M.Braak
Posts: 45
Registered: ‎10-20-2009
Message 7 of 77 (12,403 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

No, all local storage is used by the HP VSA and is provided as a iSCSI volume/datastore to ESX servers. (used in small enterprises)

When you have VM's hosted on the same ESX node as which the VSA (which is used as gateway node for the volume) is hosted then you have this problem when for example deleting a vmware snapshot. (All cases in which the ESX node itselve communicates with the datastore)

 

Traffic from within VM's to the datastore dont suffer this problem.

 

So local storage of the server is only being used by the HP VSA.

 

Just test it for yourselve:

Deploy a VSA (with VMXNET3 nic) on a single ESXi 4.1 server

Create a volume on the VSA and create a vmware datastore on it

Now let the ESXi server perform some traffic on the datastore by commiting a snapshot or a much easier way:

 

Execute the following command from an SSH shel on the ESXi node

# cd /vmfs/volumes/[datastorename goes here]

# time dd if=/dev/zero of=testfile count=102400 bs=1024

 

This command creates a 100MB testfile on the datastore. Creating a 100MB file should be a matter of 1-2 seconds!!! Times could go up to even minutes.

Also check the datastore read and write latency of the datastore from viclient/vcenter.. (200+ ms as soon as you start creating the file!)

 

 

 

Frequent Advisor
RonsDavis
Posts: 56
Registered: ‎06-25-2010
Message 8 of 77 (12,371 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

On my 9.0 VSAs the Nics are set to flexible. Why are you using VMNET3 anyway? Does it come standard on newer OVFs?

 

Advisor
virtualmatrix
Posts: 15
Registered: ‎07-20-2009
Message 9 of 77 (12,360 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

 

FWIW --

 

We saw similar symptoms a year or so ago, but it was reproducible with any virtual nic device and on both 10 GigE and 1 GigE networks.  With that said, perhaps it could be more prevalent with vmxnet3 or perhaps it was just a different issue altogether.

 

Do you see this problem with vmxnet2?

 

In our case, the cause was thought to be due to vmkernel race and locking issues across the multiple vmdk layers.  It was most easily triggered with operations such as cloning, snapshots, and zeroing... but it wasn't reproducible-on-demand.  We changed all of our VSAs to use RDMs to the local storage instead of VMDKs-on-VMFS and the problems immediately disappeared.

 

That was back with ESXi 4.x and VSAs at SAN/IQ 8.x.  We're now running ESXi 5.0 and San/IQ 9.5.  Some VSAs are using vmxnet2 -- no issues.  We haven't tried vmxnet3.

 

Using RDMs removes an "unnecessary" layer since the only thing on the datastore is the data VMDKs for the VSA anyway.  It may be quicker&easier for new administrators to setup a VSA by just setting up VMDKs on a VMFS, but it sounds like you're quite comfortable getting around the ESXi shell.  To create the RDMs, we used vmkfstools.

 

HTH

Regular Advisor
5y53ng
Posts: 160
Registered: ‎07-27-2011
Message 10 of 77 (12,340 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

This is very interesting. I experienced this behavior as well, but I was unaware of the root cause. I witnessed extremely high latency numbers and poor throughput when I used the VMXNET3 adapter on my VSA. I could only clear the symptoms by rebooting the host. Since I was unable to explain the cause of the latency problems I abandonded further testing with the VMXNET3. When the VMXNET3 was working properly I did not see a significant performance increase over the flexible adapter anyway.

 

 

Frequent Advisor
M.Braak
Posts: 45
Registered: ‎10-20-2009
Message 11 of 77 (12,336 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

[ Edited ]

When you deploy a VSA from the OVF it has two nics. One flexible and one VMXNET3.

I was told the flexible interface is for cases where management traffic is not possible over the same network as storage so you can delete one. VMXNET3 should offer better performance as the flexible so i always remove the flexible adapter.

 

The case at HP support didn't work out. HP's statement is that it's a VMware problem and they should solve it ?!?

 

The case at VMware is making progress. They are actively investigating this case. This morning i have performed several testing scenarios and collected logfiles for VMware to analyse.

 

I'll keep this thread  updated with the progress

Wvd
Occasional Advisor
Wvd
Posts: 5
Registered: ‎01-17-2012
Message 12 of 77 (12,331 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

HP is very obscure why there suddenly are two nic's in the VSA 9.5. OVF. This is very confusing and no clear statement exists which adapter should be used for iSCSI traffic. HP should make a definite statement and release a revised OVF with one adapter or at least one TYPE of adapter...

Regarding performance issues, I am performing tests in my lab and seeing interesting results. Will post back soon...
Regular Advisor
5y53ng
Posts: 160
Registered: ‎07-27-2011
Message 13 of 77 (12,330 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver


M.Braak wrote: 

When using two vSwitches VMWare uses a different path internally to communicate and this way TSO of the VMXNET3 driver could function properly. (I didn't tested this possible workaround however!)

 


Using two vswitches as described above, the iSCSI traffic must traverse the physical network to reach the gateway VSA. This is strange, since we would expect to benefit from using the the VMXNET3 with traffic that remains on the vswitch and does not cross the physical network. I believed this to be where the VMXNET3 adapter would provide some benefit, but I guess it's time to refresh my memory and read up on the different types of virtual network adapters...

Regular Advisor
5y53ng
Posts: 160
Registered: ‎07-27-2011
Message 14 of 77 (12,326 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver


M.Braak wrote:

When you deploy a VSA from the OVF it has two nics. One flexible and one VMXNET3.

I was told the flexible interface is for cases where management traffic is not possible over the same network as storage so you can delete one. VMXNET3 should offer better performance as the flexible so i always remove the flexible adapter.


I was never able to get the CMC to connect to the second NIC on any of my VSAs. The CMC would only connect to the NIC that was set as the SANiQ interface. When I would change the SANiQ interface, I was unable to reach the VIP on my iSCSI network. Is there something special you have to do in order to use the second NIC for management?

Frequent Advisor
M.Braak
Posts: 45
Registered: ‎10-20-2009
Message 15 of 77 (12,318 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver


5y53ng wrote:
I was never able to get the CMC to connect to the second NIC on any of my VSAs. The CMC would only connect to the NIC that was set as the SANiQ interface. When I would change the SANiQ interface, I was unable to reach the VIP on my iSCSI network. Is there something special you have to do in order to use the second NIC for management?

I never used two interfaces so i can't tell you but this is from the help function:

  • When configuring a management interface on a P4000 storage system, you must designate the storage interface as the SAN/iQ interface for that storage system in the CMC. This is done on the Communications tab in the TCP/IP configuration category for that storage system

Wvd
Occasional Advisor
Wvd
Posts: 5
Registered: ‎01-17-2012
Message 16 of 77 (12,292 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

[ Edited ]

I have finished my testing and have come to the following conclusion:

 

Any P4000 VSA 9.5 with a hardware virtual machine version above 4 is performance impaired.

 

I have come to this conclusion by testing a lot of different configurations.

My test setup:

 

DL380 G7 with 12 450GB 10K SAS disks in RAID10

HP 2910AL switches

Dedicated iSCSI network and adapters

 

I tried a lot of different 9.5 VSA configurations but these are the most common:

 

  • VSA 9.5 with flexible adapter
  • VSA 9.5 with VMXNET3
  • VSA 9.0 with flexible adapter and upgraded to 9.5

Created a separate management group for all of them and created a volume that the ESXi server would connect to.

 

I created datastores on the volumes and deployed a clean VSA 9.5 OVF on each datastore. These would not be offering storage, they are just a quick test of virtual machine boot performance and latency to the datastores.

 

The results:

 

Booting the virtual machine resulted in latency spikes to 200-300ms on all datastores except for the upgraded 9.0 to 9.5 VSA with flexible adapter. Here latency never went above 5ms.

 

I also timed the boot to verify if performance was indeed impacted.

The upgraded VSA 9.0 to 9.5 booted the virtual machine in 1min 10sec

All other booted in 1min 30sec

 

The difference between an upgraded 9.0 to 9.5 and a newly deployed VSA 9.5 lies mainly in the fact that the upgraded VSA stays on VM hardware version 4.

My suspicion was confirmed when I upgraded the hardware version and the boot time of the test VM immediately went to 1m30s and high latency appeared.

 

As a final test I also tried using a raw device mapping to local storage on a new VSA 9.5 as mentioned in this thread. This improved performance, boot time went to 1m15s but latency was still too high and spikey.

 

These tests were performed on ESXi 4.1 U1 and 5.0. It made no difference.

 

I am definitely keeping my VSA's on HW version 4.

 

 

 

 

 

 

Occasional Visitor
yaodongxian
Posts: 1
Registered: ‎01-25-2012
Message 17 of 77 (12,202 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

How could you keep the VM hardware version 4 if you deploy the 9.5 vsa ovf to ESXi 5?

 

Since I am testing a 6 nodes P4000 VSA with ESXi 5, the VM hardware version is 7.

 

I am also experiencing crappy performance issue, I have 6 nodes in the cluster, each node has 5 disks, with VMXNET3 and one 10G port for vSwitch uplink. I only get about 30MB/s throughput.

Wvd
Occasional Advisor
Wvd
Posts: 5
Registered: ‎01-17-2012
Message 18 of 77 (12,191 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver


yaodongxian wrote:

How could you keep the VM hardware version 4 if you deploy the 9.5 vsa ovf to ESXi 5?

 


Only way is to deploy an older 9.0 VSA ovf and then use the CMC to upgrade it to 9.5.

Keep us posted on the results...

Regular Advisor
5y53ng
Posts: 160
Registered: ‎07-27-2011
Message 19 of 77 (12,185 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver


Wvd wrote:

Created a separate management group for all of them and created a volume that the ESXi server would connect to.

 

I created datastores on the volumes and deployed a clean VSA 9.5 OVF on each datastore.

 

Your results are interesting, but could you clarify the above quote for me? Did your test consist of a single ESXi host with three VSAs, each in it's own cluster, serving up a single volume? I would like to try and duplicate your test as closely as possible to see if I experience the same results.
Thanks.
Wvd
Occasional Advisor
Wvd
Posts: 5
Registered: ‎01-17-2012
Message 20 of 77 (12,167 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Correct, single ESXi host with three VSAs, each in it's own cluster, serving up a single volume
Regular Advisor
5y53ng
Posts: 160
Registered: ‎07-27-2011
Message 21 of 77 (12,162 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

Your results reinforce my findings while testing. While my test configurations were much different than yours, I concluded there was either something wrong with the VSA or the iSCSI stack in ESXi 5. Write performance was terrible with this combination of VMware and the HP VSA.

 

Using ESXi 5 and VSA 9.5 I was unable to achieve more than 135 MB/ sec on IO meter with a sequential 64KB 100% write. I had much better results with ESX 4.1 and VSA 9.0, roughly 230MB/ sec with the same IOmeter test profile.

 

 

Frequent Advisor
RonsDavis
Posts: 56
Registered: ‎06-25-2010
Message 22 of 77 (12,157 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

I'm going to build a pair of test VSA boxes, and run through a set of tests. The variables I'm looking at right now seem to be RDM vs VMDK, Hardware verion 4 vs 7 vs 8, 9.0 vs 9.5, and flexible vs VMnet3. Anyone else have any other ideas I should test against?

What I'll basically do is set up iometer on a VM and add the entire VSA available storage to it as a second drive. I'll give it 4 CPUs, with 16 queued requests, since I'll have 8 drives in each node. Test will likely be "All in One", and run for at least a couple hours.

I'll post results when I'm done, which will be a week or so.

 

Occasional Advisor
dch15
Posts: 7
Registered: ‎05-27-2010
Message 23 of 77 (11,958 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

We are seeing similar problems with two DL380 G7's and the 9.5 VSA.  We're running ESXi 5 from SD cards on the DL380's and then using the local storage for the VSA. 

 

Have you heard back yet from HP on whether or not they have confirmed the problem and that the fix is to use the E1000 driver and/or if it is related to the machine version?

 

Thanks,

 

Dan

Frequent Advisor
M.Braak
Posts: 45
Registered: ‎10-20-2009
Message 24 of 77 (11,943 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

HP support told me that it is not their problem and vmware should solve it. HP support told me they do not support the e1000 driver but they do tell me to use it if it fixes my problem!?!?!
Vmware has build a reproduction enviroment to investigate the issue but i dont have an answer from them yet.
Using the flexible driver seems to be the best solution for now. :-(

I will keep you updated as soon as i have an answer from vmware.
Occasional Advisor
dch15
Posts: 7
Registered: ‎05-27-2010
Message 25 of 77 (11,903 Views)

Re: Performance issues when using VSA on ESX with VMXNET3 driver

HTH or others, can you explain how you got RDM's to work on local storage with the VSA?  When I try to set up the VSA with using RDM's it is grayed out.  Do you know if this is supported by HP?  It makes sense to me that an RDM is more 'direct' than using VMDK's on VMFS but I don't see anything about it in HP's docs.

 

We have a setup with 2 DL380's with local storage.  We used SmartStart to set the drives up in a RAID 5 array with two logical disks (one small for the VSA itself and the other large for everything else on the array).  Then we installed ESXi 5 on the small partition.  When we go to set up the VSA's how do we use RDM's with it?

 

Thanks,

 

Dan

The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.