Re: Network failover not working with ESXi5, hp Virtual Connect (1883 Views)
Reply
Regular Advisor
Mike O.
Posts: 104
Registered: ‎12-31-2008
Message 1 of 14 (1,901 Views)

Network failover not working with ESXi5, hp Virtual Connect

We have C7000 with a mix of BL465G5 and BL465G7 blades.  We're using the 1/10 Virtual Connect Ethernet interconnect modules (no Flex 10).  Our external switches are Cisco 6509 units.  The blades are currently running VMWare ESXi 5.

 

We've configured the systems using the shared uplink set and VLan tagging (basically the scenario 1:5 in the HP Virtual Connect Ethernet cookbook).  The uplink set is spanning across two interconnect modules, with each interconnect going to an LACP group on a different 6509 switch, so it has some ports active and others in standby.  Except for VMWare updates and hardware firmware updates, this configuration has been pretty much unchanged for a couple of years.

 

Recently when we were doing some other testing, we disconnected three of the four interconnects from the network and the VMWare blade dropped off the network.  One interconnect was still connected, so the blades should have stayed connected.

 

After a bunch of testing, configuration changes, etc. we've determined that it appears that the VMWare NIC's aren't failing over when they should.  We have "beacon probing" enabled on the vSwitch, but it doesn’t seem like it's detecting that there's no communication to the outside.

 

The blades all have four NIC's on them.  Our Virtual Connect configuration has two "Uplink Sets" defined with the same VLAN's in each.  Two NIC's are going to each uplink set.

 

All four physical vmnic's are attached to the vSwitch.  Vmnic0 and vmnic1 are going to "uplink set 1" and vmnic2 and vmnic3 are going to "uplink set 2"

The vmware "Nic Teaming" failover order is set to vmnic0, vmnic1, vmnic2, and vmnic3.  For testing we set the "load balancing" to "Explicit failover order".

 

If "Uplink set 1" has a connection to the outside world, everything works OK.  If we disconnect the connections on Uplink set 1, and only have Uplink set 2 working, VMWare doesn't detect that vmnic0 and vmnic1 can't communicate and we lose network connectivity.

 

If we change the vmnic failover order to have vmnic2 or vmnic3 first in the list, then it only works if Uplink Set 2 is connected.

 

 

We're going to open a ticket with VMWare, but I was hoping someone out there has had a similar configuration and might have some insight..

 

Mike O.

Trusted Contributor
Hongjun Ma
Posts: 215
Registered: ‎02-07-2011
Message 2 of 14 (1,898 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect

Hi Mike,

 

What's the version for VC 1/10 module?

 

also, could you post screen captures for the following

 

1) SUS1 config

2) SUS2 config

3) a server profile config(main screen and Multiple Networks screen)

4) Stacking link status

 

BTW, with Active/standby design, when active links in SUS1 fail, the server traffic should go across stacking link to use new active uplinks to go out. This is done without doing NIC failover.

 

Please take a look at page 23 of this doc

http://hongjunma.wordpress.com/2011/11/28/hp-virtual-connect-technical-overview-presentation/

 

My VC blog: http://hongjunma.wordpress.com



Regular Advisor
Mike O.
Posts: 104
Registered: ‎12-31-2008
Message 3 of 14 (1,894 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect


Hongjun Ma wrote:

Hi Mike,

 

What's the version for VC 1/10 module?

 

also, could you post screen captures for the following

 

1) SUS1 config

2) SUS2 config

3) a server profile config(main screen and Multiple Networks screen)

4) Stacking link status

 

BTW, with Active/standby design, when active links in SUS1 fail, the server traffic should go across stacking link to use new active uplinks to go out. This is done without doing NIC failover.

 

Please take a look at page 23 of this doc

http://hongjunma.wordpress.com/2011/11/28/hp-virtual-connect-technical-overview-presentation/

 


I don't have the screen shots available, but here's some info:

 

- We're running 3.18 of the VC firmware.

 

1) SUS1 has bay1 ports 1, 2, 3, & 4 (LACP group to one switch) and bay 2 ports 1&2 (LACP group to a different switch).   About 8 vlan networks defined in the SUS

 

2) SUS2 has bay 5 ports 1, 2, 3 & 4 (LACP group to one switch) and bay 6 ports 1&2 (LACP group to a different switch).  Same VLan networks as defined for SUS1

 

 

3) LOM1 and LOM2 going to SUS1,   Mezz1 and Mezz2 going to SUS2

 

VMWare team has all four vmnics.  Load balancing set to port ID, failover detection set to "beacon".

 

 

4) We have ethernet interconnects in bays 1, 2, 5, and 6.  We have cx4 linking 1&5 and another one between 2 & 6.  All vertical and horizontal stacking links are showing OK.

 

-Within the uplink set (where some ports are active and some standby), we do get the the correct function when the active ones go down; the standby ones go live and the traffic flows without almost no interruption.  What  we're missing is when all the ports in an uplink set (both active and passive) go down, VMWare doesn't pick up that vmnic0 and vmnic1 aren't getting outside, so it continues to use those vmnics instead of failing to the other vmnics in the team that are going to a different uplink.

 

Respected Contributor
Psychonaut
Posts: 214
Registered: ‎08-31-2011
Message 4 of 14 (1,885 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect

Do you have Smart Link enabled on the SUS's?
Trusted Contributor
Hongjun Ma
Posts: 215
Registered: ‎02-07-2011
Message 5 of 14 (1,883 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect

Hi Mike,

 

Your last statement helped me to better understand better about your problem. I was not clear you refer to the situation that you lose all uplinks for a SUS.

 

I think here is your problem when you use "beacon" along with this topology of 4 vc modules stacking. Let's say you lose all of your uplinks for SUS1 in module 1 and 2. Because your stacking topology(which is setup correctly), the beacon heartbeat from server will send to VC1 and see stacking link to VC5 so it'll get forwarded to VC5 and then VC6 through internal horizontal stacking link. From VC6, it'll use vertical stacking link again to flow back to VC2 and back to your vmnics. Remember, all modules and stacking links will carry any vnet you defined even though you don't have any uplink defined on this module.

 

What's the reason you use "beacon"? Why can't you use "link status" detection on Vmware side?

 

This is assuming that you DON'T have "smartlink" enabled for all vnets, which is what cookbook 1:5 is configured.

 

try define "smartlink" for all vnets to see if it works with beacon and link status detection. It should work. The function of "smartlink" is to shut down all downlink ports to server if the given vnet loses ALL of its uplinks.

 

 

My VC blog: http://hongjunma.wordpress.com



Regular Advisor
Mike O.
Posts: 104
Registered: ‎12-31-2008
Message 6 of 14 (1,880 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect


Hongjun Ma wrote:

Hi Mike,

 

Your last statement helped me to better understand better about your problem. I was not clear you refer to the situation that you lose all uplinks for a SUS.

 

I think here is your problem when you use "beacon" along with this topology of 4 vc modules stacking. Let's say you lose all of your uplinks for SUS1 in module 1 and 2. Because your stacking topology(which is setup correctly), the beacon heartbeat from server will send to VC1 and see stacking link to VC5 so it'll get forwarded to VC5 and then VC6 through internal horizontal stacking link. From VC6, it'll use vertical stacking link again to flow back to VC2 and back to your vmnics. Remember, all modules and stacking links will carry any vnet you defined even though you don't have any uplink defined on this module.

 

What's the reason you use "beacon"? Why can't you use "link status" detection on Vmware side?

 

This is assuming that you DON'T have "smartlink" enabled for all vnets, which is what cookbook 1:5 is configured.

 

try define "smartlink" for all vnets to see if it works with beacon and link status detection. It should work. The function of "smartlink" is to shut down all downlink ports to server if the given vnet loses ALL of its uplinks.

 

 


Actually, I was just about to post some more info.  This morning, we tried various combinations of uplink sets, smartlink, and/or beacon probing.

 

What we found out was pretty much just what you said; that the beacon probing wasn't working when we had the uplinks spanning interconnects, and we figured out that it was because of the stacking links.  It did work if we had each uplink set isolated to a single interconnect bay.

 

We also tried enabling smartlink on the networks in the uplink set, with the uplink set spanning multiple interconnects (with the active/standby ports).  This worked perfectly and did exactly what we wanted it to do.   If the active ports went down, the standby ones came up and everything worked OK.  We would lose one "ping", but VMWare didn't mark any nics as down.    When we remove all the ports from the uplink, VMWare sees both nics down and does it's teaming to send the traffic over the other nics (attached the the other uplink set).  The "outage" is a little bit longer (two or three PING responses), but certainly acceptable.

 

So with Smartlink enabled, we have the full redundancy; as long as we have at least one connection to any of the interconnect modules, we can get network traffic to the VMWare environment.

 

 

What I'm wondering about now is why in the VC Cookbook, under scenario 1:5, it specifically says that "Smartlink should NOT be enabled".  I understand that in a "horizontal" failover with active/standby ports, Smartlink wouldn't be needed, but is there a problem with having it enabled?

 

Since having smartlink enabled seems to solve our issue, and provide us the most redundancy, I'd like to leave it enabled, but I don't want to cause any other issues...

 

 

 

Mike O.

 

By the way, the reason we had been using "beacon" instead of "link status" was from the cookbook; it shows beacon in the ESX configuration section of scenario 1:5.  That also seemed logical, since with Smartlink disabled (per the cookbook), it seemed like we would never have a link failure.

 


 

Regular Advisor
Mike O.
Posts: 104
Registered: ‎12-31-2008
Message 7 of 14 (1,879 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect


Psychonaut wrote:
Do you have Smart Link enabled on the SUS's?

We did not, per the VC Cookbook scenario 1:5.  However, as part of our testing today we did enable it and the failovers work exactly as we want them to (see my other response).  I'm still not sure why the cookbook specifically says "Smartlink should NOT be enabled".  I can see that it wouldn't help in a failover with the active/standby ports, but will having it enabled cause any problems?

 

Mike O.

Trusted Contributor
Hongjun Ma
Posts: 215
Registered: ‎02-07-2011
Message 8 of 14 (1,876 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect

Hi Mike,

 

Please keep "Smartlink" on, it won't do any harm. I'll say most of VC deployment should have smartlink enabled to make sure we are not blackholing the traffic.

 

One instance that you don't want to use "smartlink" is only when you have some internal communications across blades and you still want to have server NICs up even when all uplinks go down. Some scenarios like cluster configuration that you don't want to trigger host failover. But in your topology you should enable smartlink.

 

Also, Try to set your vswith failover to "link status'. this may give you quicker failover time because you don't have to wait multiple times of beacon heartbeat missing before triggering failover. "link status" is default and that should just work fine.

 

I believe the reason VC cookbook uses "beacon" is because that back in some early time, Smartlink feature doesn't work consistently on some NIC firmware versions. Nowadays with latest firmware/driver, smartlink will work well and then you should leave NIC side as "link status" failover.

 

Take a look at VC Flexfabric cookbook, which is latest VC module. You can see in Scenario 5, the "link status" is being used by vswitch. Forget about FCOE and FlexNIC part which doesn't apply to VC 1/10 module. Basic ethernet design and failover is the same.

http://h20000.www2.hp.com/bc/docs/support/SupportManual/c02616817/c02616817.pdf

 

My VC blog: http://hongjunma.wordpress.com



Regular Advisor
Mike O.
Posts: 104
Registered: ‎12-31-2008
Message 9 of 14 (1,872 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect

Thanks, that's what I was hoping to hear, that Smartlink wouldn't cause any problems.  I guess what was concerning me was they way the cookbook worded it, that "Smartlink should NOT be enabled", with "NOT" in all caps.   I didn't see how it would hurt, but they fact that they emphasized "NOT" made me wonder..

 

For the VMWare detection, once we re-enabled Smartlink we were going to go ahead an use the "link status" in VMWare instead of beaconing. 

 

Besides the issue with VC looping back the beacon packets, I can understand where beacon probing theoretically could help detecting upstream switch failures, but in our case our blade chassis is connected directly to our top level "core" switchs in our data center; there's no other "upstream" switch for the beaconing to detect.  If our core 6509 switch isn't talking to anything else, we have a whole lot more issues going on...

 

I have a copy of the Flexfabric cookbook, but I didn't really dig into much since we're not using the Flex-10 modules at this time.

 

Thanks again.

 

Mike O.

Respected Contributor
Psychonaut
Posts: 214
Registered: ‎08-31-2011
Message 10 of 14 (1,870 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect

I've got 12 servers running with Smartlink and "link status" - works great.
Regular Advisor
Mike O.
Posts: 104
Registered: ‎12-31-2008
Message 11 of 14 (1,868 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect


Psychonaut wrote:
I've got 12 servers running with Smartlink and "link status" - works great.

Just curious, are you using the uplink sets that span the interconnect modules, with active/standby ports?  That seems to be the configuration in the cookbook that says to not use smartlink.

 

I'd just really like to know why HP specifically says to "NOT" enable smartlink in that configuration...
Mike O.

Respected Contributor
Psychonaut
Posts: 214
Registered: ‎08-31-2011
Message 12 of 14 (1,864 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect

My uplink sets are Active/Active, so my setup is along the lines of 1:6 in the Cookbook. I looked through 1:5 and read that "NOT" statement, I don't understand it either. They word it like the system will be fine because you have other "Available" uplinks. But that doesn't help because the OS doesn’t know the link is down. From other discussions and research I’ve done I’ve always been under the impression that unless you have a really good reason Smartlink should always be enabled. That way situations like that are avoided.

Perhaps one of the authors of that document is out there and can explain the reasoning.
Advisor
Steven McLean
Posts: 11
Registered: ‎09-13-2006
Message 13 of 14 (1,856 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect

Mike O/All,

 

I’ve read through your issue and Hongjun is correct in stating that Link State should be used with VMware now and that at the time that VC Cookbook was written, Link State was not supported.  That is no longer the case and Link State should be used.

 

As for whether SmartLink should or shouldn’t be enable depends on how the VC networks are implemented.

The purpose of SmartLink is to turn OFF server downlinks that are connected to networks associated with uplinks that have become disconnected or failed.  Take a look at the examples below;

 

In Scenario 1:5, which is an Active/Standby design, when ALL links on Bay 1 are disabled or cables unplugged Virtual Connect will enable the STBY links on Bay 2 and fail all traffic over to the now active links on Bay 2.  The server will not realize the fail-over occurred, the internal stacking link will be used for Bay 1 NIC traffic.  This is the behavior we want in an A/S network design.

 

In Scenario 1:5, if we enable SmartLink, the only time it will do anything, would in the case of ALL uplinks going down.  For example, if ALL uplinks in this scenario connect to the same switch, and the switch failed, all uplinks links with be disconnected and we would lose external communications, but SmartLink would also shut down ALL connected downlinks, disconnecting ALL server NICs (both Bays) and even server to server communications within the enclosure would be lost.  This is why NOT to use SmartLink in an A/S VC config.

 

In Scenario 1:6, which is an Active/Active design, when ALL links on Bay 1 are disabled or cables unplugged ALL downlinks connected to those networks are shut down, this causes the server NIC to go offline forcing NIC teaming or the vSwitch to move all load to the remaining active NIC.  This is the behavior we want in an A/A network design.

 

I wrote that VC Cookbook and, other that Link State now being supported, that’s what I was thinking and what I meant by recommending Smart Link NOT to be used in Scenario 1:5 and why is should be used in 1:6.

 

As for your concern;

 

I’ve reviewed your VC config, this config is actually a merge of scenarios 1:5 and 1:6 and I would consider your design to be Active/Active, not A/S.  What I mean by this is that you have TWO SUS, each containing half the server’s NICs, which make this an A/A design.  The fact that you also have an additional pair of STBY links on each SUS, makes each SUS A/S, but the overall design is still A/A from the server perspective.  In this case, Link State and Smart Link should be used.

 

I would enable Smart Link and Link State and re-test.  All should be fine.

Respected Contributor
Psychonaut
Posts: 214
Registered: ‎08-31-2011
Message 14 of 14 (1,846 Views)

Re: Network failover not working with ESXi5, hp Virtual Connect

Steven,

Thanks for hopping on and sharing.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.