06-08-2011 06:48 AM
1.use BL460C + VCFlex10 as Server.
2.Operating system is SuSe 11.0 + SP1
* Some information :
We have done load test ( performance test) on some DL380 hardwares with our software, DL380 could handl 600 concurrent TCP traffic at the same time.
But BL460C can't handle 600 concurrent TCP traffic from our simulator. After investigating, we found BL460C TCP listening port act in strange behaviour. E.g. We send out 600 TPS every second and smoothly, in principle,BL460C TCP port should also receive 600 TPS smoothly ( or 2% ~ 3% fluctuation). The fact is BL460C TCP port receive very low TPS(such as 200 TPS) in one second, but receive very high TPS ( suchs as 1000 TPS ) in other second. Accepted connection number is very strange while performing load test.
* Question :
1. Does BL460C have firmware to solve this problem ?
2. Is SuSe11+SP1 compatible with BL460C ?Or should we patch SuSe?
Thank you very much !
06-09-2011 09:00 AM
What are you using to generate this load?
Have you examined the networking statistics on either your load generator(s) or your BL460c (BTW, *which* BL460C - a "plain" (aka Gen1) or G5, 6 or 7? As you have VC Flex10 mentioned, have you defined any flexnics?
06-09-2011 08:21 PM
Thanks for your response.
1. 600 concurrent request is mapped to 600 connection ( 600 sockets), not one socket.It is a new TCP connection for each transaction.
2. 600 TPS smoothly : Yes, it initiate 600 transactions, then sleep for a second, then initiate another 600 etc.
3. Generate this load : We developed our own c++ codes as simulator ( not use third part tool). Our codes are compiled on both sparc platform (solaris) and SuSe platform.
If we use DL380 + Suse as platform, and test DL380 + SuSe + our product, performance is very good !
If we use DL380 + Suse as platform, and test BL460C + SuSe + our product, performance is poor. Problems were describered in the first thread.
If we use Sparc + Solaris as platform, and test BL460C + SuSe + our product, performance is stable but not high.
4. BL460C Type : BL460C G7
5. Vc Flex10 : Yes, we used flex NIC. BL460C is combined with two physical ethernet interface, and VC Flex10 vertualize other 6 ethernet interfaces. Totally 8 ethernet interfaces on one BL460C hardware.
We are now using eth0 to do performance test
06-10-2011 09:26 AM
1) - a new TCP connection for each transaction is a model that went-out with HTTP 1.0 in the late 1990s. Frankly, with just the one load generator, you are lucky that 600 TCP connections per second didn't cause issues with TIME_WAIT and local port number exhaustion. I trust you aren't using an "abortive close" ...
2) 600 transactions, sleep for one second, lather, rinse, repeat is most certainly *not* smooth. It is in fact rather bursty. Is the real-world usage of the system really going to behave like that, or is that bursty behaviour simply an artifact of how you coded-up the load generator?
Also, how "big" are the responses to each transaction?
3) Is your simulator one thread or 600 threads? Are you launching new threads for each transaction? Can you post some pseudo-code?
On the DL380 were you simply using the LOM ports or did you add cards to it?
Just how low is "not high" when you use Solaris as the load generation? There are at least two possibilities there - one is that Solaris and SPARC are just slow. The other is that on the Solaris platform, your attempt to churn through 600 TCP connections a second has indeed exhausted the local port space and that is slowing things down.
5) What bandwidths did you assign to each of the FlexNICs you created? What are you using as the link(s) between the Flex-10 Modules and the switch that then connects to your DL380 load generator?
It would be good to get the output of ifconfig on one of these flexnics, as well as ethtool -g
You should look at the netstat statistics on your load generator while it is running. Take two snapshots several seconds apart and then compare them with something like beforeafter from ftp://ftp.cup.hp.com/dist/networking/tools/
netstat -s > before
netstat -s > after
beforeafter before after > delta
and then look at delta. Probably good to do the same thing on the system under test (eg the BL460 G7) as well.
From your descriptions thusfar, I am guessing that there are some packet losses somewhere, triggered by the bursts of traffic, and the fluctuation stems from some older transactions being delayed by that.
06-14-2011 01:47 AM
1. One tcp connection with one trasaction is required somewhere, like some applications in telecome. We have changed linux kernel parameter to avoid socket exahusted. I agree if we changed to "keep-alive", BL460C performance level should be higher
(but our reuiqrement is "close-down"). One connection should be kept for 5 seconds
2. Sorry, no sleep for 600 transactions. We send out 600 transactions in each second. And time slot is 1/600 = 0.0017 second.
3. We use multiple process ( 30 processes )to generate 600 trasactions
code: generate 30 threads, one thread is allocated 20 TPS
Did not add cards, use LOM on DL380
4. I use "Auto" to assign each of the FlexNICs bandwith ( also tried manually, such as 1G connection). We use only one ethernet interface ( such as 10.170.7.168) to accept imcoming request.
5. ethtool -g eth0 (on BL460C)
Ring parameters for eth0:
RX Mini: 0
RX Jumbo: 0
Current hardware settings:
RX Mini: 0
RX Jumbo: 0
6. Another information:
After stopped simulator, BL460C still have some connection or FIN_wait like following :
Port 10035 is listen port on BL460C and we have stopped sending out request on simulator
netstat -an | grep 10035
tcp 0 0 10.170.7.168:10035 0.0.0.0:* LISTEN
tcp 0 0 10.170.7.168:10035 10.170.7.81:41498 TIME_WAIT
tcp 0 0 10.170.7.168:10035 10.170.7.81:41517 TIME_WAIT
tcp 0 0 10.170.7.168:10035 10.170.7.81:57430 FIN_WAIT2
tcp 0 0 10.170.7.168:10035 10.170.7.81:41515 TIME_WAIT
tcp 0 0 10.170.7.168:10035 10.170.7.81:41540 TIME_WAIT
tcp 0 0 10.170.7.168:10035 10.170.7.81:41507 TIME_WAIT
tcp 0 0 10.170.7.168:10035 10.170.7.81:57431 FIN_WAIT2
tcp 0 0 10.170.7.168:10035 10.170.7.81:57620 ESTABLISHED
tcp 0 0 10.170.7.168:10035 10.170.7.81:41571 TIME_WAIT
tcp 0 0 10.170.7.168:10035 10.170.7.81:57595 ESTABLISHED
tcp 0 0 10.170.7.168:10035 10.170.7.81:57657 ESTABLISHED
tcp 0 0 10.170.7.168:10035 10.170.7.81:41492 TIME_WAIT
Do not know whether it is related to VCFlex10 configuration.
7. Network Connection:
BL460C <-> VC Flex10 <-> Cisco Switch (1G cable) <-> DL380 (simulator)