08-19-2002 09:43 AM
When running 'netstat -s -p tcp' I noticed that we are dropping connections due to "no listener". See output below (which was taken 3 hours after a system reboot).
What I am trying to find out is what type of things should I look at to determine why the connections are dropping and if its related to the source quenching. Also, if its not too much trouble, could someone provide a little information as to what 'source quenching' is and maybe what "connect requests dropped due to no listener" actually means (I guess if I knew that then I would know where to start looking for issues).
Additional info: The server is running 4 Sybase databases and 2 Oracle databases. The slow applications in question use two different databases.
I have attached a list of current patches, kmtune output, and a list of the non-default tcp parameters.
netstat -s -p tcp output
2440722 packets sent
2306605 data packets (1956282119 bytes)
133 data packets (39082 bytes) retransmitted
134117 ack-only packets (38357 delayed)
115 URG only packets
234 window probe packets
0 window update packets
47985 control packets
1457662 packets received
1127258 acks (for 1956296290 bytes)
16387 duplicate acks
0 acks for unsent data
577034 packets (89657644 bytes) received in-sequence
0 completely duplicate packets (0 bytes)
0 packets with some dup, data (0 bytes duped)
32 out of order packets (9694 bytes)
0 packets (0 bytes) of data after window
0 window probes
291139 window update packets
1 packet received after close
1 segment discarded for bad checksum
0 bad TCP segments dropped due to state change
1613 connection requests
21994 connection accepts
23607 connections established (including accepts)
24340 connections closed (including 819 drops)
541 embryonic connections dropped
1104276 segments updated rtt (of 1104276 attempts)
70 retransmit timeouts
0 connections dropped by rexmit timeout
234 persist timeouts
545 keepalive timeouts
458 keepalive probes sent
4 connections dropped by keepalive
0 connect requests dropped due to full queue
1421 connect requests dropped due to no listener
Solved! Go to Solution.
08-19-2002 10:01 AM
As for source quench, see RFC792, page 10:
A gateway may discard nternet datagrams if it does not have the buffer space needed to queue the datagrams for output to the next network on the route to the destination network. If a gateway discards a datagram, it may send a source quench message to the internet source host of the datagram. A destination host may also send a source quench message if datagrams arrive too fast to be processed. The source quench message is a request to the host to cut back the rate at which it is sending traffic to the internet destination.
08-19-2002 10:02 AM
for each machine (and NIC) which might be involved. The second page shows errors on the link so don't miss looking at it.
If you have errros check that your NICs and switches all have the same duplex and speed hard coded. Do not let them autonegotiate.
As for the "no listener" stat. Presumably the box received a connection request to port x and there was no process waiting on port x to receive the connection. This could be caused by another PC trying to connect to a wrong port or it might be traffic to a port whose process has died for some reason.
netstat -a | grep listen
netstat -a | grep established
will show you most of the ports that are listening or working. Without the grep you get a few pages of stuff you might not need but it will give you a complete list.
You might need to get tcpdump or use a net sniffer to find out what's going on but get rid of the source quench first so your network guy can run his pings to check the connection.
08-20-2002 07:30 AM
First, we have to insure that the program runnning the listen(fd,size of listen queue) is set to a reasonable number in the program. Second, we set a global parameter for /dev/tcp "max_conn_request_max." This is a default of 20. Most web server should set this to 1024 or greater.
A note should be inserted here in that the the process taking in the requests should be able to handle the total queue depth within the time that the client will wait before it will then retry. If not, then you may get other errors.
08-20-2002 08:09 AM
The short and sweet is "... connect requests dropped due to no listener. The connect requests that came in were for sockets that had no
one listening on it."
If you have some type of port redirector that redirects a single inbound port to a server that has multiple port listeners on it then the definition may be incorrect and have desitnation ports for which there are no listeners.
08-20-2002 08:48 AM
Yes, the NICs are statically set to 100FD AutoNeg off.
I originally looked at the tcp_conn_request_max (which is set at 1024), but I also noticed that 0 connections were dropped due to a full queue.
At least I now have a better understanding of what is causing the connection drops. I will try working with our LAN/WAN guys to take a closer look. I'll have to see if I can find some other ways to track down the source as well.
08-20-2002 09:35 AM
To find the source of these things, you could get a copy of tcpdump, and run it on each interface in turn, with a filter expresion - one that matches TCP segemnts with the RST bit set, or one that matches on port numbers other than the ones you see listed in the output of netstat -an | grep LISTEN. The former will be easier, and would likely look like this:
"(tcp & 4) != 0)"
which means that byte offset 13 in the TCP header (which has the flags) bitwise ANDed with the value of 4 (RST is the third bit of that byte, hence four in decimal/hex) is not zero - ie the RST bit is set.
By default, the HP-UX 11X stacks will place text in the RST segment explaining why the reset was sent. You can then start to decode the data in the RST segment with the help of the ascii(5) manpage.
It would look something like this:
# /usr/contrib/bin/tcpdump -x -i lan0 "(tcp & 0x4) != 0"
tcpdump: listening on lan0
11:38:03.044279 sweb156.cup.hp.com.54321 > tardy.cup.hp.com.58243: R 0:11(11) ack 741047421 win 0 (DF)
4500 0033 5936 4000 4006 6c9f 0ff4 28ce
0ff4 2c3a d431 e383 0000 0000 2c2b 7c7d
5014 0000 ad5d 0000 4e6f 206c 6973 7465
those last few bytes are "No listener" in ASCII. Then you look at the addresing info (IP/host and ports) and go from there
08-21-2002 10:10 AM