Re: sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message (561 Views)
Reply
Frequent Advisor
ShishirG
Posts: 52
Registered: ‎06-20-2011
Message 1 of 10 (705 Views)
Accepted Solution

sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

We have horizontal scaled system in dev environment with basic sm.ini settings with no detailed loggin parameters but sm log files (on all SM LB and app servers) are full of messages like: -

 

I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

 

This message is getting written to log every few seconds which is filling log with unnecessary repeated information.

 

Any idea how to turn of these messages?

Please use plain text.
HP Expert
viprom
Posts: 321
Registered: ‎11-09-2011
Message 2 of 10 (692 Views)

Re: sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

Where the messages are coming from and what they mean:
- The message itself originates from the Jgroups software that SM utilizes to manage cluster communications.
- The message is an indicator that a SM process (pid) was suspected as being unresponsive to the periodic heartbeats sent by all SM processes in the deployment. These heartbeats serve as a mechanism to ensure cluster health and reliability in SM. When a suspect event occurs, logic in the application requires that the suspect be verified by sending additional messages asking if the suspected member is really dead. The log message you see is essentially a record (it is a WARN level message) of this process having occurred. If the suspected member is really dead (unresponsive) it is shunned from the cluster membership to prevent it from damaging the existing members. If the suspected member is not dead, it is allowed to continue as a member of the cluster.

If not experiencing any issues at or during the timeframe of these messages, then this is good. However, these messages can be an indicator that a few SM processes might be under stress (possible heavy resource utilization). In cases where a SM process (pid) cannot respond to periodic heartbeats as described above, this may be due to the pid being under heavy CPU or memory utilization.

This might be due to a Windows firewall issue.

If all servers (app and web) are running Windows firewall you may try configuring a firewall rule for the UDP traffic for all SM App servers. Whit this you can discover if the JGroups failure detection protocol (defined in the udpcluster.xml file with the empty heading) is actually using TCP.

It can be using a random TCP port.

To resolve this, try to set a "start_port" in the FD_SOCK setings as follows:
You need to add the ports 65500-65535 to the Windows firewall inbound TCP port rule on all SM App servers.
The issue is that there is no documentation in Service Manager that mentions the use of TCP for JGroups communication or how to configure the TCP port that it starts on.
Once you make the Windows firewall change, we will know if this fixes the problem.
Update the JGroup configuration file "udpcluster.xml" and open the following TCP port range on firewall:
TCP 65500-65535

 

-----
If you find this or other posts helpful, please do not forget to click the Kudo Star or to mark it as a Solution if you are the owner of the thread. Thanks :)
Please use plain text.
HP Expert
Amen16
Posts: 214
Registered: ‎11-01-2011
Message 3 of 10 (678 Views)

Re: sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

Hello,

 

You can check the following document from our knowledge base for some information about tracing these messages:

 

http://support.openview.hp.com/selfsolve/document/KM1364636

 

Regards,

Alex

HP Support

If you find that this or any post resolves your issue, please be sure to mark it as an accepted solution.
Please use plain text.
Frequent Advisor
ShishirG
Posts: 52
Registered: ‎06-20-2011
Message 4 of 10 (593 Views)

Re: sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

Hi Viprom,

 

I have opened up 65100 to 65535 port on the firewall but still getting suspect messages on port 65111 and 62455. The port number changes when I start the app server again.

 

However as 65111 is already included in the inbound

 

I have not made any changes to the udpcluster.xml file yet.

Can you please update me on the correct syntax to put it.

 

Currently line in udpcluster.xml file is like <FD_SOCK/>

 

Shall I put it something like: -

<FD_SOCK start_port="65500-65535" />

 

Does the above setting limits the number of ports to be used by jgroups? as at the moment I can see SM uses very wide range of ports for this.

 

Also, how many ports do you think are enough for 10 listeners running on each box. We have got 2 LBs (clustered) and  6 Apps servers running 10 listners each with medium level traffic and 600 user connection.

 

Due to security, we are not allowed to open unnecessary ports, so any limited range will be helpful

 

 

Please use plain text.
HP Expert
viprom
Posts: 321
Registered: ‎11-09-2011
Message 5 of 10 (561 Views)

Re: sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

[ Edited ]

update the JGroup configuration file "udpcluster.xml" as following:
<FD_SOCK start_port=”65500”/>
and open the following TCP port range on firewall:
TCP 65500-65535

The default FD port (without putting any value) is ephemeral which is blocked by the firewall causing the failure of communicating among JGroup members.

 

For the rest of your questions, it's a matter of setup, but the main rule is: you need to have a free port for each listener.

Hope this helps.

-----
If you find this or other posts helpful, please do not forget to click the Kudo Star or to mark it as a Solution if you are the owner of the thread. Thanks :)
Please use plain text.
Frequent Advisor
ShishirG
Posts: 52
Registered: ‎06-20-2011
Message 6 of 10 (554 Views)

Re: sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

I made that change in the file and servers are started OK but they couldn't join the load balancer now.

 

when running sm -reportlbstatus from the server, it is complaining about LB not available.

 

 

Please use plain text.
HP Expert
FrankRen
Posts: 23
Registered: ‎03-18-2014
Message 7 of 10 (498 Views)

Re: sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

This is a SM jroups OOB configuration issue, SM cluster(HS) is based on jgroups. It was fixed by QCCR1E70834 "Servlets dropping on loadbalancer".

 

The fix is scheduled in SM7.11P22,  SM9.21P9 and 9.34.

Please use plain text.
Frequent Advisor
ShishirG
Posts: 52
Registered: ‎06-20-2011
Message 8 of 10 (495 Views)

Re: sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

Thanks everyone for their responses.
After tracing, I realized SM is also using UDP ephemeral ports along with TCP. So merely updating the FD_SOCK start port settings did not resolve the issue for me. I had to open up all ephemeral port range from somwhere 41000 - 65535 resolved the issue
Please use plain text.
HP Expert
FrankRen
Posts: 23
Registered: ‎03-18-2014
Message 9 of 10 (482 Views)

Re: sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

In fact, you can sovle this issue by yourself if you can not wait for the next patch I mentioned in my last post.

 

Edit the file udp.xml in SM installation Server/RUN directory, make those RED colored changes:

 

<config>
    <UDP
         mcast_addr="${jgroups.udp.mcast_addr:228.10.10.10}"
         mcast_port="${jgroups.udp.mcast_port:45588}"
         tos="8"
         ucast_recv_buf_size="20000000"
         ucast_send_buf_size="640000"
         mcast_recv_buf_size="25000000"
         mcast_send_buf_size="640000"
         loopback="false"
         discard_incompatible_packets="true"
         max_bundle_size="64000"
         max_bundle_timeout="30"
         use_incoming_packet_handler="true"
         ip_ttl="${jgroups.udp.ip_ttl:2}"
         enable_bundling="true"
         enable_diagnostics="true"
         thread_naming_pattern="cl"

         use_concurrent_stack="true"

         thread_pool.enabled="true"
         thread_pool.min_threads="1"
         thread_pool.max_threads="2"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="true"
         thread_pool.queue_max_size="100"
         thread_pool.rejection_policy="Run"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="2"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="true"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="Run"/>

By our lab tests, this basically eleminated the situation that in jgroups cluster some node being "Shunned" or "being doubted".

Please use plain text.
Frequent Advisor
ShishirG
Posts: 52
Registered: ‎06-20-2011
Message 10 of 10 (466 Views)

Re: sm log is full of - I was suspected by xx.xx.xx.xx:xxxx; ignoring the SUSPECT message

Thanks for the info .. it will be helpful for people reading this thread.

 

However I cannot try these steps as environment setup is now completed without issues by opening up ephemeral porta range of UDP and TCP ports on firefall

Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation