HP ITSM Insider Dispatch: How to Troubleshoot JGroups-related Issues

JGroups plays a very important role in HP Service Manager (SM). SM relies on JGroups to build a software cluster. With the software cluster in place, SM Server is able to evenly forward requests it receives from the SM load balancer (SMLB) in order to keep the load on each connected node in balance.  In this article, you will learn how to:

  • implement JGroups in a way that avoids/minimizes potential issues
  • interpret common symptoms so you know if you have a certain JGroup issue
  • use ideas to solve two common JGroups issues you may encounter
  • for issues needing further investigation, conduct an initial investigation and collect network  traffic and logs in preparation for further analysis by HP Support

 Please note: In the following text, “LB” stands for loadbalancer server, and “APP” stands for application server.

 

 

JGroup implementation best practices

 

1.  If the capacity is sufficient, then Vertical Scaling is preferable to Horizontal Scaling.

Justification: Deploying SM on a single piece of hardware can reduce the communication overhead associated with JGroups and reduce the possibility of network issues.

 

2.  A smaller number of computing nodes (servlets) is preferable to a larger number.  Always set the parameter ‘threadsperprocess’ to an optimized value of 50 or higher.

Justification: The level of UDP traffic will exponentially rise with the number of nodes (servlets).

 

3.  Always deploy Horizontally Scaled SM on the same subnet (to avoid any deployment across routers). Justification: SM relies on UDP multicast to build the Horizontal-Scaling cluster. In general, multicast messages are not good things to communicate across the subnet, and may cause a network storm if not handled properly.  

 

4.  If possible, always upgrade to the latest version of SM.

Justification: In SM 9.31p2 and later releases, lock synchronization has been moved from JGroups to Database, which improves the performance of JGroups and reduces the network traffic of UDP multicast significantly. Furthermore, in SM 9.32GA we adopted the new version of JGroups (v3.2.0), which has better performance.

 

5.   Always choose a unique groupname for each dedicated SM deployment. For example, choose two different groupnames for the Production and Test systems rather than sharing the same groupname. Justification: Using the same groupname more than once in the same subnet will cause some overhead for JGroups and make the cluster noisy.

 

 

Common symptoms for JGroups issues

 

1.   “sm -reportlbstatus” doesn’t work properly.

Message:  “Error - Couldn't obtain loadbalancer info.”

 

2.   “sm -reportlic” doesn’t work properly.

Message:  “Error - Starting system: HPSM1.13080, but system: SM930SERVICE.13080 seems to be already running on this database.”

 

3.  “System Status” doesn’t work properly.

 

4.   Some servlets become unavailable.

 

5.   Users can’t login to application servers.

Message:  “Error – Balancer couldn’t find any available nodes. Servers are too busy, please try again.”

 

 

Common causes for JGroups multicast-related issues:  

 

1.   The network in which SM resides has been restricted for UDP multicast. 

     Seek help from your network adminstrator.

 

2.   If SM was working fine for awhile, and then suddenly fails, it could be a change to the

     devices that run the network in which SM resides.  Check if a new switch has been installed.

 

3.   If SM is running on top of a virtualized environment, then Vmotion (which is a change in the

     virtual network configuration that restricts multicast) could have occurred.

 

Ideas to solve two common issues

 

1.   Failure running “sm –reportlbstatus” on LB or APP server

       a.  There may be a multicast ability issue with the network.

                Please conduct a multicast ability test based on the following document:

                http://support.openview.hp.com/selfsolve/document/KM1304227

            If there is a multicast issue on the network, please ask for help from your network administrator.

 

       b.  The UDP Buffer Size value may not be optimized.

            Please search the SM installation guide (using phrase “UDP Buffer Sizing”) to check

            all related content. 

            One item of importance to note is that the UDP Buffer Size value (e.g. referred to as

            “net.core.rmem_max” on Linux systems) should be set to a value of 4194304 (4MB) or above. 

            For better performance, if there is enough memory available on the server, choose a value larger

            than 4MB.

 

       c.  The Maximum Transmision Unit (MTU) setting on the server may be incorrectly set.

            If SM is deployed across a single subnet, then make sure that the “JUMBO packet” is not enabled

            as part of the MTU setting on the server. With “JUMBO packet” enabled, JGroups can’t sync

            the status of LB from the LB server to the APP server. In this situation, running

           “sm –reportlbstatus" on the APP server may fail.

 

2.   Success in trying to run “sm –reportlbstatus” on the LB or APP server when the total number of nodes

     is small, but encountering a failure when the total number of nodes is large.

     a.  Make sure the total number of nodes is reasonable. Too many nodes will result

          in bad performance and cause a failure when trying to run “sm –reportlbstatus”.

 

     b.  There may be issue with the MTU setting on the server. Make sure the MTU setting is optimized

          on the server. To determine if the MTU is optimized, see:

             Change your MTU under Vista, Windows 7 or Windows 8

          This process also works for Windows 2003 Server, Windows 2008 Server & Windows 2012 Server.

          If SM is not deployed on the Windows platform, then perform a search on the web to see how to

          check the MTU setting on the particular platform you are using.

 

How to conduct an initial investigation for JGroups issues needing resolution by HP Support

 

1.   Check the configuration files (sm.cfg and sm.ini)

The following parameters could be related to the JGroups situation you are invetigating:

  • groupname
  • groupmcastaddress
  • groupport
  • grouplicenseip
  • groupbindaddress
  • groupsubnetaddress

Please refer to help documents or HP Support to check whether they are properly configured.

 

2.   Conduct the following test to check whether or not an issue exists:

      a.  Add a debug line 'debugstartup' in the sm.ini files which will log additional

           debugging information at startup.

 

      b.  Clear the log files and restart the server.

 

3.   Conduct the following test to collect the logs:

      a.  Add the following JGroups trace parameter to all three of the sm.ini files:
log4jDebug:com.hp.ov.sm.common.resource,com.hp.ov.sm.common.cluster,com.hp.ov.sm.common.org.jgroups

 

      b.  Perform the following sequence of tasks:

           1)   Stop the servers

           2)   Backup/remove the sm.log files

           3)   Start the servers.

 

      c.  Reproduce the issue. It would be helpful if some screenshots are included.

 

      d.  Send the sm.log file for all the servers to HP Support.

 

4.   Conduct the following test to check multicast ability. See:
      http://support.openview.hp.com/selfsolve/document/KM1304227

 

5.   Reproduce the issue and provide the WireShark (Tcpdump for Linux) network logs to

      HP Support for further investigation.

 

 

NOTE: This content was submitted by Yu-Liang Xu, an HP Service Manager R&D Customer Assistant Team engineer

Leave a Comment

We encourage you to share your comments on this post. Comments are moderated and will be reviewed
and posted as promptly as possible during regular business hours

To ensure your comment is published, be sure to follow the Community Guidelines.

Be sure to enter a unique name. You can't reuse a name that's already in use.
Be sure to enter a unique email address. You can't reuse an email address that's already in use.
Type the characters you see in the picture above.Type the words you hear.
Search
Showing results for 
Search instead for 
Do you mean 
About the Author
A 25+ year veteran of HP, Yvonne is currently a Senior Product Manager of HP ITSM software including HP Service Anywhere and HP Service Man...
Featured


Follow Us
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation.