ServiceGuard problems when DNS server changed (73 Views)
Reply
Occasional Advisor
Joe Geiger
Posts: 13
Registered: ‎01-29-2009
Message 1 of 1 (73 Views)

ServiceGuard problems when DNS server changed

I have a question concerning ServiceGuard and name resolution (i.e. DNS).  The general question is this: When we lose the DNS server listed first in our resolv.conf file, the system experiences prolonged delays when executing cm* commands to manage a ServiceGuard cluster. 

 

During our last DR test, we performed a DNS server change in which virtual hostnames were swapped – that is, the IP addresses were not changed but the (virtual) hostnames were changed.  For example, if the IP addresses of pkgA  and pkgB were 10.1.1.1 and 10.1.1.2 respectively, after the DNS switchover, the IP addresses for pkgA  and pkgB were 10.1.1.2 and 10.1.1.1 respectively.  The IP addresses and hostnames for the physical nodes in the two clusters did not change.  Our nsswwitch.conf has but a single line in it – “hosts: files dns”.  The virtual hostnames for the packages are *not* in our hosts file.  Also, all servers are running HP-UX 11.31 and MC-SG rev 11.20.

 

When the network group performed the DNS change, we lost, for all practical purposes, the ability to manage the cluster via cm* commands.  I say “for all practical purposes” because, for example, a cmviewcl would successfully complete – after about 10-12 minutes.  In the end, we ended up rebooting all nodes in the cluster and all was OK.

 

Our initial investigation into this issue led us to much discussion about how HP-UX does not cache DNS info – thus, any subsystems that initially read the resolv.conf file must be restarted to recognize the change to DNS.  That’s our take from this: http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=120&prodSeriesId=...

 

So, our specific question is this: With regard to ServiceGuard, what’s the best way to address a change to the resolv.conf file (like when the first DNS server listed therein drops)?  Is there a way to recover cleanly and quickly from this w/o having to reboot?  We thought one possible solution might be to restart the cmcld daemon but, for the life of us, we cannot figure out how to do that. 

 

Has anyone run up against this and/or come up with any goodness to deal with it?  Thanks in advance!

Please use plain text.
The opinions expressed above are the personal opinions of the authors, not of HP. By using this site, you accept the Terms of Use and Rules of Participation