12-23-2013 03:53 AM
I am facing the issue, able to start the package and in the cmviewcl output is not showing as running. any idea.. how to fix this... your help is much appreciated.
# cmrunpkg -v USTLCSWA401
Running package USTLCSWA401 on node dmoco1
Successfully started package USTLCSWA401 on node dmoco1
cmrunpkg: All specified packages are running
ustlsswa421:/# cmviewcl -vp USTLCSWA401
PACKAGE STATUS STATE AUTO_RUN NODE
USTLCSWA401 down failed disabled unowned
ITEM STATUS NODE_NAME NAME
Subnet up dmoco1172.25.211.0
Subnet up dmoco3172.25.211.0
NODE_TYPE STATUS SWITCHING NAME
Primary up disabled dmoco1
Alternate up enabled dmoco3
12-23-2013 08:14 AM
This would imply the package is starting, but then immediately failing. You can confirm this by checking the syslog file and then determine the failure by checking the package log file if the answer is not obvious from syslog.
12-26-2013 05:57 AM
12-26-2013 09:21 PM
Problem while executing the scripts. Do you have log file for 'master_control_script.sh' and 'startstop_sgdb.sh' ?
Are you sure that the package configuration files are not modified after executing 'cmapplyconf' command ?
Try to check the configuration again and apply the configuration.
12-26-2013 11:10 PM
According to syslog, the most specific log file in this case would be:
Is that the "cluster log file.txt" you attached?
The script /u01/app/oracle/home/re10.1.4/scripts/startstop_sg
INVALID PACKAGE NAME /u01/app/oracle/home/re10.1.4/scripts/startstop_sg
db.ksh: test: argument expected /u01/app/oracle/home/re10.1.4/scripts/startstop_sg db.ksh: test: argument expected
My first guess would be that the script requires some variable values that are not getting initialized properly for some reason.
The last two errors include a line number, so you should look at what is supposed to happen on lines 28 and 48 of startstop_sgdb.ksh.
On those lines, there is probably a literal "test" command or a "[ ... ]" block that probably includes some variables. Make sure that those variable names are correctly spelled (in case of variables provided by Serviceguard) or that the variable values have been initialized with appropriate values at some earlier point of the script (if they are created and used by this script only).
12-27-2013 06:31 AM
Thanks for your help...
Yes, that is the "cluster log file.txt" i attached.
Please find the database script in the attached txt file. Help me if you found any thing wrong in that.
12-27-2013 01:32 PM
I see two problems.
The first is that your script produces errors because it is not provided with appropriate parameters.
In this case, I think the script is designed to start the database when run with this command:
db.ksh start USTLCSWA401
... and to stop the database when run with this command:
db.ksh stop USTLCSWA401
But the current Serviceguard configuration will run the script without any parameters at all, like this:
Because there are no parameters, $ACTION and $SGPACKAGE will be empty strings. This is why the 'case "$SGPACKAGE" in ... esac' clause will output "INVALID PACKAGE NAME", and the tests on lines 'if [ $ACTION = start ]' and 'elif [ $ACTION = stop ]' will both produce "test: argument expected" error messages.
The second problem is in the design of your script. The startstop_sgdb.ksh will either start or stop the specified database, and then exit. When Serviceguard starts an application or a script as a service, it expects the application/script to keep running indefinitely, until killed. If the application/script dies on its own, it will be interpreted as service failure, and depending on package configuration, either the service will be restarted or a package failover will be triggered. Your startstop_sgdb.ksh is fundamentally unsuitable to be run as a service script, but with some small modifications, it could be useable as an external_script.
An external_script will be run with the "start" parameter when a package is started, and ths script is expected to complete before the package is considered completely started. Likewise, when halting a package, the external_script will be run with the "stop" parameter, and Serviceguard will wait for it to complete before unmounting the package disks.
The external_script will also be run with the "verify" parameter whenever the cmapplyconf command is used (more on this below).
You might consider writing another service script for monitoring the database: it should run in an infinite loop, sleeping for a suitable time, then performing some simple sanity check to the database, and then repeating this loop forever as long as the database passes the sanity check. If the sanity check fails, the service script should simply exit: for Serviceguard, that will be a signal indicating that the database has failed and a failover is necessary.
To make your startstop_sgdb.ksh work as an external_script, your script needs two modifications:
- first, it must be made to work without the second command-line parameter. The simplest way to do that would be to create multiple copies of the script, one for each database, and replace the "SGPACKAGE=$2" at the beginning of the script with a hard-coded package name, e.g. "SGPACKAGE=USTLCSWA401" for this package. I'm sure that there are other, more clever solutions too.
- second, in addition to start and stop, the script must be suitable to run with a third keyword: verify. Your script will actually fall through without doing anything and without producing an error if called as ".../startstop_sgdb.ksh verify", so no changes are actually required for that. However, it would be good style to add some tests, e.g. verifying that the other required files and scripts, like /etc/oratab and $SCRIPTS/switchdb.ksh exist and are readable when the script is called with the "verify" keyword.
After making these changes, the next step is to edit your package configuration file to call your script using the external_script keyword instead of the service keyword, and then use the cmapplyconf command to re-apply the package configuration.