07-27-2011 02:57 AM
I have a problem here with starting an OM agent on a host
I can start the core, but when starting the coda, I get the follwoing error in the System.txt
ovcd (21541/3): (ctrl-41) Timeout while starting component 'coda'.
In the coda.txt there is no error found:
0: INF: Wed Jul 27 11:39:22 2011: coda (21639/1): Open File (/var/opt/OV/datafiles/coda00000)
0: INF: Wed Jul 27 11:39:22 2011: coda (21639/1): Open File (/var/opt/OV/datafiles/coda00001)
0: INF: Wed Jul 27 11:39:22 2011: coda (21639/1): Starting CODA(10.51.265:14:0:/Hewlett-Packard/OpenView/Coda
0: INF: Wed Jul 27 11:39:23 2011: coda (21639/1): Waiting for requests...
When starting the agent the output is:
root@lnvx0030:/opt/OV/bin # ./ovc -start -verbose
Starting control service.
ovbbccb:Component is already running.
But I can see all processes are running:
root@lnvx0030:/opt/OV/bin # ps -ef | grep OV
root 6401 17997 0 11:22:36 pts/1 0:00 tail -f /var/opt/OV/log/coda.txt
root 22989 21541 0 11:41:14 ? 0:03 /opt/OV/lbin/eaagt/opcmsga
root 7173 15352 0 11:23:52 pts/2 0:00 tail -f /var/opt/OV/log/System.txt
root 21542 21541 0 11:39:07 ? 0:00 /opt/OV/bin/ovbbccb -nodaemon
root 28441 21541 0 11:49:14 ? 0:00 /opt/OV/lbin/eaagt/opcmona
root 21541 1 0 11:39:06 ? 0:15 /opt/OV/bin/ovcd
root 25836 21541 0 11:45:14 ? 0:00 /opt/OV/lbin/eaagt/opcmsgi
root 24498 21541 0 11:43:14 ? 0:00 /opt/OV/lbin/eaagt/opcacta
root 27278 21541 0 11:47:14 ? 0:01 /opt/OV/lbin/eaagt/opcle -std
root 21543 21541 0 11:39:07 ? 0:04 /opt/OV/lbin/conf/ovconfd
root 21639 21541 0 11:39:14 ? 0:00 /opt/OV/lbin/perf/coda
Also the ping from another node shows that coda is good running:
root@lnvx0031:/opt/OV/bin # ./ovcodautil -ping -n lnvx0030
Ping of 'OvBbcCb' at: 'http://lnvx0030:383/Hewlett-Packard/OpenView/BBC/
Ping of 'Coda' at: 'http://lnvx0030:383/Hewlett-Packard/OpenView/Coda
MWA which uses alsocoda has no problems, so it must be some communication configuration between the core and the coda / agents that is wrong configured.
Is there one who has some tips for me?
Rick de Haan
07-27-2011 08:14 AM
Try stopping all the process OVPA, OVC, and opcagt. Double check that no processes are still running. Remove the coda data files. Restart ovc check to ensure all processes are running then restart OVPA.
07-27-2011 10:38 PM
It seems your coda data file has got corrupted, Stop Performance Agent and Operations agent on that node then delete all coda* files from datafiles directory.
then start Operations agent then check the status. later start Performance agent too.
Hope this helps
07-28-2011 01:04 AM
Thanks for the reply,
Before I posted my problem I was diggin in the forum to check for solutions, this one I tried also.
After starting the process created a new coda001 file but the problem is still there.
The strange thing is that the status of the subagents is on aborted, but I get the messages in OM.
So it looks to me that there is only a problem with the communication between ovc and the agents/coda.
Rick de Haan
07-28-2011 02:09 AM
Have you checked whether cada data collection is happeings?
Run following command on your Managed node to check whether data is getting collected by coda or not?
ovcodautil -dumpds CODA
you will find the last cycle data. check the date and time also metric wise data.
Comming to subagent not running,
Remove all the deployed policies from that node then check the agent status, at that time none of the subagents running coz there is no demon running coz no interceptor policies.
Now you can deploy the same policies type wise like first measurement threshold policies only then check the sub agent status. then move to next type.
Hope this helps..