09-18-2013 02:38 PM
I have a Alpha Server DS15 on a cluster with another DS15 server. The first machine restarts randamly while the second machine is ok. I would like to know what would be some reasons for a server to restart on its own.
09-18-2013 03:29 PM
I am trying to read the pagefile.sys file to know what would be the system error, if any. Can someone advise how to read this file. I have done the following.
09-18-2013 05:18 PM
$ ANALYZE/CRASH_DUMP SYS$SYSTEM:SYSDUMP.DMP
%SDA-F-OPENIN, error opening SYS$COMMON:[SYSEXE]SYSDUMP.DMP; as input
-RMS-E-FNF, file not found
later i did a search for any files on the harddisk but none found
$ dir dka100:[000000...]*sysdump*.dmp
%DIRECT-W-NOFILES, no files found
SEEMS LIKE THE UTILITY IS OFF OR SOMETHING, how can i restart the syste error log utility, so that when it happens again i can capture the error
09-18-2013 08:03 PM
Eddy, let's step back a bit. You're saying that one system "just resets" and ... Does it reboot fine and run for a while and then reset again? If you have a dumb terminal connected to the system's console? IF so then you'd be better off using a terminal emulator, for now, to capture and SAVE what your system does when it "resets." We can't help you without knowing AT LEAST what happens when it resets and the console output is crucial. IF it goes through a system crash we'll need the data from the console to start guessing. Once we know what it does then we can decide if your problem *requires* a dumpfile or not. If it just resets without a system crash then the dumpfile is redundant...for now.
We need the console output first. If you're getting a crashdump then we'll recommend actions to go further based on what that output shows.
09-19-2013 01:50 AM
The analysis of the errorlog can help in this case. For default, this file is in sys$errorlog:errlog.sys (root specific).
What is the version of the OpenVMS that you are running at your site ?
I suppose you are running a release of OpenVMS equal to or greater than OpenVMS v7.3-1 + VMS731_CPU2208-v0100 (minimum version that supports the Alpha DS15 class servers).
In case, I can analize the errorlog for you.
09-19-2013 03:09 AM
If there is no SYS$SYSTEM:SYSDUMP.DMP, OpenVMS will try to write the crasdump to PAGEFILE.SYS, so to analyze a crash in the pagefile, use ANALYZE/CRASH SYS$SYSTEM:PAGEFILE.SYS
09-19-2013 03:39 AM
>>> If there is no SYS$SYSTEM:SYSDUMP.DMP, OpenVMS will try to write the crasdump to PAGEFILE.SYS
If SYS$SYSTEM:SYSDUMP.DMP does not exist, and there is no DOSD device configured (Dump-Off-System-Disk), the operating system writes the dump of physical memory into SYS$SYSTEM:PAGEFILE.SYS, the primary system page file.
If the SAVEDUMP system parameter is set, the dump file is retained in PAGEFILE.SYS when the system is booted after a system failure. If the SAVEDUMP parameter is not set, which is the default, OpenVMS uses the entirepage file for paging and any dump written to the page file is lost.
09-19-2013 06:41 AM
BUT...IF this DS15 is really just resetting (as implied by the OP) then you might not have anything in errlog OR in any dump or pagefile at all. Not all system resets are crashes and, I'll admit, not all of them are those odd resets where you just have what looks like the system power cycled and restarts. The only way to have that information is to have a console connection to something that can record the output from the event (or non-event should this happen to be something different than a crash) and go from there.
09-19-2013 10:38 AM - edited 09-19-2013 10:40 AM
Unfortunately, this case was not well argued and we have few information to work on.
The post 1 recites : "I have an Alpha Server DS15 on a cluster with another DS15 server. The first machine restarts randomly while the second machine is ok".
The suggestion submitted with post 10 is absolutely valid, but it's my opinion that even the analysis of the system errorlog (using the right tools) may help in cases like this one :
1- we can confirm if the root cause of this behavior is a system crash or not (CLUEXIT Bugckeck...????)
2- if the system suffers of the hw problems
3- If this DS15 is really just resetting
and so on
Purely Personal Opinion
09-19-2013 07:38 PM
Thanks For Replying.
I will be uploading the operator log file when the machine restarts this time. Maybe then we will know where to go and look for the cause of the restart.
The server is running VMS 7.3 version.
09-22-2013 05:15 PM
Hello All, the server restarted again this Saturday at 11.09 am. I had a look at the operator log file for that day and time. Please refer below.
%%%%%%%%%%% OPCOM 21-SEP-2013 09:32:03.89 %%%%%%%%%%%
Message from user TCPIP TELNET on LBAWB3
TELNET Logout Request from Remote Host: 10.100.30.111 Port: 59569
%%%%%%%%%%% OPCOM 21-SEP-2013 10:14:39.23 %%%%%%%%%%%
Logfile time stamp
the server restarted here at 11.09 but nothing much logged.
%%%%%%%%%%% OPCOM 21-SEP-2013 11:14:39.25 %%%%%%%%%%%
Logfile time stamp
%%%%%%%%%%% OPCOM 21-SEP-2013 12:08:24.07 %%%%%%%%%%%
Logfile has been initialized by operator _LBAWB3$OPA0:
Logfile is LBAWB3::SYS$SYSROOT:[SYSMGR]OPERATOR.LOG;1440
Also while going through the log file, i found this....while the server was booting.
%%%%%%%%%%% OPCOM 21-SEP-2013 12:08:32.04 %%%%%%%%%%%
Message from user SYSTEM on LBAWB3
%LICENSE-E-NOAUTH, DEC OPENVMS-ALPHA use is not authorized on this node
-LICENSE-F-EXCEEDED, attempted usage exceeds active license limits
-LICENSE-I-SYSMGR, please see your system manager
however the machine booted and is online since than and working fine.
LBAWB3> sh dev
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA0: Mounted 0 WBDATA 65389230 116 2
$1$DKA0: (LBAWB3) ShadowCopying 0 (copy trgt DSA0: 52% copied)
$1$DKA100: (LBAWB3) Mounted 0 AXPVMSSYS 61614378 397 2
$1$DQA0: (LBAWB3) Online 0
$1$DQA1: (LBAWB3) Offline 1
$1$DQB0: (LBAWB3) Offline 1
$1$DQB1: (LBAWB3) Offline 1
$2$DKA0: (LBAWB4) ShadowSetMember 0 (member of DSA0:)
$2$DKA100: (LBAWB4) Mounted 0 ALPHA_0722 65796399 1 2
$2$DQA0: (LBAWB4) Online 0
Device Device Error
Name Status Count
OPA0: Online 0
OPA2: Online 0
OPA3: Online 0
ASN0: Online 0
FTA0: Offline 0
LTA0: Offline mounted 0
LTA2: Online 0
LTA3: Online 0
LTA4: Online 0
LTA5: Online 0
LTA101: Online spooled 0
LTA102: Online 0
LTA103: Online 0
LTA104: Online 0
LTA5022: Online 0
LTA5033: Online 0
LTA5034: Online 0
LTA5035: Online 0
RTA0: Offline 0
RTB0: Offline 0
TNA0: Online 0
TNA5: Online 0
TNA6: Online 0
TNA7: Online 0
TTA0: Online 0
Device Device Error
Name Status Count
LRA0: Online 0
Device Device Error
Name Status Count
EIA0: Online 0
EIA2: Online 0
EIA5: Online 0
EIA6: Online 0
EIA7: Online 0
EIA9: Online 0
EIB0: Online 0
EIB2: Online 0
EIB4: Online 0
MPA0: Online 0
PEA0: Online 0
PKA0: Online 0
PKB0: Online 0
PPP0: Online 0
SMA0: Online 0
LBAWB3> sh error
Device Error Count
$1$DQA1: (LBAWB3) 1
$1$DQB0: (LBAWB3) 1
$1$DQB1: (LBAWB3) 1
Thanks for the support
09-22-2013 11:11 PM
OPERATOR.LOG nearly never contains information about a crash or restart reason.
You need to capture the console (OPA0:) output from such a 'restart'. The ERRLOG.SYS file or a system dump file may contain additional information, if the 'restart' is caused by a system crash. Unfortunately, you need a tool like DECevent to decode the ERRLOG.SYS file on OpenVMS Alpha V7.3.
Also consider to set the console variable AUTO_ACTION to RESTART, if it should be set to BOOT. You can retrieve the current setting with WRITE SYS$OUTPUT F$GETENV("AUTO_ACTION")
09-23-2013 05:58 AM
You need to get this system configured to create a dump file. Or, confirm that in fact it is properly configured to create a dump file. Please read the OpenVMS System Manager's Manual - look for sections that discuss the System Dump and generating crash dumps. Until you get the dump file properly configured, you are wasting your time. (If you need help doing this, there are plenty of people around who can do this - many responding in this thread, our company included)
Just out of curiosity - how do you connect to the console of this server? Is there a console server involved? You also need to set up your console so that output to the console is captured and preserved, somewhere.
Software Concepts International
09-24-2013 06:14 AM
Using a USB serial adapter, connect a laptop to the console port and use a PuTTY session to capture the console output. Be sure to set the PuTTY session scroll back buffer to a large value. I use 20000.
09-24-2013 07:59 AM - edited 09-24-2013 08:00 AM
as I reported many times in this thread, the analysis of the system errorlog (not operator.log) might help us to better understand the root cause of your problem.
From the DS15, where the problem occurs :
$ create/dir sys$sysdevice:[temp_errlog]
$ copy/log sys$errorlog:errlog.sys sys$sysdevice:[temp_errlog]*
Transfer the sys$sysdevice:[temp_errlog]errlog.sys from Alpha to your PC in binary mode.
When you have done, contact me off line. We have to arrange the way so I can analyze the errorlog for you.
09-24-2013 08:09 AM
if the problem is a restart-crash (like MACHINECHK etc.), copying and analyzing the ERRLOG.SYS files does not help, if AUTO_ACTION is NOT set to RESTART and there is no valid SYSDUMP.DMP file set up.
Capturing the console output is the only way to find out, why your server is restarting unexpectedly. Once you know that, more analysis may be required (dumpfile, errorlog etc.).