01-03-2013 12:22 PM
Hopefully somebody can help as I've been trawling the interent to see if I can find the cause of a system boot error, with no luck.
I am running an old Alphaserver 3000 Model 700, using OpenVMS 7.1-1, nothing fancy attached, two internal hard drives and an external tape drive.
Everything was fine until sometime over the Christmas period when the system failed with a "Machine Check" error. When I now try to reboot the system, it goes through all the power up tests ok, and gets to the >>> prompt. I try to boot from here, and the system reports all the INIT phases ok, and then drops out with;
?07 MCHK FR PAL
PC = 00000000.00054023
And won't go any further.
Repeated attempts to boot the system mostly give the above error, but occasionally I get;
?06 MCHK DBL
It would be most welcome if anyone can shed any light on the possible cause of these error messages!
Thanks very much
01-03-2013 03:04 PM
Paul, I hate to say it but that combination of error messages usually mean that something in Alphaland has died. I can't remember, because there were SO many different systems in the DEC3000 family, if yours is a "desktop" or "deskside" unit. In either case you might try reseating everything that "plugs in" (the desktop has one main board and everything plugs into it where the deskside has a more insidious design and there are two modules that connect through the middle metal bulkhead between the sides of the system...one side with peripherals and the other side with memory module carriers). The first error, the PAL failure, implies to me that something on the CPU board has probably died...or, if you're sorta lucky, you might have "just" lost the battery and settings.
With Google I did a search for "DEC3000 ?07 MCHK FR PAL" and got several pointers. One you might want to dig into is the service manual at: http://manx.classiccmp.org/collections/mds-199909/
The ?06 MCHK DBL basically is saying that you suffered a machine check while the system was taking a machine check. This is a Pretty Bad Thing. Need more info, more output from the console during POST or any other output you get from trying to boot. Just keep in mind that this sounds like it'll be expensive (if you have no maintainer on contract especially). If reseating "stuff" doesn't make it any better then you're going to have a difficult time finding what broke.
01-06-2013 10:58 AM
Thanks Steven and Bob for your responses!
Believe it or not, in this modern day, this machine is/was being used to run a business!
It has been running a bespoke DBL application for many years, and I was brought in to assist with migrating to a new system, which, for various reasons has not yet been completed. As such, it is now a priority of mine to try and get it working again!
I believe the problem is either with the main board, or the I/O board so am currently looking to see if I can get replacements! Or, failing that, another VMS system.
So, if any readers have anything available I would most appreciate contact from you! This is not a massive system, the system that has just failed was supporting 5 users, and the application, plus data was on a 2gb HDD.
Thanks again for any input!
01-07-2013 07:09 AM
If the software works, why fix it? Consider moving to an Alpha emulator to upgrade the hardware and keep the software.
Disclosure: We sell Avanti, an Alpha hardware emulator.
01-09-2013 12:51 PM
Whilst waiting for a new server to e built to host an Alpha emulation package, I've taken the Alpha 3000 to pieces with a view of then putting it all back together, just in case something had worked loose.
In doing so I have established that there is a bent pin on the I/O board, in the SCSI connector. Hopefully tomorrow I will be able to get this pin straightened, but I'm not too sure how this could cause my problems as it's obviously been like this for a while, and no-one has touched the machine for at least 12 months.
Question for the panel; are all the pins on a SCSI connector actually used? The bent one is one of the very end ones, next to the securing clips.
01-10-2013 03:52 AM
01-14-2013 01:05 AM
Bought a refurbished I/O board, for a model 600, not a 700, but I understand they are inter-changeable.
Everything plugged in, put back together, I'm now getting an NVR failure on the self tests.
Typing TEST NVR at the >>> prompt I get ;
T-STS-NVR - NVR_REG TEST
? T-ERR-NVR - VRT BIT FAILURE, FINAL CHECK
T-STS-NVR - FAILED, status = 20
?? 002 NVR 0x0020
Dan, thanks for your offer,but from the above I don't think it is my I/O board now. I'm based in Slough, Berkshire, UK.
01-14-2013 04:48 AM
According to my copy of the Dec 3000 Model 700 Service Addendum manual this error is reporting a problem with the I/O board, with solution 1 to reseat the board, which I have done, and option 2 to replace the board.
Looking at my Dec 3000 Model 300 Service Guide, it says the fault is the system module.
Part code for the I/O module is 54-21813-02
Part code for the system module is 54-23153-05
Do you have either/both of those?
01-16-2013 03:17 AM
I have sourced another I/O module, and am now back to my orignal error.
Machine goes through the power up tests ok but fails on attempting to boot with the error;
**** System Machine Check Abort has occurred *****
This leads me to believe the problem is with the system board, so, if anyone has either a 3000/600 or 3000/700 system board just hanging around, I'd like to hear from them!
I believe the part number for the 3000/700 system board is 54-23153-05 and I'm not too sure what the part number for a 3000/600 is!
01-22-2013 05:59 PM
What I'm hearing here is technically known as "flogging a dead horse".
A DEC3000 is a 1st generation Alpha system and must be coming up to 20 years old. It's slow, small capacity, hot and very power hungry by today's standards. Rather than continue eating time and money trying the fix whatever is broken this time, and realising that even if you manage to fix it, something else will almost certainly break in the near future, why not take the cheaper, simpler and ultimately longer term solution of moving your application to an emulator. You won't believe how much faster it runs, and how much less power it consumes.