06-24-2005 12:07 PM
quorum disk defined as a virtual disk
on an EVA5000. (VMS V7.3-2)
I want to delete and recreate that disk,
but I'm not clear on the steps. (The HP
docs I've seen so far are not clear.) My
question is similar to "how would I replace
a failed Quorum disk", which has got to
be a somewhat common situation so I'm surprised that I haven't found explicit
docs on this.
Is one approach:
1) Shut down cluster
2) Delete and recreate disk on EVA
with same Unit ID
3) Boot cluster
4) Init disk from VMS side
5) Reboot cluster so cluster file
will get created on new disk
Will the above work?
Is there a simpler way?
Solved! Go to Solution.
06-24-2005 12:51 PM
1) Dismount/cluster qdisk; assuming it is not a pageswap/dosd/system disk or has other open files.
2) Delete and recreate on EVA
3) Reinitialize disk
4) Reboot one node
06-24-2005 07:54 PM
dismounting the quorum disk in a running cluster works (tested on V7.3-1), so you could start with DISM/CLUSTER qdsk. Access to the quorum disk will be temporarily lost but will be re-established immediately.
Assuming that your votes are set up in a way, to allow the cluster to maintain quorum, if QDSKVOTES are not present, you could then delete and re-create the quorum disk on the EVA. This will cause access to the quorum disk to be lost, but the cluster should continue if 2 nodes are up (assuming 2x VOTES=1 and QDSKVOTES=1, i.e. QUORUM = 2).
You may need to do SYSMAN> IO AUTO or IO SCSI_PATH_VERIFY after re-creating the disk unit.
Then INIT and MOUNT/SYSTEM the new quorum disk, this will allow the QUORUM.DAT file to be created by CLUSTER_SERVER and connection to the 'quorum disk' will be re-established.
06-24-2005 11:40 PM
just the fact that the QDSK _IS_ a disk with a virtual hardware name makes this much easier than the case with a physical disk.
You just have to make sure of two things:
- you have to re-create the exact same-named unit
- during the period of removal though re-creation of the unit you have no "headroom" in quorunm voters, so you have to make as sure as you can that you do not loose any voters
If you think you need to change the deviceNAME of the quorum disk, yhen a cluster shutdown is the only simple way.
(I think it should be possible to do it in a rolling way as well, but that requires thorough planning, and several reboots and voting manipulations. Not for the faint of heart, nor for the unexperienced. I even doubt whether any such route will be supported)
Just first remove the old unit, and create the a new with the same name will be your best route.
Have one on me.
06-28-2005 09:00 AM
your approach would be okay but a simplier point 3: boot 1 node minimum or boot the VMS cd for the VMS init.
Any further minimum boot makes no sense because doing this a quorum.dat will not be created.
In case of making the quorum disk unavailable to the quorum disk watcher nodes a cluster state transition will occur.
So far this is no problem but Iâ m really interested if the way â let all nodes running all over the timeâ of Volker and Jan is a totally smooth one related to the activities of the connection manager. Anyone who've made it already in this way?
06-28-2005 09:05 PM
would you accept a test on a V7.3-1 single cluster node with a local SCSI quorum disk as a proof-of-concept, that you can swap the quorum disk in a running cluster - if you can provide enough votes to keep the cluster running or are willing to use the IPC interrupt (or AMDS) to recalculate quorum ?
The attached file shows a simple test on how this can be done - and it does work !
The different steps are labeled  to :
 boot a single cluster node with VOTES=1, EXPECTED_VOTES=1, QDSKVOTES=1 and DISK_QUORUM=DKC500 - no quorum file does yet exist.
 mount the designated quorum disk (DKC500), this will cause QUORUM.DAT to be created automatically by CLUSTER_SERVER - even if you only mount that disk privately.
 dismount the quorum disk.
 unplug the quorum disk
 As dynamic QUORUM is 2, step  will cause quorum to be lost (in this simple config), but it can be easily regained using the IPC>Q interrupt.
 plug in the physical quorum disk into DKC400 slot and delete QUORUM.DAT (if I would have had an empty new disk, I could have used it and just do an INIT)
 plug the disk back into DKC500 (note: there is NO quorum.dat file anymore on that disk !)
 mount the 'new' quorum disk again. CLUSTER_SERVER will create QUORUM.DAT and the quorum disk will become active again.
NO REBOOTS needed at all. And even if your cluster would loose QUORUM, if the quorum disk dies, you could use IPC/DECamds and recover without any reboot.
06-29-2005 04:38 AM
IPC is the Interrupt Control Program.
You enter it at the console (used to be ^P ; nowadays whatever the specific hardware requires)
>>> D SIRR C
deposits hex C in the SIRR register, meaning set IPL 12
at IPC force Quorum recalculation
Continue normal operation.
--- in a cluster, this HAS to be COMPLETED within RECNXINTERVAL
It has always been around, AFAIK, although the Vax syntax was slightly different.
and, this is what AMDS can do for you, and quick, when you ask it to force quorum.
Have on eon me.
06-29-2005 05:18 PM
to exit the IPC> interrupt, you need to enter
The IPC (short for IPL C interrupt, where 0xC is IPL 12.) is described in the System Managers Manual Volume 1: Essentials
Chapter: Using Interrupt Priority Level C (IPC)
06-30-2005 07:36 AM
I wouldn't had a doubt in such a configuration but many thanx for your poc. The reason I've asked is that I cannot test at the moment the whole scenario with the recreation of a virtual quorum disk. So I can only "believe" that the same will work.
One other point raised looking at your logfile: it can be handled in a smooth way only as long as mvtimeout has not been reached, right ?
06-30-2005 08:02 AM
| deposits hex C in the SIRR register, meaning set IPL 12
Hm, SIRR is the Software Interrupt Request Register - my understanding is that it requests an IPL 12 interrupt, but does not set the IPL to 12. Imagine what happens if the processor is currently running at a higher IPL - not a good idea!
Extra points if you know what fork processes are for ;-)
The next step would be:
| IPC> Q
| at IPC force Quorum recalculation
| IPC> C
| Continue normal operation.
Volker has already commented that "C" is used to cancel a mount verification.
| --- in a cluster, this HAS to be COMPLETED within RECNXINTERVAL
Right, and there must not be a bug in the VAX-8600 ;-)
| It has always been around, AFAIK, although
| the Vax syntax was slightly different.
>>> D/I 14 C
06-30-2005 09:30 AM
what I miss is the possibility to use a quorum file on a HBVShadowed disk :-)
Why is that not supported (and never will be !!):
Suppose a cluster with two equal halves (for ease of concept, take a two-site cluster, but the principle is general).
So, each halve has n votes, and there is a quorum vote. Expected_votes 2n + 1, quorum n + 1.
If the halves lose sight of each other, one halve has the qdsk, has quorum, and can continue; the other halve looses quorum.
The cluster integrity is guarded.
Now, suppose the qdsk is shadowed.
Again, the halves loose contact.
That could well mean that the shadow members loose contact.
Now EACH halve sees its member of the qdsk shadow set as THE qdsk.
Both halves maintain quorum, and within a few handfulls of IOs (say, miliseconds?) your data is seriously corrupted..
HPUX has a good descriptive name for that situation:
A "SPLIT BRAIN CLUSTER".
So: _NO_ shadowed quorum disk. Period.
Have one on me.
06-30-2005 12:16 PM
sure, you're absolutely right, it will never be supported but nevertheless I would wish I COULD use a (one site located) shadowset as a quorum disk volume.
Also with the current (and long existing) VMS implementation I can set up the whole cluster in a non supported way so that cluster partitioning can occur (at least at boot time)!
So I still say: I would like to HAVE the possibility to use a shadowset as a quorum volume. If I set up the configuration in a way that can lead to trouble then this is my configuration failure in each case.
Cheers (and a proost)
06-30-2005 12:44 PM
- sysgen parameter quorum_disk remains as it is referring to a special disk
- this disk is a member of a shadowset
This would mean: no change to the current behaviour (but probably a lot of re-writing VMS code).
Simple(?) question: am I right or totally wrong ?
06-30-2005 06:01 PM
Setting SIRR to C requests an IPL 12. interrupt, which then will issue the IPC> prompt (running at IPL 12.)
Using the IPC mechanism in a SMP system, may lead to CPUSPINWAIT or CPUSANITY crashes, so AMDS is definitely the better choice ;-)
If the quorum disk has gone dead and MVTIMEOUT has expired, you can still DISMOUNT/ABORT it, IF no (other) open files are on that disk.
Shadowsets can split as well and if either side of the cluster then continues with it's local member ?! Just use a small quorum node, put it in a safe place and forget about it.
07-03-2005 09:22 PM
Purely Personal Opinion
07-04-2005 12:50 AM
Shadowing the q disk is not allowed for the case that the 2 disks would start to live separately.
But mirroring is allowed because the controller hides the fact from VMS.
So, shadowing 2 disks behind 1 (dual) controller should also be possible.
07-05-2005 11:28 AM
I simply wish to use a quorum disk as part of a shadowset just to be able to use this shadowset for more than quorum.dat, pagefiles etc. (and in general I prefer a quorum node).
If the quorum disk is defined all over the cluster as 1 dedicated member of a shadowset, then the case can not occur that this important part of a clustered system starts to live separately. I know, this would mean change of basic vms code as I've mentioned before, but this is another story.
And a last remark: sure, we use mirroring for our quorum disks ...
07-18-2005 04:08 AM
BUT if you use it for other purposes, the other purposes will govern whether you mirror it. If it is a SWAP/PAGE disk (ONLY) then you STILL should not mirror it. You are wasting WRITE operations on the mirror.
But if you are using a shared system disk, that disk can be the quorum disk. Ask yourself the purpose of a quorum disk. It is to prevent the system from coming up if you don't have enough to make it work. Well, if you are on a cluster with an even number of physical members AND the system disk is shared, it is the best candidate, bar none. 'cause if you don't have the shared system disk, you are SO hosed...
Now if you have distinct system disks, this isn't true. But if you want to do this "right" then consider some applications disk without which you shouldn't be running your system. Like if you have a separate disk for user home directories, make THAT you quorum disk. QUORUM.DAT doesn't really contain much data anyway.