Here is a document I wrote one evening for a friend to use in their Disaster Recovery Weekend Docs as a L1 section.
With a pair of mirrored root disks, you have some fail-over reliability. If one fails, the other carries on. If you dont boot, HPUX knows which is the good one, and that it is good.
You dont want to pull the wrong one!
On the running system, run 'showboot' to see what is set as the PRIMARY path, and what is probably the ALT-PATH (not necessarily). Use ioscan -kfn to convert numbers into device names.
Dont worry, if you ever had to do this, the HP engineer would do it at the hardware level.
On the running system, (or from an archived printout), obtain the
output from ioscan -kfn
. It helps you convert device names
to and from device numbers.
Supposing c2t2d0 is the primary path, you want to know which disk that is. Simply READ a lot from the disk, and see which light comes on, and label it with a sticky tab. Be certain though.
dd if=/dev/c2t2d0 of=/dev/null
There is no need to do this, as you shall only remove the primary.
It is worth remembering, that the entire disk is NOT primary or secondary, but that each part of the first disk is repeated or mirrored on the second disk. We call them that for sanity, but HPUX has no such concept. It has a selector variable to pick a disk to boot from, and LVM mirroring.
If you follow this test script, (on a different system), you will need to know the name of an LVOL, so that you can check the status of its PE's (Physical Extent = 4 M track, with independent versioning).
You may wish to get a list of all the LVOL's on the disk that you are removing (and also check everything on that PV is mirrored!)
vgdisplay -v /dev/vg00 will show you a list.
Run lvdisplay on a mirrored LVOL (on the disk you are looking at), and check that it says 'current' on both disks. Check syslog.
You can ONLY remove a disk if it is 'hot-pluggable'. ULN5 has its disks in the external Jamaica cabinets, with those grey/blue levers to pull and reseat disks.
Now remove one of the two disks, by pulling it out by 1/2"
Check syslog, dmesg and lvdisplay. You should see error messages, saying POWERFAIL (on the disk address) but the system still runs. lvdisplay should show errors.
Now firmly/gently put the disk back, and check syslog/dmesg until you see the POWERFAIL/RECOVERED message. That indicates that the disk has been seen and checked by the OS, as being there and functioning correctly.
Quickly check lvdisplay (rerunning several times) and you will see some physical extents change from ERROR/STALE to current. That is the LVM bringing both sides up-to-date.
When that is finished, the system is fully back to normal.
If you reboot, with a disk missing, HPUX cannot be certain that the disk that works, has the proper data on it, so the LVM refuses to activate the LV's. When you boot via the alternate path, it suceeds in finding, loading and running the kernel, but when it then activates LVM, it fails.
If you had three equally mirrored disks (what HP calls two mirror copies), and two disks vouched for each-other (ie showed the same data revisions), then HPUX would believe the two, and boot without the third. HPUX calls that quorum (more than half).
Since you have two equally mirrored disks (one mirror copy), when one is down, the system wont boot, unless you tell it to avoid the quorum check (using hpux -lq).
It is worth trying to see what happens when you dont type anything (ie dont boot with hpux -lq), and it only takes 5 minutes, as a full reboot is not necessary, it quickly drops back into IPL mode.
As well as two mirrors, (of each of the LVOLS), there are two boot tracks. If you boot without the alternate disk, hpux might not notice until it tries to actives the LVM. If you boot without the primary path, hpux will notice straight away.
Find out which is the primary path by running 'showboot'. Find out which is that physical disk by reading from it and checking the LED. (See above).
Shutdown the machine, using shutdown -h, and remove the primary path disk, and boot. Booting takes at least 20 minutes, because the system wants to check everything.
First let it try its own thing, to see what happens. It will detect the missing pri-path disk and should return to the BIOS command line.
To get here without removing that disk, press the space-bar during booting, when it says Press-Any-Key within 10 seconds. Options include:
HELP MENU
Its important to realise that there are two levels. There is the machine BIOS, which does not come from any disk, and there is an IPL-BIOS which comes from the disk. You have not read the disk yet. (Which is just as well, since you have removed it/one of them).
You can set the PRI and ALT PAths from here, or you can leave them and boot from a named (numbered) path. You can also do that from the second stage loader, or from UNIX.
SEA
or 'search', will tell you the list of devices
which the bios allows you to attempt to boot from. A recent copy of ioscan
helps, so that you KNOW which is the tape, cdrom or disk. To boot an
ignite tape, boot from that tape device.
(DO NOT RUN YET) BO ALT
or 'boot alternate', will
attempt to boot from the configured alternative path. BO 8/8.8.0
will attempt to boot from that SCSI controller, that LUN device. (Dont do
that, now, unless you want to)
The setting of the ALT PAth is not important, and may be wrong. You can change it if you wish, from here, from the IPL prompt or from a running UNIX.
BO
or 'boot', will probably offer you a choice of
PRI/ALT bootable paths (disks), and also a chance to 'interact with the
IPL'. Say 'Y' if you want to specify the -lq option later. If you said
BO ALT
, you wont see the first two options.
Boot from primary - N Boot from alternate - Y Interact with IPL? - Y
That leaves you in the ISL BIOS, loaded from the disk. Again try
HELP and MENU. hpux show autofile
is like showboot, but
from BIOS not from UNIX. If the ALTPATH is already set, life is a tiny bit
easier, but it is often set to the tape (which you want to boot from for
ignite), or the CDROM (which you want to reinstall). Commands include:
help hpux # to boot as normally hpux show autofile # like lifcp to screen hpux -is boot # boot to single user state hpux -lq # boot without quorum check hpux -lm # 3-38 - NO SWAP NO LVM # AVOID hpux /stand/vmunix.BCKUP # for the old kernel hpux .... # see 2-7 for other kernels hpux .... # also combinations of options primpath 8/8.8.0 # permanently use different boot path
DO NOT RUN: hpux -lm It will take you to maintenance mode, but you will have to reboot, so its mostly for when you think you can recover the disk and immediately reboot.
hpux -lq
will boot with no quorum check, which will
get your entire machine working. You can also add the -is to go
into single-user state, then do an init 4 , but that might be
confusing if you have no reason to do so.
If you have a second machine, up and running, try 'man hpux'
Why is ISL also called IPL ? The Initial Program Loader, loads the Initial System Loader from disk, and then runs the ISL. Its similar to the way HP-CDROMS have one printed label on the media, and another label on the carton.
Boot using 'hpux -lq', and the machine will come up cleanly (presuming that disk was the only one missing, and other LVOLs wont have difficulties). Otherwise add the '-is' option and figure it from there.
There will still be messages about the disk not being there, but the other mirrored parts make it non-fatal.
Quickly test that Informix is functioning at all, check syslog.
Now return the missing disk, by gently pushing the lever (over a catch). The LED will blink, and there may even be a SCSI bus reset as the controller detects it.
HPUX will not know its back, or even that it exists! You need to
run ioscan
, without the -k option (from kernel memory),
but probably with -fn and through pg.
The HPUX kernel now knows that the device exists, but the LVM is still running without it. (I'm presuming the replacement disk is the original disk, with the old LVM info on it. If not, you will need to check vgcfgrestore in the manual pages and the Admin-Tasks-Guide -- again -- if that ever really happens the HP engineer will do that with you).
vgsync
is required to tell the LVM to look-for,
find and re-sync with the disk. Notice that you didn't do that when you
didn't boot. You must do it this time, because the LVM has completely
forgotten about the absent disk, and is not polling to find it.
lvdisplay -v /dev/vg00/lvol11
will show you lots of
physical extents in error or stale, but repeated running will show you the
LVM bringing each PE back into CURRENT status.
You can now proceed, or if superstitious, you can do a full normal, unattended reboot, to be sure.