Chapter 9
Inode Duplication Facility
As dependency on computer systems increases, the effects of job stops caused by hardware errors
are having a greater impact. To cope with this, the file system management information is duplicated to
enable a reduction in the number of job stops resulting from errors and in the recovery work.
If an I/O error occurs in the management area due to the occurrence of a track fault, all other
I/O operations can be performed to continue the operation by creating (duplicating) the contents of the
file system management area in the other virtual disk.
The virtual disk containing the file system is called
the master VD (virtual disk), while the other
virtual disk is called the copy VD.
Devices that can be duplicated include magnetic disks and striping disks. The file systems to be
duplicated are SFS and SFS/H. The management areas of these file systems are duplicated. A management area
to be duplicated includes the following information.
- Superblock
- Linked list block (SFS only)
- I-list
- VVTOC (for reallocation ON or SFS/H)
However, the following file systems cannot be duplicated.
- File system whose management area consists of two or more virtual disks
- File system created in the N7763 disk device
- File system created in XMU
- Root file system
Duplication can be registered by specifying the master and copy VDs in
dupconf(1M). The registration information is stored in VVL. Since closing information is also stored or updated in VVL
during operation, the recovery for duplication is required if a VVL error occurs (described in
Section 9.6.7).
Writing to duplicated areas is performed for both the master and copy VDs.
In this case, if an error occurs, the subsequent I/O for the error system stops
(is closed) and writing is performed to the other, normal system.
Reading is performed from the master VD. However, if an I/O error occurs in the master VD, the
master VD is closed and reading is performed from the copy VD. If the master VD has already been
closed, reading is performed from the copy VD. While either the master or copy VD is closed, and an error
occurs on the other system, an I/O error occurs without closing both systems.
The following new commands are provided to enable use of the duplication facility.
devinfo(1M) can be used to reference the duplication definition information.
This section explains the setting procedure for the duplication facility. The command examples in
the explanations have the following configuration.
- Master VD -- SFS or SFS/H file system constructed in the virtual volume /dev/rdsk/100,
consisting of the virtual disks of /dev/rid/010 and /dev/rid/011
- Copy VD -- Virtual disk of /dev/rid/020
- When a new file system is created
Use the following procedure to create and duplicate a new file system.
- Procedure for other than SFS/H
- Create a virtual volume. (The existing procedure is used up to the virtual
volume attribute setting.)
- Obtain the size of the file system management area using
the dupmkfs(1M) command.
Example:
# dupmkfs -i /dev/rdsk/100
- Create the virtual disk for the copy VD. (Create a virtual disk having a size that is equal
to or greater than the size obtained in step 2.)
- Create the file system and set the duplication using the dupmkfs(1M) command.
Example:
# dupmkfs /dev/rdsk/100 /dev/rid/020
- Mount the file system, and then start the operation.
- Procedure for SFS/H
- Create the SFS/H file system. (The existing procedure is used.)
- Obtain the size of the file system management area using the
dupconf(1M) command.
Example:
# dupconf -i /dev/rdsk/100
- Create a virtual disk for the copy VD.
(Create a virtual disk of a size equal to or greater
than the size obtained in step 2.)
- Set the duplication using the dupconf command.
Example:
# dupconf /dev/rdsk/100 /dev/rid/020
- Copy the duplicated area from the master VD to the copy VD using the
dupcopy(1M) command.
Example:
- Mount the file system and start the operation.
- When the existing file system is duplicated
The procedure varies depending on whether the reallocation facility is being used.
- Procedure when the reallocation facility is not being used
- Demount the file system.
- Obtain the size of the file system management area using the
dupmkfs(1M) command.
Example:
# dupmkfs -i /dev/rdsk/100
- Create the virtual disk for the copy VD. (Make it the size obtained in step 2.)
- Set the duplication using the dupconf(1M) command.
Example:
# dupconf /dev/rdsk/100 /dev/rid/020
- Copy the duplicated area from the master VD to the copy VD using the
dupcopy command.
Example:
- Mount the file system and start the operation.
- Procedure when the reallocation facility is being used
- Save the file system.
- Demount the file system.
- Obtain the size of the file system management area using the
dupmkfs(1M) command.
Example:
# dupmkfs -i /dev/rdsk/100
- Create the virtual disk for the copy VD. (Make it the size obtained in step 3.)
- Create the duplicated file system using the dupmkfs(1M) command.
Example:
# dupmkfs /dev/rdsk/100 /dev/rid/020
- Mount the file system.
- Restore the file and restart the operation.
The procedure for canceling a duplication follows.
- Demount the file system.
- Perform recovery if the master or copy VD has been closed. (The procedure is explained in
Sections 9.6.2 and 9.6.3.)
- Cancel the duplication using the dupconf(1M) command.
Example:
# dupconf -r /dev/rdsk/100
- Mount the file system.
Duplication must be canceled and then reexecuted when a duplicated file system is reconstructed
or attributes are changed (reallocation facility on/off, cluster size change, etc.).
The reconstruction procedure follows.
- Cancel the duplication (explained in
Section 9.4.2).
- Reconstruct the file system. (Use
dupmkf(1M) if mkfs(1M) must be executed.)
- Set the duplication (explained in
Section 9.4.1).
If the tar, mtar, cpio, dump, restore, or
dd commands are used for backup and restoration,
matching of the duplicated areas is retained after execution. The user need not take these operations into account.
If duplication is used, be particularly careful when executing the following commands.
- catdev(1M) -- An error results.
- initvvlm(1M) -- An error results.
- vvattr(1M) -- An error occurs upon executing on/off of the reallocation facility or upon changing
a cluster size. Execute this command after canceling the duplication.
- mkvl(1M) -- Information relating to the duplicated virtual volume must be changed after
the duplication is canceled because the duplication information is cleared.
- deldev(1M) -- The duplication information is cleared when the virtual volume configuration is canceled.
Duplication enables a reduction in the number of job stops caused by errors, and also of the error
recovery thus incurred. However, only the file management area is duplicated. The data area is not. Therefore,
it is necessary to prepare for errors by collecting regular backup copies as usual, even if duplication is
in progress.
This section explains error isolation and recovery if errors occur in duplicated systems. The recovery
procedure conforms to that described in Chapter 8, Disk Recovery. However, the work involved
in recovering duplicated areas is additional to that explained in Chapter 8, Disk Recovery. This section describes the
general procedure, centering on duplicated area recovery (duplication information matching). Therefore,
see Chapter 8, Disk Recovery, for details of the procedure for error correction. Since the recovery procedure varies with
the operation mode, examination of each user site is required.
If an error occurs in a portion that is not duplicated, use the existing recovery method. The
command examples in the explanations are assumed to have the following configuration.
- Master VD -- SFS constructed in virtual volume /dev/rdsk/100, consisting of the virtual disks of /dev/rid/010 and /dev/rid/011
- Copy VD -- Virtual disk of /dev/rid/020
If an error occurs in a file system that is performing duplication, the procedures to be performed vary
with the error portions and on the detailed contents of the errors.
Errors are classified as follows.
- If the master VD is closed
- If the copy VD is closed
- If an error occurs in the copy VD while the master VD is closed
- If an error occurs in the master VD while the copy VD is closed
- If an error occurs outside the duplication range
- If an error occurs in VVL containing the duplication information
These errors are distinguished using the following methods:
- Error messages
If an error occurs in a duplicated portion, the related system is closed and a closing message is output. The following closing messages are output.
- If the master VD is closed:
NOTICE:VVD-Virtual volume (xxx,xxx):MASTER VD(xxx,xxx) is turned to blocked state
- If the copy VD is closed:
NOTICE:VVD-Virtual volume (xxx,xxx):COPY VD(xxx,xxx) is turned to blocked state
If an error address is output to an I/O error message, the user can determine whether the error portion is the duplicated area by specifying this address in the fsearch command.
- Confirming the closing
Use the devinfo(1M) command to check a closed system.
- Error log
The user can determine whether the error portion is the duplicated area by obtaining the error
address from the error log and by executing fsearch.
If an error occurs in a duplicated master VD, this system is closed and the subsequent I/O operations
are performed only for the copy VD. The three recovery methods used if the master VD is closed are
presented in this section.
- When the duplicated areas are matched after track exchange
If an I/O error occurs in a part of the duplicated area and recovery is made by track exchange,
the duplicated area is copied from the copy VD after exchange.
The procedure follows.
- Demount the file system.
- Execute block exchange using the hdefix command of IOX.
- Copy the duplicated area from the copy VD to the master VD, and then release the master
VD from the closing using the dupcopy(1M) command.
Example:
The master VD is closed.
dupcopy performs copying from the copy VD to the master VD,
and then releases the master VD closing.
- Mount the file system and restart the operation.
If a track error occurs outside the duplicated area, perform track error recovery as usual.
- When a device is exchanged and recovery is performed using the backup copy
If a device error occurs, the procedures explained next are applied when the device is exchanged
and a previously made backup is restored.
- When backup and restoration is performed for each file:
- Demount the file system.
- Exchange the device.
- Re-create the virtual disk.
- Check the virtual disk by executing
disks -d.
- Release the master VD closing using the
dupcopy(1M) command.
In this case, the duplicated area is copied from the copy VD to the master VD.
However, it is cleared when the file system is created.
Example:
The master VD is closed.
dupcopy(1M) performs copying from the copy VD to the
master VD, and releases the master VD closing.
- Re-create the file system using the dupmkfs(1M) command.
Example:
# dupmkfs /dev/rdsk/100 /dev/rid/020
- Mount the file system.
- Perform restoration for each file.
- When the file system is reconstructed and recovery is performed:
- Demount the file system.
- Exchange the device (if necessary).
- Re-create the virtual disk.
- Check the virtual disk by executing disks -d.
- Release the master VD closing using the dupcopy(1M) command.
In this case, the duplicated area is copied from the copy VD to the master VD.
However, it is cleared when the file system is re-created.
Example:
- Re-create the file system using the dupmkfs(1M) command.
Example:
# dupmkfs /dev/rdsk/100 /dev/rid/020
- Mount the file system.
After completing these recovery procedures, execute the
fsck command to determine whether there is
any file system inconsistency. If an inconsistency is found, the conventional procedure
(fsck execution, file restoration, etc.) can be used. However, before attempting the correction, confirm that neither the
master nor copy VD is closed.
9.6.3 Recovery when the Copy VD Is Closed
If an error occurs in the area of the duplicated copy VD, the copy VD is closed and the subsequent
I/O operations are performed only for the master VD. The recovery methods used when the copy VD is
closed are discussed in this section.
- When recovery is performed from the backup for each file after device exchange:
- Exchange the device.
- Create the virtual disk for the copy VD and other file systems.
- Copy the duplicated area from the master VD to the copy VD using the
dupcopy(1M) command.
Example:
- Mount the file system, and then restart the operation.
- Restore the file.
9.6.4 Recovery when a Copy VD Error Occurs (Master VD Closed)
One of two recovery methods is used if an error occurs in the copy VD while the master VD is
closed. Determine the method to be used according to the error severity.
- When the recovery is performed based on the file management information in the copy VD:
- Demount the file system.
- Correct the error portions of the master and copy VDs.
- Copy the duplicated area from the copy VD to the master VD using the
dupcopy(1M) command.
Example:
- Check the file system with the
fsck(1M) command, and then specify the effect file.
- Mount the file system.
- Restore the effect file, and then restart the operation.
- When recovery is performed by reconstructing a file system, restoring data, and resetting duplication
See Section 9.4 for details of file system reconstruction and duplication setting. For details of the procedure for restoring data,
see Section 9.6.2.
9.6.5 Recovery when a Master VD Error Occurs (Copy VD Closed)
One of two recovery methods can be used if an error occurs in the master VD while the copy VD is
closed. Determine the method to be used, according to the error severity.
- Recovery procedure using information remaining in the master VD:
- Demount the file system.
- Correct the error portions of the master and copy VDs.
- Copy the duplicated portion from the master VD to the copy VD using the
dupcopy(1M) command.
Example:
- Check the file system with the
fsck command and specify an effect file.
- Mount the file system.
- Restore the effect file, and then restart the operation.
- When recovery is performed by reconstructing the file system, restoring data, and
resetting duplication
See Section 9.4 for details of file system reconstruction and duplication setting. For details of
the procedure for restoring data, see
Section 9.6.2.
9.6.6 Recovery when an Error Occurs Outside the Duplication Range
The existing recovery is performed regardless of the duplication.
9.6.7 Recovery when an Error Occurs in VVL
VVL stores duplication information. If an error occurs, perform recovery as shown next.
- When one VVL system is closed
To restart the operation, copy the data of the normal VVL to another location and set the
duplication again. If the closed VVL is the master VD, perform this processing immediately.
- Copy VVL to another location with vvlcopy(1M).
- Start the operation of VVL for both systems with vvldual(1M).
- When an error occurs in the other system while one system of VVL is closed
Copy VVL (generally used to the end) of either system to another area, copy it again, and then set
the duplication. However, it is necessary to check or redefine the following: virtual volume in which
data in the error portion is invalid, and the duplication definition.
- Copy VVL onto another location by executing
vvlcopy(1M) and prepare the master and copy VVLs.
- Start both VVL systems by executing vvldual(1M).
- Specify the range that is affected by an error.
- Perform reset for an affected portion.
As mentioned, if an error occurs for both VVL systems, recovery requires much time and labor.
Therefore, if one system is closed, perform recovery processing immediately.
- The following file systems cannot be duplicated:
- File systems created in the N7763 disk device
- File systems created in XMU
- File systems in which the management area consists of two or more virtual disks.
- Root File systems