Chapter 5


File System Construction

5.1 OVERVIEW

In SUPER-UX, many elements are related to the factors that determine the file system performance and ease of use. For example, these elements include the attributes of a storage device and virtual volume, file system types, and the characteristics of the applications to be executed. Therefore, it is not always easy to design and construct a file system that satisfies the operation purpose for a given site. First, the system administrator must become familiar with the operation purpose and usage and constantly strive to satisfy the user's needs. After constructing the file system, obtain information during actual operation and repeat redesign (tuning), installation, and operation as many times as necessary to gradually bring the file system close to that desired by the user.

The operation of the ideal file system consists of four phases. These four phases are repeated, in order.

Home


Figure 5-1 shows the relationship of these four phases.

Figure 5-1 Operation Cycle of the Ideal File System

This chapter explains the file system design method and the installation and construction methods, using actual examples and case studies. See Chapter 6, File System Management, for an explanation of the operation and maintenance phase.

Refer to the SUPER-UX System Administrator's Reference Manual for details of the administrator commands appearing in the examples of this chapter.

Home


The facilities of the file systems and virtual volumes provided by SUPER-UX are independent of each other. However, if a file system is defined without considering the characteristics and defects of these facilities, the expected performance may not be obtained.

To construct a target file system, note the elements listed next, analyze the purpose in advance, and design the file system accordingly.

Home


Home


Home


Home


Home


5.2 FILE SYSTEM CONSTRUCTION PROCEDURE

This section describes the standard procedure for constructing a file system.

5.2.1 Creating Physical Device Special Files

At system installation, a special file must be created for each physical device used as a work volume. For creating special files, the mknod(1M) command can be used. SUPER-UX also supports the disks(1M) command, which automatically creates special files for magnetic disk devices and XMU virtual disk device. When initializing work volumes or adding new devices, the disks(1M) command is easier to use.

Example:

5.2.2 Formatting Physical Devices

Before using a physical device, the device must be initialized. The initialization of a device (which is called formatting) includes steps such as setting the volume label for the physical device, writing the zero data to the device, and setting the alternate tracks to be used to replace bad tracks. The fmthard(1M) command is used for formatting.

Example:

When the formatting of the physical device has completed, the volume label of the physical device is set and initial subpartition information for the physical device is prepared. A user can examine the contents of the information by using the prtvl(1M) command.

In the following example, prtvl(1M) shows the volume label of the disk device (/dev/ID01).

Example:

Home


5.2.3 Setting Partitions

After formatting, a physical device may be divided into up to 16 partitions, and special file names are automatically assigned. The mkdev(1M) command with the -p option creates the physical device partition.

Example:

The partitioning status can be examined using the prtvl(1M) command.

Example:

Home


5.2.4 Setting Virtual Volume Label Partitions

To construct a virtual volume, the master and copy VVLDEVs must be in the normal dual management. When the address of VVLDEV is changed, copy the master VVLDEV to the copy VVLDEV. To do this copying, use the vvlcopy(1M) command.

Example -- making a backup copy of the virtual disk /dev/rid/017:

When the copy procedure is completed, restart the system using the backup copy of the kernel, while observing the system operation status using the vvldual(1M) command with the -l option specified.

Example:

As shown in the preceding example, if the status of both MASTER and COPY is OPEN, it means the system is duplicated.

You can change the setting of the virtual volume label partition using the vvldual(1M) command, as shown in the following example.

Example:

Home


5.2.5 Setting the Cache Capacity

To utilize a virtual volume for an enabled cache, a user must define the cache capacity by AM and XM and the device for the XM cache by re-creating the kernel.

For the cache capacity, a user defines the values for the AM and XM using the config(1M) command and the ISL parameter. For the XM cache, the default value is zero, and a user must specify the size in units of blocks. (The maximum size is 2,095,104 blocks.) The specified size of the cache is reserved in the XMU virtual disk device as the cache cylinder and is used for the cache cylinder.

As the number of virtual volumes for which the caches are used increases, the larger size of the cache should be reserved.

5.2.6 Creating the Striping Disk Device

The catdev(1M) command with the -s option defines the striping disk device configuration. In the following example, the striping disk device isd0 is created from the disk devices ID10 and ID11.

Example:

Physical devices that can be defined as striping devices are restricted only for magnetic disk devices and ultra high-speed disk devices that are formatted in advanced. Note that XMU virtual disk devices are excluded. To cancel the definition of a striping disk device, use the deldev(1M) command.

Example:

Home


5.2.7 Creating Virtual Volumes

Once partitions are set, a user is ready to create a virtual volume. Two phases are required when creating virtual volumes. First, the virtual volume configuration (catdev(1M) -v) is set and then attributes for the virtual volume are set.

5.2.7.1 SETTING VIRTUAL VOLUME CONFIGURATION

In this phase, the configuration of the partitions for a virtual volume must be determined. The partition configuration means the arrangement of the partitions (virtual disks) on a virtual volume. Use the following guidelines when defining virtual volume configuration.

The catdev(1M) command with the -v option configures a virtual volume. Virtual disks and the virtual volume name must be specified.

Example:

For the virtual disk group specified by the catdev(1M) command, the order of its partitions is the order in the command. If the specified special file is not found, a special file is created using the unit number of the special file name.

Therefore, the virtual volume /dev/rdsk/410 is configured with its partitions in the following order: /dev/rxd/00, /dev/rid/011, and /dev/rid/020. When defining a virtual disk with different device types, the user must be very careful about this order. To examine the configuration, use the vvinfo(1M) command.

Example:

5.2.7.2 SETTING ATTRIBUTES FOR VIRTUAL VOLUME

After the partitions of a virtual volume are defined, some attributes such as desired transfer speed and type of file system must be specified. To choose the attributes for a virtual volume, use vvattr(1M) with the -v option. Many attributes are available. However, you must determine some important attributes such as:

If these attributes need to be changed after the creation of the file system, the file system must be recreated using the mkfs(1M) command. In the following example, mkfs defines the virtual volume /dev/rdsk/410 with caching enabled, reallocation enabled, and a system type of SFS.

Example:

Home


When all these attributes are set, the file system can be created on the virtual volume. To observe the contents of the virtual volume, use the devinfo(1M) command.

Example:

A virtual volume contained an N7763 magnetic disk unit must be used with cache.

5.2.8 Creating and Using File Systems

After a virtual volume is created, the mkfs(1M) command is used to create a file system.

Also, the number of blocks in the file system can be specified with the mkfs(1M) command. However, if this value exceeds the size of the virtual volume, the mkfs(1M) command reports an error. When specifying the file system size with the mkfs(1M) command, use devinfo(1M) to check the size of the virtual volume. When file creation is completed, files can be accessed by mounting them according to the file system type. For the mount(1M) command, specify a block type virtual volume.

When file system construction is completed with mkfs(1M), a lost + found directory is automatically created.

Example:

Home


5.3 USING XMU

5.3.1 Usage and Characteristics of XMU

The XMU virtual disk device is used in various ways. Because of its high-speed processing, however, its setting differs from that of a magnetic disk device. This section explains various interfaces provided by SUPER-UX to make efficient use of an XMU virtual disk device.

The XMU itself is a memory unit, and its contents may disappear when the power for the unit is cut off. Therefore, it is not appropriate for data storage for a long period of time. However, it is hundreds or thousands of times as fast as a magnetic disk device. The speed increases as the data length at a single I/O operation increases. The time required for I/O with respect to an XMU virtual disk is counted as a CPU time. Therefore, the execution time for a process may be shortened while I/O time for the process (kernel overhead time) might increase.

By using the config(1M) and swap(1M) commands, the system can be made to swap to the XMU. Because all I/O on an XMU is synchronous, swapping on a magnetic disk device might perform better than on an XMU. Generally, when the number of logged-in users and the number of interactive jobs is large, then the XMU should be used. When several noninteractive jobs are running, a magnetic disk device should be used as the swapping unit.

In the virtual volume cache provided by IAS, it is assumed that the XMU cache exists. This means that when the number of virtual volumes with a cache memory increases, expected efficiency may not be obtained unless the size of the XM cache is large enough. The XM cache cylinder size of the whole system can be determined by the ISL parameter. When changing cache sizes, the system must be rebooted using the new kernel.

5.3.2 Example of XMU Configuration

The following example shows how to configure the XMU whose size is 2 GB. Because the size of a cache cylinder in the example is 256 MB, the user cylinder becomes 1784 MB. The XMU is defined for /dev/MSX0 (2 GB). A swap file of 256 MB and two other virtual volumes are set to /dev/MSX0.

The following is an example of the partition definition file. The size of the dynamic and static partition is defined by the proportion of user cylinders.

Example:

The following example shows the specification up to the virtual volume setting.

Home


Example:

The command mkfs(1M) need not be issued for both swap files.

When mkfs(1M) processing is finished, mount the virtual volume for SFS.

The swap file may be activated.

5.4 SFS/H FILE SYSTEM

5.4.1 Structure

In SFS/H, the area that stores file management information such as superblocks and inodes and the area that stores ordinary file data are allocated in different partitions. I/O requests for file management information are usually issued in units of 4 KB. For devices suited only for transferring large amounts of data, the small size of these units causes performance to deteriorate. Performance deterioration can be avoided by allocating the file management section and data section to different devices.

The file management section begins with the area used to manage data section areas (this area is called a VVTOC). Ordinary file management information is stored in other areas of the file management section.

When a file is created on the SFS/H file system, data for the file is stored in the data section. However, if an inode is created for a file or directory, the directory entry is stored in the file management section.

Files created in this file system are called hybrid special files as opposed to regular files.

See Figure 5-3 for the structure of the SFS/H file system.

Home


Figure 5-3 Structure of the SFS/H File System

The file management section is basically equivalent to an SFS with a cluster size of 4 KB. When a directory is created, a 4 KB area is allocated in the file management section.

When a hybrid special file (hereafter called a hybrid file) is created, an area is allocated in the data section. Unlike units of the file management section, the user must specify the size of units used in the data section. The value specified for units of the data section is called the data cluster size. It is specified when the SFS/H file system is constructed. See Section 5.4.3 for details.

5.4.2 Disk Striping

As mentioned in Section 4.2, Disk Striping Facility, I/O operations for IAS can be accelerated by striping physical disk devices. Similarly, I/O operations for SFS/H can be accelerated by striping physical disk devices.

When SFS/H is constructed by specifying multiple virtual disks in the data section, these virtual disks are concatenated. However, I/O is issued in units of virtual disk striping by specifying striping at SFS/H construction. When the virtual disk that is specified as the SFS/H data section consists of multiple physical devices, specifying striping divides the I/O load and enhances the I/O performance.

The actual I/O is issued as follows.

Figure 5-4 SFS/H Striping

The striping size must be specified to construct SFS/H where the data section is striped. I/O is performed for the virtual disk according to the specified striping size. When this size is larger than one I/O size, I/O is performed only for one virtual disk and the striping facility does not operate. Specify the striping size so that one I/O is divided into multiple virtual volumes. See Section 5.4.3 for details of striping size specification.

Notes for the SFS/H striping construction are as follows.

Home


5.4.3 Construction

5.4.3.1 KERNEL CONFIGURATION

To use SFS/H, several config parameters must be specified when a kernel is created.

SFSHCNT
Specify the number of SFS/H file systems created on a system. When SFS/H is not used set 0 (0 is the default value).
5.4.3.2 PARTITION DEFINITION

The virtual volumes that constitute SFS/H are created by concatenating two or more partitions like ordinary virtual volumes. The only difference is that for virtual volumes constituting SFS/H, not less than one virtual disk must be specified for each of the file management and data sections. That is, not less than two virtual disks must be specified.

Like ordinary virtual volumes, up to 32 virtual disks can be specified in the file management section. Disk devices, XMU, and striping disk devices can be specified. Since the file management section is accessed in units of 4 KB, disk devices or XMU virtual disks should be specified.

Data sections can be specified on one through 96 virtual disks when creating concatenated SFS/H, and can be specified on two through eight virtual disks when creating striped SFS/H. Striping disk devices are not allowed in the data section. When two or more virtual disk devices are specified, they must have the same type and the same size.

The virtual disks specified in the file management section and data section are created with the same procedure used to define ordinary virtual volumes. See Section 2.3.

5.4.3.3 FILE SYSTEM CREATION

The mksfsh(1M) command is used to create the SFS/H file system. The catdev, vvattr, and mkfs commands used to create ordinary file systems are not used. The mksfsh command performs all internal processing such as construction of a virtual volume from the specified virtual disks, attribute setting, and mkfs execution.

The virtual disks (partitions) constituting the file management section and data section and an attribute setting file are specified and the mksfsh command is executed. A plus sign (+) must be specified between the virtual disks (partitions) constituting the file management section and those constituting the data section to connect them. The attribute setting file is specified in the -s option.

Format:

As previously explained, up to 32 partitions can be specified in vdpart1 and one through 96 partitions can be specified in vdpart2. The partitions specified in vdpart2 must also have the same device type and size.

When the -p option is not specified, the striping function is not used.

Example:

Home


In this example, virtual volume /dev/rdsk/010 is created under the following conditions: /dev/rxd/10 is specified as the partition constituting the file management section; /dev/rxd/20 and /dev/rxd/30 are specified as the partitions constituting the data section.

The attribute setting file is a text file. The sample of the attribute setting file is as follows. (All numbers represent kilobytes.)

If the striping function is not used, specify 0 in H-STRIP.

The attribute setting file matches devices to be used. It is created according to the above format. Users can modify the sizes specified in H-DCLST and H-STRIP. Set the physical block size (the minimum unit for I/O) of the device used for the data section in H-XMUBF. In specifying values for these parameters, you must take into account the physical subpartition size, cylinder size, and block size for the devices used by the data section since they differ for each device. The physical subpartition size is a multiple of the cylinder size and the cylinder size is a multiple of the physical block size. The following restrictions apply to these three sizes.

A user can specify the cluster size and striping size by specifying an option or definition file. Either specifying a definition file or specifying the cluster size and striping size by an option is available. The following restrictions apply to the cluster size and striping size that a user specifies.

When striping function is not used:

When the striping function is used and the number of striping disk devices is n (max 8):

If any of the preceding conditions is not satisfied, an error occurs.

If the cluster size, striping size, and definition file are not specified, they are defined as follows.

Cluster size = cylinder size of the device
Striping size = nothing
XMU buffer size = physical I/O size of the device

The device information can check on the -t option. The following example outputs XMU device information.

Home


The vvinfo and devinfo commands are used to check definitions.

Whenever this definition is modified, mksfsh(1M) must be executed to re-create the SFS/H file system.

5.4.3.4 USE

The SFS/H file system created using mksfsh can be mounted and used as an ordinary SFS file system. To mount virtual volume /dev/rdsk/010 on /sfsh, specify the following.

Example:

Hybrid files can be manipulated using the same operations as for regular files in this directory. When the -l option of the ls command is specified, h is displayed as the file type of hybrid files.

Example:

Ordinary read and write system calls can be used to perform I/O.

Home


5.4.4 Preallocate Function

In SFS/H, the fcntl() system call can be used to allocate an area (a cluster of the data section) before data is actually written.

When the requested allocation size has to extend over two or more clusters, physically continuous areas are allocated as much as possible. When the allocation size can be allocated as a completely continuous area, 1 is set as the return value of the system call. If the size is divided into several areas and allocated, the number of divisions is returned.

When areas are allocated, the file size is increased but data in unwritten areas is left undefined.

This function is available only for SFS/H.

5.5 MONITORING AND TUNING

5.5.1 Operation Management Issues

The system administrator should prepare several symbolic linked directories that belong to another SFS-type file system under each end user's home directory.

Example:

Home


5.5.2 Monitoring

SUPER-UX supports a facility for monitoring the operation status of each layer (file system, virtual volume, actual device) constituting a file system.

5.5.2.1 FILE SYSTEM OPERATION STATUS

Use the -B or -d option of sar(1M) to confirm the operation status of the entire file system.

5.5.2.2 TUNING AND RECONFIGURATION

By actually mounting file systems, various information is obtained concerning virtual volumes. This information can be examined using the devinfo(1M) command as follows.

The devinfo(1M) command displays real-time information on the virtual volume when the virtual volume is mounted. Otherwise, it displays the information on the previously mounted file system. Options -t, -c, and -s give the summary of prior information. When other options are specified, the total of the information during the specified prior period is given (900).

Example:

You can use the devinfo(1M) command with the -f option to initialize the statistical information.

5.5.2.3 ACTUAL DEVICE OPERATION STATUS

The statistical operation of an actual device can be obtained by specifying -t or -d with errpt(1M). Check the device load to determine whether the load is one-sided.

Home


5.5.3 Tuning

This section explains the main tuning parameters that influence file system performance. Before tuning, obtain the necessary information items, such as information collected by monitoring and the application execution results, and then perform a thorough examination of the file problems.

5.5.3.1 FILE SYSTEM BUFFER

Use sar -B to obtain statistical information relating to the buffer cache. SFS sends most data without buffers. The number of SFS buffers (NSBUF) need not be changed. In most cases, SFS performance is affected by the virtual volume tuning discussed is Sections 5.5.3.2 and 5.5.3.3.

The small cluster uses the SFS buffer frequently. Therefore, if %srche is considerably lower than 90% or %swche is considerably lower than 65%, it is recommended that the NSBUF value be increased.

5.5.3.2 VIRTUAL VOLUME EXTENT

The cluster size has the greatest effect on the performance. In a file system that executes many applications for which one I/O is smaller than the cluster size, space efficiency is merely reduced even if a cluster size is enlarged with no consideration. Optimize the size, considering the nature of the application. In a FORTRAN program, a better effect can be obtained as an F_SETBUF value more closely approximates the cluster size.

When using the striping disk, the striping disk block size also affects the performance. The I/O size issued to the virtual disk unit is limited to the cluster size. Therefore, making the striping disk size larger than the cluster size is meaningless. Determine the striping disk block size according to the cluster size and the average size of I/O issued to the file system:

cluster size / number of striping disks / 32
striping disk block size
cluster size / number of striping disks
Also, the file system needs to be reconstructed when these extents are exchanged.
5.5.3.3 VIRTUAL VOLUME CACHE

Monitor the XM CACHE hit ratio by using the devinfo command and set the XM cache hit ration to a sufficient level.

To increase the hit ratio, the XM cache capacity must be increased. Increase the XM cache size (XMCACHESZ of the vvattr command) or the total XM cache (cachedev or the ISL parameter) of each virtual volume.

The main parameters relating to other caches are listed next.

Table 5-3 lists the ideal cache hit ratios. Perform tuning to approach these values.

Table 5-3 Cache Hit Ratios

CommandsSmall Cluster SFSSFS
rxhit85% or more70% or more
wxhit90% or more80% or more

Home


5.5.4 Case Studies

This section consists of problems and questions that are apt to occur in a file system, together with the corresponding countermeasures and answers.

Home

Contents Previous Chapter Next Chapter Index