This chapter explains NQS batch operations and transaction methods.
Access rights are needed in order to use NQS. You can confirm your access right status using the qstata command. If the qstata command is executed without option, the following messages are displayed depending on the case.
You are permitted to place requests in NQS.
You are not permitted to place requests in NQS.
Your group is not permitted to place requests in NQS.
See the system administrator if you are unable to access NQS. Specify the host name with -h to display the NQS access restrictions of a remote host.
For explanations of NQS commands, refer to the online descriptions in Chapter 3, User Commands.
| Home |
|---|
To create a batch request, create a shell script file that includes batch-executed commands. An NQS batch request shell script only performs non-interactive operations because standard input (stdin), standard output (stdout), or error output (stderr) are linked to a file.
An NQS batch request shell script is identical to a normal shell script, except that it must be non-interactive and NQS options must be included in the first comment block. In a shell script, commands can usually be grouped randomly. However, commands requesting stty input and output cannot be used in NQS shell script execution. NQS does not use terminal I/O. Users can specify a bsh, csh, or another shell for interpreting an NQS batch execution shell.
Submit the batch execution shell script to NQS, where it is executed as a batch request. The following is a sample batch request shell script.
Example:
You can specify input data with shell commands in two ways. The first way is to prepare a data file for storing input data for the command, as shown in Example 1. The data to be sorted is stored in a file called input_data. The second way is by using double redirection, as shown in Example 2.
Example 1:
Example 2:
Sorting is carried out from the line immediately after sort until the line just before the end-of-file (EOF).
Submit batch request shell scripts to NQS with qsub. This command has many options that you may want to associate with a specific script. To do this, you may have to enter several lines of options at each submittal. To avoid this, you can embed options in the script as comment statements prior to the first executable statement in the shell.
The following rules apply to embedded options.
More than one option can be entered on a line. However, if the option is followed by a string, the string must be enclosed in quotation marks ("string"). The batch job in the following example begins at 11:30 pm and uses the CPU for 21 minutes and 10 seconds. It submits the request to the batch1 queue.
| Home |
|---|
Example:
An embedded option can remain as a comment line by putting '#' between '#' and "@$".
You can put any characters except '#', '@', and newline between the two '#'s.
When you submit a request from the public domain (e.g., COSMIC version) NQS to the NEC version NQS (SUPER-UX NQS and UP NetShepherd), you can specify the options used only in the NEC version NQS by replacing "@$" with "NECOPT ".
Example:
You must put blank spaces between "NECOPT" and the options. The option must begin with '-'.
The local (public domain) NQS does not regard the options with "NECOPT" as the NQS options. The remote (NEC version) NQS regards them as the NQS options and interprets them.
An embedded option can also remain as a comment line using "NECOPT" by putting '#' between '#' and "NECOPT".
| Home |
|---|
Use the qsub command to submit a batch job (shell script). Options can be specified on the command line and embedded in the shell script file. Options on the command line take precedence over embedded options. If you do not specify a shell script in the command, the commands to be processed can be read from standard input (stdin). The following is a batch request example.
Example:
Request 65.host1 submitted to queue:
batch1.
This example shows a script1 shell script being submitted to the batch1 queue. The batch request ID assigned by NQS is 65.host1. The request ID includes the serial number from the host where the request was submitted and the name of that host machine. This name is unique within the network. In the following example, the commands to be processed are read from standard input.
Example:
Request
66.host1 submitted to queue: batch1.
Specify the following options when submitting a batch request.
See the qsub(1) command in this guide for more information.
Table 2-1 lists output file options that determine the data destination from a batch request to standard output (stdout) or standard error output (stderr).
Output from a batch request usually goes to an NQS spool file and gets copied to a specified file when the batch job ends. If the destination is not specified, then the batch request outputs to a file in the current directory.
| Option | Description |
|---|---|
| -e [machine:][[/]path/]stderr-filename | Specifies the standard error output file (stderr) of a batch request. This option cannot be used if users specify the -eo option. |
| -o [machine:][[/]path/]stdout-filename | Specifies the output file (stdout). |
| -eo | Directs output from the stderr file to the stdout file. |
| -ke | Retains a stderr file on the machine executing the request. |
| -ko | Retains a stdout file on the machine executing a request. This option is ignored if using -o to specify a destination. |
Specify -e and -o option arguments using the following format.
Examples follow.
Example:
The /usr/nqs/result.e file on HOST1 is the output file.
Example:
If the request was submitted on HOST1, the /usr/nqs/result file on HOST1 is the output file.
To specify the file name with a relative path, see the following example.
Example:
If the file name does not begin with a slash (/), it is interpreted as being a relative path name. When the resulting file is created on the machine to which the request is submitted, the current directory at the time of entry is used as the reference directory. When the resulting file is created on a machine other than the machine to which the request is submitted, the user home directory on the machine is used as the reference directory.
| Home |
|---|
A batch request can specify the resources it uses during execution. A resource limit forcibly terminates a batch request if it exceeds set limits such as CPU time, memory size, or file size allocated for a batch request.
Table 2-2 lists resource limit options. Default resource limits are values set by the system administrator. User-specified limits are compared with set values.
If specified values are larger than set values, then the batch request is rejected. Resource limits differ according to each machine and may also differ by queue.
The valid resource limits for each machine can be displayed with the qlimit command. Set values for each queue can be displayed with the qstatq -f option.
| Options | Description |
|---|---|
| -l0 limit [, warn-limit] | Sets a per-request file system group 0 (=XMU) maximum and an optional warning. |
| -l1 limit [,warn-limit] | Sets a per-request file system group 1 maximum and an optional warning. |
| -l2 limit[, warn-limit] | Sets a per-request file system group 2 maximum and an optional warning. |
| -l3 limit [,warn-limit] | Sets a per-request file system group 3 maximum and an optional warning. |
| -ld size-limit [,warn-limit] | Sets a per-process data-segment size maximum and an optional warning. |
| -ID drives | Sets a per-request maximum tape drives limit. |
| -lf size-limit [,warn-limit] | Sets a per-process permanent-file size limit. |
| -lm size-limit [,warn-limit] | Sets a per-process memory size maximum and an optional warning. |
| -ln nice-value | Sets a per-process nice-execution value. |
| -lr time-limit[,warn-limit] | Sets a per-process CPU resident maximum time limit and an optional warning. |
| -le CPU-number | Sets the number of CPUs as the target of the per-process CPU resident time limit. |
| -lR time-limit[, warn-limit] | Sets a per-request CPU resident maximum time limit and an optional warning. |
| -lE CPU-number | Sets the number of CPUs as the target of the per-request CPU resident time limit. |
| -ls size-limit [,warn-limit] | Sets a per-process stack-segment size maximum and an optional warning |
| -lt time-limit [,warn-limit] | Sets a maximum per-process CPU time limit and an optional warning. |
| Home |
|---|
|
You can specify maximum or warning limits in a resource limit.
The warning limit must be smaller than the maximum limit. The default is the maximum limit.
If batch request resources exceed the warning limit, signals in the limits are transmitted to the batch request. If the maximum is exceeded, execution terminates immediately. See the setrlimit(2) command in the SUPER-UX Programmer's Reference Manual for more information. |
Resource limit values are specified as follows.
Example:
| 1234:58:21.29 | (1234 hours 58 minutes 21.29 seconds) | |
| 59:01 | (59 minutes 1 second) | |
| 12345 | (12345 seconds) | |
| 121.1 | (121.1 seconds |
Unit measurements are as follows. Byte is the default unit size.
| b | Byte | |
| w | Word | |
| kb | Kilobyte | |
| kw | Kiloword | |
| mb | Megabyte | |
| mw | Megaword | |
| gb | Gigabyte | |
| gw | Gigaword |
Example:
| 1234 | (1234 Bytes) | |
| 1234kb | (1234 Kilobytes) | |
| 1234.5gb | (1234.5Gigabytes) |
| Home |
|---|
These options send mail at the start and end of a batch job. The default value is the user name of the person submitting the batch request. Table 2-3 lists mail options.
| When a batch request ends, mail is not usually sent. However, in the case of an abnormal termination, mail is alwayssent. |
| Option | Description |
| -mb | Sends mail when the request begins execution. |
| -me | Sends mail when the request ends. |
| -mu name | Specifies that any mail concerning the request should be delivered to the user specified in name, which can be used without the @ character, or as part of name@machine. -mu user1 sends mail to user1 on the local host. For example, -mu user1@HOST1 sends mail to user1 on HOST1. |
When the execution of a batch request is completed, a JOR is created. You can specify where the JOR is sent through the use of options when submitting the batch request. The JOR is sent to one of the following.
The JOR can also be sent by mail. If these options are not used, the JOR is output as the NQS manager specifies.
Only one of the following options can be specified at a time.
Specify the -j option as follows.
[machine:][[/]path/]filename
Example:
JOR is sent to /usr/nqs/jor on SX-4.
If a particular machine is not specified, the default is the host where the batch request is executed.
Example:
When the batch request is submitted to SX-4 and executed on nec1, JOR is sent to /usr/nqs/jor on nec1.
| Home |
|---|
Table 2-4 lists other options, such as specifying the exact time for a batch request execution, specifying the queue for submitting a batch request, and specifying the priority of the batch request.
| Option | Description | -a date-time | Suspends request execution until the specified date and time.
Specification examples are as follows
(see qsub(1) for details)
|
|---|---|
| -ac acctcode | Specifies the account code of the request. |
| -c CPU-count | Specifies the CPU count of the request. |
| -nc | Declares that batch request is not able to checkpoint. |
| -nr | Declares that the batch request is not restartable. |
| -p priority | Assigns an intra-queue priority to the request. The priority must be an integer in the ascending priority range of 0-63, inclusive. The default value is assigned if not specified. |
| -q queue | Specifies the batch request queue. If you omit this option, the environment variable set is searched for the QSUB_QUEUE variable. However, if it is not found, NQS submits the request to the default batch request queue defined by the System Administrator. If the default queue is not defined, an error message appears and NQS does not accept the request. |
| -r req-name | Specifies the request name. Request
names are assigned if not specified. The request name is stdin if entering the script from standard input. When using the name of the script file, the
file name excludes the absolute path or leading path name. If the request name begins with a digit, R is prefixed. All request names are truncated to a maximum length of 63 characters. |
| -s shell | Specifies the absolute path name of the shell that interprets the batch
request shell script. Without this option, NQS uses one of three
distinct shell choice strategies. Any one of the three strategies can be configured by a System Administrator for each NQS machine. Specify this option if the default shell strategy is not suitable. This default shell strategy can be accessed with the qlimit command. The three
shell strategies are as follows.
|
| Home |
|---|
The qstatr(1) command is used to confirm the status of a submitted batch request. The request is specified by the request ID. If you know the request ID, you can directly specify the request. The following example shows such direct specifying.
Example:
$ qstatr 72.HOST1 ======================================= NQS (Rxx.xx) BATCH REQUESTS HOST: HOST1 ======================================= REQUEST ID NAME OWNER QUEUE PRI NICE MEMORY TIME STT JID R --------------- -------- -------- -------- ---- --- ------ ------ --- ------ - 72.HOST1 STDIN user1 batch1 20 0 1617 0.157 RUN 18 - ------------------------------------------------------------------------------
The STT column displays the request status. These request statuses have the following meanings.
PRR and POR are available on the cluster system with NQS/MPI only.
| Home |
|---|
See qstatr(1) for the contents of items other than STT.
The qstatr command can be specified with a request name instead of the request ID. When the qstatr command is specified with the request name, execute it with the -r option appended. See the following example.
Example:
$ qstatr -r request-name
This request name is assigned to a request when it is submitted with the qsub command.
If you do not know the request ID, you can access information on any submitted batch requests with the
-b option, as shown in the following example.
Example:
$ qstatr -b ================================================= NQS (Rxx.xx) BATCH REQUEST HOST: HOST1 ================================================= REQUEST ID NAME OWNER QUEUE PRI NICE MEMORY TIME STT JID R --------------- -------- -------- -------- ---- --- ------ ------ --- ------ - 72.HOST1 STDIN user1 batch1 20 0 1617 0.157 RUN 18 * 73.HOST1 STDIN user1 batch1 20 10 QUE - 74.HOST1 STDIN user1 batch2 20 10 QUE - ------------------------------------------------------------------------------
Specify the -f option to see more information about a request, as shown in the following example.
Example:
$ qstatr -f 72.host1
================================================
NQS (Rxx.xx) BATCH REQUEST: 72.HOST1
================================================
Name: STDIN State: running
Owner: user1
Group: group1
Created: Tue Jan 23 1996 Priority: 20
7:33:04 JOB ID: 18
Acctcode: acct1
Restricted: Already running
Scheduling priority: 1
QUEUE
Name: batch1@HOST1
RESOURCES LIMITS
Per-process
Core File Size = UNLIMITED <DEFAULT>
Data Segment = UNLIMITED <DEFAULT>
Permanent File Size = UNLIMITED <DEFAULT>
Memory Size = UNLIMITED <DEFAULT>
Stack Segment = UNLIMITED <DEFAULT>
CPU Time = UNLIMITED <DEFAULT>
Perm File Capacity = UNLIMITED <DEFAULT>
Open file = UNLIMITED <DEFAULT>
Number of CPU = UNLIMITED <DEFAULT>
CPU Resident Time = UNLIMITED <DEFAULT>
CPU Resident Number = 1 <DEFAULT>
Per-request
Temp File Capacity = UNLIMITED <DEFAULT>
Memory Size = UNLIMITED <DEFAULT>
CPU Time Limit = UNLIMITED <DEFAULT>
Tape Drives = UNLIMITED <DEFAULT>
Perm File Capacity = UNLIMITED <DEFAULT>
FSG 0 limit = UNLIMITED <DEFAULT>
FSG 1 limit = UNLIMITED <DEFAULT>
FSG 2 limit = UNLIMITED <DEFAULT>
FSG 3 limit = UNLIMITED <DEFAULT>
Open file = UNLIMITED <DEFAULT>
Process Number = UNLIMITED <DEFAULT>
CPU Resident Time = UNLIMITED <DEFAULT>
CPU Resident Number = 1 <DEFAULT>
SCHEDULING PARAMETER
Nice Value 0
Base Priority 80
Timeslice Value 1000
Memory Priority 0
Modification factor of CPU 2
Tick Count 0
Decay Factor 1
Decay Interval 1
Mrt Size Effect 30
Mrt Priority Effect 100
Aging Range 160
Mrt Minimum 2
Slave Priority 0
CPU Count 32
FILES MODE NAME
Stdout: SPOOL HOST1:/usr/nec/STDIN.o72
Stderr: SPOOL HOST1:/usr/nec/STDIN.e72
MAIL
Address user1@HOST1 When: NONE
MISC
Restartable Yes User Mask: 0
Restartstate No Orig.Owner: user1
Shell: DEFAULT Jor: NONE
Checkpoint: Yes
Resource Sharing Group: DEFAULT
| Home |
|---|
To execute the qstatr command for the request transferred to the remote host, execute the command in either of the following methods: execute the qstatr command with the -t option specified at the machine from which the request was submitted, or execute the qstatr command in the host where the request exists.
Example:
qstatr -t 2 72.host1
Specifying the -t option enables display of the requests transferred to other hosts.
The number following -t indicates a level. To reference requests on the remote host, specify 1 or 2.
The qstat(1) command is also used to confirm the status of a request. It differs from the qstatr command in the use method and output format. The user may use whichever command is most convenient. See Chapter 3, User Commands, for details about the qstatr command.
| Home |
|---|
qstatq can be used to confirm the status of a batch queue. qstatq with the -b option accesses information on all batch queues, as shown in the following example.
Example:
$ qstatq -b ======================================================== NQS (Rxx.xx) BATCH QUEUE SUMMARY HOST: HOST1 ======================================================== QUEUE NAME ENA STS PRI/BPR/ TMS /MPR RLM TOT QUE RUN WAI HLD SUS ARR EXT --------------- ------------------------------- ------------------------------- batch1 ENA RUN 20/ 80/ 1000/ 0 2 3 1 1 1 0 0 0 0 batch2 ENA INA 30/ 80/ 2000/ 0 3 2 0 0 1 1 0 0 0 --------------- ------------------------------- ------------------------------- <TOTAL> 10 5 1 1 2 1 0 0 0 --------------- ------------------------------- -------------------------------
The status appears under the ENA and STS columns, as follows.
Specify the batch queue with the qstatq command to access the status of an individual batch queue, as shown in the following example.
| Home |
|---|
Example:
$ qstatq batch1 ======================================================== NQS (Rxx.xx) BATCH QUEUE SUMMARY HOST: HOST1 ======================================================== QUEUE NAME ENA STS PRI/BPR/ TMS /MPR RLM TOT QUE RUN WAI HLD SUS ARR EXT --------------- ------------------------------- ------------------------------- batch1 ENA RUN 20/ 80/ 1000/ 0 2 3 1 1 1 0 0 0 0 --------------- ------------------------------- ------------------------------- <TOTAL> 2 3 1 1 2 0 0 0 0 --------------- ------------------------------- -------------------------------
Specify the -f option with the qstatq command to access detailed information on a batch queue, as shown in the following example.
Example:
$ qstatq -f batch1 ========================================================== NQS (Rxx.xx) BATCH QUEUE: batch1@HOST1 ========================================================== Priority: 20 Status: [ENABLED , RUNNING] Batch Base Priority: 80 Time Slice Value: 1000 Nice Value: 0 Memory Priority: 0 Mod Factor of CPU: 2 Tick Count: 0 Decay Factor: 1 Decay Interval: 1 Mrt Size Effect: 30 Mrt Pri Effect: 100 Aging Range: 160 Mrt Minimum: 2 Slave Priority: 0 CPU Count: 32 Scheduling Mode: TYPE-0 Continuous Scheduling Number: Undefined Default Scheduling Priority: 1 Resource-occupy Wait: Undefined Resource Sharing Group: DEFAULT ENTRIES Total: 3 Queued: 1 Running: 1 Waiting: 1 Held: 0 Suspending: 0 Arriving: 0 Exiting: 0 COMPLEX MEMBERSHIP complex1, complex2 RUN LIMITS Total run limit: 2 FSG0(XMU) run limit: Unlimited FSG1 run limit: Unlimited FSG2 run limit: Unlimited FSG3 run limit: Unlimited Memory run limit: Unlimited User run limit: 2 Group run limit: Unlimited RESOURCE LIMITS Per-process Core File Size Limit = UNLIMITED <DEFAULT> Data Size Limit = UNLIMITED <DEFAULT> Permanent File Size Limit = UNLIMITED <DEFAULT> Memory Size Limit = UNLIMITED <DEFAULT> Stack Size Limit = UNLIMITED <DEFAULT> CPU Time Limit = UNLIMITED <DEFAULT> Perm File Capacity Limit = UNLIMITED <DEFAULT> Open File Limit = UNLIMITED <DEFAULT> Number of CPU Limit = UNLIMITED <DEFAULT> CPU Resident Time Limit = UNLIMITED <DEFAULT> CPU Resident Number = 1 <DEFAULT> Per-request Tape Drives Limit = UNLIMITED <DEFAULT> Memory Size Limit = UNLIMITED <DEFAULT> CPU Time Limit = UNLIMITED <DEFAULT> Temp File Capacity Limit = UNLIMITED <DEFAULT> Perm File Capacity Limit = UNLIMITED <DEFAULT> FSG 0 (XMU) Limit = UNLIMITED <DEFAULT> FSG 1 Limit = UNLIMITED <DEFAULT> FSG 2 Limit = UNLIMITED <DEFAULT> FSG 3 Limit = UNLIMITED <DEFAULT> Open File Limit = UNLIMITED <DEFAULT> Process Number Limit = UNLIMITED <DEFAULT> CPU Resident Time Limit = UNLIMITED <DEFAULT> CPU Resident Number = 1 <DEFAULT> ACCESS Unrestricted access ATTRIBUTE LOADBALANCE ON LOAD BALANCING PARAMETER Keeping request number limit = 1 Delivery wait time = 30 FORCE RESTART MODE When file open failed OFF When file modified ON CUMULATIVE TIME System space time = 3.290 seconds User space time = 0.483 seconds
| Home |
|---|
Specify the remote host status to access with the -h option, as shown in the following example.
$ qstatr -h NEC1 -b
When you specify the queue, NQS infers that the queue is on the local host. A remote host can be specified by entering qstatq with the -h option, as shown in the following examples.
Examples:
qstatq batch1
(Displays information about the local batch1 queue)
qstatq -h NEC1 batch1
(Displays information about the batch1 queue on NEC1)
| Home |
|---|
The qstatq command can be used to confirm the status of a pipe queue. qstatq with the -p option displays information on all pipe queues in the system, as shown in the following example.
Example:
$ qstatq -p =============================================================== NQS (Rxx.xx) PIPE QUEUE SUMMARY HOST: host1 =============================================================== QUEUE NAME ENA STS PRI RLM TOT QUE ROU WAI HLD ARR --------------- -------------------- ------------------------------ pipe1 ENA INA 20 1 2 2 0 0 0 0 pipe2 DIS STP 30 2 0 0 0 0 0 0 netpipe1 ENA ROT 20 1 2 1 1 0 0 0 --------------- -------------------- ------------------------------ <TOTAL> 10 4 3 1 0 0 0 --------------- -------------------- ------------------------------
Status appears under the ENA and STS columns, as follows.
The status of each pipe queue can also be accessed by specifying the pipe queue name, as shown in the following example.
Example:
$ qstatq pipe1 =============================================================== NQS (Rxx.xx) PIPE QUEUE SUMMARY HOST: host1 =============================================================== QUEUE NAME ENA STS PRI RLM TOT QUE ROU WAI HLD ARR --------------- -------------------- ------------------------------ pipe1 ENA INA 20 1 2 2 0 0 0 0 --------------- -------------------- ------------------------------ <TOTAL> 1 2 2 0 0 0 0 --------------- -------------------- ------------------------------
| Home |
|---|
Specify the -f option to display information on the pipe queue, as shown in the following example. As with batch queues, you can also specify a host with the -h option.
Example:
$ qstatq -f pipe1
=======================================================================
NQS (Rxx.xx) PIPE QUEUE: pipe1@host1
=======================================================================
Priority: 20 Status: [ ENABLE , INACTIVE ]
Queue server: /usr/lib/nqs/pipeclient
ENTRIES
Total: 2
Queued: 2 Routing: 0 Waiting: 0
Held: 0 Arriving: 0
RUN LIMITS
Total run limit: 3
User run limit : Unlimited Group run limit : Unlimited
DESTINATIONS
batch1@host1, batch2@host1
ACCESS
Unrestricted access
ATTRIBUTE
BEFORECHECK OFF
STAYWAIT OFF
FREEDESTINATION OFF
LOADBALANCE ON
TRANSPARENT OFF
LOAD BALANCING PARAMETER
Reserved run limit = 1
Destination retry wait = 3600
CUMULATIVE TIME
System space time= 1.00 sec
User space time= 2.00 sec
The DESTINATIONS information is particularly important in the pipe queue. It displays the destination queue. When a request is submitted to a pipe queue, the pipe queue indicates the queue to which the request is routed. More than one destination queue may be set in the pipe queue, as shown in the following example.
| Home |
|---|
Example:
DESTINATIONS
batch1@host1,batch2@host1
In this case, an attempt is made to transfer a request to the batch1@host1 queue first. If the request cannot be transferred to that queue, it is transferred to the batch2@host1 queue. The request cannot be transferred because the destination queue may not be able to accept requests and so forth. If the remote host queue is set in the destination queue, the pipe queue is called a network pipe queue.
Example:
DESTINATIONS
batch1@host2
In this example, the request is transferred to queue batch1 in remote host host2.
When accessing the pipe queue status on a remote host, specify the host with -h option in the same way as the batch queue status. The following example shows how to access the pipe queue status on the host nec1 from the host nec1.
Example:
$ qstatq -h nec1 -p
The same rule on the batch queue applies to the pipe queue in specifying the queue name.
You can change batch request attributes after submitting a request with qalter. The following example shows how to change the per-process CPU time limit of request ID 72.HOST1.
Example:
$ qalter -lt 1000 72.HOST1
Most request attributes can be changed while the request is in the queued state. However, once a request enters the running state, only alterable attributes can be changed during the execution phase. Table 2-5 contains a list of some qalter options. For all options, see the qalter(1) page in Chapter 3, User Commands.
You cannot alter attributes to values that exceed queue resource limits. An error occurs if the change cannot be made.
If a request is in a batch queue, the queue resource limit values that are not supported by the system cannot be changed. While routing a request on the pipe queue or executing a request on the batch queue, you cannot change most request attributes.
| Home |
|---|
| Option | Explanation | Example |
|---|---|---|
| -a | Changes the request execution time | qalter -a 17:20 72.HOST1 |
| -c | Changes the CPU count value | qalter -c 10 72.HOST1 |
| -e | Changes the stderr-output-direction | qalter -e HOST1:/usr/nec/result.e 72.HOST1 |
| -lm | Changes the value of queue resource limit | qalter -lm 2kb 72.HOST1 |
| -mb | Changes the mode of sending mail when the request begins execution | qalter -mb on 72.HOST1 |
| -me | Changes the mode of sending mail when the request ends execution | qalter -me off 2kb 72.HOST1 |
| -mu | Changes the user to send mail | qalter -mu user2 72.HOST1 |
| -nr | Changes the mode whether restartable or not | qalter -nr off 72.HOST1 |
| -o | Changes the stdout-output-direction | qalter -o HOST1:/usr/nec/result.o 72.HOST1 |
| -p | Changes the request priority | qalter -p 25 72.HOST1 |
| -re | Changes the transmission mode of stderr-output-file | qalter -re n 72.HOST1 |
| -ro | Changes the transmission mode of stdout-output-file | qalter -ro s 72.HOST1 |
| -s | Changes the shell to execute request | qalter -s /bin/sh 72.HOST1 |
| Home |
|---|
Specify qdel with a request ID to delete a batch request that is queued, waiting, or holding, as shown in the following example.
Example:
$ qdel 72.HOST1 Request 72.HOST1 has been deleted.
If the request is being executed, a message to that effect is displayed, as shown in the following example. The request is not deleted.
Example:
$ qdel 73.HOST1 Request 73.HOST1 is running.
To delete a request from the remote host, execute the qdel(1) command in the host from which the request was submitted or in the host where the request exists.
The qdel(1) command can specify a request with the request name. In this case, specify the -r option as shown in the following example.
Example:
qdel -r MAKE3
The following explains the process of deleting an executing request. The deletion methods previously described in this section cannot be used to delete an executing request. Specify the -k option and execute the qdel(1) command to delete an executing request. Specify the request ID as previo usly described in this section.
Example:
$ qdel -k 74.HOST1 Request 74.HOST1 is running, and has been signaled.
The -k option sends a SIGKILL signal to the request that forcibly ends the execution of the request. To send another signal, specify the number of that signal. In the following example, the SIGINT and SIGHUP signals are sent.
Example:
$ qdel -2 72.HOST1 $ qdel -1 72.HOST1
This means that -k and -9 are functionally the same. Specifying -k on a request that is not running also deletes it. Therefore, running and waiting requests can be deleted at the same time, as shown in the following example.
Example:
$ qdel -k 72.HOST1 73.HOST1
To hold a queued or waiting batch request, use qhold. This command holds the request and makes it ineligible for execution until the user removes the hold. Use qhold and the request ID to hold a request, as shown in the following example.
Example:
$qhold 72.HOST1 Request 72.HOST1 has been held.
When a batch request is held correctly, the message that the batch request has been held is printed on the stdout. When a batch request is not held because of a particular condition, an error message corresponding to the condition is printed.
If a request is running and the system and NQS permit a checkpoint, NQS gets the request checkpoint and holds it.
When specifying a restart file name is permitted by the NQS manager, you can specify it by the -F path-name option. The format of the path-name is as follows:
[[/]path/]file-name
If the -F option is omitted, a restart file is created on the current directory with the name request-name.hsequence-number.
If specifying a restart file name is not permitted, a restart file is automatically created on the directory /usr/spool/nqs/restart.
The qhold and qrls commands can specify a request in the same way as the request name or can suspend or release a request on the remote machine in the same way as the qdel command.
| Home |
|---|
To release a hold on a batch request, use qrls. This command releases a previously installed hold on a batch request. Releasing a hold has various effects on the batch request. The effects depend on the state of the request when the hold was applied.
Use qrls followed by the request ID to release the hold on a request, as shown in the following example.
Example:
$ qrls 75.HOST1 Request 75.HOST1 has been released.
If qrls is used on a request that has no hold applied to it, the message shown in the following example appears.
Example:
$ qrls 76.HOST1 Request 76.HOST1 is not holding.
The qhold and qrls commands can specify a request in the same way as the request name or can suspend or release a request on the remote machine in the same way as the qdel command.
| Home |
|---|
This section explains how to suspend and restart a batch request. To suspend or restart a batch request, use the qspnd(1) and qrsm(1) commands, respectively. The qspnd(1) command is validated only for the request being executed or when in the RUNNING state.
To suspend a request, specify the request ID of the request and execute the qspnd command as shown in the following example.
Example:
$qspnd 72.host1 Request 72.host1 has been suspended.
If suspend processing is performed correctly, a message indicating the suspension state of the request is output.
To release the suspension state of a request, specify the request ID of the request and execute the qrsm command.
Example:
$qrsm 72.host1 Request 72.host1 has been resumed.
If the release processing is performed correctly, a message indicating the release of the suspended request is output. If the release processing has failed because the request was not suspended or similar, the corresponding error message is output.
Example:
$qrsm 73.host1 Request 73.host1 is not suspending.
The qspnd and qrsm commands can specify a request in the same way as the request name or can suspend or release a request on the remote machine in the same way as the qdel command.
To checkpoint or restart a batch request, use the qchk(1) and qrst(1) commands. A checkpoint applies only to a running batch request. Checkpointing does not stop the running batch request. You can restart a checkpointed request after the request execution finishes.
To checkpoint a request specify the request ID and execute the qchk command as follows.
$ qchk 72.host1 Start to get checkpoint of request 72.host1. Please wait..... Request 72.host1 has been restarted.
If checkpoint processing is performed correctly, a message appears.
When specifying a restart file name is permitted by the NQS manager, you can specify it by the -F path-name option. The format of the path-name is as follows:
[[/]path/]file-name
If the -F option is omitted, a restart file is created on the current directory with the name request-name.csequence-number.
If specifying a restart file name is not permitted, a restart file is automatically created on the directory /usr/spool/nqs/restart.
To restart a request from its checkpoint, specify the request ID assigned to it and execute the qrst(1) command as follows.
$ qrst 72.host1 Request 72.host1 has been restarted.
If the restart processing is performed correctly, a message is output. If the restart has failed because the request was still running, the following message appears.
$ qrst 73.host1 Request 73.host1 is running.
The qchk and qrst commands can specify a request in the same way as the request name or can checkpoint or restart a request on the remote machine in the same way as the qdel command.
| Home |
|---|
A running batch request can be stopped and returned to the queue with qrerun. This command checks that the request can be returned to the queue by verifying that all of the running request's attributes are allowed in the queue.
If the attributes are valid, the request is killed and returned (submitted) to the queue with the same request ID. If any of the attributes are not allowed in the queue, the request continues to run and a message describing the outcome is displayed. If the request is rerun, the entire batch request starts at the beginning.
Include the request ID with the qrerun command, as shown in the following example.
Example:
$ qrerun 81.HOST1 Request 81.HOST1 has been rerun.
If the command is used on a request that is in any other state than running, it is not executed and a message appears as shown in the following example.
Example:
$ qrerun 81.HOST1 Request 81.HOST1 is not running.
The qrerun command can specify a request in the same way as the request name or can rerun a request on the remote machine in the same way as the qdel command.
| Home |
|---|
Use qmove to move a nonrunning batch request to another queue. Only requests that are held, waiting, or queued may be moved. The command first checks to see if the request attributes are allowable in the new queue. If they are, the request is moved to the new queue. The state of the request remains the same. If any request attributes are not allowed, the request is not moved and a message describing the outcome is displayed.
The qmove command allows you to move specific requests or all of the requests in a specified queue. Include the request ID with the command, followed by the new queue name to move a specific request, as shown in the following example.
Example:
$ qmove 81.HOST1 batch1 Request 81.HOST1 has been moved.
Use the -q option to move all eligible requests in a queue, as shown in the following example.
Example:
$ qmove -q batch1 batch2 Request 81.HOST1 has been moved. Request 82.HOST1 has been moved. Request 83.HOST1 has been moved.
If qmove is used on a running request, the command is not executed and the message shown in the following example appears.
Example:
$ qmove 84.HOST batch1 Request 84.HOST1 is running.
The qmove command can specify a request with the request name in the same way as the qdel command.
Use the qmsg command to send messages to batch requests. The messages are embedded in the request's output file.
Example:
$ qmsg 72.HOST1 Compile test.c CTR-D(EOF)
Specifying qmsg sends a message to both the standard and error output files. -o sends the message to the standard output file. -e sends the message to the standard error output file.
Example:
$ qmsg -o 72.HOST1 $ qmsg -e 72.HOST1
The qmsg command can specify a request with the request name in the same way as the qdel command.
| Home |
|---|
Valid resource limits are those supported by the host at the time of request execution. Although these request limits may be entered as request attributes, their specifications are ignored if they are not supported by the host that executes the request. The local host's valid resource limits can be confirmed by using the qlimit command without options, as shown in the following example.
Example:
$ qlimit
Core file size limit (-lc)
Data segment size limit (-ld)
Per-process permanent file size limit (-lf)
Per-process memory size limit (-lm)
Stack segment size limit (-ls)
Per-process cpu time limit (-lt)
.
.
Nice value (-ln)
Shell strategy = LOGIN
The qlimit command outputs valid resource limits for the local host. It also shows the options that can be specified with the qsub command. Finally, it outputs the shell strategies defined by the system administrator. To confirm valid resource limits on a remote host, specify the name of the remote host.
Example:
$ qlimit HOST2
Per-process corefile size limit (-lc)
Per-process data size limit (-ld)
Per-process permanent file size limit (-lf)
Per-process permanent file space limit (-lF)
Per-process stack size limit (-ls)
Per-process CPU time limit (-lt)
.
.
Nice value (-ln)
Shell strategy = FIXED
| Home |
|---|
The qstatr command and other commands are used to confirm the termination of a batch request. The qwait(1) command is used to await the termination of a batch request and confirm its termination state. See the following examples.
Example 1: Request terminated with termination code 45.
$ qwait 123.host1 done 45
Example 2: Request terminated by SIGKILL.
$ qwait 124.host1 killed 9
The termination of a request is recognized from the mail transmitted at that time. To transmit mail upon the termination of a request, the -me option in the qsub command must be specified when the request is submitted. However, if the request is canceled due to a failure occurrence, the mail reporting the failure status is always transmitted.
When a batch request ends, its output is stored in files. Two output files are usually generated. The first file stores the contents output to the standard output (stdout) during script execution. The other file contains the contents output to the standard error output (stderr) during script execution. These files can be specified when submitting a request and they can also be altered prior to running them. You can specify a request name when submitting a request. If a request name is not specified, the script name becomes the request name. If the script comes from the standard input (stdin), and a request name is not entered, the request name becomes STDIN.
The format for naming the standard output file is as follows.
request-name.orequest-sequence-number
The format for naming the standard error file is as follows.
request-name.e request-sequence-number
Example:
A request named batreg withrequest number 72 has the following error file name.batreg.e72
| Home |
|---|
A device request sends requests to printers and plotters. The sequence of device request operations is as follows.
Device request operations create a file for output data to the device. This data may also be obtained from standard input (stdin) when submitting a device request. As with the shell script of a batch request, no embedded options can be included in this data file. Enter a device request with the qpr command, as shown in the following example.
Example:
$ qpr -q device1 data1 Request 80.HOST1 submitted to queue: device1.
In this example, the data1 file is submitted to a device1 device queue as a device request. The entry is successful, and a message appears indicating that the request has been accepted. This message is the same as the one output during batch request entry.
As with the qsub command, you can specify more than one option with the qpr command. Table 2-6 shows some options that can be used with qpr. For information about all the options available, see the qpr(1) pages in Chapter 3, User Commands.
| Unique Option | Description |
|---|---|
| -f form-name | Limits the set of acceptable devices to those that are loaded with form-name. When this option is omitted, the qpr command only submits the request to a device that is loaded with the default forms. When a default form is not defined, the request is submitted to an appropriate device regardless of the forms configured for the device. In any case, only those devices associated with the chosen queue are considered. |
| -n copies | Specifies the number of copies to be printed. The default is one. |
Once a device request is accepted by the device queue, the request is output according to priority and entry order. If the output device is in a CLOSED or FAILED state, the device request is not processed. The qstatd command can be used to confirm the device status.
qstatd returns information about devices in a formatted output. Use this command, followed by the device name, to display information about a specific device, as shown in the following example.
Example:
$ qstatd dev1 ================================================== NQS (Rxx.xx) DEVICE SUMMARY HOST: HOST1 ================================================== DEVICE NAME ENA STS FORMS --------------- ---------- ----------------------------------- dev1 ENA INA form1 --------------- ---------- -----------------------------------
| Home |
|---|
Use qstatd without a device name to obtain information on all devices on the local host, as shown in the following example.
Example:
$ qstatd ================================================== NQS (Rxx.xx) DEVICE SUMMARY HOST: HOST1 ================================================== DEVICE NAME ENA STS FORMS --------------- ---------- ----------------------------------- dev1 ENA INA form1 dev2 ENA INA form1 form2 dev3 DIS INA form2 --------------- ---------- -----------------------------------
Specify the -f option for detailed information on devices, as shown in the following example.
Example:
$ qstatd -f dev1 ==================================================== NQS (Rxx.xx) DEVICE: dev1@HOST1 ==================================================== Status: [ ENABLE , INACTIVE ] Fullname: /dev/lp Server: /usr/lib/nqs/lpserver FORMS form1 RELATION DEVICE QUEUES device1, device2, device3
Use the -h option to specify a host's device status, as shown in the following example.
Example:
$ qstatd -h NEC1 ================================================== NQS (Rxx.xx) DEVICE SUMMARY HOST: NEC1 ================================================== DEVICE NAME ENA STS FORMS --------------- ---------- ----------------------------------- lpdev ENA INA lpform texdev ENA INA texform --------------- ---------- -----------------------------------
| Home |
|---|
The qstatd command displays information about a local or remote device. The host can be changed with the -h option.
Example 1: Displays information on dev1 on the local host
qstatd dev1
Example 2: Displays information on dev1 on the remote host NEC1
qstatd -h NEC1 dev1
Use the qstatr command to confirm a device request status. The method for using these commands is exactly the same as for batch requests.
Example 1 confirms the status of one device request. Example 2 confirms the status of all device requests.
Example 1:
Example 2:$ qstatr 72.HOST1 ================================================== NQS (Rxx.xx) DEVICE REQUESTS HOST: HOST1 ================================================== REQUEST ID NAME OWNER QUEUE PRI SIZE STT -------------- --------- -------- -------- ---- ------ --- 72.HOST1 nec user1 device1 20 1106 RUN ------------------------------------------------------------
$ qstatr -d =================================================== NQS (Rxx.xx) DEVICE REQUESTS HOST: HOST1 =================================================== REQUEST ID NAME OWNER QUEUE PRI SIZE STT -------------- --------- -------- -------- ---- ------ --- 72.HOST1 nec user1 device1 20 1106 RUN 73.HOST1 STDIN user1 device1 30 2000 QUE 74.HOST1 nqs user1 device2 20 500 WAI ------------------------------------------------------------
Display a specific device request status in detail using the qstatr -f option, as shown in the following example.
| Home |
|---|
Example:
$ qstatr -f 72.HOST1
==================================================
NQS (Rxx.xx) DEVICE REQUEST: 72.host1
==================================================
Name: nec State: RUNNING
Owner: user1
Group: group1
Acctcode: account1
Created: Fri Apr 20 1990 Priority: 20
09:41:06 GMT
QUEUE
Name: device1@host1
FORMS
Name: form1
MAIL
Address user1@host1 When: END
MISC
Orig.Owner: user1
Size: 2145
Copies: 1
Display the status of all the device queues on the local host using the qstatq -d option, as shown in the following example.
Example:
$ qstatq -d ======================================================= NQS (Rxx.xx) DEVICE QUEUE SUMMARY HOST: host1 ======================================================= QUEUE NAME ENA STS PRI TOT QUE RUN WAI HLD ARR -------------- ---------------- ------------------------------ device1 ENA RUN 20 3 2 1 0 0 0 device2 ENA INA 30 2 0 0 1 1 0 -------------- ---------------- ------------------------------ <TOTAL> 5 2 1 1 1 0 -------------- ---------------- ------------------------------
Display the status of a specific device queue with the qstatq command, as shown in the following example.
Example:
$ qstatq device1 ======================================================= NQS (Rxx.xx) DEVICE QUEUE SUMMARY HOST: host1 ======================================================= QUEUE NAME ENA STS PRI TOT QUE RUN WAI HLD ARR -------------- ---------------- ------------------------------ device1 ENA RUN 20 3 2 1 0 0 0 -------------- ---------------- ------------------------------ <TOTAL> 3 2 1 0 0 0 -------------- ---------------- ------------------------------
| Home |
|---|
Display the status of a specific device queue in detail using the qstatq -f option, as shown in the following example.
Example:
$ qstatq -f device1 ========================================================== NQS (Rxx.xx) DEVICE QUEUE: device1@host1 ========================================================== Priority: 20 Status: [ ENABLED, RUNNING ] ENTRIES Total: 3 Queued: 2 Running: 1 Waiting: 0 Held: 0 arriving: 0 DEVICES dev1, dev2 ACCESS Unrestricted access CUMULATIVE TIME System space time= 35.35 sec User space time= 24.57 sec
The methods for deleting a device request are the same as for a batch request (see Section 2.7).
When you issue the qmove command and specify an EXITING batch request and network queue, you can move all network requests that are staging out stdout/stderr/jor files for the specified batch request to the specified network queue. Using this method, you can change the target host for staging out the job output files. Staging out fails if the path for staging out, which is specified when the batch request is submitted, does not exist on the changed target host.
You cannot use the -q option to move all network requests in the specified network queue and you cannot specify network requests directly.
When you issue the qdel command and specify an EXITING batch request, you can delete all network requests that are staging out stdout/stderr/jor files for the specified batch request. The -k option or -signo option are not necessary. If the target network request is running, the output file staged out by the network request is put on the requested owner's home directory on the requested executed machine. You cannot specify network requests directory.
| Home |
|---|
Use the qstatr command to confirm the status of a network request. The method for using this command is exactly the same as for batch requests. Using the qstatr command with the -N option displays network requests.
$qstatr -N ================================================== NQS (Rxx.xx) NETWORK REQUEST HOST: host1 ================================================== REQUEST ID EVENT NAME OWNER QUEUE NAME PRI STT PGRP --------------- ------- -------- -------- --------------- ---- --- ------ 396.host1 31(ERR) STDIN user1 net1 31 QUE 396.host1 30(OUT) STDIN user1 net1 31 QUE -------------------------------------------------------------------------
Using the qstatr command with the -N -f options displays network requests in detail.
$ qstatr -N -f 395.host1
=================================================
NQS (Rxx.xx) NETWORK REQUEST: 395.host1
=================================================
Name: STDIN State: queued
Owner: user1 Priority: 31
Group: group1 Event: 30
Created: Mon Mar 28 1994
13:50:55
QUEUE
Name: net1@host1
STAGING FILE
Name: /home/nqs/STDIN.o395
MAIL
Address: user1@host1
MISC
Orig.Owner: user1
Using the qstatq command displays network queues.
$ qstatq -N ======================================================== NQS (Rxx.xx) NETWORK QUEUE SUMMARY HOST: host1 ======================================================== QUEUE NAME DESTINATION MACHINE ENA STS PRI RLM TOT QUE RUN WAI --------------- --------------------- ----------------- ----------------- DefaultNetQue - ENA INA -1 20 0 0 0 0 net1 host1 ENA INA 20 1 4 0 1 3 net2 host2 ENA STP 40 2 0 0 0 0 --------------- --------------------- ----------------- ----------------- <TOTAL> 20 4 0 1 3 --------------- --------------------- ----------------- -----------------
| Home |
|---|
Using the qstatq command with the -f option displays network queues in detail.
$ qstatq net1 ========================================================== NQS (Rxx.xx) NETWORK QUEUE: net1@host1 ========================================================== Priority: 20 Status: [ENABLED , INACTIVE] Queue server: /usr/lib/nqs/netclient ENTRIES Total: 4 Queued: 0 Running: 1 Waiting: 3 RUN LIMITS Total run limit: 1 DESTINATIONS MACHINE (MID) host1 (100) CUMULATIVE TIME System space time = 5.20 seconds User space time = 0.85 seconds
| Home |
|---|
While an MPI process running on a single node can use normal batch queues, the NQS/MPI function is needed to run an MPI process across two or more nodes as a single NQS request. The NQS/MPI function allows users to associate queues on multiple nodes to execute them as a single MPI request. The MPI master queue definition specifies the slave queues (on other nodes) where spawned tasks will be executed.
A user can specify resource limits of an MPI request in the same way as the limits are specified for normal batch requests. The NQS/MPI funciton uses the specified limits to reserve resources on the slave queues, ensuring that the multi-node MPI request can obtain the same resources they request on every slave queue.
This section explains how to use the NQS/MPI function. For detailed information on setting the function, see Section 5.12.
As soon as the batch requests are sent, each slave request starts an MPI daemon to connect with the master request, and waits for the contact from the master request.
After checking that all the slave requests have RUNNING status, NQS executes the master requests.
When mpisx(1) is called, the master request starts the MPI daemon to connect with the slave requests, connects to some MPI daemons on the slave queues specified by the configuration file or the option of mpisx(1), and executes the MPI processes.
When one mpisx(1) command is completed, the MPI daemon of the master request terminates. At that time, the MPI daemon of the slave queues is still waiting for the next connection.
If two or more mpisx(1) commands are called in one MPI request, the master request repeats the process detailed above for each mpisx(1) command.
When the master request has been executed, NQS terminates the execution of all the slave requests. At that time, the stdout, stderr, and JOR files of the slave requests are moved to the node where the master request has been executed (a master node), added to the files of the master request, and output in the same way as the ordinary batch requests.
MPI requests can be controlled by issuing NQS commands such as qdel(1) on the master request.
For detailed information about mpisx(1), see the SUPER-UX MPI/SX User's Guide.
Use the -x option of qstat(1) and the -f option of qstatq(1) to find out whether a batch queue is a master or slave queue.
$ qstat -x
mpiM@host1; type=BATCH; [ENABLED, RUNNING]; pri=20
0 exit; 1 run; 0 stage; 0 queued; 0 wait; 0 hold; 0 arrive;
Run_limit = 3;
:
:
Unrestricted access
MPI_master
MPI_queues = {MST(mpiM@host1), SLV(mpiS@host2)}
:
:
$ qstat -x
mpiS@host2; type=BATCH; [ENABLED, RUNNING]; pri=20
0 exit; 1 run; 0 stage; 0 queued; 0 wait; 0 hold; 0 arrive;
Run_limit = 3;
:
:
Unrestricted access
MPI_slave
:
:
$ qstatq -f
==========================================================
NQS (Rxx.xx) BATCH QUEUE: mpiM@host1
==========================================================
Priority: 20 Status: [ENABLED , RUNNING]
:
:
ATTRIBUTE
LOADBALANCE ON
MPI_MASTER ON
RELATED_MPI_QUEUES
MST(mpiM@host1), SLV(mpiS@host2)
:
:
$ qstatq -f
==========================================================
NQS (Rxx.xx) BATCH QUEUE: mpiS@host2
==========================================================
Priority: 20 Status: [ENABLED , RUNNING]
:
:
ATTRIBUTE
LOADBALANCE OFF
MPI_SLAVE ON
:
:
For more information, see Chapter 3, User Commands.
To execute an MPI request, submit a master request to a master queue using qsub(1) just as you would to submit a common batch. The qsub(1) options apply to both master and slave requests.
You can submit a master request to a normal batch queue. However, distributed MPI processes are not treated as slave requests, so these processes cannot be controlled.
Normal batch requests should not be submitted to master queues, since it causes unnecessary slave requests to be executed.
NQS assigns a request ID to each master request, just as it does for other batch requests. NQS also assigns each slave queue an ID. The ID has as its base the request ID of its master request, and a slave number is added in parentheses at the end. Each slave queue in one MPI request is assigned a unique slave number; numbering starts at one.
Before all slave requests have been sent and started, a master request enters the PRE-RUNNING state. PRE-RUNNING requests cannot be controlled although they can be deleted and confirmed.
When all slave requests have started executing, the master request enters the RUNNING state and starts its execution.
During execution, a master request is affected by the resource limitations and scheduling parameters set on a submitted master queue or node (master node), while a slave request is affected by the resource limitations and scheduling parameters of a submitted slave queue or node (slave node).
When the execution of a master queue terminates, a master request enters the POST-RUNNING state until the execution of each slave queue has terminated. POST-RUNNING requests cannot be controlled though they can be deleted and confirmed.
When all slave requests terminate, the master request enters the EXITING state and outputs the stdout, stderr, and JOR files.
Use qstat(1) and qstatr(1) to confirm MPI requests.
$ qstatr 300.host1 ======================================= NQS (Rxx.xx) BATCH REQUESTS HOST: host1 ======================================= REQUEST ID NAME OWNER QUEUE PRI NICE MEMORY TIME STT JID R --------------- -------- -------- -------- ---- --- ------ ------ --- ------ - 300.host1 STDIN user1 mpiM 20 0 1617 0.157 RUN 18 - ------------------------------------------------------------------------------ ======================================= NQS (Rxx.xx) BATCH REQUESTS HOST: host2 ======================================= REQUEST ID NAME OWNER QUEUE PRI NICE MEMORY TIME STT JID R --------------- -------- -------- -------- ---- --- ------ ------ --- ------ - 300.host1(1) STDIN user1 mpiS 30 0 1510 0.143 RUN 5 - ------------------------------------------------------------------------------
When you specify a request ID as shown in the above example, the specified master request and all the related slave requests that are in the RUNNING state appear. Slave requests are numbered with the master request ID followed by a unique slave number in parentheses.
When you omit a request ID, only a master request or slave request on the node to which the command is issued appears.
When you confirm the number of requests with qstatq(1), the specified master requests and all the related slave requests that are in the RUNNING, PRE-RUNNING, and POST-RUNNING states appear.
See Chapter 3, User Commands for more information.
To delete MPI requests, use the qdel(1) command just as you would on normal requests.
$ qdel -k 300.host1
In the example above, the master request 300.host1 and all related slave requests will be deleted.
| If a request is being sent via pipe, it cannot be deleted. |
The -k option deletes PRE-RUNNING and POST-RUNNING requests.
Individual slave requests cannot be deleted.
To alter the attributes of MPI requests, use the qalter(1) command just as you would on normal requests.
Only the attributes of a master request can be altered. The attributes of slave requests will be unaffected by the command.
$ qalter -lf 2mb 300.host1
In the example above, the file size of the master request 300.host1 will be changed to 2 megabytes.
The attributes of PRE-RUNNING and POST-RUNNING requests cannot be altered.
To move MPI requests to another queue, use the qmove(1) command just as you would on normal requests.
$ qmove 300.host1 batch1
In the example above, the master request 300.host1 will be moved to the queue batch1.
Only QUEUED and WAITING master requests can be moved. Individual slave requests cannot be moved.
If an MPI request is moved to a normal batch queue, its MPI processes cannot be controlled during execution.
To hold the MPI requests, use the qhold(1) command just as you would on normal requests.
$ qhold 300.host1
In the example above, the master request 300.host1 will be held.
Only QUEUED ,RUNNING, SUSPEND, and WAITING master requests can be held. PRE-RUNNING and POST-RUNNING requests cannot be held. Individual slave requests cannot be held.
When RUNNING or SUSPEND master request is specified, the master request and all related slave requests are held with checkpointing. At that time, however, if any related slave requests have already been terminated, qhold aborts.
To release the MPI requests, use the qrls(1) command just as you would on normal requests.
$ qrls 300.host1
If the request is checkpointed in holding, NQS restarts all master and related slave requests from checkpoint. At that time, unless all NQSs necessary to execute master and slave requests are activated, restart fails.
To suspend the MPI requests, use the qspnd(1) command just as you would on normal requests.
$ qspnd 300.host1
In the example above, the master request 300.host1 and all related slave requests will be suspended.
Neither PRE-RUNNING nor POST-RUNNING requests can be suspended. Individual slave requests cannot be suspended.
To resume the MPI requests, use the qrsm(1) command just as you would on normal requests.
$ qrsm 300.host1
To rerun MPI requests, use the qrerun(1) command just as you would on normal requests.
$ qrerun 300.host1
In the example above, the master request 300.host1 and all related slave requests will be rerun.
Non-RUNNING requests, including PRE-RUNNING and POST-RUNNING requests, cannot be rerun. Individual slave requests cannot be rerun.
To checkpoint MPI requests, use the qchk(1) command just as you would on normal requests.
$ qchk 300.host1
In the above example , the master request 300.host1 and all related slave requests will be checkpointed.
Non-RUNNING requests, including PRE-RUNNING and POST-RUNNING requests, cannot be checkpointed. If any slave requests have already terminated, all related MPI requests cannot be checkpointed. Individual slave requests cannot be checkpointed.
To restart MPI requests from its checkpoint, use the qrst(1) command just as you would on normal requests.
$ qrst 300.host1
In the above example , the master request 300.host1 and all related slave requests will be restarted.
At that time, unless all NQSs necessary to execute master and slave requests are activated, restart fails.
When NQS including RUNNING or SUSPEND MPI requests (master or slave) is shutdown, NQS checkpoints the MPI requests and all related master and slave requests and stops them automatically. These MPI requests are automatically restarted after all necessary NQSs are activated.
| Home |
|---|
| Contents | Previous Chapter | Next Chapter | Index |