Chapter 2


Operations

This chapter explains NQS batch operations and transaction methods.

Access rights are needed in order to use NQS. You can confirm your access right status using the qstata command. If the qstata command is executed without option, the following messages are displayed depending on the case.

See the system administrator if you are unable to access NQS. Specify the host name with -h to display the NQS access restrictions of a remote host.

For explanations of NQS commands, refer to the online descriptions in Chapter 3, User Commands.

Home


2.1 CREATING A BATCH REQUEST

To create a batch request, create a shell script file that includes batch-executed commands. An NQS batch request shell script only performs non-interactive operations because standard input (stdin), standard output (stdout), or error output (stderr) are linked to a file.

An NQS batch request shell script is identical to a normal shell script, except that it must be non-interactive and NQS options must be included in the first comment block. In a shell script, commands can usually be grouped randomly. However, commands requesting stty input and output cannot be used in NQS shell script execution. NQS does not use terminal I/O. Users can specify a bsh, csh, or another shell for interpreting an NQS batch execution shell.

Submit the batch execution shell script to NQS, where it is executed as a batch request. The following is a sample batch request shell script.

Example:

You can specify input data with shell commands in two ways. The first way is to prepare a data file for storing input data for the command, as shown in Example 1. The data to be sorted is stored in a file called input_data. The second way is by using double redirection, as shown in Example 2.

Example 1:

Example 2:

Sorting is carried out from the line immediately after sort until the line just before the end-of-file (EOF).

Submit batch request shell scripts to NQS with qsub. This command has many options that you may want to associate with a specific script. To do this, you may have to enter several lines of options at each submittal. To avoid this, you can embed options in the script as comment statements prior to the first executable statement in the shell.

The following rules apply to embedded options.

More than one option can be entered on a line. However, if the option is followed by a string, the string must be enclosed in quotation marks ("string"). The batch job in the following example begins at 11:30 pm and uses the CPU for 21 minutes and 10 seconds. It submits the request to the batch1 queue.

Home


Example:

An embedded option can remain as a comment line by putting '#' between '#' and "@$".

You can put any characters except '#', '@', and newline between the two '#'s.

When you submit a request from the public domain (e.g., COSMIC version) NQS to the NEC version NQS (SUPER-UX NQS and UP NetShepherd), you can specify the options used only in the NEC version NQS by replacing "@$" with "NECOPT ".

Example:

You must put blank spaces between "NECOPT" and the options. The option must begin with '-'.

The local (public domain) NQS does not regard the options with "NECOPT" as the NQS options. The remote (NEC version) NQS regards them as the NQS options and interprets them.

An embedded option can also remain as a comment line using "NECOPT" by putting '#' between '#' and "NECOPT".

Home


2.2 SUBMITTING A BATCH REQUEST

Use the qsub command to submit a batch job (shell script). Options can be specified on the command line and embedded in the shell script file. Options on the command line take precedence over embedded options. If you do not specify a shell script in the command, the commands to be processed can be read from standard input (stdin). The following is a batch request example.

Example:

This example shows a script1 shell script being submitted to the batch1 queue. The batch request ID assigned by NQS is 65.host1. The request ID includes the serial number from the host where the request was submitted and the name of that host machine. This name is unique within the network. In the following example, the commands to be processed are read from standard input.

Example:

2.2.1 Entry Options

Specify the following options when submitting a batch request.

See the qsub(1) command in this guide for more information.

2.2.2 Output File Options

Table 2-1 lists output file options that determine the data destination from a batch request to standard output (stdout) or standard error output (stderr).

Output from a batch request usually goes to an NQS spool file and gets copied to a specified file when the batch job ends. If the destination is not specified, then the batch request outputs to a file in the current directory.

Table 2-1 Output File Options

OptionDescription
-e [machine:][[/]path/]stderr-filename Specifies the standard error output file (stderr) of a batch request. This option cannot be used if users specify the -eo option.
-o [machine:][[/]path/]stdout-filename Specifies the output file (stdout).
-eoDirects output from the stderr file to the stdout file.
-keRetains a stderr file on the machine executing the request.
-koRetains a stdout file on the machine executing a request. This option is ignored if using -o to specify a destination.

Specify -e and -o option arguments using the following format.

Examples follow.

Home


2.2.3 Resource Limit Options

A batch request can specify the resources it uses during execution. A resource limit forcibly terminates a batch request if it exceeds set limits such as CPU time, memory size, or file size allocated for a batch request.

Table 2-2 lists resource limit options. Default resource limits are values set by the system administrator. User-specified limits are compared with set values.

If specified values are larger than set values, then the batch request is rejected. Resource limits differ according to each machine and may also differ by queue.

The valid resource limits for each machine can be displayed with the qlimit command. Set values for each queue can be displayed with the qstatq -f option.

Table 2-2 Resource Limit Options

OptionsDescription
-l0 limit [, warn-limit] Sets a per-request file system group 0 (=XMU) maximum and an optional warning.
-l1 limit [,warn-limit] Sets a per-request file system group 1 maximum and an optional warning.
-l2 limit[, warn-limit] Sets a per-request file system group 2 maximum and an optional warning.
-l3 limit [,warn-limit] Sets a per-request file system group 3 maximum and an optional warning.
-ld size-limit [,warn-limit] Sets a per-process data-segment size maximum and an optional warning.
-ID drives Sets a per-request maximum tape drives limit.
-lf size-limit [,warn-limit] Sets a per-process permanent-file size limit.
-lm size-limit [,warn-limit] Sets a per-process memory size maximum and an optional warning.
-ln nice-value Sets a per-process nice-execution value.
-lr time-limit[,warn-limit] Sets a per-process CPU resident maximum time limit and an optional warning.
-le CPU-number Sets the number of CPUs as the target of the per-process CPU resident time limit.
-lR time-limit[, warn-limit] Sets a per-request CPU resident maximum time limit and an optional warning.
-lE CPU-number Sets the number of CPUs as the target of the per-request CPU resident time limit.
-ls size-limit [,warn-limit] Sets a per-process stack-segment size maximum and an optional warning
-lt time-limit [,warn-limit] Sets a maximum per-process CPU time limit and an optional warning.

Home


NOTE

You can specify maximum or warning limits in a resource limit. The warning limit must be smaller than the maximum limit. The default is the maximum limit.

If batch request resources exceed the warning limit, signals in the limits are transmitted to the batch request. If the maximum is exceeded, execution terminates immediately.

See the setrlimit(2) command in the SUPER-UX Programmer's Reference Manual for more information.

Resource limit values are specified as follows.

Home


2.2.4 Mail Options

These options send mail at the start and end of a batch job. The default value is the user name of the person submitting the batch request. Table 2-3 lists mail options.

NOTE

When a batch request ends, mail is not usually sent. However, in the case of an abnormal termination, mail is alwayssent.

Table 2-3 Mail Options

Option Description
-mbSends mail when the request begins execution.
-meSends mail when the request ends.
-mu name Specifies that any mail concerning the request should be delivered to the user specified in name, which can be used without the @ character, or as part of name@machine. -mu user1 sends mail to user1 on the local host. For example, -mu user1@HOST1 sends mail to user1 on HOST1.

2.2.5 JOR Options

When the execution of a batch request is completed, a JOR is created. You can specify where the JOR is sent through the use of options when submitting the batch request. The JOR is sent to one of the following.

The JOR can also be sent by mail. If these options are not used, the JOR is output as the NQS manager specifies.

Only one of the following options can be specified at a time.

Specify the -j option as follows.

Example:

If a particular machine is not specified, the default is the host where the batch request is executed.

Example:

When the batch request is submitted to SX-4 and executed on nec1, JOR is sent to /usr/nqs/jor on nec1.

Home


2.2.6 Other Options

Table 2-4 lists other options, such as specifying the exact time for a batch request execution, specifying the queue for submitting a batch request, and specifying the priority of the batch request.

Table 2-4 Other Options

OptionDescription
-a date-time Suspends request execution until the specified date and time. Specification examples are as follows (see qsub(1) for details)

  • 01-Jan-1996 12am, GMT
  • Tuesday, 23:00:00
  • 11pm tues
  • tomorrow 23-GMT
If specifying dates and times of two or more tokens separated by spaces, place them in double quotes as in -a "July 4,2000 12:31-GMT"
-ac acctcodeSpecifies the account code of the request.
-c CPU-countSpecifies the CPU count of the request.
-ncDeclares that batch request is not able to checkpoint.
-nrDeclares that the batch request is not restartable.
-p priority Assigns an intra-queue priority to the request. The priority must be an integer in the ascending priority range of 0-63, inclusive. The default value is assigned if not specified.
-q queueSpecifies the batch request queue. If you omit this option, the environment variable set is searched for the QSUB_QUEUE variable. However, if it is not found, NQS submits the request to the default batch request queue defined by the System Administrator. If the default queue is not defined, an error message appears and NQS does not accept the request.
-r req-name Specifies the request name. Request names are assigned if not specified. The request name is stdin if entering the script from standard input. When using the name of the script file, the file name excludes the absolute path or leading path name.

If the request name begins with a digit, R is prefixed. All request names are truncated to a maximum length of 63 characters.
-s shell Specifies the absolute path name of the shell that interprets the batch request shell script. Without this option, NQS uses one of three distinct shell choice strategies. Any one of the three strategies can be configured by a System Administrator for each NQS machine. Specify this option if the default shell strategy is not suitable. This default shell strategy can be accessed with the qlimit command. The three shell strategies are as follows.

  • Fixed causes the configured fixed shell to be executed to interpret all batch requests.

  • Free causes your login shell as defined in the password file to be executed. This in turn chooses and spawns the appropriate shell for interpreting the batch request script.

  • Login causes only your login shell to be executed to interpret the script.

Home


2.3 CONFIRMING BATCH REQUESTS

The qstatr(1) command is used to confirm the status of a submitted batch request. The request is specified by the request ID. If you know the request ID, you can directly specify the request. The following example shows such direct specifying.

Example:

$ qstatr 72.HOST1
=======================================
NQS (Rxx.xx) BATCH REQUESTS  HOST: HOST1
=======================================
REQUEST ID       NAME     OWNER    QUEUE   PRI NICE MEMORY  TIME  STT  JID   R
--------------- -------- -------- -------- ---- --- ------ ------ --- ------ -
72.HOST1        STDIN    user1    batch1     20   0   1617  0.157 RUN    18  -
------------------------------------------------------------------------------

The STT column displays the request status. These request statuses have the following meanings.

PRR and POR are available on the cluster system with NQS/MPI only.

Home


See qstatr(1) for the contents of items other than STT.

The qstatr command can be specified with a request name instead of the request ID. When the qstatr command is specified with the request name, execute it with the -r option appended. See the following example.

Example:

$ qstatr -r request-name

This request name is assigned to a request when it is submitted with the qsub command.

If you do not know the request ID, you can access information on any submitted batch requests with the
-b option, as shown in the following example.

Example:

$ qstatr -b
=================================================
NQS (Rxx.xx) BATCH REQUEST  HOST: HOST1
=================================================
REQUEST ID       NAME     OWNER    QUEUE   PRI NICE MEMORY  TIME  STT  JID   R
--------------- -------- -------- -------- ---- --- ------ ------ --- ------ -
72.HOST1        STDIN    user1    batch1     20   0   1617  0.157 RUN    18  *
73.HOST1        STDIN    user1    batch1     20  10               QUE        -
74.HOST1        STDIN    user1    batch2     20  10               QUE        -
------------------------------------------------------------------------------

Specify the -f option to see more information about a request, as shown in the following example.

Example:

$ qstatr -f 72.host1
================================================
NQS (Rxx.xx) BATCH REQUEST: 72.HOST1
================================================
        Name: STDIN                     State: running
        Owner: user1
        Group: group1
        Created: Tue Jan 23 1996        Priority: 20
                 7:33:04                JOB ID: 18
        Acctcode: acct1
        Restricted: Already running
        Scheduling priority: 1
QUEUE
        Name: batch1@HOST1
RESOURCES LIMITS
    Per-process
        Core File Size =        UNLIMITED  <DEFAULT>
        Data Segment =          UNLIMITED  <DEFAULT>
        Permanent File Size =   UNLIMITED  <DEFAULT>
        Memory Size =           UNLIMITED  <DEFAULT>
        Stack Segment =         UNLIMITED  <DEFAULT>
        CPU Time =              UNLIMITED  <DEFAULT>
        Perm File Capacity =    UNLIMITED  <DEFAULT>
        Open file =             UNLIMITED  <DEFAULT>
        Number of CPU =         UNLIMITED  <DEFAULT>
        CPU Resident Time =     UNLIMITED  <DEFAULT>
        CPU Resident Number =   1          <DEFAULT>
    Per-request
        Temp File Capacity =    UNLIMITED  <DEFAULT>
        Memory Size =           UNLIMITED  <DEFAULT>
        CPU Time Limit =        UNLIMITED  <DEFAULT>
        Tape Drives =           UNLIMITED  <DEFAULT>
        Perm File Capacity =    UNLIMITED  <DEFAULT>
        FSG 0 limit =           UNLIMITED  <DEFAULT>
        FSG 1 limit =           UNLIMITED  <DEFAULT>
        FSG 2 limit =           UNLIMITED  <DEFAULT>
        FSG 3 limit =           UNLIMITED  <DEFAULT>
        Open file =             UNLIMITED  <DEFAULT>
        Process Number =        UNLIMITED  <DEFAULT>
        CPU Resident Time =     UNLIMITED  <DEFAULT>
        CPU Resident Number =   1          <DEFAULT>

SCHEDULING PARAMETER
        Nice Value                  0
        Base Priority                      80
        Timeslice Value                  1000
        Memory Priority                     0
        Modification factor of CPU          2
        Tick Count                          0
        Decay Factor                        1
        Decay Interval                      1
        Mrt Size Effect                    30
        Mrt Priority Effect               100
        Aging Range                       160
        Mrt Minimum                         2
        Slave Priority                      0
        CPU Count                          32

FILES              MODE      NAME
        Stdout:   SPOOL     HOST1:/usr/nec/STDIN.o72
        Stderr:   SPOOL     HOST1:/usr/nec/STDIN.e72

MAIL
        Address         user1@HOST1      When: NONE

MISC
        Restartable     Yes             User Mask: 0
        Restartstate    No              Orig.Owner: user1
        Shell:          DEFAULT         Jor: NONE
        Checkpoint:     Yes
        Resource Sharing Group: DEFAULT

Home


To execute the qstatr command for the request transferred to the remote host, execute the command in either of the following methods: execute the qstatr command with the -t option specified at the machine from which the request was submitted, or execute the qstatr command in the host where the request exists.

Example:

qstatr -t 2 72.host1

Specifying the -t option enables display of the requests transferred to other hosts.

The number following -t indicates a level. To reference requests on the remote host, specify 1 or 2.

The qstat(1) command is also used to confirm the status of a request. It differs from the qstatr command in the use method and output format. The user may use whichever command is most convenient. See Chapter 3, User Commands, for details about the qstatr command.

Home


2.4 CONFIRMING BATCH QUEUE STATUS

qstatq can be used to confirm the status of a batch queue. qstatq with the -b option accesses information on all batch queues, as shown in the following example.

Example:

$ qstatq -b
========================================================
NQS (Rxx.xx) BATCH QUEUE SUMMARY   HOST: HOST1
========================================================
QUEUE NAME       ENA STS PRI/BPR/ TMS /MPR RLM  TOT QUE RUN WAI HLD SUS ARR EXT
--------------- ------------------------------- -------------------------------
batch1           ENA RUN  20/ 80/ 1000/  0   2    3   1   1   1   0   0   0   0
batch2           ENA INA  30/ 80/ 2000/  0   3    2   0   0   1   1   0   0   0
--------------- ------------------------------- -------------------------------
   <TOTAL>                                  10    5   1   1   2   1   0   0   0
--------------- ------------------------------- -------------------------------

The status appears under the ENA and STS columns, as follows.

Specify the batch queue with the qstatq command to access the status of an individual batch queue, as shown in the following example.

Home


Example:

$ qstatq batch1
========================================================
NQS (Rxx.xx) BATCH QUEUE SUMMARY   HOST: HOST1
========================================================
QUEUE NAME       ENA STS PRI/BPR/ TMS /MPR RLM  TOT QUE RUN WAI HLD SUS ARR EXT
--------------- ------------------------------- -------------------------------
batch1           ENA RUN  20/ 80/ 1000/  0   2    3   1   1   1   0   0   0   0
--------------- ------------------------------- -------------------------------
   <TOTAL>                                   2    3   1   1   2   0   0   0   0
--------------- ------------------------------- -------------------------------

Specify the -f option with the qstatq command to access detailed information on a batch queue, as shown in the following example.

Example:

$ qstatq -f batch1
==========================================================
NQS (Rxx.xx) BATCH QUEUE: batch1@HOST1
==========================================================
  Priority:   20              Status:  [ENABLED  , RUNNING]
  Batch Base Priority: 80     Time Slice Value: 1000
  Nice Value: 0               Memory Priority: 0
  Mod Factor of CPU: 2        Tick Count: 0
  Decay Factor: 1             Decay Interval: 1
  Mrt Size Effect: 30         Mrt Pri Effect: 100
  Aging Range: 160            Mrt Minimum: 2
  Slave Priority: 0           CPU Count: 32
  Scheduling Mode: TYPE-0
  Continuous Scheduling Number: Undefined
  Default Scheduling Priority: 1
  Resource-occupy Wait: Undefined
  Resource Sharing Group: DEFAULT

ENTRIES
   Total:      3
   Queued:     1     Running:    1     Waiting:    1
   Held:       0     Suspending: 0     Arriving:   0
   Exiting:    0

COMPLEX MEMBERSHIP
  complex1, complex2

RUN LIMITS
  Total run limit:   2   FSG0(XMU) run limit: Unlimited
  FSG1 run limit: Unlimited    FSG2 run limit: Unlimited
  FSG3 run limit: Unlimited    Memory run limit: Unlimited
  User run limit:    2   Group run limit: Unlimited

RESOURCE LIMITS
 Per-process
  Core File Size Limit =       UNLIMITED <DEFAULT>
  Data Size Limit =            UNLIMITED <DEFAULT>
  Permanent File Size Limit =  UNLIMITED <DEFAULT>
  Memory Size Limit =          UNLIMITED <DEFAULT>
  Stack Size Limit =           UNLIMITED <DEFAULT>
  CPU Time Limit =             UNLIMITED <DEFAULT>
  Perm File Capacity Limit =   UNLIMITED <DEFAULT>
  Open File Limit =            UNLIMITED <DEFAULT>
  Number of CPU Limit =        UNLIMITED <DEFAULT>
  CPU Resident Time Limit =    UNLIMITED <DEFAULT>
  CPU Resident Number =        1         <DEFAULT>
 Per-request
  Tape Drives Limit =          UNLIMITED <DEFAULT>
  Memory Size Limit =          UNLIMITED <DEFAULT>
  CPU Time Limit =             UNLIMITED <DEFAULT>
  Temp File Capacity Limit =   UNLIMITED <DEFAULT>
  Perm File Capacity Limit =   UNLIMITED <DEFAULT>
  FSG 0 (XMU) Limit =          UNLIMITED <DEFAULT>
  FSG 1 Limit =                UNLIMITED <DEFAULT>
  FSG 2 Limit =                UNLIMITED <DEFAULT>
  FSG 3 Limit =                UNLIMITED <DEFAULT>
  Open File Limit =            UNLIMITED <DEFAULT>
  Process Number Limit =       UNLIMITED <DEFAULT>
  CPU Resident Time Limit =    UNLIMITED <DEFAULT>
  CPU Resident Number =        1         <DEFAULT>

ACCESS
  Unrestricted access

ATTRIBUTE
  LOADBALANCE   ON

LOAD BALANCING PARAMETER
  Keeping request number limit = 1
  Delivery wait time = 30

FORCE RESTART MODE
  When file open failed   OFF
  When file modified      ON

CUMULATIVE TIME
  System space time = 3.290 seconds
  User space time =   0.483 seconds

Home


Specify the remote host status to access with the -h option, as shown in the following example.

$ qstatr -h NEC1 -b

When you specify the queue, NQS infers that the queue is on the local host. A remote host can be specified by entering qstatq with the -h option, as shown in the following examples.

Examples:

qstatq batch1 (Displays information about the local batch1 queue)
qstatq -h NEC1 batch1 (Displays information about the batch1 queue on NEC1)

Home


2.5 CONFIRMING PIPE QUEUE STATUS

The qstatq command can be used to confirm the status of a pipe queue. qstatq with the -p option displays information on all pipe queues in the system, as shown in the following example.

Example:

$ qstatq -p
===============================================================
NQS (Rxx.xx) PIPE   QUEUE SUMMARY  HOST: host1
===============================================================
QUEUE NAME       ENA  STS  PRI  RLM   TOT  QUE  ROU  WAI  HLD  ARR
--------------- -------------------- ------------------------------
pipe1            ENA  INA  20    1     2    2    0    0    0    0
pipe2            DIS  STP  30    2     0    0    0    0    0    0
netpipe1         ENA  ROT  20    1     2    1    1    0    0    0
--------------- -------------------- ------------------------------
   <TOTAL>                       10    4    3    1    0    0    0
--------------- -------------------- ------------------------------

Status appears under the ENA and STS columns, as follows.

The status of each pipe queue can also be accessed by specifying the pipe queue name, as shown in the following example.

Example:

$ qstatq pipe1
===============================================================
NQS (Rxx.xx) PIPE   QUEUE SUMMARY  HOST: host1
===============================================================
QUEUE NAME       ENA  STS  PRI  RLM   TOT  QUE  ROU  WAI  HLD  ARR
--------------- -------------------- ------------------------------
pipe1            ENA  INA  20    1     2    2    0    0    0    0
--------------- -------------------- ------------------------------
   <TOTAL>                       1     2    2    0    0    0    0
--------------- -------------------- ------------------------------

Home


Specify the -f option to display information on the pipe queue, as shown in the following example. As with batch queues, you can also specify a host with the -h option.

Example:

$ qstatq -f pipe1
=======================================================================
NQS (Rxx.xx) PIPE   QUEUE: pipe1@host1
=======================================================================
   Priority:  20                 Status:  [ ENABLE , INACTIVE ]
   Queue server: /usr/lib/nqs/pipeclient

ENTRIES
   Total:        2
   Queued:       2       Routing:        0       Waiting:        0
   Held:         0       Arriving:       0

RUN LIMITS
   Total run limit:   3
   User run limit :   Unlimited      Group run limit : Unlimited


DESTINATIONS
      batch1@host1, batch2@host1

ACCESS
   Unrestricted access

ATTRIBUTE
   BEFORECHECK   OFF
   STAYWAIT      OFF
   FREEDESTINATION OFF
   LOADBALANCE   ON
   TRANSPARENT   OFF

LOAD BALANCING PARAMETER
   Reserved run limit =     1
   Destination retry wait = 3600

CUMULATIVE TIME
   System space time= 1.00 sec
   User space time=   2.00 sec

The DESTINATIONS information is particularly important in the pipe queue. It displays the destination queue. When a request is submitted to a pipe queue, the pipe queue indicates the queue to which the request is routed. More than one destination queue may be set in the pipe queue, as shown in the following example.

Home


Example:

DESTINATIONS
      batch1@host1,batch2@host1

In this case, an attempt is made to transfer a request to the batch1@host1 queue first. If the request cannot be transferred to that queue, it is transferred to the batch2@host1 queue. The request cannot be transferred because the destination queue may not be able to accept requests and so forth. If the remote host queue is set in the destination queue, the pipe queue is called a network pipe queue.

Example:

DESTINATIONS
      batch1@host2

In this example, the request is transferred to queue batch1 in remote host host2.

When accessing the pipe queue status on a remote host, specify the host with -h option in the same way as the batch queue status. The following example shows how to access the pipe queue status on the host nec1 from the host nec1.

Example:

$ qstatq -h nec1 -p

The same rule on the batch queue applies to the pipe queue in specifying the queue name.

2.6 ALTERING BATCH REQUEST ATTRIBUTES

You can change batch request attributes after submitting a request with qalter. The following example shows how to change the per-process CPU time limit of request ID 72.HOST1.

Example:

$ qalter -lt 1000 72.HOST1

Most request attributes can be changed while the request is in the queued state. However, once a request enters the running state, only alterable attributes can be changed during the execution phase. Table 2-5 contains a list of some qalter options. For all options, see the qalter(1) page in Chapter 3, User Commands.

You cannot alter attributes to values that exceed queue resource limits. An error occurs if the change cannot be made.

If a request is in a batch queue, the queue resource limit values that are not supported by the system cannot be changed. While routing a request on the pipe queue or executing a request on the batch queue, you cannot change most request attributes.

Home


Table 2-5 Batch Request Attributes

OptionExplanationExample
-aChanges the request execution time qalter -a 17:20 72.HOST1
-cChanges the CPU count value qalter -c 10 72.HOST1
-eChanges the stderr-output-direction qalter -e HOST1:/usr/nec/result.e 72.HOST1
-lmChanges the value of queue resource limit qalter -lm 2kb 72.HOST1
-mbChanges the mode of sending mail when the request begins execution qalter -mb on 72.HOST1
-meChanges the mode of sending mail when the request ends execution qalter -me off 2kb 72.HOST1
-muChanges the user to send mail qalter -mu user2 72.HOST1
-nrChanges the mode whether restartable or not qalter -nr off 72.HOST1
-oChanges the stdout-output-direction qalter -o HOST1:/usr/nec/result.o 72.HOST1
-pChanges the request priority qalter -p 25 72.HOST1
-reChanges the transmission mode of stderr-output-file qalter -re n 72.HOST1
-roChanges the transmission mode of stdout-output-file qalter -ro s 72.HOST1
-sChanges the shell to execute request qalter -s /bin/sh 72.HOST1

Home


2.7 DELETING A BATCH REQUEST

Specify qdel with a request ID to delete a batch request that is queued, waiting, or holding, as shown in the following example.

Example:

$ qdel 72.HOST1
Request 72.HOST1 has been deleted.

If the request is being executed, a message to that effect is displayed, as shown in the following example. The request is not deleted.

Example:

$ qdel 73.HOST1
Request 73.HOST1 is running.

To delete a request from the remote host, execute the qdel(1) command in the host from which the request was submitted or in the host where the request exists.

The qdel(1) command can specify a request with the request name. In this case, specify the -r option as shown in the following example.

Example:

qdel -r MAKE3

The following explains the process of deleting an executing request. The deletion methods previously described in this section cannot be used to delete an executing request. Specify the -k option and execute the qdel(1) command to delete an executing request. Specify the request ID as previo usly described in this section.

Example:

$ qdel -k 74.HOST1
Request 74.HOST1 is running, and has been signaled.

The -k option sends a SIGKILL signal to the request that forcibly ends the execution of the request. To send another signal, specify the number of that signal. In the following example, the SIGINT and SIGHUP signals are sent.

Example:

$ qdel -2 72.HOST1
$ qdel -1 72.HOST1

This means that -k and -9 are functionally the same. Specifying -k on a request that is not running also deletes it. Therefore, running and waiting requests can be deleted at the same time, as shown in the following example.

Example:

$ qdel -k 72.HOST1 73.HOST1

2.8 HOLDING A BATCH REQUEST

To hold a queued or waiting batch request, use qhold. This command holds the request and makes it ineligible for execution until the user removes the hold. Use qhold and the request ID to hold a request, as shown in the following example.

Example:

$qhold 72.HOST1
Request 72.HOST1 has been held.

When a batch request is held correctly, the message that the batch request has been held is printed on the stdout. When a batch request is not held because of a particular condition, an error message corresponding to the condition is printed.

If a request is running and the system and NQS permit a checkpoint, NQS gets the request checkpoint and holds it.

When specifying a restart file name is permitted by the NQS manager, you can specify it by the -F path-name option. The format of the path-name is as follows:

[[/]path/]file-name

If the -F option is omitted, a restart file is created on the current directory with the name request-name.hsequence-number.

If specifying a restart file name is not permitted, a restart file is automatically created on the directory /usr/spool/nqs/restart.

The qhold and qrls commands can specify a request in the same way as the request name or can suspend or release a request on the remote machine in the same way as the qdel command.

Home


2.9 RELEASING A BATCH REQUEST

To release a hold on a batch request, use qrls. This command releases a previously installed hold on a batch request. Releasing a hold has various effects on the batch request. The effects depend on the state of the request when the hold was applied.

Use qrls followed by the request ID to release the hold on a request, as shown in the following example.

Example:

$ qrls 75.HOST1
Request 75.HOST1 has been released.

If qrls is used on a request that has no hold applied to it, the message shown in the following example appears.

Example:

$ qrls 76.HOST1
Request 76.HOST1 is not holding.

The qhold and qrls commands can specify a request in the same way as the request name or can suspend or release a request on the remote machine in the same way as the qdel command.

Home


2.10 SUSPENDING AND RESTARTING A BATCH REQUEST

This section explains how to suspend and restart a batch request. To suspend or restart a batch request, use the qspnd(1) and qrsm(1) commands, respectively. The qspnd(1) command is validated only for the request being executed or when in the RUNNING state.

To suspend a request, specify the request ID of the request and execute the qspnd command as shown in the following example.

Example:

$qspnd 72.host1
Request 72.host1 has been suspended.

If suspend processing is performed correctly, a message indicating the suspension state of the request is output.

To release the suspension state of a request, specify the request ID of the request and execute the qrsm command.

Example:

$qrsm 72.host1
Request 72.host1 has been resumed.

If the release processing is performed correctly, a message indicating the release of the suspended request is output. If the release processing has failed because the request was not suspended or similar, the corresponding error message is output.

Example:

$qrsm 73.host1
Request 73.host1 is not suspending.

The qspnd and qrsm commands can specify a request in the same way as the request name or can suspend or release a request on the remote machine in the same way as the qdel command.

2.11 CHECKPOINTING AND RESTARTING A BATCH REQUEST

To checkpoint or restart a batch request, use the qchk(1) and qrst(1) commands. A checkpoint applies only to a running batch request. Checkpointing does not stop the running batch request. You can restart a checkpointed request after the request execution finishes.

To checkpoint a request specify the request ID and execute the qchk command as follows.

$ qchk 72.host1
Start to get checkpoint of request 72.host1.
Please wait.....
Request 72.host1 has been restarted.

If checkpoint processing is performed correctly, a message appears.

When specifying a restart file name is permitted by the NQS manager, you can specify it by the -F path-name option. The format of the path-name is as follows:

[[/]path/]file-name

If the -F option is omitted, a restart file is created on the current directory with the name request-name.csequence-number.

If specifying a restart file name is not permitted, a restart file is automatically created on the directory /usr/spool/nqs/restart.

To restart a request from its checkpoint, specify the request ID assigned to it and execute the qrst(1) command as follows.

$ qrst 72.host1
Request 72.host1 has been restarted.

If the restart processing is performed correctly, a message is output. If the restart has failed because the request was still running, the following message appears.

$ qrst 73.host1
Request 73.host1 is running.

The qchk and qrst commands can specify a request in the same way as the request name or can checkpoint or restart a request on the remote machine in the same way as the qdel command.

Home


2.12 RERUNNING A BATCH REQUEST

A running batch request can be stopped and returned to the queue with qrerun. This command checks that the request can be returned to the queue by verifying that all of the running request's attributes are allowed in the queue.

If the attributes are valid, the request is killed and returned (submitted) to the queue with the same request ID. If any of the attributes are not allowed in the queue, the request continues to run and a message describing the outcome is displayed. If the request is rerun, the entire batch request starts at the beginning.

Include the request ID with the qrerun command, as shown in the following example.

Example:

$ qrerun 81.HOST1
Request 81.HOST1 has been rerun.

If the command is used on a request that is in any other state than running, it is not executed and a message appears as shown in the following example.

Example:

$ qrerun 81.HOST1
Request 81.HOST1 is not running.

The qrerun command can specify a request in the same way as the request name or can rerun a request on the remote machine in the same way as the qdel command.

Home


2.13 MOVING A BATCH REQUEST

Use qmove to move a nonrunning batch request to another queue. Only requests that are held, waiting, or queued may be moved. The command first checks to see if the request attributes are allowable in the new queue. If they are, the request is moved to the new queue. The state of the request remains the same. If any request attributes are not allowed, the request is not moved and a message describing the outcome is displayed.

The qmove command allows you to move specific requests or all of the requests in a specified queue. Include the request ID with the command, followed by the new queue name to move a specific request, as shown in the following example.

Example:

$ qmove 81.HOST1 batch1
Request 81.HOST1 has been moved.

Use the -q option to move all eligible requests in a queue, as shown in the following example.

Example:

$ qmove -q batch1 batch2
Request 81.HOST1 has been moved.
Request 82.HOST1 has been moved.
Request 83.HOST1 has been moved.

If qmove is used on a running request, the command is not executed and the message shown in the following example appears.

Example:

$ qmove 84.HOST batch1
Request 84.HOST1 is running.

The qmove command can specify a request with the request name in the same way as the qdel command.

2.14 SENDING MESSAGES TO A BATCH REQUEST

Use the qmsg command to send messages to batch requests. The messages are embedded in the request's output file.

Example:

$ qmsg 72.HOST1
Compile test.c
CTR-D(EOF)

Specifying qmsg sends a message to both the standard and error output files. -o sends the message to the standard output file. -e sends the message to the standard error output file.

Example:

$ qmsg -o 72.HOST1
$ qmsg -e 72.HOST1

The qmsg command can specify a request with the request name in the same way as the qdel command.

Home


2.15 CONFIRMING VALID RESOURCE LIMITS

Valid resource limits are those supported by the host at the time of request execution. Although these request limits may be entered as request attributes, their specifications are ignored if they are not supported by the host that executes the request. The local host's valid resource limits can be confirmed by using the qlimit command without options, as shown in the following example.

Example:

$ qlimit
 Core file size limit  (-lc)
 Data segment size limit  (-ld)
 Per-process permanent file size limit (-lf)
 Per-process memory size limit (-lm)
 Stack segment size limit (-ls)
 Per-process cpu time limit (-lt)
      .
      .
 Nice value (-ln)

 Shell strategy = LOGIN

The qlimit command outputs valid resource limits for the local host. It also shows the options that can be specified with the qsub command. Finally, it outputs the shell strategies defined by the system administrator. To confirm valid resource limits on a remote host, specify the name of the remote host.

Example:

$ qlimit HOST2
 Per-process corefile size limit (-lc)
 Per-process data size limit (-ld)
 Per-process permanent file size limit (-lf)
 Per-process permanent file space limit (-lF)
 Per-process stack size limit (-ls)
 Per-process CPU time limit (-lt)
          .
          .
 Nice value (-ln)

 Shell strategy = FIXED

Home


2.16 TERMINATING A BATCH REQUEST

The qstatr command and other commands are used to confirm the termination of a batch request. The qwait(1) command is used to await the termination of a batch request and confirm its termination state. See the following examples.

Example 1: Request terminated with termination code 45.

$ qwait 123.host1
done 45

Example 2: Request terminated by SIGKILL.

$ qwait 124.host1
killed 9

The termination of a request is recognized from the mail transmitted at that time. To transmit mail upon the termination of a request, the -me option in the qsub command must be specified when the request is submitted. However, if the request is canceled due to a failure occurrence, the mail reporting the failure status is always transmitted.

2.17 BATCH REQUEST OUTPUT FILE

When a batch request ends, its output is stored in files. Two output files are usually generated. The first file stores the contents output to the standard output (stdout) during script execution. The other file contains the contents output to the standard error output (stderr) during script execution. These files can be specified when submitting a request and they can also be altered prior to running them. You can specify a request name when submitting a request. If a request name is not specified, the script name becomes the request name. If the script comes from the standard input (stdin), and a request name is not entered, the request name becomes STDIN.

The format for naming the standard output file is as follows.

request-name.orequest-sequence-number

The format for naming the standard error file is as follows.

request-name.e request-sequence-number

Example:

A request named batreg withrequest number 72 has the following error file name.
batreg.e72

Home


2.18 SUBMITTING A DEVICE REQUEST

A device request sends requests to printers and plotters. The sequence of device request operations is as follows.

  1. Prepare output data.
    Prepares data to output to the device.

  2. Enter the device request.
    Enters data prepared in Step 1 to NQS as a device request.

  3. Confirm the request.
    Confirms the status, queue, and the device.

  4. Terminate a device request.
    A device request can be forcibly terminated at any point.

Device request operations create a file for output data to the device. This data may also be obtained from standard input (stdin) when submitting a device request. As with the shell script of a batch request, no embedded options can be included in this data file. Enter a device request with the qpr command, as shown in the following example.

Example:

$ qpr -q device1 data1
Request 80.HOST1 submitted to queue: device1.

In this example, the data1 file is submitted to a device1 device queue as a device request. The entry is successful, and a message appears indicating that the request has been accepted. This message is the same as the one output during batch request entry.

As with the qsub command, you can specify more than one option with the qpr command. Table 2-6 shows some options that can be used with qpr. For information about all the options available, see the qpr(1) pages in Chapter 3, User Commands.

Table 2-6 qpr Options

Unique OptionDescription
-f form-nameLimits the set of acceptable devices to those that are loaded with form-name. When this option is omitted, the qpr command only submits the request to a device that is loaded with the default forms. When a default form is not defined, the request is submitted to an appropriate device regardless of the forms configured for the device. In any case, only those devices associated with the chosen queue are considered.
-n copies Specifies the number of copies to be printed. The default is one.

Once a device request is accepted by the device queue, the request is output according to priority and entry order. If the output device is in a CLOSED or FAILED state, the device request is not processed. The qstatd command can be used to confirm the device status.

qstatd returns information about devices in a formatted output. Use this command, followed by the device name, to display information about a specific device, as shown in the following example.

Example:

$ qstatd dev1
==================================================
NQS (Rxx.xx) DEVICE SUMMARY  HOST: HOST1
==================================================

DEVICE NAME      ENA  STS   FORMS
--------------- ---------- -----------------------------------
dev1             ENA  INA   form1
--------------- ---------- -----------------------------------

Home


Use qstatd without a device name to obtain information on all devices on the local host, as shown in the following example.

Example:

$ qstatd
==================================================
NQS (Rxx.xx) DEVICE SUMMARY  HOST: HOST1
==================================================

DEVICE NAME      ENA  STS   FORMS
--------------- ---------- -----------------------------------
dev1             ENA  INA   form1
dev2             ENA  INA   form1 form2
dev3             DIS  INA   form2
--------------- ---------- -----------------------------------

Specify the -f option for detailed information on devices, as shown in the following example.

Example:

$ qstatd -f dev1
====================================================
NQS (Rxx.xx) DEVICE: dev1@HOST1
====================================================
   Status:  [ ENABLE   , INACTIVE  ]
   Fullname: /dev/lp
   Server:   /usr/lib/nqs/lpserver

FORMS
   form1

RELATION DEVICE QUEUES
   device1, device2, device3

Use the -h option to specify a host's device status, as shown in the following example.

Example:

$ qstatd -h NEC1
==================================================
NQS (Rxx.xx) DEVICE SUMMARY  HOST: NEC1
==================================================

DEVICE NAME      ENA  STS   FORMS
--------------- ---------- -----------------------------------
lpdev            ENA  INA   lpform
texdev           ENA  INA   texform
--------------- ---------- -----------------------------------

Home


The qstatd command displays information about a local or remote device. The host can be changed with the -h option.

Example 1: Displays information on dev1 on the local host

qstatd dev1

Example 2: Displays information on dev1 on the remote host NEC1

qstatd -h NEC1 dev1

2.19 CONFIRMING DEVICE REQUEST STATUS

Use the qstatr command to confirm a device request status. The method for using these commands is exactly the same as for batch requests.

Example 1 confirms the status of one device request. Example 2 confirms the status of all device requests.

Example 1:

$ qstatr 72.HOST1
==================================================
NQS (Rxx.xx) DEVICE REQUESTS HOST: HOST1
==================================================
REQUEST ID      NAME      OWNER    QUEUE   PRI    SIZE  STT
-------------- --------- -------- -------- ----  ------ ---
72.HOST1        nec       user1    device1   20    1106 RUN
------------------------------------------------------------
Example 2:
$ qstatr -d
===================================================
NQS (Rxx.xx) DEVICE REQUESTS  HOST: HOST1
===================================================
REQUEST ID      NAME      OWNER    QUEUE   PRI    SIZE  STT
-------------- --------- -------- -------- ----  ------ ---
72.HOST1        nec       user1    device1   20    1106 RUN
73.HOST1        STDIN     user1    device1   30    2000 QUE
74.HOST1        nqs       user1    device2   20     500 WAI
------------------------------------------------------------

Display a specific device request status in detail using the qstatr -f option, as shown in the following example.

Home


Example:

$ qstatr -f 72.HOST1
==================================================
NQS (Rxx.xx) DEVICE REQUEST: 72.host1
==================================================

Name:     nec              State:          RUNNING
Owner:    user1
Group:    group1
Acctcode: account1
Created:  Fri Apr 20 1990  Priority:       20
          09:41:06 GMT
QUEUE
          Name: device1@host1
FORMS
          Name:          form1
MAIL
          Address          user1@host1         When:      END
MISC
          Orig.Owner:          user1
          Size:                2145
          Copies:              1

Display the status of all the device queues on the local host using the qstatq -d option, as shown in the following example.

Example:

$ qstatq -d
=======================================================
NQS (Rxx.xx) DEVICE QUEUE SUMMARY HOST: host1
=======================================================

QUEUE NAME      ENA  STS  PRI    TOT  QUE  RUN  WAI  HLD  ARR
-------------- ---------------- ------------------------------
device1         ENA  RUN  20      3    2    1    0    0    0
device2         ENA  INA  30      2    0    0    1    1    0
-------------- ---------------- ------------------------------
   <TOTAL>                        5    2    1    1    1    0
-------------- ---------------- ------------------------------

Display the status of a specific device queue with the qstatq command, as shown in the following example.

Example:

$ qstatq device1
=======================================================
NQS (Rxx.xx) DEVICE QUEUE SUMMARY HOST: host1
=======================================================

QUEUE NAME      ENA  STS  PRI    TOT  QUE  RUN  WAI  HLD  ARR
-------------- ---------------- ------------------------------
device1         ENA  RUN  20      3    2    1    0    0    0
-------------- ---------------- ------------------------------
   <TOTAL>                        3    2    1    0    0    0
-------------- ---------------- ------------------------------

Home


Display the status of a specific device queue in detail using the qstatq -f option, as shown in the following example.

Example:

$ qstatq -f device1
==========================================================
NQS (Rxx.xx) DEVICE QUEUE: device1@host1
==========================================================
   Priority:  20                  Status:  [ ENABLED, RUNNING ]

ENTRIES
   Total:        3
   Queued:       2       Running:        1       Waiting:        0
   Held:         0       arriving:       0

DEVICES
   dev1, dev2

ACCESS
   Unrestricted access

CUMULATIVE TIME
   System space time=  35.35 sec
   User space time=  24.57 sec

2.20 DELETING A DEVICE REQUEST

The methods for deleting a device request are the same as for a batch request (see Section 2.7).

2.21 MOVING NETWORK REQUESTS

When you issue the qmove command and specify an EXITING batch request and network queue, you can move all network requests that are staging out stdout/stderr/jor files for the specified batch request to the specified network queue. Using this method, you can change the target host for staging out the job output files. Staging out fails if the path for staging out, which is specified when the batch request is submitted, does not exist on the changed target host.

You cannot use the -q option to move all network requests in the specified network queue and you cannot specify network requests directly.

2.22 DELETING NETWORK REQUESTS

When you issue the qdel command and specify an EXITING batch request, you can delete all network requests that are staging out stdout/stderr/jor files for the specified batch request. The -k option or -signo option are not necessary. If the target network request is running, the output file staged out by the network request is put on the requested owner's home directory on the requested executed machine. You cannot specify network requests directory.

Home


2.23 CONFIRMING NETWORK REQUEST STATUS

Use the qstatr command to confirm the status of a network request. The method for using this command is exactly the same as for batch requests. Using the qstatr command with the -N option displays network requests.

$qstatr -N
==================================================
NQS (Rxx.xx) NETWORK REQUEST  HOST: host1
==================================================
REQUEST ID       EVENT   NAME     OWNER    QUEUE NAME     PRI  STT  PGRP
--------------- ------- -------- -------- --------------- ---- --- ------
396.host1       31(ERR) STDIN    user1    net1              31 QUE
396.host1       30(OUT) STDIN    user1    net1              31 QUE
-------------------------------------------------------------------------

Using the qstatr command with the -N -f options displays network requests in detail.

$ qstatr -N -f 395.host1
=================================================
NQS (Rxx.xx) NETWORK REQUEST: 395.host1
=================================================
        Name: STDIN                     State: queued
        Owner: user1                    Priority: 31
        Group: group1                   Event: 30
        Created: Mon Mar 28 1994
                 13:50:55
QUEUE
        Name: net1@host1

STAGING FILE
        Name: /home/nqs/STDIN.o395
MAIL
        Address:        user1@host1
MISC
        Orig.Owner:     user1

Using the qstatq command displays network queues.

$ qstatq -N
========================================================
NQS (Rxx.xx) NETWORK QUEUE SUMMARY    HOST: host1
========================================================
QUEUE NAME       DESTINATION MACHINE   ENA STS PRI RLM   TOT QUE RUN WAI
--------------- --------------------- ----------------- -----------------
DefaultNetQue            -             ENA INA  -1  20     0   0   0   0
net1             host1                 ENA INA  20   1     4   0   1   3
net2             host2                 ENA STP  40   2     0   0   0   0
--------------- --------------------- ----------------- -----------------
 <TOTAL>                                          20     4   0   1   3
--------------- --------------------- ----------------- -----------------

Home


Using the qstatq command with the -f option displays network queues in detail.

$ qstatq net1
==========================================================
NQS (Rxx.xx) NETWORK QUEUE: net1@host1
==========================================================
  Priority: 20               Status:  [ENABLED  , INACTIVE]
  Queue server: /usr/lib/nqs/netclient

ENTRIES
   Total:      4
   Queued:     0     Running:    1     Waiting:    3

RUN LIMITS
 Total run limit:   1

DESTINATIONS MACHINE (MID)
  host1 (100)

CUMULATIVE TIME
  System space time = 5.20 seconds
  User space time =   0.85 seconds

Home


2.24 USING THE NQS/MPI FUNCTION

While an MPI process running on a single node can use normal batch queues, the NQS/MPI function is needed to run an MPI process across two or more nodes as a single NQS request. The NQS/MPI function allows users to associate queues on multiple nodes to execute them as a single MPI request. The MPI master queue definition specifies the slave queues (on other nodes) where spawned tasks will be executed.

A user can specify resource limits of an MPI request in the same way as the limits are specified for normal batch requests. The NQS/MPI funciton uses the specified limits to reserve resources on the slave queues, ensuring that the multi-node MPI request can obtain the same resources they request on every slave queue.

This section explains how to use the NQS/MPI function. For detailed information on setting the function, see Section 5.12.

2.24.1 Outline of the NQS/MPI Function

A batch request calling mpisx(1) (a master request) is submitted to a master queue. A master queue is a batch queue prepared by the NQS manager for this function. When NQS starts a submitted master request, it creates batch requests to execute the distributed MPI processes (slave requests), and sends them to all the slave queues defined as members of the master queue.

As soon as the batch requests are sent, each slave request starts an MPI daemon to connect with the master request, and waits for the contact from the master request.

After checking that all the slave requests have RUNNING status, NQS executes the master requests.

When mpisx(1) is called, the master request starts the MPI daemon to connect with the slave requests, connects to some MPI daemons on the slave queues specified by the configuration file or the option of mpisx(1), and executes the MPI processes.

When one mpisx(1) command is completed, the MPI daemon of the master request terminates. At that time, the MPI daemon of the slave queues is still waiting for the next connection.

If two or more mpisx(1) commands are called in one MPI request, the master request repeats the process detailed above for each mpisx(1) command.

When the master request has been executed, NQS terminates the execution of all the slave requests. At that time, the stdout, stderr, and JOR files of the slave requests are moved to the node where the master request has been executed (a master node), added to the files of the master request, and output in the same way as the ordinary batch requests.

MPI requests can be controlled by issuing NQS commands such as qdel(1) on the master request.

For detailed information about mpisx(1), see the SUPER-UX MPI/SX User's Guide.

2.24.2 Notes on Using the NQS/MPI Function

2.24.2.1 CONFIRMING MPI QUEUES

Use the -x option of qstat(1) and the -f option of qstatq(1) to find out whether a batch queue is a master or slave queue.

For more information, see Chapter 3, User Commands.

2.24.2.2 SUBMITTING MPI REQUESTS

To execute an MPI request, submit a master request to a master queue using qsub(1) just as you would to submit a common batch. The qsub(1) options apply to both master and slave requests.

You can submit a master request to a normal batch queue. However, distributed MPI processes are not treated as slave requests, so these processes cannot be controlled.

Normal batch requests should not be submitted to master queues, since it causes unnecessary slave requests to be executed.

2.24.2.3 EXECUTING MPI REQUESTS

NQS assigns a request ID to each master request, just as it does for other batch requests. NQS also assigns each slave queue an ID. The ID has as its base the request ID of its master request, and a slave number is added in parentheses at the end. Each slave queue in one MPI request is assigned a unique slave number; numbering starts at one.

Before all slave requests have been sent and started, a master request enters the PRE-RUNNING state. PRE-RUNNING requests cannot be controlled although they can be deleted and confirmed.

When all slave requests have started executing, the master request enters the RUNNING state and starts its execution.

During execution, a master request is affected by the resource limitations and scheduling parameters set on a submitted master queue or node (master node), while a slave request is affected by the resource limitations and scheduling parameters of a submitted slave queue or node (slave node).

When the execution of a master queue terminates, a master request enters the POST-RUNNING state until the execution of each slave queue has terminated. POST-RUNNING requests cannot be controlled though they can be deleted and confirmed.

When all slave requests terminate, the master request enters the EXITING state and outputs the stdout, stderr, and JOR files.

2.24.2.4 CONFIRMING MPI REQUSETS

Use qstat(1) and qstatr(1) to confirm MPI requests.

$ qstatr 300.host1
=======================================
NQS (Rxx.xx) BATCH REQUESTS  HOST: host1
=======================================
REQUEST ID       NAME     OWNER    QUEUE   PRI NICE MEMORY  TIME  STT  JID   R
--------------- -------- -------- -------- ---- --- ------ ------ --- ------ -
300.host1       STDIN    user1    mpiM       20   0   1617  0.157 RUN    18  -
------------------------------------------------------------------------------
=======================================
NQS (Rxx.xx) BATCH REQUESTS  HOST: host2
=======================================
REQUEST ID       NAME     OWNER    QUEUE   PRI NICE MEMORY  TIME  STT  JID   R
--------------- -------- -------- -------- ---- --- ------ ------ --- ------ -
300.host1(1)     STDIN    user1    mpiS      30   0   1510  0.143 RUN     5  -
------------------------------------------------------------------------------

When you specify a request ID as shown in the above example, the specified master request and all the related slave requests that are in the RUNNING state appear. Slave requests are numbered with the master request ID followed by a unique slave number in parentheses.

When you omit a request ID, only a master request or slave request on the node to which the command is issued appears.

When you confirm the number of requests with qstatq(1), the specified master requests and all the related slave requests that are in the RUNNING, PRE-RUNNING, and POST-RUNNING states appear.

See Chapter 3, User Commands for more information.

2.24.2.5 DELETING MPI REQUESTS

To delete MPI requests, use the qdel(1) command just as you would on normal requests.

$ qdel -k 300.host1

In the example above, the master request 300.host1 and all related slave requests will be deleted.

NOTE

If a request is being sent via pipe, it cannot be deleted.

The -k option deletes PRE-RUNNING and POST-RUNNING requests.

Individual slave requests cannot be deleted.

2.24.2.6 ALTERING MPI REQUSETS ATTRIBUTES

To alter the attributes of MPI requests, use the qalter(1) command just as you would on normal requests.

Only the attributes of a master request can be altered. The attributes of slave requests will be unaffected by the command.

$ qalter -lf 2mb 300.host1

In the example above, the file size of the master request 300.host1 will be changed to 2 megabytes.

The attributes of PRE-RUNNING and POST-RUNNING requests cannot be altered.

2.24.2.7 MOVING MPI REQUESTS

To move MPI requests to another queue, use the qmove(1) command just as you would on normal requests.

$ qmove 300.host1 batch1

In the example above, the master request 300.host1 will be moved to the queue batch1.

Only QUEUED and WAITING master requests can be moved. Individual slave requests cannot be moved.

If an MPI request is moved to a normal batch queue, its MPI processes cannot be controlled during execution.

2.24.2.8 HOLDING/RELEASING MPI REQUESTS

To hold the MPI requests, use the qhold(1) command just as you would on normal requests.

$ qhold 300.host1

In the example above, the master request 300.host1 will be held.

Only QUEUED ,RUNNING, SUSPEND, and WAITING master requests can be held. PRE-RUNNING and POST-RUNNING requests cannot be held. Individual slave requests cannot be held.

When RUNNING or SUSPEND master request is specified, the master request and all related slave requests are held with checkpointing. At that time, however, if any related slave requests have already been terminated, qhold aborts.

To release the MPI requests, use the qrls(1) command just as you would on normal requests.

$ qrls 300.host1

If the request is checkpointed in holding, NQS restarts all master and related slave requests from checkpoint. At that time, unless all NQSs necessary to execute master and slave requests are activated, restart fails.

2.24.2.9 SUSPENDING/RESUMING MPI REQUESTS

To suspend the MPI requests, use the qspnd(1) command just as you would on normal requests.

$ qspnd 300.host1

In the example above, the master request 300.host1 and all related slave requests will be suspended.

Neither PRE-RUNNING nor POST-RUNNING requests can be suspended. Individual slave requests cannot be suspended.

To resume the MPI requests, use the qrsm(1) command just as you would on normal requests.

$ qrsm 300.host1
2.24.2.10 RERUNNING MPI REQUESTS

To rerun MPI requests, use the qrerun(1) command just as you would on normal requests.

$ qrerun 300.host1

In the example above, the master request 300.host1 and all related slave requests will be rerun.

Non-RUNNING requests, including PRE-RUNNING and POST-RUNNING requests, cannot be rerun. Individual slave requests cannot be rerun.

2.24.2.11 CHECKPOINTING/RESTARTING MPI REQUESTS

To checkpoint MPI requests, use the qchk(1) command just as you would on normal requests.

$ qchk 300.host1

In the above example , the master request 300.host1 and all related slave requests will be checkpointed.

Non-RUNNING requests, including PRE-RUNNING and POST-RUNNING requests, cannot be checkpointed. If any slave requests have already terminated, all related MPI requests cannot be checkpointed. Individual slave requests cannot be checkpointed.

To restart MPI requests from its checkpoint, use the qrst(1) command just as you would on normal requests.

$ qrst 300.host1

In the above example , the master request 300.host1 and all related slave requests will be restarted.

At that time, unless all NQSs necessary to execute master and slave requests are activated, restart fails.

When NQS including RUNNING or SUSPEND MPI requests (master or slave) is shutdown, NQS checkpoints the MPI requests and all related master and slave requests and stops them automatically. These MPI requests are automatically restarted after all necessary NQSs are activated.

Home

Contents Previous Chapter Next Chapter Index