| Page: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | |
| 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | |
| 46 | 47 | 48 | 49 |
Messages are the means of communication within a Stream. All input and output under STREAMS is based on messages. The objects passed between Streams components are pointers to messages. All messages in STREAMS use two data structures to refer to the data in the message. These data structures describe the type of the message and contain pointers to the data of the message, as well as other information. Messages are sent through a Stream by successive calls to the put routine of each queue in the Stream using the appropriate utility routines. Messages may be generated by a driver, a module, or by the Stream head.
The Open Systems Interconnection (OSI) Reference Model developed by the International Standards Organization (ISO) and International Telegraph and Telephone Consultative Committee (CCITT) provides an international standard seven-layer architecture for the development of communication protocols. UNIX System V adheres to this standard.
OSI and TCP/IP support the transport of expedited data (see note) for transmission of high-priority, emergency data. This data is useful for flow control, congestion control, routing, and various applications where immediate delivery of data is necessary.
Expedited data is mainly used for exceptional cases and transmission of control signals. Expedited data is processed immediately, ahead of normal data on the queue, but after STREAMS high-priority messages and after any expedited data already on the queue.
Expedited data flow control is unaffected by the flow control constraints of normal data transfer. Expedited data has its own flow control because it can easily run the system out of buffers if its flow is unrestricted.
Drivers and modules define separate high- and low-water marks for priority band data flow. (Water marks are defined for each queue and identify the upper and lower limit of bytes that can be contained on the queue.) The default water marks for priority band data and normal data are the same. The Stream head also ensures that incoming priority band data is not blocked by normal data already on the queue by associating a priority with the messages. This priority implies a certain ordering of the messages in the queue. See Section 5.3 for more information.
| Within the STREAMS mechanism and in this guide, expedited data is also referred to as priority band data. |
All messages are composed of one or more message blocks. A message block is a linked triplet of two structures and a variable length data buffer. The structures are a message block (msgb) and a data block (datab). The data buffer is a location in memory where the data of a message are stored.
See datab(D4) and msgb(D4) for fields that can be referenced in data and message blocks.
The field b_band determines where the message is placed when it is enqueued using the STREAMS utility routines. This field has no meaning for high priority messages and is set to zero for these messages. When a message is allocated via allocb, the b_band field will be initially set to zero. Modules and drivers may set this field if so desired.
| In System V Release 4.0, certain data structures fundamental to the kernel (for example device numbers, user IDs) were enlarged to enable them to hold more information. This feature was referred to as Expanded Fundamental Types (EFT). Since some of this information was passed in STREAMS messages, there was a binary compatibility issue for pre-System V Release 4 drivers and modules. #idefs was added to the kernel to provide a transition period for these drivers and modules to be recompiled, and to allow it to be built to use the pre-System V Release 4 short data types or the System V Release 4 long data types. Support for short data types will be dropped in some future releases. |
| Home |
|---|
The message block is used to link messages on a message queue, link message blocks to form a message, and manage the reading and writing of the associated data buffer. The b_rptr and b_wptr fields in the msgb structure locate the data currently contained in the buffer. As shown in Figure 5-1, the message block (mblk_t) points to the data block of the triplet. The data block contains the message type, buffer limits, and control variables. STREAMS allocates message buffer blocks of varying sizes. db_base and db_lim are the fixed beginning and end (+1) of the buffer.
A message consists of one or more linked message blocks. Multiple message blocks in a message can occur, for example, because of buffer size limitations, or as the result of processing that expands the message. When a message is composed of multiple message blocks, the type associated with the first message block determines the message type, regardless of the types of the attached message blocks.
| Home |
|---|

A message may occur singly, as when it is processed by a put procedure, or may be linked on the message queue in a queue, waiting to be processed by the service procedure. Message 2, as shown in Figure 5-1, links to Message 1.
Note that a data block in Message 1 is shared between Message 1 and another message. Multiple message blocks can point to the same data block to conserve storage and to avoid copying overhead. For example, the same data block, with associated buffer, may be referenced in two messages, from separate modules that use separate protocol levels. Figure 5-2 illustrates the concept, but data blocks typically are not shared by messages on the same queue. The buffer can be retransmitted, if required, because of errors or timeouts, from either protocol level without replicating the data. The dugmsg utility routine does data block sharing. See dupmsg(D3). STREAMS maintains a count of the message blocks sharing a data block in the db_ref field.
STREAMS provides utility routines to assist in managing messages and message queues, and to assist in other areas of module and driver development. A utility routine should always be used when operating on a message queue or accessing the message storage pool. If messages are manipulated on the queue without using the STREAMS utilities, the message ordering may become confused and lead to inconsistent results.
| If you do not use the STREAMS utilities as they are defined by the Driver-Kernel Interface, the system may panic or deadlock. Non-Driver-Kernel Interface drivers are not supported. |
| Home |
|---|
Most message types can be generated by modules and drivers. A few are reserved for the Stream head. The most commonly used messages are M_DATA, M_PROTO, and M_PCPROTO. These messages can also be passed between a process and the topmost module in a Stream with the same message boundary alignment maintained on both sides of the kernel. This allows a user process to function, to some degree, as a module above the Stream and maintain a service interface. M_PROTO and M_PCPROTO messages are intended to carry service interface information among modules, drivers, and user processes. Some message types can only be used within a Stream and cannot be sent or received from the user level.
Modules and drivers do not interact directly with any system calls except open(2) and close(2). The Stream head handles all message translation and passing between user processes and STREAMS components. Message transfer between processes and the Stream head can occur in different forms. For example, M_DATA and M_PROTO messages can be transferred in their direct form by the getmsg(2) and putmsg(2) system calls. Alternatively, write(2) causes one or more M_DATA messages to be created from the data buffer supplied in the call. M_DATA messages received at the Stream head are consumed by read(2) and copied into the user buffer. As another example, M_SIG causes the Stream head to send a signal to a process.
Any module or driver can send any message in either direction on a Stream. However, based on their intended use in STREAMS and their treatment by the Stream head, certain messages can be categorized as upstream, downstream, or bidirectional. M_DATA, M_PROTO, or M_PCPROTO messages, for example, can be sent in both directions. Other message types are intended to be sent upstream to be processed only by the Stream head. Messages intended to be sent downstream are silently discarded if received by the Stream head.
STREAMS enables modules to create messages and pass them to neighboring modules. However, the read(2) and write(2) system calls are not enough to enable a user process to generate and receive all such messages. First, read and write are byte-stream oriented with no concept of message boundaries. To support service interfaces, the message boundary of each service primitive must be preserved so that the beginning and end of each primitive can be located. Also, read and write offer only one buffer to the user for transmitting and receiving STREAMS messages. If control information and data were placed in a single buffer, the user would have to parse the contents of the buffer to separate the data from the control information.
The putmsg system call enables a user to create messages and send them downstream. The user supplies the contents of the control and data parts of the message in two separate buffers. The getmsg system call retrieves M_DATA or M_PROTO messages from a Stream and places the contents into two user buffers.
The format of putmsg is as follows:
int putmsg(int fd, struct strbuf *ctlptr, struct strbuf *dataptr, int flags)
where fd identifies the Stream to which the message is passed, ctlptr and dataptr identify the control and data parts of the message, and flags may be used to specify that a high-priority message (M_PCPROTO) should be sent. When a control part is present, setting flags to 0 generates an M_PROTO message. If flags is set to RS_HIPRI, an M_PCPROTO message is generated.
| The Stream head guarantees that the control part of a message generated by putmsg(2) is at least 64 bytes in length. This promotes reusability of the buffer. When the buffer is a reasonable size, modules and drivers may reuse the buffer for other headers. |
| Home |
|---|
The strbuf structure is used to describe the control and data parts of a message, and has the following format:
struct strbuf {
int maxlen; /*maximum buffer length */
int len; /* length of data */
char *buf; /* pointer to buffer */
}
where buf points to a buffer containing the data, len specifies the number of bytes of data in the buffer, and maxlen specifies the maximum number of bytes the given buffer can hold, and is only significant when retrieving information into the buffer using getmsg.
The getmsg system call retrieves M_DATA, M_PROTO, or M_PCPROTO messages available at the Stream head, and has the following format:
int getmsg(int fd, struct strbuf *ctlptr, struct strbuf
*dataptr, int *flagsp)
The arguments to getmsg are the same as those of putmsg except that the flagsp parameter is a pointer to an int.
putpmsg and getpmsg [see putmsg(2) and getmsg(2)] support multiple bands of data flow. They are analogous to the system calls putmsg and getmsg. The extra parameter is the priority band of the message.
putpmsg has the following interface:
int putpmsg(int fd, struct strbuf *ctlptr,
struct strbuf *dataptr, int band, int flags)
The parameter band is the priority band of the message to put downstream. The valid values for flags are MSG_HIPRI and MSG_BAND. MSG_BAND and MSG_HIPRI are mutually exclusive. MSG_HIPRI generates a high-priority message (M_PCPROTO) and band is ignored. MSG_BAND causes an M_PROTO or M_DATA message to be generated and sent down the priority band specified by band. The valid range for band is from 0 to 255, inclusive.
The call
putpmsg(fd, ctlptr,dataptr, 0, MSG_BAND);
is equivalent to the system call
putmsg(fd, ctlptr, dataptr, 0);
and the call
putpmsg(fd, ctlptr, dataptr, 0, MSG_HIPRI);
is equivalent to the system call
putmsg(fd, ctlptr, dataptr, RS_HIPRI);
If MSG_HIPRI is set and band is nonzero, putpmsg fails with EINVAL.
| Home |
|---|
getpmsg has the following format:
int getpmsg(int fd, struct strbuf *ctlptr, struct strbuf *dataptr,
int *bandp, int *flagsp)
where bandp is the priority band of the message. This system call retrieves a message from the Stream. If *flagsp is set to MSG_HIPRI, getpmsg attempts to retrieve a high-priority message. If MSG_BAND is set, getpmsg tries to retrieve a message from priority band *bandp or higher. If MSG_ANY is set, the first message on the Stream head read queue is retrieved. These three flags (MSG_HIPRI, MSG_BAND, and MSG_ANY) are mutually exclusive. On return, if a high priority message was retrieved, *flagsp is set to MSG_HIPRI and *bandp is set to 0. Otherwise, *flagsp is set to MSG_BAND and *bandp is set to the band of the message retrieved.
The call
int band = 0;
int flags = MSG_ANY;
getpmsg(fd, ctlptr,dataptr, &band, &flags);
is equivalent to
int flags = 0;
getmsg(fd, ctlptr, dataptr, &flags);
If MSG_HIPRI is set and *bandp is nonzero, getpmsg fails with EINVAL.
The M_SETOPTS message allows a driver or module to exercise control over certain Stream head processing. An M_SETOPTS can be sent upstream at any time. The Stream head responds to the message by altering the processing associated with certain system calls. The options to be modified are specified by the contents of the stroptions structure contained in the message.
Six Stream head characteristics can be modified. Four characteristics correspond to fields contained in queue (minimum/maximum) packet sizes and high/low water marks. The other two are discussed here.
| Home |
|---|
The value for read options (so_readopt) corresponds to two sets of three modes a user can set using the I_SRDOPT ioctl [see streamio(7)] call. The first set deals with data and message boundaries:
Byte-stream mode nearly models pipe data transfer. Message nondiscard mode nearly models a TTY in canonical mode.
The second set deals with the treatment of protocol messages by the read(2) system call:
The value for write offset (so_wroff) is a hook to allow more efficient data handling. How it works is that in every data message generated by a write(2) system call and in the first M_DATA block of the data portion of every message generated by a putmsg(2) call, the Stream head leaves so_wroff bytes of space at the beginning of the message block. Expressed as a C language construct:
bp->b_rptr = bp->b_datap->db_base+write offset
The write offset value must be smaller than the maximum STREAMS message size, STRMSGSZ. In certain cases (for example, if a buffer large enough to hold the offset+data is not currently available), the write offset might not be included in the block. To handle all possibilities, modules and drivers should not assume that the offset exists in a message, but should always check the message.
The intended use of write offset is to leave room for a module or a driver to place a protocol header before user data in the message instead of allocating and prepending a separate message.
| Home |
|---|
Message queues grow when the STREAMS scheduler is delayed from calling a service procedure because of system activity, or when the procedure is blocked by flow control. When called by the scheduler, the service procedure processes enqueued messages in a FIFO manner. However, expedited data support and certain conditions require that associated messages (for example, an M_ERROR) reach their Stream destination as rapidly as possible. This is done by associating priorities with the messages. These priorities imply a certain ordering of messages on the queue as shown in Figure 5-2. Each message has a priority band associated with it. Ordinary messages have a priority of zero. High-priority messages are high priority by nature of their message type. Their priority band is ignored. By convention, they are not affected by flow control. The putq utility routine places high-priority messages at the head of the message queue followed by priority band messages (expedited data) and ordinary messages.

Figure 5-2 Message Ordering on a Queue
When a message is queued, it is placed after the messages of the same priority already on the queue (that is, FIFO within their order of queuing). This affects the flow control parameters associated with the band of the same priority. Message priorities range from 0 (normal) to 255 (highest). This provides up to 256 bands of message flow within a Stream. Expedited data can be implemented with one extra band of flow (priority band 1) of data. This is shown in Figure 5-3.

High-priority messages are not subject to flow control. When they are queued by putq, the associated queue is always scheduled (in the same way as any queue; following all other queues currently scheduled). When the service procedure is called by the scheduler, the procedure uses getq to retrieve the first message on queue, which will be a high-priority message, if present. service procedures must be implemented to act on high-priority messages immediately. The above mechanisms - priority message queuing, absence of flow control, and immediate processing by a procedure - result in rapid transport of high-priority messages between the originating and destination components in the Stream.
| Home |
|---|
The following routines aid users in controlling each priority band of data flow:
flushband is discussed in Section 3.4.3.2, and
is discussed in Section 5.3.2.1.
The strqget routine allows modules and drivers to obtain information about a queue or particular band of the queue. This insulates the STREAMS data structures from the modules and drivers. The format of the routine is:
int strqget(queue_t *q,qfields_t what, unsigned char pri, long *valp)
The information is returned in the long referenced by valp. The fields that can be obtained are defined by Figure 5-4.
typedef enum qfields {
QHIWAT = 0, /* q_hiwat or qb_hiwat */
QLOWAT = 1, /* q_lowat or qb_lowat */
QMAXPSZ = 2, /* q_maxpsz */
QMINPSZ = 3, /* q_minpsz */
QCOUNT = 4, /* q_count or qb_count */
QFIRST = 5, /* q_first or qb_first */
QLAST = 6, /* q_last or qb_last */
QFLAG = 7, /* q_flag or qb_flag */
QBAD = 8
} qfields_t;
This routine returns 0 on success and an error number on failure.
The routine strqset allows modules and drivers to change information about a queue or particular band of the queue. This also insulates the STREAMS data structures from the modules and drivers. Its format is
int strqset(queue_t *q, qfields_t what, unsigned char pri, long val)
The updated information is provided by val. strqset returns 0 on success and an error number on failure. If the field is intended to be read-only, then the error EPERM is returned and the field is left unchanged. The following fields are currently read-only: QCOUNT, QFIRST, QLAST, and QFLAG.
The strqget and strqset routines must be bracketed by freezestr and unfreezestr.
| Home |
|---|
The ioctls I_FLUSHBAND, I_CKBAND, I_GETBAND, I_CANPUT, and I_ATMARK support multiple bands of data flow. The ioctl I_FLUSHBAND allows a user to flush a particular band of messages and is discussed in more detail in Section 3.4.3.2.
The ioctl I_CKBAND allows a user to check if a message of a given priority exists on the Stream head read queue. Its interface is
ioctl(fd, I_CKBAND, pri);
This returns 1 if a message of priority pri exists on the Stream head read queue and 0 if no message of priority pri exists. If an error occurs, -1 is returned. Note that pri should be of type int.
The ioctl I_GETBAND allows a user to check the priority of the first message on the Stream head read queue. The interface is
ioctl(fd, I_GETBAND, prip);
This results in the integer referenced by prip being set to the priority band of the message on the front of the Stream head read queue.
The ioctl I_CANPUT allows a user to check if a certain band is writable. Its interface is
ioctl(fd, I_CANPUT, pri);
The return value is 0 if the priority band pri is flow controlled, 1 if the band is writable, and -1 on error.
The field b_flag of the msgb structure can have a flag MSGMARK that allows a module or driver to mark a message. This is used to support TCP's (Transport Control Protocol) ability to show the user the last byte of out-of-band data. Once marked, a message sent to the Stream head causes the Stream head to remember the message. A user may check to see if the message on the front of its Stream head read queue is marked with the I_ATMARK ioctl. If a user is reading data from the Stream head and there are multiple messages on the read queue, and one of those messages is marked, read(2) terminates when it reaches the marked message and returns the data only up to that marked message. The rest of the data may be obtained with successive reads.
The ioctl I_ATMARK has the following format:
ioctl(fd, I_ATMARK, flag);
where flag may be either ANYMARK or LASTMARK. ANYMARK indicates that the user merely wants to check if the message is marked. LASTMARK indicates that the user wants to see if the message is the only one marked on the queue. If the test succeeds, 1 is returned. On failure, 0 is returned. If an error occurs, -1 is returned.
| Home |
|---|
service procedures, message queues, message priority, and basic flow control are all intertwined in STREAMS. A queue generally does not use its message queue if there is no service procedure in the queue. The function of a service procedure is to process messages on its queue. Message priority and flow control are associated with message queues.
The operation of a queue revolves around the queue structure. See queue(D4) for details.
Queues are always allocated in pairs (read and write); one queue pair per module, driver, or Stream head. A queue contains a linked list of messages. When a queue pair is allocated, the following fields are initialized by STREAMS:
Copying values from module_info allows them to be changed in the queue without modifying the streamtab and module_info values.
q_count and qb_count are used in flow control calculations and represent the number of bytes in the various bands on the queue.
Modules and drivers should use STREAMS utility routines.
Modules and drivers can change q_ptr. Modules and drivers can read but should not change q_qinfo, q_bandp, and q_nband.
Modules and drivers need locks for their private data structures (just as the STREAMS code protects the q_next pointer, for example).
Modules and drivers can change q_ptr, q_minpsz, q_maxpsz, q_hiwat, and q_lowat. Modules and drivers can read but should not change q_qinfo, q_next, q_bandp, and q_nband.
See queue(D4) for a list of flags that you can test.
The queue flow information for each band is contained in a qband structure. This structure is not visible to a module/driver, although some information in it may be read and written using strqget and strqset.
qband includes a field analogous to the queue's q_count field. However, the field only applies to the messages on the queue in the band of data flow represented by the corresponding qband structure. (In contrast, q_count only contains information regarding normal and high-priority messages.)
Each band has a separate high- and low-water mark. These are initially set to the queue's q_hiwat and q_lowat respectively. Modules and drivers may change these values if desired through the strqset function.
The qband structures are not preallocated per queue. They are allocated when a message with a priority greater than zero is placed on the queue by putq, putbq, or insq. Because band allocation can fail, these routines return 0 on failure and 1 on success. Once a qband structure is allocated, it remains associated with the queue until the queue is freed. strqset and strqget will cause qband allocation to occur.
| Home |
|---|
Use the STREAMS utility routines when manipulating the fields in the qband structure. Use the routines strqset and strqget to access band information.
put procedures are generally required in pushable modules. service procedures are optional. If the put routine enqueues messages, you need a corresponding service routine to handle the enqueued messages. If the put routine does not enqueue messages, you do not need the service routine.
The general processing flow when both procedures are present is as follows:
This processing continues until the queue is empty (getq does not return a message) or flow control blocks further processing. The service procedure returns to the caller.
| A service or put procedure must never sleep since it has no user context. It must always return to its caller. |
If no processing is required in the put procedure, the procedure does not have to be explicitly declared. However, putq can be placed in the qinit structure declaration for the appropriate queue side to queue the message for the service procedure, for example:
static struct qinit winit = { putq, modwsrv, ...... };
Typically, put procedures will, at a minimum, process high-priority messages to avoid queuing them. If M_FLUSH messages are queued, there is a danger that a message queued after the M_FLUSH will be discarded when the M_FLUSH is processed.
| Home |
|---|
The key attribute of a service procedure in the STREAMS architecture is delayed processing. When a service procedure is used in a module, the module developer is implying that there are other, more time-sensitive activities to be performed elsewhere in this Stream, in other Streams, or in the system in general. The presence of a service procedure is mandatory if the flow control mechanism is to be used by the queue.
The delay for STREAMS to call a service procedure varies with implementation and system activity.
If a module or driver wishes to recognize priority bands, the service procedure is written to the following algorithm:
.
.
while ((bp = getq(q)) != NULL) {
#ifdef MT
if (pcmsg(bp->b_datap->db_type)) {
#else
if (bp->b_datap->db_type >= QPCTL) {
#endif
putnext(q, bp);
#ifdef MT
} else if (bcanputnext(q, bp->b_band)) {
#else
} else if (bcanput(q->q_next, bp->b_band)) {
#endif
putnext(q, bp);
} else {
putbq(q, bp);
return;
}
}
.
.
| Home |
|---|
The STREAMS flow control mechanism is voluntary and operates between the two nearest queues in a Stream containing service procedures (see Figure 5-5). Messages are generally held on a queue only if a service procedure is present in the associated queue. Flow control is applied per band. Each band has its own high- and low-water marks.
Messages accumulate on a queue when the queue's service procedure processing does not keep pace with the message arrival rate, or when the procedure is blocked from placing its messages on the following Stream component by the flow control mechanism. Pushable modules contain independent upstream and downstream limits. The Stream head contains a preset upstream limit (which can be modified by a special message sent from downstream) and a driver may contain a downstream limit.

Flow control operates as follows:
| Home |
|---|
Modules and drivers need to observe the message priority. High-priority messages, determined by the type of the first block in the message, are not subject to flow control. They are processed immediately and forwarded, as appropriate.
For ordinary messages, flow control must be tested before any processing is performed.
The
utility determines if the forward path from the queue is blocked by flow control.
This is the general flow control processing of ordinary messages:
The canonical representation of this processing within a service procedure is as follows:
while (getq != NULL)
if (high priority message || no flow control)
process message
putnext
else
putbq
return
Expedited data have their own flow control with the same general processing as that of ordinary
messages.
provides modules and drivers with a way to test flow control in the given priority band. It returns 1 if a message of the given priority can be placed on the queue, returns 0 if the priority band is
flow controlled, and if the band does not yet exist on the queue in question, the routine returns 1.
| Home |
|---|
Banded data has separate flow control. In other words, bands 1 through 255 operate totally independently. Any band greater than or equal to band 1, when flow controlled, will stop band 0 data (normal data).
| A service procedure must process all messages on its queue unless flow control prevents this. |
A service procedure continues processing messages from its queue until getq returned NULL. When an ordinary message is enqueued by putq, putq causes the service procedure to be scheduled only if the queue was previously empty, and a previous getq call returns NULL (that is, the QWANTR flag is set). If there are messages on the queue, putq presumes the service procedure is blocked by flow control and the procedure is automatically rescheduled by STREAMS when the block is removed. If the service procedure cannot complete processing as a result of conditions other than flow control (for example, no buffers), it must ensure it will return later [for example, by use of the bufcall utility routine] or it must discard all messages on the queue. If this is not done, STREAMS never schedules the service procedure to be run unless the queue's put procedure enqueues a priority message with putq.
| High-priority messages are discarded only if there is already a high-priority message on the Stream head read queue. Only one high-priority message can be present on the Stream head read queue at any time. |
putbq replaces messages at the beginning of the appropriate section of the message queue by their priority. This might not be the same position at which the message was retrieved by the preceding getq. A later getq might return a different message.
putq only looks at the priority band in the first message. If a high-priority message is passed to putq with a nonzero b_band value, b_band is reset to 0 before placing the message on the queue. If the message is passed to putq with a b_band value that is greater than the number of qband structures associated with the queue, putq tries to allocate a new qband structure for each band up to and including the band of the message.
The above also applies to putbq and insq. If an attempt is made to insert a message out of order in a queue by insq, the message is not inserted and the routine fails.
putq will not schedule a queue if noenable(q) has been previously called for this queue. noenable instructs putq to enqueue the message when called by this queue, but not to schedule the service procedure. noenable does not prevent the queue from being scheduled by a flow control back-enable. The inverse of noenable is enableok(q).
| Home |
|---|
Driver upstream flow control is explained next as an example. Although device drivers typically discard input when they are unable to send it to a user process, STREAMS allows driver read-side flow control, possibly for handling temporary upstream blockages, through a driver read service procedure that is disabled during the driver open with noenable. If the driver input interrupt routine determines messages can be sent upstream, it sends the message with putnext. Otherwise, it calls putq to queue the message. The message waits on the message queue (possibly with queue length checked when new messages are enqueued by the interrupt routine) until the upstream queue becomes unblocked. When the blockage abates, STREAMS back-enables the driver read service procedure, which then sends the messages upstream using the mechanisms described previously. This is similar to looprsrv (see Section 8.3) where the service procedure is present only for flow control.
qenable, another flow control utility, allows a module or driver to cause one of its queues to be scheduled. qenable might also be used when a module or driver wants to delay message processing for some reason. An example is a buffer module that gathers messages in its message queue and forwards them as a single, larger message. This module uses noenable to inhibit its service procedure and queues messages with its put procedure until a certain byte count or "in queue" time has been reached. When either condition is met, the module calls qenable to cause its service procedure to run.
Another example is a communication line discipline module that implements end-to-end (that is, to a remote system) flow control. Outbound data is held on the write-side message queue until the read-side receives a transmit window from the remote end of the network.
| STREAMS routines are called at different priority levels. Interrupt routines are called at the interrupt priority of the interrupting device. service routines are called with interrupts enabled (hence, service routines for STREAMS drivers can be interrupted by their own interrupt routines). Write side put procedures may also be interrupted by their own interrupt routines. |
STREAMS can implement a service interface between any two components in a Stream, and between a user process and the topmost module in the Stream. A service interface is defined at the boundary between a service user and a service provider. A service interface is a set of primitives and the rules that define a service and the allowable state transitions that result as these primitives are passed between the user and the provider. These rules are typically represented by a state machine. In STREAMS, the service user and provider are implemented in a module, driver, or user process. The primitives are carried bidirectionally between a service user and provider in M_PROTO and M_PCPROTO messages.
PROTO messages (M_PROTO and M_PCPROTO) can be multiblock, with the second through last blocks of type M_DATA. The first block in a PROTO message contains the control part of the primitive in a form agreed on by the user and provider. The block is not intended to carry protocol headers. (Although its use is not recommended, upstream PROTO messages can have multiple PROTO blocks at the start of the message. getmsg(2) compacts the blocks into a single control part when sending to a user process.) The M_DATA block(s) contains any data part associated with the primitive. The data part may be processed in a module that receives it, or it may be sent to the next Stream component along with any data generated by the module. The contents of PROTO messages and their allowable sequences are determined by the service interface.
| Home |
|---|
PROTO messages can be sent bidirectionally (upstream and downstream) on a Stream and between a Stream and a user process. putmsg(2) and getmsg(2) system calls are analogous, respectively, to write(2) and read(2) except that the former allow both data and control parts to be (separately) passed, and they retain the message boundaries across the user-Stream interface. putmsg(2) and getmsg(2) separately copy the control part (M_PROTO or M_PCPROTO block) and data part (M_DATA blocks) between the Stream and user process.
An M_PCPROTO message is normally used to acknowledge primitives composed of other messages. M_PCPROTO ensures that the acknowledgment reaches the service user before any other message. If the service user is a user process, the Stream head only stores a single M_PCPROTO message, and discards subsequent M_PCPROTO messages until the first one is read with getmsg(2).
A STREAMS message format has been defined to simplify the design of service interfaces. System calls, getmsg(2) and putmsg(2), are available for sending messages downstream and receiving messages that are available at the Stream head.
This section describes the system calls getmsg and putmsg in the context of a service interface example. First, a brief overview of STREAMS service interfaces is presented.
A principal advantage of the STREAMS mechanism is its modularity. From the user level, kernel-resident modules can be dynamically interconnected to implement any reasonable processing sequence. This modularity reflects the layering characteristics of contemporary network architectures.
One benefit of modularity is the ability to interchange modules of like functions. For example, two distinct transport protocols, implemented as STREAMS modules, may provide a common set of services. An application or higher layer protocol that requires those services can use either module. This ability to substitute modules enables user programs and higher level protocols to be independent of the underlying protocols and physical communication media.
Each STREAMS module provides a set of processing functions, or services, and an interface to those services. The service interface of a module defines the interaction between that module and any neighboring modules, and is a necessary component for providing module substitution. By creating a well-defined service interface, applications and STREAMS modules can interact with any module that supports that interface, as shown in Figure 5-6.
| Home |
|---|

Figure 5-6 Protocol Substitution
By defining a service interface through which applications interact with a transport protocol, it is possible to substitute a different protocol below that service interface in a way completely transparent to the application. In this example, the same application can run over the Transmission Control Protocol (TCP) and the ISO transport protocol. Of course, the service interface must define a set of services common to both protocols.
The three components of any service interface are the service user, the service provider, and the service interface itself, as shown in Figure 5-7.

Figure 5-7 Service Interface
Typically, a user makes a request of a service provider using some well-defined service primitive. Responses and event indications are also passed from the provider to the user using service primitives.
| Home |
|---|
Each service interface primitive is a distinct STREAMS message that has two parts: a control part and a data part. The control part contains information that identifies the primitive and includes all necessary parameters. The data part contains user data associated with that primitive.
An example of a service interface primitive is a transport protocol connect request. This primitive requests the transport protocol service provider to establish a connection with another transport user. The parameters associated with this primitive may include a destination protocol address and specific protocol options to be associated with that connection. Some transport protocols also allow a user to send data with the connect request. A STREAMS message would be used to define this primitive. The control part would identify the primitive as a connect request and would include the protocol address and options. The data part would contain the associated user data.
The service interface library example presented in Figure 5-8 through Figure 5-13 includes four functions that enable a user to do the following:
Figure 5-8 shows the structure and constant definitions required by the library. These typically will reside
in a header file associated with the service interface.
/*
* Primitives initiated by the service user.
*/
#define BIND_REQ 1 /* bind request */
#define UNITDATA_REQ 2 /* unitdata request */
/*
* Primitives initiated by the service provider.
*/
#define OK_ACK 3 /* bind acknowledgment */
#define ERROR_ACK 4 /* error acknowledgment */
#define UNITDATA_IND 5 /* unitdata indication */
/*
* The following structure definitions define the format of the
* control part of the service interface message of the above
* primitives.
*/
struct bind_req { /* bind request */
int PRIM_type; /* always BIND_REQ */
int BIND_addr; /* addr to bind */
};
struct unitdata_req { /* unitdata request */
int PRIM_type; /* always UNITDATA_REQ */
int DEST_addr; /* destination addr */
};
struct ok_ack { /* positive acknowledgment */
int PRIM_type; /* always OK_ACK */
};
struct error_ack { /* error acknowledgment */
int PRIM_type; /* always ERROR_ACK */
int UNIX_error; /* UNIX system error code */
};
struct unitdata_ind { /* unitdata indication */
int PRIM_type; /* always UNITDATA_IND */
int SRC_addr; /* source addr */
};
/* union of all primitives */
union primitives {
int type;
struct bind_req bind_req;
struct unitdata_req unitdata_req;
struct ok_ack ok_ack;
struct error_ack error_ack;
struct unitdata_ind unitdata_ind;
};
/* header files needed by library */
#include <stropts.h>
#include <stdio.h>
#include <errno.h>
| Home |
|---|
Five primitives have been defined. The first two represent requests from the service user to the service provider. These are as follows:
| BIND_REQ | Asks the provider to bind a specified protocol address (that is, give it a name on the network). It requires an acknowledgment from the provider to verify that the contents of the request are syntactically correct. |
| UNITDATA_REQ | Asks the provider to send data to the specified destination address. It does not require an acknowledgment from the provider. |
The three other primitives represent acknowledgments of requests, or indications of incoming events, and are passed from the service provider to the service user. These are as follows:
| OK_ACK | Informs the user that a previous bind request was received successfully by the service provider. |
| ERROR_ACK | Informs the user that a nonfatal error was found in the previous bind request. It indicates that no action was taken with the primitive that caused the error. |
| UNITDATA_IND | Indicates that data destined for the user have arrived. |
The defined structures describe the contents of the control part of each service interface message passed between the service user and service provider. The first field of each control part defines the type of primitive being passed.
| Home |
|---|
The first routine presented, inter_open, opens the protocol driver device file specified by path and binds the protocol address contained in addr so that it may receive data. On success, the routine returns the file descriptor associated with the open Stream; on failure, it returns -1 and sets errno to indicate the appropriate error value.
inter_open(char *path, int oflags, int addr){
int fd;
struct bind_req bind_req;
struct strbuf ctlbuf;
union primitives rcvbuf;
struct error_ack *error_ack;
int flags;
if ((fd = open(path, oflags)) < 0)
return(-1);
/* send bind request msg down stream */
bind_req.PRIM_type = BIND_REQ;
bind_req.BIND_addr = addr;
ctlbuf.len = sizeof(struct bind_req);
ctlbuf.buf = (char *)&bind_req;
if (putmsg(fd, &ctlbuf, NULL, 0) < 0) {
close(fd);
return(-1);
}
After opening the protocol driver, inter_open packages a bind request message to send downstream. putmsg is called to send the request to the service provider. The bind request message contains a control part that holds a bind_req structure, but it has no data part. ctlbuf is a structure of type strbuf, and is initialized with the primitive type and address. Notice that the maxlen field of ctlbuf is not set before calling putmsg, because putmsg ignores this field. The dataptr argument to putmsg is set to NULL to indicate that the message contains no data part. Also, the flags argument is 0, which specifies that the message is not a high-priority message.
| Home |
|---|
After inter_open sends the bind request, it must wait for an acknowledgment from the service
provider, as shown in Figure 5-10.
/* wait for ack of request */
ctlbuf.maxlen = sizeof(union primitives);
ctlbuf.len = 0;
ctlbuf.buf = (char *)&rcvbuf;
flags = RS_HIPRI;
if (getmsg(fd, &ctlbuf, NULL, &flags) < 0) {
close(fd);
return(-1);
}
/* did we get enough to determine type */
if (ctlbuf.len < sizeof(int)) {
close(fd);
errno = EPROTO;
return(-1);
}
/* switch on type (first long in rcvbuf) */
switch(rcvbuf.type) {
default:
close(fd);
errno = EPROTO;
return(-1);
case OK_ACK:
return(fd);
case ERROR_ACK:
if (ctlbuf.len < sizeof(struct error_ack)) {
close(fd);
errno = EPROTO;
return(-1);
}
error_ack = (struct error_ack *)&rcvbuf;
close(fd);
errno = error_ack->UNIX_error;
return(-1);
}
}
getmsg is called to retrieve the acknowledgment of the bind request. The acknowledgment message consists of a control part that contains either an ok_ack or error_ack structure, and no data part.
The acknowledgment primitives are defined as priority messages. Messages are queued in a FIFO sequence within their priority at the Stream head; high-priority messages are placed at the front of the Stream head queue followed by priority band messages and ordinary messages. The STREAMS mechanism allows only one high-priority message per Stream at the Stream head at one time; any further high-priority messages are freed until the message at the Stream head is processed. (Only one high priority message can be present on the Stream head read queue at any time.) High-priority messages are particularly suitable for acknowledging service requests when the acknowledgment should be placed ahead of any other messages at the Stream head.
| Home |
|---|
Before calling getmsg, this routine must initialize the strbuf structure for the control part. buf should point to a buffer large enough to hold the expected control part, and maxlen must be set to show the maximum number of bytes this buffer can hold.
Because neither acknowledgment primitive contains a data part, the dataptr argument to getmsg is set to NULL. The flagsp argument points to an integer containing the value RS_HIPRI. This flag indicates that getmsg should wait for a STREAMS high-priority message before returning. It is set because we want to catch the acknowledgment primitives that are priority messages. Otherwise, if the flag is zero, the first message is taken. With RS_HIPRI set, even if a normal message is available, getmsg will block until a high-priority message arrives.
On return from getmsg, the len field is checked to ensure that the control part of the retrieved message is an appropriate size. The example then checks the primitive type and takes appropriate actions. An OK_ACK indicates a successful bind operation, and inter_open returns the file descriptor of the open Stream. An ERROR_ACK indicates a bind failure, and errno is set to identify the problem with the request.
The next routine in the service interface library example is
inter_close, which closes the Stream to the service provider.
inter_close(int fd)
{
close(fd);
}
The routine simply closes the given file descriptor. This routine causes the protocol driver to free any resources associated with that Stream. For example, the driver may unbind the protocol address that had previously been bound to that Stream, thereby freeing that address for use by some other service user.
| Home |
|---|
The third routine, inter_snd, passes data to the service provider for transmission to the user at the address specified in addr. The data to be transmitted are contained in the buffer pointed to by buf and contains len bytes. On successful completion, this routine returns the number of bytes of data passed to the service provider; on failure, it returns -1 and sets errno to an appropriate error value.
inter_snd(int fd, char *buf, int len, int addr)
{
struct strbuf ctlbuf;
struct strbuf databuf;
struct unitdata_req unitdata_req;
unitdata_req.PRIM_type = UNITDATA_REQ;
unitdata_req.DEST_addr = addr;
ctlbuf.len = sizeof(struct unitdata_req);
ctlbuf.buf = (char *)&unitdata_req;
databuf.len = len;
databuf.buf = buf;
if (putmsg(fd, &ctlbuf, &databuf, 0) < 0) {
errno = EIO;
return(-1);
}
return(len);
}
In this example, the data request primitive is packaged with both a control part and a data part. The control part contains a unitdata_req structure that identifies the primitive type and the destination address of the data. The data to be transmitted is placed in the data part of the request message.
Unlike the bind request, the data request primitive requires no acknowledgment from the service provider. In the example, this choice was made to minimize the overhead during data transfer. If the putmsg call succeeds, this routine assumes all is well and returns the number of bytes passed to the service provider.
| Home |
|---|
The final routine in this example,
inter_rcv, retrieves the next data. buf
points to a buffer where the data should be stored,
len shows the size of that buffer, and addr
points to a long integer where the source
address of the data will be placed. On successful completion,
inter_rcv returns the number of bytes in the
retrieved data; on failure, it returns -1 and sets the appropriate UNIX System error value.
inter_rcv(int fd, char *buf, int len, int *addr)
{
struct strbuf ctlbuf;
struct strbuf databuf;
struct unitdata_ind unitdata_ind;
int retval;
int flagsp;
ctlbuf.maxlen = sizeof(struct unitdata_ind);
ctlbuf.len = 0;
ctlbuf.buf = (char *)&unitdata_ind;
databuf.maxlen = len;
databuf.len = 0;
databuf.buf = buf;
flagsp = 0;
if ((retval = getmsg(fd, &ctlbuf, &databuf, &flagsp)) < 0) {
errno = EIO;
return(-1);
}
if (retval) {
errno = EIO;
return(-1);
}
if (unitdata_ind.PRIM_type != UNITDATA_IND) {
errno = EPROTO;
return(-1);
}
*addr = unitdata_ind.SRC_addr;
return(databuf.len);
}
getmsg is called to retrieve the data indication primitive, where that primitive contains both a control and data part. The control part consists of a unitdata_ind structure that identifies the primitive type and the source address of the data sender. The data part contains the data itself.
In ctlbuf, buf must point to a buffer where the control information will be stored, and maxlen must be set to indicate the maximum size of that buffer. Similar initialization is done for databuf.
The integer pointed at by flagsp in the getmsg call is set to zero, indicating that the next message should be retrieved from the Stream head, regardless of its priority. Data will arrive in normal priority messages. If no message currently exists at the Stream head, getmsg will block until a message arrives.
The user's control and data buffers should be large enough to hold any incoming data. If both buffers are large enough, getmsg processes the data indication and returns 0, indicating that a full message was retrieved successfully. However, if either buffer is not large enough, getmsg only retrieves the part of the message that fits into each user buffer. The remainder of the message is saved for later retrieval (if in message non-discard mode), and a positive, nonzero value is returned to the user. MORECTL indicates that more control information is waiting for retrieval, MOREDATA indicates that more data is waiting for retrieval, and (MORECTL | MOREDATA) indicates that data from both parts of the message remains. In the example, if the user buffers are not large enough (that is, getmsg returns a positive, nonzero value), the function will set errno to EIO and fail.
The type of primitive returned by getmsg is checked to make sure it is a data indication (UNITDATA_IND in the example). The source address is then set and the number of bytes of data is returned.
The example presented is a simplified service interface. The state transition rules for such an interface were not presented for the sake of brevity. The intent was to show typical uses of the putmsg and getmsg system calls. See putmsg(2) and getmsg(2) for further details. For simplicity, this example did not also consider expedited data.
| Home |
|---|
Figure 5-14 and Figure 5-15 show an example of part of a module that illustrates the concept of a service interface. The module implements a simple service interface and mirrors the service interface library example given earlier. The following rules pertain to service interfaces:
In addition, modules and drivers must observe the rule that high-priority messages are not subject to flow control, and forward them accordingly.
The service interface primitives are defined in the declarations as shown in Figure 5-14.
#include <sys/types.h>
#ifdef MT
#include "sys/mplock.h"
#endif
#include <sys/param.h>
#include <sys/stream.h>
#include <sys/errno.h>
/* Primitives initiated by the service user */
#define BIND_REQ 1 /* bind request */
#define UNITDATA_REQ 2 /* unitdata request */
/* Primitives initiated by the service provider */
#define OK_ACK 3 /* bind acknowledgment */
#define ERROR_ACK 4 /* error acknowledgment */
#define UNITDATA_IND 5 /* unitdata indication */
/*
* The following structures define the format of the
* stream message block of the above primitives.
*/
struct bind_req { /* bind request */
int PRIM_type; /* always BIND_REQ */
int BIND_addr; /* addr to bind */
};
struct unitdata_req { /* unitdata request */
int PRIM_type; /* always UNITDATA_REQ */
int DEST_addr; /* dest addr */
};
struct ok_ack { /* ok acknowledgment */
int PRIM_type; /* always OK_ACK */
};
struct error_ack { /* error acknowledgment */
int PRIM_type; /* always ERROR_ACK */
int UNIX_error; /* UNIX system error code */
};
struct unitdata_ind { /* unitdata indication */
int PRIM_type; /* always UNITDATA_IND */
int SRC_addr; /* source addr */
};
union primitives { /* union of all primitives */
int type;
struct bind_req bind_req;
struct unitdata_req unitdata_req;
struct ok_ack ok_ack;
struct error_ack error_ack;
struct unitdata_ind unitdata_ind;
};
struct dgproto { /* structure per minor device */
short state; /* current provider state */
int addr; /* net address */
};
/* Provider states */
#define IDLE 0
#define BOUND 1
| Home |
|---|
In general, the M_PROTO or M_PCPROTO block is described by a data structure containing the service interface information. In this example, union primitives is that structure.
Two commands are recognized by the module:
| BIND_REQ | Give this Stream a protocol address (that is, give it a name on the network). After a BIND_REQ is completed, data from other senders will find their way through the network to this particular Stream. |
| UNITDATA_REQ | Send data to the specified address. |
Three messages are generated:
| OK_ACK | A positive acknowledgment (ack) of BIND_REQ. |
| ERROR_ACK | A negative acknowledgment (nak) of BIND_REQ. |
| UNITDATA_IND | Data from the network have been received (this code is not shown). |
The acknowledgment of a BIND_REQ informs the user that the request was syntactically correct (or incorrect if ERROR_ACK). The receipt of a BIND_REQ is acknowledged with an M_PCPROTO to ensure that the acknowledgment reaches the user before any other message. For example, a UNITDATA_IND could come through before the bind has completed, and the user would get confused.
The driver uses a per-minor device data structure, dgproto, which contains the following:
| state | Current state of the service provider IDLE or BOUND |
| addr | Network address that has been bound to this Stream |
It is assumed (though not shown) that the module open procedure sets the write queue q_ptr to point at the appropriate private data structure.
| Home |
|---|
The write put procedure is shown in Figure 5-15.
int protowput(queue_t *q, mblk_t *mp)
{
union primitives *proto;
struct dgproto *dgproto;
int err;
#ifdef MT
extern mask_t lck[];
#endif
dgproto = (struct dgproto *) q->q_ptr;
switch (mp->b_datap->db_type) {
default:
/* don't understand it */
mp->b_datap->db_type = M_ERROR;
mp->b_rptr = mp->b_wptr = mp->b_datap->db_base;
*mp->b_wptr++ = EPROTO;
qreply(q, mp);
break;
case M_FLUSH:
/* standard flush handling goes here ... */
break;
case M_PROTO:
/* Protocol message -> user request */
| Home |
|---|
proto = (union primitives *) mp->b_rptr;
switch (proto->type) {
default:
mp->b_datap->db_type = M_ERROR;
mp->b_rptr = mp->b_wptr = mp->b_datap->db_base;
*mp->b_wptr++ = EPROTO;
qreply(q, mp);
return;
case BIND_REQ:
#ifdef MT
mask_lock(lck);
#endif
if (dgproto->state != IDLE) {
err = EINVAL;
goto error_ack;
}
if (mp->b_wptr - mp->b_rptr != sizeof(struct bind_req)) {
err = EINVAL;
goto error_ack;
}
if (err = chkaddr(proto->bind_req.BIND_addr))
goto error_ack;
dgproto->state = BOUND;
dgproto->addr = proto->bind_req.BIND_addr;
#ifdef MT
mask_unlock(lck);
#endif
mp->b_datap->db_type = M_PCPROTO;
proto->type = OK_ACK;
mp->b_wptr = mp->b_rptr + sizeof(struct ok_ack);
qreply(q, mp);
break;
error_ack:
#ifdef MT
mask_unlock(lck);
#endif
mp->b_datap->db_type = M_PCPROTO;
proto->type = ERROR_ACK;
proto->error_ack.UNIX_error = err;
mp->b_wptr = mp->b_rptr + sizeof(struct error_ack);
qreply(q, mp);
break;
case UNITDATA_REQ:
#ifdef MT
mask_lock(lck);
#endif
if (dgproto->state != BOUND)
goto bad;
| Home |
|---|
if (mp->b_wptr - mp->b_rptr != sizeof(struct unitdata_req))
goto bad;
if (err = chkaddr(proto->unitdata_req.DEST_addr))
goto bad;
/* start device or mux output ... */
#ifdef MT
mask_unlock(lck);
#endif
putq(q, mp);
break;
bad:
#ifdef MT
mask_unlock(lck);
#endif
freemsg(mp);
break;
}
}
The write put procedure switches on the message type. The only types accepted are M_FLUSH and M_PROTO. For M_FLUSH messages, the driver performs the canonical flush handling (not shown). For M_PROTO messages, the driver assumes the message block contains a union primitive and switches on the type field. Two types are understood: BIND_REQ and UNITDATA_REQ.
For a BIND_REQ, the current state is checked; it must be IDLE. Next, the message size is checked. If it is the correct size, the passed-in address is verified for legality by calling chkaddr. If everything checks, the incoming message is converted into an OK_ACK and sent upstream. If there is any error, the incoming message is converted into an ERROR_ACK and sent upstream.
For UNITDATA_REQ, the state is also checked; it must be BOUND. As above, the message size and destination address are checked. If there is any error, the message is simply discarded. If all is well, the message is put on the queue, and the lower half of the driver is started.
If the write put procedure receives a message type that it does not understand, either a bad b_datap>db_type or bad proto->type, the message is converted into an M_ERROR message and sent upstream.
The generation of UNITDATA_IND messages (not shown in the example) normally occurs in the device interrupt if this is a hardware driver or in the lower read put procedure if this is a multiplexer. The algorithm is simple: The data part of the message is prepended by an M_PROTO message block that contains a unitdata_ind structure and sent upstream.
| Home |
|---|
The allocb utility routine allocates a message and the space to hold the data for the message. allocb returns a pointer to a message block containing a data buffer of at least the size requested, providing there is enough memory available. It returns null on failure. Note that allocb always returns a message of type M_DATA. The type may then be changed if required. b_rptr and b_wptr are set to db_base (see msgb and datab), which is the start of the memory location for the data.
allocb may return a buffer larger than the size requested. If allocb indicates buffers are not available [allocb fails], the put/service procedure may not call sleep to wait for a buffer to become available. Instead, the bufcall utility can defer processing in the module or the driver until a buffer becomes available.
If message space allocation is done by the put procedure and allocb fails, the message is usually discarded. If the allocation fails in the service routine, the message is returned to the queue. bufcall is called to enable to the service routine when a message buffer becomes available, and the service routine returns.
The freeb utility routine releases (deallocates) the message block descriptor and the corresponding data block, if the reference count (see datab structure) is equal to 1. If the reference counter exceeds 1, the data block is not released.
The freemsg utility routine releases all message blocks in a message. It uses freeb to free all message blocks and corresponding data blocks.
In Figure 5-16, allocb is used by the
bappend subroutine that appends a character to a message block.
/*
* Append a character to a message block.
* If (*bpp) is null, it will allocate a new block
* Returns 0 when the message block is full, 1 otherwise
*/
#define MODBLKSZ 128 /* size of message blocks */
static bappend(mblk_t **bpp, int ch)
{
mblk_t *bp;
if ((bp = *bpp) != NULL) {
if (bp->b_wptr >= bp->b_datap->db_lim)
return 0;
} else if ((*bpp = bp = allocb(MODBLKSZ, BPRI_MED)) == NULL)
return 1;
*bp->b_wptr++ = ch;
return 1;
}
bappend receives a pointer to a message block pointer and a character as arguments. If a message block is supplied (*bpp != NULL), then bappend checks if there is room for more data in the block. If not, it fails. If there is no message block, a block of at least MODBLKSZ is allocated through allocb.
If the allocb fails, bappend returns successfully, silently discarding the character. This may or may not be acceptable. For TTY-type devices, it is generally accepted. If the original message block is not full or the allocb is successful, bappend stores the character in the block.
Figure 5-17 showns the subroutine
modwput,which processes all the message blocks in any downstream
data (type M_DATA) messages. freemsg deallocates messages.
/* Write side put procedure */
static modwput( queue_t *q, mblk_t *mp)
{
switch (mp->b_datap->db_type) {
default:
putnext(q, mp); /* Don't do these, pass them along */
break;
case M_DATA: {
register mblk_t *bp;
struct mblk_t *nmp = NULL, *nbp = NULL;
for (bp = mp; bp != NULL; bp = bp->b_cont) {
while (bp->b_rptr < bp->b_wptr) {
if (*bp->b_rptr == `\n')
if (!bappend(&nbp, `\r'))
goto newblk;
if (!bappend(&nbp, *bp->b_rptr))
goto newblk;
bp->b_rptr++;
continue;
newblk:
if (nmp == NULL)
nmp = nbp;
else linkb(nmp, nbp); /* link message block
to tail of nmp */
nbp = NULL;
}
}
if (nmp == NULL)
nmp = nbp;
else linkb(nmp, nbp);
freemsg(mp); /* de-allocate message */
if (nmp)
putnext(q, nmp);
break;
}
}
}
| Home |
|---|
In Figure 5-17, data messages are scanned and filtered. modwput copies the original message into a new block(s), modifying as it copies; nbp points to the current new message block; and nmp points to the new message being formed as multiple M_DATA message blocks. The outer for loop goes through each message block of the original message, while the inner while loop goes through each byte. bappend is used to add characters to the current or new block; if it fails, the current new block is full. If nmp is NULL, nmp is pointed at the new block. If nmp is not NULL, the new block is linked to the end of nmp with the linkb utility.
At the end of the loops, the final new block is linked to nmp. The original message (all message blocks) is returned to the pool by freemsg. If a new message exists, it is sent downstream.
The bufcall utility can recover from an allocb failure. The call syntax is as follows:
bufcall(int
size, int
pri, int (*func)(),
long long arg)
bufcall calls (*func)(arg) when a buffer of size bytes is available. When func is called, it has no user context and must return without sleeping. Also, because of interrupt processing,
there is no guarantee that when func is called, a buffer will actually be available (someone else may steal it).
| Home |
|---|
On success, bufcall returns a nonzero identifier that can be used as a parameter to unbufcall to cancel the request later. On failure, 0 is returned and the requested function will never be called.
| Be sure to avoid deadlock when holding resources while waiting for bufcall to call (*func)(arg). Use bufcall sparingly. |
Two examples, Figure 5-18 and Figure 5-19, are provided. Figure 5-18 is a device receive interrupt handler.
#include <sys/types.h>
#include <sys/param.h>
#include <sys/stream.h>
dev_rintr(int dev)
{
/* process incoming message ... */
/* allocate new buffer for device */
dev_re_load(dev);
}
/*
* Reload device with a new receive buffer
*/
dev_re_load(int dev)
{
mblk_t *bp;
if ((bp = allocb(DEVBLKSZ, BPRI_MED)) == NULL) {
cmn_err(CE_WARN, "dev: allocb failure (size %d) \n", DEVBLKSZ);
/*
* Allocation failed. Use bufcall to
* schedule a call to ourselves.
*/
(void) bufcall(DEVBLKSZ, BPRI_MED, dev_re_load, dev);
return;
}
/* pass buffer to device ... */
}
dev_rintr is called when the device has posted a receive interrupt. The code retrieves the data from the device (not shown). dev_rintr must then give the device another buffer to fill by a call to dev_re_load, which calls allocb. If allocb fails, dev_re_load uses bufcall to call itself when STREAMS determines a buffer is available.
| Because bufcall may fail, there is still a chance that the device may hang. A better strategy, if bufcall fails, is to discard the current input message and resubmit that buffer to the device. Losing input data is generally better than hanging. |
| Home |
|---|
Figure 5-19 is a write service procedure,
mod_wsrv, which needs to prepend each output message with
a header. mod_wsrv illustrates a case for potential deadlock.
static int mod_wsrv(queue_t *q)
{
int qenable;
mblk_t *mp, *bp;
while (mp = getq(q)) {
/* check for priority messages and canput ... */
/* Allocate a header to prepend to the message. If
* the allocb fails, use bufcall to reschedule.
*/
if ((bp = allocb(HDRSZ, BPRI_MED)) == NULL) {
if (!bufcall(HDRSZ, BPRI_MED, qenable, q)) {
timeout(qenable, q, HZ*2);
}
/* Put the message back and exit, we will be re-enabled later */
putbq(q, mp);
return;
}
/* process message .... */
}
}
However, if allocb fails,
mod_wsrv wants to recover without loss of data and calls
bufcall. In this example, the routine passed to
bufcall is qenable. When a buffer is available, the service procedure
is automatically reenabled. Before exiting, the current message is put back on the queue. This example deals
with bufcall failure by resorting to the
operating system utility routine. This routine schedules
the given function to be run with the given argument in the given number of clock ticks (there are HZ
clock ticks per second). In this example, if bufcall
fails, the system runs qenable after two seconds
have passed.
The STREAMS buffer allocation mechanism enables the allocation of message and data blocks to point directly to a client-supplied (non-STREAMS) buffer. Message and data blocks allocated this way are indistinguishable (for the most part) from the normal data blocks. The client-supplied buffers are processed as if they were normal STREAMS data buffers.
| Home |
|---|
Drivers may not only attach non-STREAMS data buffers, but also free them. This is done as follows:
| The free routine must not reference any dynamically allocated data structures that become freed when the driver is closed, because messages can exist in a Stream after the driver is closed. This can occur, for example, when a Stream is closed down. The driver close routine is called and the driver's private data structure may be deallocated. If the driver sends a message created by esballoc upstream, that message may still be on the Stream head read queue. The Stream head read queue is then flushed, freeing the message and calling the driver's free routine after the driver has been closed. |
struct free_rtn {
void (*free_func) (); /* driver dependent free routine */
char *free_arg; /* argument for free_rtn */
};
typedef struct free_rtn frtn_t;
The structure has two fields: a pointer to a function and a location for any argument passed to the
function. Instead of defining a specific number of
arguments, free_arg is defined as a char
*. Drivers can then pass pointers to structures if more than one argument is needed.
The STREAMS utility routine,
esballoc, provides a common interface for allocating and
initializing data blocks. It makes the allocation as transparent to the driver as possible and provides a way to
change the fields of the data block, since modification should only be performed by STREAMS. The driver
calls this routine when it wants to attach its own data buffer to a newly allocated message and data block. If
the routine successfully completes the allocation and assigns the buffer, it returns a pointer to the
message block. The driver is responsible for supplying the arguments to
esballoc, namely, a pointer to its data buffer, the size of the buffer, the priority of the data block, and a pointer to the
free_rtn structure. All arguments should be
non-NULL.
All the STREAMS messages are defined in sys/stream.h. The messages differ in their intended purpose and their queuing priority. The contents of certain message types can be transferred between a process and a Stream by system calls.
| Home |
|---|
Below, the message types are briefly described and classified according to their queuing priority.
Ordinary Messages (also called normal messages):
M_BREAK | Request to a Stream driver to send a "break" |
M_CTL | Control/status request used for intermodule communication |
M_DATA | User data message for I/O system calls |
M_DELAY | Request a real-time delay on output |
M_IOCTL | Control/status request generated by a Stream head |
M_PASSFP | File pointer passing message |
M_PROTO | Protocol control information |
M_RSE | Reserved for internal use |
M_SETOPTS | Set options at the Stream head, sent upstream |
M_SIG | Signal sent from a module/driver to a user |
High Priority Messages:
M_COPYIN | Copy in data for transparent ioctls, sent upstream |
M_COPYOUT | Copy out data for transparent ioctls, sent upstream |
M_ERROR | Report downstream error condition, sent upstream |
M_FLUSH | Flush module queue |
M_HANGUP | Set a Stream head hangup condition, sent upstream |
M_IOCACK | Positive ioctl(2) acknowledgment |
M_IODATA | Data for transparent ioctls, sent downstream |
M_IOCNAK | Negative ioctl(2) acknowledgment |
M_PCPROTO | Protocol control information |
M_PCRSE | Reserved for internal use |
M_PCSIG | Signal sent from a module/driver to a user |
M_READ | Read notification, sent downstream |
M_START | Restart stopped device output |
M_STARTI | Restart stopped device input |
M_STOP | Suspend output |
M_STOPI | Suspend input |
| Transparent ioctls support applications developed before the introduction of STREAMS. |
Defined STREAMS message types differ in their intended purposes, their treatment at the Stream head, and in their message queuing priority.
STREAMS does not prevent a module or driver from generating any message type and sending it in any direction on the Stream. However, established processing and direction rules should be observed. Stream head processing according to message type is fixed, although certain parameters can be altered.
| Home |
|---|
The message types are classified according to their message queuing priority. Ordinary messages are described first, with high priority messages following. In certain cases, two message types may perform similar functions, differing only in priority. The use of the word "module" generally implies "module or driver."
Ordinary messages are also called normal or nonpriority messages. Ordinary messages are subject to flow control whereas high priority messages are not.
Sent to a driver to request that BREAK be transmitted on whatever media the driver is controlling.
The message format is not defined by STREAMS and its use is developer dependent. This message may be considered a special case of an M_CTL message. An M_BREAK message cannot be generated by a user-level process and is always discarded if passed to the Stream head.
Generated by modules that want to send information to a particular module or type of module. M_CTL messages are typically used for intermodule communication, as when adjacent STREAMS protocol modules negotiate the terms of their interface. An M_CTL message cannot be generated by a user-level process and is always discarded if passed to the Stream head.
Intended to contain ordinary data. Messages allocated by the allocb routine are type M_DATA by default. M_DATA messages are generally sent bidirectionally on a Stream and their contents can be passed between a process and the Stream head. In the getmsg(2) and putmsg(2) system calls, the contents of M_DATA message blocks are referred to as the data part. Messages composed of multiple message blocks will typically have M_DATA as the message type for all message blocks following the first.
Sent to a media driver to request a real-time delay on output. The data buffer associated with this message is expected to contain an integer to show the number of machine ticks of delay desired. M_DELAY messages are typically used to prevent transmitted data from exceeding the buffering capacity of slower terminals.
The message format is not defined by STREAMS and its use is developer dependent. Not all media drivers may understand this message. This message may be considered a special case of an M_CTL message. An M_DELAY message cannot be generated by a user-level process and is always discarded if passed to the Stream head.
| Home |
|---|
Generated by the Stream head in response to I_STR, I_LINK, I_UNLINK, I_PLINK, and I_PUNLINK [ioctl(2) STREAMS system calls, see streamio(7)], and in response to ioctl calls that contain a command argument value not defined in streamio(7). When one of these ioctls is received from a user process, the Stream head uses values supplied in the call and values from the process to create an M_IOCTL message containing them, and sends the message downstream. M_IOCTL messages are intended to perform the general ioctl functions of character device drivers.
For an I_STR ioctl, the user values are supplied in a structure of the following form, provided as
an
argument to the ioctl call [see
I_STR in streamio(7)]:
struct strioctl
{
int ic_cmd; /* downstream request */
int ic_timout; /* ACK/NAK timeout */
int ic_len; /* length of data arg */
char *ic_dp; /* ptr to data arg */
};
where ic_cmd is the request (or command) defined by a downstream module or driver, ic_timout is the time the Stream head will wait for acknowledgment to the M_IOCTL message before timing out, and ic_dp is a pointer to an optional data buffer. On input, ic_len contains the length of the data in the buffer passed in and, on return from the call, it contains the length of the data, if any, being returned to the user in the same buffer.
The M_IOCTL message format is one
M_IOCTL message block followed by zero or more
M_DATA message blocks. STREAMS constructs an
M_IOCTL message block by placing an iocblk
structure, defined in sys/stream.h, in its data buffer:
struct iocblk
{
int ioc_cmd; /* ioctl command type */
int *ioc_cr; /* full credentials :SX no use */
uint ioc_id; /* ioctl identifier */
uint ioc_count; /* byte count for ioctl data */
int ioc_error; /* error code for M_IOCACK or M_IOCNAK */
int ioc_rval; /* return value for M_IOCACK */
long ioc_filler[4]; /* reserved for future use */
};
For an I_STR ioctl, ioc_cmd corresponds to ic_cmd of the strioctl structure. Its contents can be tested to determine if the user issuing the ioctl call is authorized to do so. For an I_STR ioctl, ioc_count is the number of data bytes, if any, contained in the message and corresponds to ic_len.
ioc_id is an identifier generated internally, and is used by the Stream head to match each M_IOCTL message sent downstream with response messages sent upstream to the Stream head. The response message that completes the Stream head processing for the ioctl is an M_IOCACK (positive acknowledgment) or an M_IOCNAK (negative acknowledgment) message.
For an I_STR ioctl, if a user supplies data to be sent downstream, the Stream head copies the data, pointed to by ic_dp in the strioctl structure, into M_DATA message blocks and links the blocks to the initial M_IOCTL message block. ioc_count is copied from ic_len. If there is no data, ioc_count is zero.
If the Stream head does not recognize the command argument of an ioctl, it creates a transparent M_IOCTL message. The format of a transparent M_IOCTL message is one M_IOCTL message block followed by one M_DATA block. The form of the iocblk structure is the same as above. However, ioc_cmd is set to the value of the command argument in the ioctl system call and ioc_count is set to TRANSPARENT, defined in sys/stream.h. TRANSPARENT distinguishes the case where an I_STR ioctl may specify a value of ioc_cmd equivalent to the command argument of a transparent ioctl. The M_DATA block of the message contains the value of the arg parameter in the ioctl call.
The first module or driver that understands the ioc_cmd request contained in the M_IOCTL acts on it. For an I_STR ioctl, this action generally includes an immediate upstream transmission of an M_IOCACK message. For transparent M_IOCTLs, this action generally includes the upstream transmission of an M_COPYIN or M_COPYOUT message.
| Home |
|---|
Intermediate modules that do not recognize a particular request must pass the message on. If a driver does not recognize the request, or the receiving module cannot acknowledge it, an M_IOCNAK message must be returned.
M_IOCACK and M_IOCNAK message types have the same format as an M_IOCTL message and contain an iocblk structure in the first block. An M_IOCACK block may be linked to following M_DATA blocks. If one of these messages reaches the Stream head with an identifier that does not match that of the currently-outstanding M_IOCTL message, the response message is discarded. A common means of assuring that the correct identifier is returned is for the replying module to convert the M_IOCTL message into the appropriate response type and set ioc_count to 0, if no data is returned. Then, the qreply utility is used to send the response to the Stream head.
In an M_IOCACK or M_IOCNAK message, ioc_error holds any return error condition set by a downstream module. If this value is nonzero, it is returned to the user in errno. Note that both an M_IOCNAK and an M_IOCACK may return an error. However, only an M_IOCACK can have a return value. For an M_IOCACK, ioc_rval holds any return value set by a responding module. For an M_IOCNAK, ioc_rval is ignored by the Stream head.
If a module processing an I_STR ioctl wants to send data to a user process, it must use the M_IOCACK message that it constructs such that the M_IOCACK block is linked to one or more following M_DATA blocks containing the user data. The module must set ioc_count to the number of data bytes sent. The Stream head places the data in the address pointed to by ic_dp in the user I_STR strioctl structure.
If a module processing a transparent ioctl (that is, it received a transparent M_IOCTL) wants to send data to a user process, it can use only an M_COPYOUT message. For a transparent ioctl, no data can be sent to the user process in an M_IOCACK message. All data must have been sent in a preceding M_COPYOUT message. The Stream head will ignore any data contained in an M_IOCACK message (in M_DATA blocks) and will free the blocks.
No data can be sent with an M_IOCNAK message for any type of M_IOCTL. The Stream head will ignore and free any M_DATA blocks.
The Stream head blocks the user process until an M_IOCACK or M_IOCNAK response to the M_IOCTL (same ioc_id) is received. For an M_IOCTL generated from an I_STR ioctl, the Stream head will time out if no response is received in ic_timout interval (the user may specify an explicit interval or specify use of the default interval). For M_IOCTL messages generated from all other ioctls, the default (infinite) is used.
Used by STREAMS to pass a file pointer from the Stream head at one end of a Stream pipe to the Stream head at the other end of the same Stream pipe.
The message is generated as a result of an I_SENDFD ioctl [see streamio(7)] issued by a process to the sending Stream head. STREAMS places the M_PASSFP message directly on the destination Stream head's read queue to be retrieved by an I_RECVFD ioctl [see streamio(7)]. The message is placed without passing it through the Stream (that is, it is not seen by any modules or drivers in the Stream). This message should never be present on any queue except the read queue of a Stream head. Consequently, modules and drivers do not need to recognize this message, and it can be ignored by module and driver developers.
| Home |
|---|
Intended to contain control information and associated data. The message format is one or more (see NOTE) M_PROTO message blocks followed by zero or more M_DATA message blocks as shown in the next figure. The semantics of the M_DATA and M_PROTO message block are determined by the STREAMS module that receives the message.
The M_PROTO message block will typically contain implementation dependent control information. M_PROTO messages are generally sent bidirectionally on a Stream, and their contents can be passed between a process and the Stream head. The contents of the first message block of an M_PROTO message is generally referred to as the control part, and the contents of any following M_DATA message blocks are referred to as the