Notes on X11 proxy
Contents
How encoding/decoding works
Each element in the XCB XML definition gets mapped to an object that knows how to both encode and decode the type, and can generate an example of the type for testing purposes. (This mapping is in the style of denotational semantics.)
There are two types of encoder/decoder object: types and StructParts.
Types are standalone in the sense that they can be encoded/decoded in a context-free manner. Encoding takes a value and writes its encoding to a stream, while decoding reads from a stream and returns a decoded value. Methods:
decode(stream) -> val
- encode(stream, val)
make_example() -> val
StructParts are used within the context of a struct, which is represented in Python as a dictionary. A struct is made up of StructParts, each of which is responsible for encoding/decoding a single field or inserting padding. Encoding takes a dict and may write one of its fields to a stream. Decoding takes a stream and a dict and may add a field to the dict having read its value from the stream. Methods:
- decode_elt(stream, record)
- encode_elt(stream, record)
get_derived_fields() -> list
- fill_out_length(record)
- fill_out_example(record)
get_field_types() -> list
How lists are handled
Take this example from the QueryTree request in the core protocol:
<field type="CARD16" name="children_len" /> <list type="WINDOW" name="children"> <fieldref>children_len</fieldref> </list>
In the Python representation of this request, the "children" field is a list of integers (since WINDOW is an integer ID). The "children_len" field does not appear in the Python representation: it is treated as a "derived field" because the information is represented by the length of the "children" list field.
When encoding, fill_out_length() is called on all the StructParts first. The StructPart for <list> fills out "children_len" in the dict based on "children", so that when encode_elt() is called on the StructPart for <field> there is a value for it to write.
When decoding, decode_elt() is called on all the StructParts in turn. The <field> StructPart fills out "children_len" in the dict. The <list> StructPart then reads "children_len" from the dict to determine how many values it should read from the stream. Lastly, StructDef removes all the fields returned by get_derived_fields() from the dict. The StructPart for <list> returns children_len from get_derived_fields().
Streams
Decoding reads from an InputStream while encoding writes to an OutputStream. These stream objects have two interesting roles. Firstly, they perform alignment checking. If you try to read or write a 4-byte integer when the current position is not aligned to a 4-byte boundary, the stream raises an exception.
Secondly, they provide a logging facility which allows the data to be annotated with encoding/decoding notes. This is implemented via the EncodingLogger object which maintains a stack of labels that can be pushed and popped. The effect is that it annotates the data with a parse tree. This is useful for debugging. If decoding fails, you can look at the parse tree to see if the data read so far was read correctly. The parse trees are saved to golden files in pretty-printed form so that if you refactor the code you can be sure that it hasn't changed the message encodings.
Handling X messages
There are six types of message in the X11 protocol: requests, replies, errors, events, setup requests and setup replies. Of these, setup requests and setup replies only appear at the very start of the connection; they can be considered a separate protocol. The remaining four can be grouped as follows:
- requests - messages sent from client to server
- responses: replies, errors, events - messages sent from server to client
"Response" is my term. I don't think it is used in the X specification.
Each message specifies its size in some way. Requests are variable size. Events and errors are 32 bytes, whereas replies are 32 bytes or more.
All responses contain a sequence number (with one exception, KeymapNotify). Responses must be ordered by their sequence number. Sequence numbers are not allowed to decrease in the response stream (except when sequence numbers wrap around).
For replies and errors, the sequence number tells you which request caused the response. (Requests are implicitly assigned sequence numbers by their position in the message stream.) For events too, the sequence number can also tell you that which request caused the event, if the event appears before any reply or error with the same sequence number. We will call those events prompted events. If an event appears after a reply or error with the same sequence number, it is an unprompted event, which means it was caused by a different X client or by user action (e.g. mouse movement).
KeymapNotify events are a special case as the only event that does not contain a sequence number. The field was used for something else, presumably in order to keep events at 32 bytes (an odd decision). However, you can take a KeymapNotify event's sequence number to be the same as the previous response, because KeymapNotify events are always unprompted.
Requests generate at most one reply or error (with the exception of one request which can generate multiply replies). Whether a request has a reply is determined by the request type. However, errors can be generated by requests that do not produce replies. The only way to know that such a request has not produced a reply is to see a later response with a higher sequence number. If you have not sent any later requests that require replies, there is no guarantee that you will see such a response. For this reason, it is sometimes necessary to send a dummy pipeline synchronisation request. X does not provide any request type specifically for this purpose, but you can use a relatively harmless request such as GetInputFocus, which does not have a side effect and produces a small reply.
Wraparound of sequence numbers
Sequence numbers in responses are 16-bit (unsigned) and can wrap around. At any given time, there will be a range of sequence numbers that the client is expecting to receive in the next response. So long as this range does not contain more than 216 sequence numbers, it will not be ambiguous which real sequence number a response refers to. This means that the client must make sure it has not sent 216 requests in a row that do not produce replies. (This could happen with drawing requests.) It can get around this by inserting a synchronisation request.
In principle, the X server could notice when the client has got into this situation and generate an event which would reduce the client's expected-sequence-number range. But there are no such requirements in the X specification.
Is X11 "asynchronous" or just "pipelined"?
X is often referred to as an asynchronous protocol. But is it really, given that it does not allow out-of-order replies?
mseaborn: question: in a protocol where you can send a bunch of requests, pipelined, and get replies back later -- but the replies must be in the same order as the requests -- would you call this an asynchronous protocol? James Ascroft-Leigh: Like HTTP? Not in general mseaborn: hmm i forgot that HTTP does this mseaborn: what term would you use for a protocol like this? James Ascroft-Leigh: Pipelined? mseaborn: yeah. what if the replies are reasonably bounded in size? James Ascroft-Leigh: Still no. Only if the work required to serve each request was reasonably bounded in size. James Ascroft-Leigh: Fully asynchronous means that the application (higher level) layer must handle message ordering, I think ... mseaborn: so the client receiving the replies should be able to handle out-of-order replies James Ascroft-Leigh: Yes
