This document is also available in these non-normative formats: XML.
Copyright © 2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document is the specification of the Efficient XML Interchange (EXI) format. EXI is a very compact representation for the Extensible Markup Language (XML) Information Set that is intended to simultaneously optimize performance and the utilization of computational resources. The EXI format uses a hybrid approach drawn from the information and formal language theories, plus practical techniques verified by measurements, for entropy encoding XML information. Using a relatively simple algorithm, which is amenable to fast and compact implementation, and a small set of data types, it reliably produces efficient encodings of XML event streams. The event production system and format definition of EXI are presented.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the Last Call Public Working Draft of the Efficient XML Interchange (EXI) Format 1.0. It is made available for review by W3C members and other interested parties. It has been produced by the Efficient XML Interchange (EXI) Working Group, which is part of the Extensible Markup Language (XML) Activity. A summary list of changes made to this document since the last publication is available.
The Working Group intends to advance this specification to W3C Recommendation status. In addition, the group has produced two draft notes, publications of which are part of the criteria for this specification to enter Last Call status. Those notes each analyze the impacts of the new format on existing XML technologies [EXI Impacts Note], and the evaluation of performance gains of the format based on the criteria defined by the XBC Working Group [EXI Evaluation Note].
The features and algorithms described in this document are considered stable at the time of this writing. However, the mechanism described in section 7.3.3 Partitions Optimized for Frequent use of String Literals may be subject to change. This mechanism caps the amount of memory used for value partitions in string tables. It should be considered a feature at risk and may later be altered or replaced if (and only if) the Working Group identifies another mechanism that provides even better efficiency.
Any feedback on this specification is welcome. Please send comments about this document to public-exi-comments@w3.org (public archive). When preparing comments to send in, please provide a separate email message for each distinct issue to the extent possible. The Last Call review period for this document extends until 07 November 2008.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
1. Introduction
1.1 History and Design
1.2 Notational Conventions and Terminology
2. Design Principles
3. Basic Concepts
4. EXI Streams
5. EXI Header
5.1 EXI Cookie
5.2 Distinguishing Bits
5.3 EXI Format Version
5.4 EXI Options
6. Encoding EXI Streams
6.1 Determining Event Codes
6.2 Representing Event Codes
6.3 Fidelity Options
7. Representing Event Content
7.1 Built-in EXI Datatype Representations
7.1.1 Binary
7.1.2 Boolean
7.1.3 Decimal
7.1.4 Float
7.1.5 Integer
7.1.6 Unsigned Integer
7.1.7 QName
7.1.8 Date-Time
7.1.9 n-bit Unsigned Integer
7.1.10 String
7.1.10.1 Restricted Character Sets
7.1.11 List
7.2 Enumerations
7.3 String Table
7.3.1 String Table Partitions
7.3.2 Partitions Optimized for Frequent use of Compact Identifiers
7.3.3 Partitions Optimized for Frequent use of String Literals
7.4
Datatype Representation Map
8. EXI Grammars
8.1 Grammar Notation
8.1.1 Fixed Event Codes
8.1.2 Variable Event Codes
8.2 Grammar Event Codes
8.3 Pruning Unneeded Productions
8.4 Built-in XML Grammars
8.4.1 Built-in Document Grammar
8.4.2 Built-in Fragment Grammar
8.4.3 Built-in Element Grammar
8.5 Schema-informed Grammars
8.5.1 Schema-informed Document Grammar
8.5.2 Schema-informed Fragment Grammar
8.5.3 Schema-informed Element Fragment Grammar
8.5.4 Schema-informed Element and Type Grammars
8.5.4.1 EXI Proto-Grammars
8.5.4.1.1 Grammar Concatenation Operator
8.5.4.1.2 Element Grammars
8.5.4.1.3 Type Grammars
8.5.4.1.3.1 SimpleType Grammars
8.5.4.1.3.2 Complex Type Grammars
8.5.4.1.3.3 Complex Ur-Type Grammar
8.5.4.1.4 Attribute Uses
8.5.4.1.5 Particles
8.5.4.1.6 Element Terms
8.5.4.1.7 Wildcard Terms
8.5.4.1.8 Model Group Terms
8.5.4.1.8.1 Sequence Model Groups
8.5.4.1.8.2 Choice Model Groups
8.5.4.1.8.3 All Model Groups
8.5.4.2 EXI Normalized Grammars
8.5.4.2.1 Eliminating Productions with no Terminal Symbol
8.5.4.2.2 Eliminating Duplicate Terminal Symbols
8.5.4.3 Event Code Assignment
8.5.4.4 Undeclared Productions
8.5.4.4.1 Adding Productions when Strict is False
8.5.4.4.2 Adding Productions when Strict is True
9. EXI Compression
9.1 Blocks
9.2 Channels
9.2.1 Structure Channel
9.2.2 Value Channels
9.3 Compressed Streams
10. Conformance
10.1 EXI Stream Conformance
10.2 EXI Processor Conformance
A References
A.1 Normative References
A.2 Other References
B Infoset Mapping
B.1 Document Information Item
B.2 Element Information Items
B.3 Attribute Information Item
B.4 Processing Instruction Information Item
B.5 Unexpanded Entity Reference Information item
B.6 Character Information item
B.7 Comment Information item
B.8 Document Type Declaration Information item
B.9 Unparsed Entity Information Item
B.10 Notation Information Item
B.11 Namespace Information Item
C XML Schema for EXI Options Header
D Initial Entries in String Table Partitions
D.1 Initial Entries in Uri Partition
D.2 Initial Entries in Prefix Partitions
D.3 Initial Entries in Local-Name Partitions
E Deriving Character Sets from XML Schema Regular Expressions
F Content Coding and Internet Media Type
F.1 Content Coding
F.2 Internet Media Type
G Example Encoding (Non-Normative)
H Schema-informed Grammar Examples (Non-Normative)
H.1 Proto-Grammar Examples
H.2 Normalized Grammar Examples
H.3 Complete Grammar Examples
I Recent Specification Changes (Non-Normative)
I.1
Changes from Fourth Public Working Draft
I.2
Changes from Third Public Working Draft
I.3
Changes from Second Public Working Draft
I.4
Changes from First Public Working Draft
J Acknowledgements (Non-Normative)
The Efficient XML Interchange (EXI) format is a very compact, high performance XML representation that was designed to work well for a broad range of applications. It simultaneously improves performance and significantly reduces bandwidth requirements without compromising efficient use of other resources such as battery life, code size, processing power, and memory.
EXI uses a grammar-driven approach that achieves very efficient encodings using a straightforward encoding algorithm and a small set of data types. Consequently, EXI processors are relatively simple and can be implemented on devices with limited capacity.
EXI is schema "informed", meaning that it can utilize available schema information to improve compactness and performance, but does not depend on accurate, complete or current schemas to work. It supports arbitrary schema extensions and deviations and also works very effectively with partial schemas or in the absence of any schema. The format itself also does not depend on any particular schema language, or format, for schema information.
[Definition:] A program module called an EXI processor, whether it is part of a software or a hardware, is used by application programs to encode their structured data into EXI streams and/or to decode EXI streams to make the structured data accessible to them. The former and the latter of the aforementioned roles of EXI processors are each called [Definition:] EXI stream encoder and [Definition:] EXI stream decoder. This document not only specifies the EXI format, but also defines errors that EXI processors are required to detect and behave upon.
The primary goal of this document is to define the EXI format completely without leaving ambiguity so as to make it feasible for implementations to interoperate. As such, the document lends itself to describing the design and features of the format in a systematic manner, often declaratively with relatively few prosaic annotations and examples. Those readers who prefer a step-by-step introduction to the EXI format design and features are suggested to start with the non-normative [EXI Primer].
EXI is the result of extensive work carried out by the W3C's XML Binary Characterization (XBC) and Efficient XML Interchange (EXI) Working Groups. XBC was chartered to investigate the costs and benefits of an alternative form of XML, and formulate a way to objectively evaluate the potential of a substitute format for XML. Based on XBC's recommendations, EXI was chartered, first to measure, evaluate, and compare the performance of various XML technologies (using metrics developed by XBC [XBC Measurement Methodologies]), and then, if it appeared suitable, to formulate a recommendation for a W3C format specification. The measurements results and analyses, are presented elsewhere [EXI Measurements Note]. The format described in this document is the specification so recommended.
The functional requirements of the EXI format are those that were prepared by the XBC WG in their analysis of the desirable properties of a high performance encoding for XML [XBC Properties]. Those properties were derived from a very broad set of use cases also identified by the XBC working group [XBC Use Cases].
The design of the format presented here, is largely based on the results of the measurements carried out by the group to evaluate the performance characteristics (mainly of processing efficiency and compactness) of various existing formats. The EXI format is based on Efficient XML [Efficient XML], including for example the basis heuristic grammar approach, compression algorithm, and resulting entropy encoding.
EXI is compatible with XML at the XML Information Set [XML Information Set] level, rather than at the XML syntax level. This permits it to encapsulate an efficient alternative syntax and grammar for XML, while facilitating at least the potential for minimizing the impact on XML application interoperability.
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear EMPHASIZED in this document, are to be interpreted as described in RFC 2119 [IETF RFC 2119]. Other terminology used to describe the EXI format is defined in the body of this specification.
The term event and stream is used throughout this document to denote EXI event and EXI stream respectively unless the words are qualified differently to mean otherwise.
This document specifies an abstract grammar for EXI. In grammar notation, all terminal symbols are represented in plain text and all non-terminal symbols are represented in italics. Grammar productions are represented as follows:
| LeftHandSide : Event NonTerminal |
A set of one or more grammar productions that share the same left-hand-side non-terminal symbol are often presented together along with event codes that uniquely identify events among the collocated productions as follows:
| LeftHandSide : | |||
| Event 1 NonTerminal 1 | EventCode1 | ||
| Event 2 NonTerminal 2 | EventCode2 | ||
| Event 3 NonTerminal 3 | EventCode3 | ||
| ... | |||
| Event n NonTerminal n | EventCoden | ||
Section 8.1 Grammar Notation introduces additional notations for describing productions and event codes in grammars. Those additional notations facilitates concise representation of the EXI grammar system.
[Definition:] In this document, the term qname is used to denote a QNameXS2. When used to qualify terminal symbols in grammars (see Table 4-1 for notation), to identify built-in element grammars (see 8.4.3 Built-in Element Grammar) and global type grammars (see 8.5.4.1.3 Type Grammars), or to distinguish value channels in EXI compression (see 9.2.2 Value Channels), such uses of qname represent QName values, which are tuples of { uri, local-name }. Otherwise, a qname represents a QName value affixed with a prefix part to make a triplet of { prefix, uri, local-name }, where the absence of prefix is indicated by "" (an empty string). Two qnames are considered equal when they have the same uri and the same local-name to each other regardless of prefix values.
Terminal symbols that are qualified with a qname permit the use of a wildcard symbol (*) in place of or as part of a qname. The forms of terminal symbols involving qname wildcards used in grammars and their definitions are described in the table below.
| Wildcard | Definition |
|---|---|
| SE (*) | The terminal symbol that matches a start element (SE) event with any qname. |
| SE (uri : *) | The terminal symbol that matches a start element (SE) event with any local-name in namespace uri. |
| AT (*) | The terminal symbol that matches an attribute (AT) event with any qname. |
Several prefixes are used throughout this document to designate certain namespaces. The bindings shown below are assumed, however, any prefixes can be used in practice if they are properly bound to the namespaces.
| Prefix | Namespace Name |
|---|---|
| exi | http://www.w3.org/2007/07/exi |
| xml | http://www.w3.org/XML/1998/namespace |
| xsd | http://www.w3.org/2001/XMLSchema |
| xsi | http://www.w3.org/2001/XMLSchema-instance |
In describing the layout of an EXI format construct, a pair of square brackets [ ] are used to surround the name of a field to denote that the occurrence of the field is optional in the structure of the part or component that contains the field.
In arithmetic expressions, the notation ⌈x⌉ where x represents a real number denotes the ceiling of x, that is, the smallest integer greater than or equal to x.
The following design principles were used to guide the development of EXI and encourage consistent design decisions. They are listed here to provide insight into the EXI design rationale and to anchor discussions on desirable EXI traits.
One of primary objectives of EXI is to maximize the number of systems, devices and applications that can communicate using XML data. Specialized approaches optimized for specific use cases should be avoided.
To reach the broadest set of small, mobile and embedded applications, simple, elegant approaches are preferred to large, analytical or complex ones.
EXI must be competitive with hand-optimized binary formats so it can be used by applications that require this level of efficiency.
EXI must deal flexibly and efficiently with documents that contain arbitrary schema extensions or deviate from their schema. Documents that contain schema deviations should not cause encoding to fail.
EXI must integrate well with existing XML technologies, minimizing the changes required to those technologies. It must be compatible with the XML Information Set [XML Information Set], without significant subsetting or supersetting, in order to maintain interoperability with existing and prospective XML specifications.
EXI achieves broad generality, flexibility, and performance, by unifying concepts from formal language theory and information theory into a single, relatively simple algorithm. The algorithm uses a grammar to determine what is likely to occur at any given point in an XML document and encodes the most likely alternatives in fewer bits. The fully generalized algorithm works for any language that can be described by a grammar (e.g., XML, Java, HTTP, etc.); however, EXI is optimized specifically for XML languages.
The built-in EXI grammar accepts any XML document or fragment and may be augmented with productions derived from XML Schemas [XML Schema Structures][XML Schema Datatypes], RELAX NG schemas [ISO/IEC 19757-2:2003], DTDs [XML 1.0] or other sources of information about what is likely to occur in a set of XML documents. The EXI encoder uses the grammar to map a stream of XML information items onto a smaller, lower entropy, stream of events.
The encoder then represents the stream of events using a set of simple variable length codes called event codes. Event codes are similar to Huffman codes [Huffman Coding], but are much simpler to compute and maintain. They are encoded directly as a sequence of values, or if additional compression is desired, they are passed to the EXI compression algorithm, which replaces frequently occurring event patterns to further reduce size.
When schemas are used, EXI also supports a user-customizable set of typed encodings for efficiently encoding typed values.
[Definition:] An EXI stream is an EXI header followed by an EXI body. [Definition:] It is the EXI body that carries the content of the document, while the EXI header amongst its roles communicates the options that were used for encoding the EXI body. Section 5. EXI Header describes the EXI header. Values in an EXI stream are packed into bytes most significant bit first.
[Definition:] The building block of an EXI body is an EXI event. An EXI body consists of a sequence of EXI events representing an EXI document or an EXI fragment.
The EXI events permitted at any given position in an EXI stream are determined by the EXI grammar. As is the case with XML, the events occur with nesting pairs of matching start element and end element events where any pair does not intersect with another except when it is fully contained in the other. The EXI grammar incorporates knowledge of the XML grammar and may be augmented and refined using schema information and fidelity options. The EXI grammar is formally specified in section 8. EXI Grammars.
The EXI grammars either permits only a single root element or multiple root elements in an EXI body, depending on the top-level grammar used for processing the body. [Definition:] EXI documents are EXI bodies encoded using either Built-in Document Grammar (See 8.4.1 Built-in Document Grammar) or Schema-informed Document Grammar (See 8.5.1 Schema-informed Document Grammar), and are inherently restricted to each contain only a single root element as per the grammars. [Definition:] EXI fragments are EXI bodies encoded using either Built-in Fragment Grammar (See 8.4.2 Built-in Fragment Grammar) or Schema-informed Fragment Grammar (See 8.5.2 Schema-informed Fragment Grammar), and are permitted to each contain multiple root elements.
[Definition:] When schema information is available to describe the contents of an EXI body, such an EXI stream is a schema-informed EXI stream, and either Schema-informed Document Grammar (See 8.5.1 Schema-informed Document Grammar) or Schema-informed Fragment Grammar (See 8.5.2 Schema-informed Fragment Grammar) is used to process the EXI body. [Definition:] Otherwise, an EXI stream is a schema-less EXI stream, and either Built-in Document Grammar (See 8.4.1 Built-in Document Grammar) or Built-in Fragment Grammar (See 8.4.2 Built-in Fragment Grammar) is used to process the EXI body.
The following table summarizes the EXI events and associated content that occur in an EXI stream. The content items appear in an EXI stream in the order they are shown in the table. In addition, the table includes the grammar notation used to represent each event in this specification. Each event in an EXI stream participates in a mapping system that relates events to XML Information Items so that an EXI document or an EXI fragment as a whole serves to represent an XML Information Set. The table shows XML Information Items relevant to each EXI event type. Appendix B Infoset Mapping describes the mapping system in detail.
| EXI Event Type | Content | Grammar Notation | Information Item |
|---|---|---|---|
| Start Document | SD | B.1 Document Information Item | |
| End Document | ED | ||
| Start Element | qname | SE ( qname ) | B.2 Element Information Items |
| SE ( * ) | |||
| SE ( uri : * ) | |||
| End Element | EE | ||
| Attribute | qname, value | AT ( qname ) | B.3 Attribute Information Item |
| AT ( * ) | |||
| Characters | value | CH | B.6 Character Information item |
| Namespace Declaration | uri , prefix , local-element-ns | NS | B.11 Namespace Information Item |
| Comment | text | CM | B.7 Comment Information item |
| Processing Instruction | name, text | PI | B.4 Processing Instruction Information Item |
| DOCTYPE | name, public, system, text | DT | B.8 Document Type Declaration Information item |
| Entity Reference | name | ER | B.5 Unexpanded Entity Reference Information item |
| Self Contained | SC |
Section 6. Encoding EXI Streams describes the algorithm used to encode events in the EXI stream. As indicated in the table above, there are some event types that carry content with their event instances while other event types function as markers without content.
SE events may be followed by a series of NS events. Each NS event either associates a prefix with an URI, assigns a default namespace, or in the case of a namespace declaration with an empty URI, rescinds one of such associations in effect at the point of its occurrence. The effect of the association or disassociation caused by a NS event stays in effect until the corresponding EE event occurs.
Like XML, the namespace of a particular element may be specified by a namespace declaration preceeding the element or a local namespace declaration following the element name. When the namespace is specified by a local namespace declaration, the local-element-ns flag of the associated NS event is set to true and the prefix of the element is set to the prefix of that NS event. When the namespace is specified by a previous namespace declaration, the local-element-ns flag of all local NS events is false and the prefix of the element is set according to the prefix component of the element qname. The series of NS events associated with a particular element may include at most one NS event with its local-element-ns flag set to true. The uri of a NS event with its local-element-ns flag set to true MUST match the uri of the associated SE event.
An SE event may be followed by a SC event, indicating the element is self-contained and can be read independently from the rest of the EXI body. Applications may use self-contained elements to index portions of the EXI body for random access.
Each item in the event content has a data type associated with it as shown in the following table. The content of each event, if any, is encoded as a sequence of items each of which being encoded according to its data type in order starting with the first item followed by subsequent items.
| Content item | Used in | Type |
|---|---|---|
| name | PI, DT, ER | 7.1.10 String |
| prefix | NS | 7.1.10 String |
| local-element-ns | NS | 7.1.2 Boolean |
| public | DT | 7.1.10 String |
| qname | SE, AT | 7.1.7 QName |
| system | DT | 7.1.10 String |
| text | CM, PI | 7.1.10 String |
| uri | NS | 7.1.10 String |
| value | CH, AT | According to the schema type (see 7. Representing Event Content) if any is in effect, otherwise 7.1.10 String |
Content items other than value have their inherent, fixed data types independent of their uses. The data type that governs each occurrence of the value item depends on the schema type if any that is in effect for the value in question. The type xsd:anySimpleType is used for values that do not have an associated schema-type, are schema-invalid, or occur in mixed content. Section 7. Representing Event Content describes how each of the types listed above are encoded in an EXI stream.
| Editorial note | |
| The syntax and semantics of NS event is so formulated in favor of simplicity in order not to incur processing cost that would have otherwise be involved by such operations as sorting and conditional branching, yet that it keeps the number of additional bits required to achieve the functionality in overall EXI streams to the minimal with the observation that the number of namespace declarations in an EXI stream is generally small. | |
Each EXI stream begins with an EXI header. [Definition:] The EXI header can identify EXI streams, distinguish EXI streams from text XML documents, identify the version of the EXI format being used, and specify the options used to process the body of the EXI stream. The EXI header has the following structure:
| [ EXI Cookie ] | Distinguishing Bits | Presence Bit | EXI Format | [EXI Options] | [Padding Bits] |
| for EXI Options | Version |
The EXI Options field within an EXI header is optional. Its presence is indicated by the value of the presence bit that follows Distinguishing Bits. The presence and absence is indicated by the value 1 and 0, respectively.
When either compression is used, or the alignment used is one of byte-alignment or pre-compression as dictated by EXI Options, padding bits of minumum length required to make the whole length of the header byte-aligned are added at the end of the header. The padding bits field can contain any values of bits as its contents.
The details of EXI Cookie, Distinguishing Bits, EXI Format Version and EXI Options are described in the following sections.
[Definition:] An EXI header MAY start with an EXI Cookie, which is a four byte field that serves to indicate that the stream of which it is a part is an EXI stream. The four byte field consists of four characters " $ " , " E ", " X " and " I " in that order, each represented as an ASCII octet, as follows.
| ' $ ' | ' E ' | ' X ' | ' I ' |
This four byte sequence is particular to EXI and specific enough to distinguish EXI streams from a broad range of data types currently used on the Web. While the EXI cookie is optional, its use is RECOMMENDED in the EXI header when the EXI stream is exchanged in a context where longer, more solid content-based datatype identification is desired than what is provided by Distinguishing Bits whose role is rather narrowly focused on distinguishing EXI streams from XML documents.
[Definition:] The second part in the EXI header is the Distinguishing Bits, which is a two bit field of which the first bit contains the value 1 and the second bit contains the value 0, as follows.
| 1 | 0 |
Unlike the optional EXI cookie that MAY occur to precede this field, the presence of Distinguishing Bits is REQUIRED in the EXI header. It is used to distinguish EXI streams from text XML documents in the absence of an EXI cookie. This two bit sequence is the minimum that suffices to distinguish EXI streams from XML documents since it is the minimum length bit pattern that cannot occur as the first two bits of a well-formed XML document represented in any one of the conventional character encodings, such as UTF-8, UTF-16, UCS-2, UCS-4, EBCDIC, ISO 8859, Shift-JIS and EUC, according to XML 1.0 [XML 1.0]. Therefore, XML Processors are expected to reject an EXI stream as early as they read and process the first byte from the stream.
Systems that use EXI streams as well as XML documents can reliably look at the Distinguishing Bits to determine whether to interpret a particular stream as XML or EXI.
[Definition:] The fourth part in the EXI header is the EXI Format Version, which identifies the version of the EXI format being used. EXI format version numbers are integers. Each version of the EXI Format Specification specifies the corresponding EXI format version number to be used by conforming implementations. The EXI format version number that corresponds with this version of the EXI format specification is 0 (zero).
The first bit of the version field indicates whether the version is a preview or final version of the EXI format. A value of 0 indicates this is a final version and a value of 1 indicates this is a preview version. Final versions correspond to final, approved versions of the EXI format specification. An EXI processor that implements a final version of the EXI format specification is REQUIRED to process EXI streams that have a version field with its first bit set to 0 followed by a version number that corresponds to the version of the EXI specification the processor implements. Preview versions of the EXI format are useful for gaining implementation and deployment experience prior to finalizing a particular version of the EXI format. While preview versions may match drafts of this specification, they are not governed by this specification and the behaviour of EXI processors encountering preview versions of the EXI format is implementation dependent. Implementers are free to coordinate to achieve interoperability between different preview versions of the EXI format.
Following the first bit of the version is a sequence of one or more 4-bit unsigned integers representing the version number. The version number is determined by summing this sequence of 4-bit unsigned values. The sequence is terminated by any 4-bit unsigned integer with a value in the range 0-14. As such, the first 15 version numbers are represented by 4 bits, the next 15 are represented by 8 bits, etc.
Given an EXI stream with its stream cursor positioned just past the first bit of the EXI format version field, the EXI format version number can be computed by going through the following steps with version number initially set to 1.
The following are example EXI format version numbers.
| EXI Format Version Field | Description |
|---|---|
| 1 0000 | Preview version 1 |
| 0 0000 | Final version 1 |
| 0 1110 | Final version 15 |
| 0 1111 0000 | Final version 16 |
| 0 1111 0001 | Final version 17 |
EXI processors conforming with the final version of this specification MUST use the 5-bit value 0 0000 as the version number.
[Definition:] The fifth part of the EXI header is the EXI Options, which provides a way to specify the options used to encode the body of the EXI stream. [Definition:] The EXI Options are represented as an EXI Options document, which is an XML document encoded using the EXI format described in this specification. This results in a very compact header format that can be read and written with very little additional software.
The presence of EXI Options in its entirety is optional in EXI header, and it is predicated on the value of the presence bit that follows the Distinguishing Bits. When EXI Options are present in the header, an EXI Processor MUST observe the specified options to process the EXI stream that follows. Otherwise, an EXI Procesor may obtain the EXI options using another mechanism. There are no fallback option values provided by this specification for use in the absence of the whole EXI Options part.
EXI processors MAY provide external means for applications or users to specify EXI Options when the EXI header is absent. Such EXI processors are typically used in controlled systems where the knowledge about the effective EXI Options is shared prior to the exchange of EXI streams . The mechanism to communicate out-of-bound EXI Options and their representation used in such systems are implementation dependent.
The following table describes the EXI options specified in the options field.
| EXI Option | Description | Default Value |
|---|---|---|
| alignment | Alignment of event codes and content items | bit-packed |
| compression | EXI compression is used to achieve better compactness | false |
| strict | Strict interpretation of schemas is used to achieve better compactness | false |
| fragment | Body is encoded as an EXI fragment instead of an EXI document | false |
| preserve | Specifies whether comments, pis, etc. are preserved | all false |
| selfContained | Enables self-contained elements | false |
| schemaID | Identify the schema information, if any, used to encode the body | none |
| datatypeRepresentationMap | Identify datatype representations used to encode values in EXI body | none |
| blockSize | Specifies the block size used for EXI compression | 1,000,000 |
| valueMaxLength | Specifies the maximum string length of value content items to be considered for addition to the string table. | unbounded |
| valuePartitionCapacity | Specifies the total capacity of value partitions in a string table | unbounded |
| [user defined] | User defined options may be added | none |
Appendix C XML Schema for EXI Options Header provides an XML Schema describing the EXI Options document. This schema is designed to produce smaller headers for option combinations used when compactness is critical.
The EXI Options document is encoded as an EXI body informed by the above mentioned schema using the default options specified by the following XML document. An EXI Options document consists only of EXI body, and MUST NOT start with an EXI header.
<header xmlns="http://www.w3.org/2007/07/exi">
<strict/>
</header>
[Definition:] The alignment option is used to control the alignment of event codes and content items. The value is one of bit-packed, byte-alignment or pre-compression, of which bit-packed is the default value assumed when the "alignment" element is absent in the EXI Options document. When the value of compression option is set to true, the way event codes and associated contents are represented is governed by the rule specified in 9. EXI Compression instead of the alignment option value, thus the compression option value "true" effectively rescinds the effect of an alignment option value.
[Definition:] Alignment option value bit-packed indicates that the the event codes and associated content are packed in bits without any paddings in-between.
[Definition:] Alignment option value byte-alignment indicates that the event codes and associated content are aligned on byte boundaries. While byte-alignment generally results in EXI streams of larger sizes compared with their bit-packed equivalents, byte-alignment may provide a help in some use cases that involve frequent copying of large arrays of scalar data directly out of the stream. It can also make it possible to work with data in-place and can make it easier to debug encoded data by allowing items on aligned boundaries to be easily located in the stream.
[Definition:] Alignment option value pre-compression alignment indicates that all steps involved in compression (see section 9. EXI Compression) are to be done with the exception of the final step of applying the DEFLATE algorithm. The primary use case of pre-compression is to avoid a duplicate compression step when compression capability is built into the transport protocol. In this case, pre-compression just prepares the stream for later compression.
[Definition:] The compression option is a Boolean used to increase compactness using additional computational resources. The default value "false" is assumed when the "compression" element is absent in the EXI Options document. When set to true, the event codes and associated content are compressed according to 9. EXI Compression regardless of the alignment option value.
[Definition:] The strict option is a Boolean used to increase compactness by using a strict interpretation of the schemas and omitting preservation of certain items, such as comments, processing instructions and namespace prefixes. The default value "false" is assumed when the "strict" element is absent in the EXI Options document. When set to true, NS, CM, PI, ER and SC events are pruned from EXI grammars, and schema-informed element and type grammars are restricted to only permit items declared in the schemas. The "strict" element MUST NOT appear in an EXI options document when the "preserve" element is present in the same options document.
[Definition:] The fragment option is a Boolean that indicates whether the EXI body is an EXI document or an EXI fragment. When set to true, the EXI body is an EXI fragment. Otherwise, the EXI body is an EXI document. Unlike EXI documents, EXI fragments are capable of representing multiple elements at the root level. They are analogous in concept to external general parsed entitiesXML in XML in that they consist of a sequence of elements, processing instructions and comments in containers of their own that are physically separate from the documents in which they are to be used. An EXI fragment is formally defined in terms of its grammar in Sections 8.4.2 Built-in Fragment Grammar and 8.5.2 Schema-informed Fragment Grammar. The XML Information Set an EXI stream is mapped onto contains a document information item if the stream represents an EXI document, otherwise, the XML Information Set does not have a document information item if the stream represents an EXI fragment. The order among elements, processing instructions and comments that appear at the root in an EXI fragment is deemed significant and MUST be preserved by EXI processors.
[Definition:] The preserve option is a set of Booleans that can be set independently to control whether certain information items are preserved in the EXI stream. 6.3 Fidelity Options describes the set of information items effected by the preserve option. The "preserve" element MUST NOT appear in an EXI options document when the "strict" element is present in the same options document.
[Definition:] The selfContained option is a Boolean used to enable the use of self contained elements in the EXI stream. Self contained elements may be read independently from the rest of the EXI body, allowing them to be indexed for random access. The "selfContained" element MUST NOT appear in an EXI options document when the "compression" or "pre-compression" elements are present in the same options document.
[Definition:] The schemaID option may be used to identify the schema information used when encoding the EXI body. When the "schemaID" element in the EXI options document contains the xsi:nil attribute, no schema information was used when encoding the EXI body. When the value of the "schemaID" element is empty, no user defined schema information was used when encoding the EXI body; however, the built-in XML Schema types may have been used with the xsi:type attribute to specify element types. When the schemaID option is absent (i.e., undefined), no statement is made about the schema information used to encode the EXI body and this information MUST be communicated out of band. This specification does not dictate the syntax or semantics of other values specified in this field. An example schemaID scheme is the use of URI that is apt for globally identifying schema resources on the Web. The parties involved in the exchange are free to agree on the scheme of schemaID field that is appropriate for their use to uniquely identify the schema information.
[Definition:] The datatypeRepresentationMap option, represented by a "datatypeRepresentationMap" element, identifies datatype representations used to encode values in the EXI body as described in 7.4 Datatype Representation Map .
[Definition:] The blockSize option specifies the block size used for EXI compression. When the blockSize option is absent, the default blocksize of 1,000,000 is used. The default blockSize is intentionally large but can be reduced for processing large documents on devices with limited memory.
[Definition:] The valueMaxLength option specifies the maximum length of string values representing value content items to be considered for addition to the string table. When the valueMaxLength option is absent, the maximum length is unbounded. String values representing value content items that have length larger than the valueMaxLength option value are excluded from further consideration on account of valuePartitionCapacity for addition to the string table.
[Definition:] The valuePartitionCapacity option specifies the total capacity of the global and all local value partitions of a string table, where the measurement unit of the capacity is the number of unique enitiries. When the valuePartitionCapacity option is absent, an unbounded capacity is assumed. A string representing a value content item that has length smaller than or equal to the valueMaxLength option value and is not found in the value partitions at the time of the value occurrence is to be added into the string table only when doing so would not cause the number of unique values in value partitions to exceed the capacity. The use of valuePartitionCapacity option value and the way the number of unique values are counted for value partitions are described in 7.3.1 String Table Partitions.
The rules for encoding a series of events as an EXI stream are very simple and are driven by a declarative set of grammars that describes the structure of an EXI stream. Every event in the stream is encoded using the same set of encoding rules, which are summarized as follows:
Self-contained (SC), namespace (NS) and attribute (AT) events associated with a given element occur directly after the start element (SE) event in the following order:
| SC | NS | NS | ... | NS | AT (xsi:type) | AT (xsi:nil) | AT | AT | ... | AT |
Namespace (NS) events occur in document order. AT(xsi:type) and AT(xsi:nil) occur before all other AT events. In a schema-less EXI stream, the remaining attribute (AT) events can occur in any order. In a schema-informed EXI stream, the remaining attribute (AT) events occur in lexical order sorted first by qname's local-name then by qname's URI.
EXI uses the same simple procedure described above, to encode well-formed documents, document fragments, schema-valid information items, schema-invalid information items, information items partially described by schemas and information items with no schema at all. Only the grammars that describe these items differ. For example, an element with no schema information is encoded according to the XML grammar defined by the XML specification, while an element with schema information is encoded according to the more specific grammar defined by that schema.
[Definition:] An event code is a sequence of 1 to 3 non-negative integers called parts. Each production in a grammar has an event code that distinguishes its event from that of other productions that share the same left-hand-side non-terminal symbol.
Section 6.1 Determining Event Codes describes in detail how the grammar is used to determine the event code of an event. Section 6.2 Representing Event Codes describes in detail how event codes are represented as bits. Section 6.3 Fidelity Options describes available fidelity options and how they effect the EXI stream. Section 7. Representing Event Content describes how the typed event contents are represented as bits.
The structure of an EXI stream is described by the EXI grammars, which are formally specified in section 8. EXI Grammars. Each grammar defines which events are permitted to occur at any given point in the EXI stream and provides a pre-assigned event code for each event.
For example, the grammar productions below describe the events that can occur in a schema-informed EXI stream after the Start-Document (SD) event provided there are four global elements defined in the schema and provide an event code for each event:
| Syntax | Event Code | ||
|---|---|---|---|
| DocContent | |||
| SE ("A") DocEnd | 0 | ||
| SE ("B") DocEnd | 1 | ||
| SE ("C") DocEnd | 2 | ||
| SE ("D") DocEnd | 3 | ||
| SE (*) DocEnd | 4.0 | ||
| DT DocContent | 4.1 | ||
| CH DocContent | 4.2 | ||
| CM DocContent | 4.3.0 | ||
| PI DocContent | 4.3.1 | ||
At the point in an EXI stream where the above grammar productions are in effect, the event code of Start Element "A" (i.e. SE("A")) is 0. The event code of a DOCTYPE (DT) event at this point in the stream is 4.1, and so on.
Each event code is represented by a sequence of 1 to 3 parts that uniquely identify an event. Event code parts are encoded in order starting with the first part followed by subsequent parts.
When the value of compression option is false, and bit-packed alignment option is used for the current processing of the stream, the ith part of an event code is encoded using the minimum number of bits required to distinguish it from the ith part of the other sibling event codes in the current grammar. Specifically, the ith part of an event code is encoded as an n-bit unsigned integer (7.1.9 n-bit Unsigned Integer), of which n is ⌈ log 2 m ⌉ where m is the number of distinct values used as the ith part of its own and all its sibling event codes in the current grammar. Two event codes are siblings at the ith part if and only if they share the same values in all preceding parts. All event codes are siblings at the first part.
On the other hand, when the value of compression option is true, or either byte-alignment or pre-compression alignment option is used, the ith part of an event code is encoded using the minimum number of bytes instead of bits required to distinguish it from the ith part of the other sibling event codes in the current grammar. Each part is encoded as an n-bit unsigned integer (7.1.9 n-bit Unsigned Integer), of which n is ⌈ log 2 m ⌉ where m is the number of distinct values used as the ith part of its own and all its sibling event codes in the current grammar. The number of bytes used for the n-bit unsigned integer representation in this case is equal to ⌈ n / 8 ⌉.
Regardless of the values of compression option and alignment option, if there is only one distinct value for a given part, the part is omitted (i.e., encoded in log 2 1 = 0 bits = 0 bytes).
For example, the nine event codes shown in the DocContent grammar above have a value ranging from 0 to 4 for their first part. There are five distinct values needed to identify the first part of these event codes. Therefore, when EXI compression and alignment are not in effect, the first part can be encoded in ⌈ log 2 5 ⌉ = 3 bits. In the same fashion, the number of bits used for encoding second and third part (if present) are calculated as ⌈ log 2 4 ⌉ = 2 bits and ⌈ log 2 2 ⌉ = 1 bits, respectively. On the other hand, when EXI compression or alignment is in effect, the number of bytes used for each part is ⌈ 3 / 8 ⌉ = 1 bytes for the first part, ⌈ 2 / 8 ⌉ = 1 bytes for the second part and ⌈ 1 / 8 ⌉ = 1 bytes for the third part.
The table below illustrates how the event codes of each event in the DocContent grammar above is encoded.
| Event | Part values | Event Code Encoding | # bits | ||
|---|---|---|---|---|---|
| SE ("A") | 0 | 000 | 3 | ||
| SE ("B") | 1 | 001 | 3 | ||
| SE ("C") | 2 | 010 | 3 | ||
| SE ("D") | 3 | 011 | 3 | ||
| SE (*) | 4 | 0 | 100 00 | 5 | |
| DT | 4 | 1 | 100 01 | 5 | |
| CH | 4 | 2 | 100 10 | 5 | |
| CM | 4 | 3 | 0 | 100 11 0 | 6 |
| PI | 4 | 3 | 1 | 100 11 1 | 6 |
| # distinct values ( m) | 5 | 4 | 2 | ||||
| 3 | 2 | 1 |
| Event | Part values | Event Code Encoding | # bytes | ||
|---|---|---|---|---|---|
| SE ("A") | 0 | 00000000 | 1 | ||
| SE ("B") | 1 | 00000001 | 1 | ||
| SE ("C") | 2 | 00000010 | 1 | ||
| SE ("D") | 3 | 00000011 | 1 | ||
| SE (*) | 4 | 0 | 00000100 00000000 | 2 | |
| DT | 4 | 1 | 00000100 00000001 | 2 | |
| CH | 4 | 2 | 00000100 00000010 | 2 | |
| CM | 4 | 3 | 0 | 00000100 00000011 00000000 | 3 |
| PI | 4 | 3 | 1 | 00000100 00000011 00000001 | 3 |
| # distinct values (m) | 5 | 4 | 2 | ||||
| 1 | 1 | 1 |
Some XML applications do not require the entire XML feature set and would prefer to eliminate the overhead associated with unused features. For example, the SOAP 1.2 specification [SOAP 1.2] prohibits the use of XML processing-instructions. In addition, there are many data-exchange use cases that do not require XML comments or DTDs.
Applications can use a set of fidelity options to specify the XML features they require. As specified in section 8.3 Pruning Unneeded Productions, EXI processors MUST use these fidelity options to prune the events that are not required from the grammars, improving compactness and processing efficiency.
The table below lists the fidelity options supported by this version of the EXI specification and describes the effect setting these options has on the EXI stream.
| Fidelity option | Effect |
|---|---|
| Preserve.comments | CM events are preserved |
| Preserve.pis | PI events are preserved |
| Preserve.dtd | DOCTYPE and ER events are preserved |
| Preserve.prefixes | NS events and namespace prefixes are preserved |
| Preserve.lexicalValues | Lexical form of element and attribute values is preserved in value content items |
The content of each event in an EXI body is represented according to its type (see Table 4-2). In the absence of external type information, attribute and character values are typed as String.
[Definition:] EXI defines a minimal set of datatype representations called Built-in EXI datatype representations that define how values are represented in EXI streams. When the preserve.lexicalValues option is false, values are represented according to their schema datatypes per Table 7-1 below using built-in EXI datatype representations as described in 7.1 Built-in EXI Datatype Representations. Otherwise, values are represented as Strings with restricted character sets (see Table 7-2 below). The following table lists the built-in EXI datatype representations, associated type identifiers and the XML Schema Language [XML Schema Datatypes] built-in datatypes each is used to represent by default.
| Built-in EXI Datatype Representation | EXI Datatype ID | XML Schema DatatypesXS2 | |
|---|---|---|---|
| Binary | xsd:base64Binary | base64Binary | |
| xsd:hexBinary | hexBinary | ||
| Boolean | xsd:boolean | boolean | |
| Date-Time | xsd:dateTime | dateTime, time, date, gYearMonth, gYear, gMonthDay, gDay, gMonth | |
| Decimal | xsd:decimal | decimal | |
| Float | xsd:double | float, double | |
| n-bit Unsigned Integer | xsd:integer |
integer, the representation of which depends on the facetXS2 values as follows. When the bounded range of integer is 4095 or smaller as determined by the values of minInclusiveXS2, minExclusiveXS2, maxInclusiveXS2 and maxExclusiveXS2 facets, use n-bit Unsigned Integer representation. Otherwise, when the integer satisfies one of the followings, use Unsigned Integer representation.
Otherwise, use Integer representation. | |
| Unsigned Integer | |||
| Integer | |||
| String | xsd:string | string, anySimpleType, anyURI, duration, All types derived by union | |
| List | All types derived by list, including IDREFS and ENTITIES | ||
| QName | |||
By default, datatypes derived from the XML Schema datatypes above are also represented according to the associated built-in EXI datatype representation. When there are more than one XML Schema datatypes above from which a datatype is derived directly or indirectly, the closest ancestor is used to determine the built-in EXI datatype representation. For example, a value of XML Schema datatype xsd:int is represented according to the same built-in EXI datatype representation as a value of XML Schema datatype xsd:integer. Although xsd:int is derived indirectly from xsd:integer and also further from xsd:decimal, a value of xsd:int is processed as an instance of xsd:integer because xsd:integer is closer to xsd:int than xsd:decimal is in the datatype inheritance hierarchy.
Each EXI datatype identifier above is a qname. Datatype identifiers uniquely identify one of the built-in EXI datatype representations. They are used by Datatype Representation Map to designate XML Schema datatypes to built-in EXI datatype representations different from the ones that are associated by default. Not all built-in EXI datatype representations are assigned datatype identifiers. Only those that have identifiers are usable by Datatype Representation Map for designating alternative representations.
When the preserve.lexicalValues option is true, all values are represented as Strings. Some values that would have otherwise been designated to certain built-in EXI datatype representations are represented as Strings with restricted character sets as defined by the table below.
| EXI Datatype ID | Restricted Character Set |
|---|---|
| xsd:base64Binary | { #x9, #xA, #xD, #x20, +, /, [0-9], =, [A-Z], [a-z] } |
| xsd:hexBinary | { #x9, #xA, #xD, #x20, [0-9], [A-F], [a-f] } |
| xsd:boolean | { #x9, #xA, #xD, #x20, 0, 1, a, e, f, l, r, s, t, u } |
| xsd:dateTime | { #x9, #xA, #xD, #x20, +, -, ., [0-9], :, T, Z } |
| xsd:decimal | { #x9, #xA, #xD, #x20, +, -, ., [0-9] } |
| xsd:double | { #x9, #xA, #xD, #x20, +, -, ., [0-9], E, F, I, N, a, e } |
| xsd:integer | { #x9, #xA, #xD, #x20, +, -, [0-9] } |
The restricted character set for the EXI List datatype representation is determined by the EXI datatype representation of the values in the List.
The rules used to represent values of String depend on the content items to which the values belong. There are certain content items whose value representation involve the use of string tables while other content items are represented using the encoding rule described in 7.1.10 String without involvement of string tables. The content items that use string tables and how each of such content items uses string tables to represent their values are described in 7.3 String Table.
Schemas can provide one or more enumerated values for types. EXI exploits those pre-defined values when they are available to represent values of such types in a more efficient manner than it would otherwise using built-in EXI datatypes. The encoding rule for representing a type of enumerated values is described in 7.2 Enumerations. Types that are derived from other types by union and their subtypes are always represented as String regardless of the availability of enumerated values. Representation of values of which the schema type is one of QName, Notation or a type derived therefrom by restriction are also not affected by enumerated values if any.
The following sections describe the encoding rules of built-in EXI datatype representations for representing values in EXI streams.
Values typed as Binary are represented as a length-prefixed sequence of octets representing the binary content. The length is represented as an Unsigned Integer (see 7.1.6 Unsigned Integer).
In the absence of pattern facets in the schema datatype, values typed as Boolean are represented as n-bit unsigned integer (7.1.9 n-bit Unsigned Integer), where n is one (1) and the value zero (0) represents false and the value one (1) represents true.
Otherwise, when pattern facets are available in the schema datatype, Boolean datatype representation is able to distinguish values not only arithmetically (0 or 1) but also between lexical variances ("0", "1", "false" and "true"), and values typed as Boolean are represented as n-bit unsigned integer (7.1.9 n-bit Unsigned Integer), where n is two (2) and the value zero (0), one (1), two (2) and three (3) each represents value "false", "0", "true" and "1".
Values typed as Decimal are represented as a Boolean sign (see 7.1.2 Boolean) followed by two Unsigned Integers (see 7.1.6 Unsigned Integer). A sign value of zero (0) is used to represent positive Decimal values and a sign value of one (1) is used to represent negative Decimal values. The first Unsigned Integer represents the integral portion of the Decimal value. The second Unsigned Integer represents the fractional portion of the Decimal value with the digits in reverse order to preserve leading zeros.
Values typed as Float are represented as two consecutive Integers (see 7.1.5 Integer). The first Integer represents the mantissa of the floating point number and the second Integer represents the base-10 exponent of the floating point number. The range of the mantissa is - (263) to 263-1 and the range of the exponent is - (214-1) to 214-1. Values typed as Float with a mantissa or exponent outside the accepted range are represented as schema-invalid values.
The exponent value -(214) is used to indicate one of the special values: infinity, negative infinity and not-a-number (NaN). An exponent value -(214) with mantissa values 1 and -1 represents positive infinity (INF) and negative infinity (-INF) respectively. An exponent value -(214) with any other mantissa value represents NaN.
A value represented as Float can be decoded by going through the following steps.
The Integer type supports signed integer numbers of arbitrary magnitude. Values typed as Integer are represented as a Boolean sign (see 7.1.2 Boolean) followed by an Unsigned Integer (see 7.1.6 Unsigned Integer). A sign value of zero (0) is used to represent positive integers and a sign value of one (1) is used to represent negative integers. For non-negative values, the Unsigned Integer holds the magnitude of the value. For negative values, the Unsigned Integer holds the magnitude of the value minus 1.
The Unsigned Integer type supports unsigned integer numbers of arbitrary magnitude. Values typed as Unsigned Integer are represented using a sequence of octets. The sequence is terminated by an octet with its most significant bit set to 0. The value of the unsigned integer is stored in the least significant 7 bits of the octets as a sequence of 7-bit bytes, with the least significant byte first.
EXI processors SHOULD support arbitrarily large Unsigned Integer values. EXI processors MUST support Unsigned Integer values less than 4294967296.
A value represented as Unsigned Integer can be decoded by going through the following steps.
Values of type QName are encoded as a sequence of values representing the URI, local-name and prefix components of the QName in that order, where the prefix component is present only when the preserve.prefixes option is set to true.
When the QName value is specified by a schema-informed grammar using the SE(qname) or AT(qname) terminal symbols, URI and local-name are implicit and are omitted. Similarly, when the URI of the QName value is derived from a schema-informed grammar using SE(uri: *) terminal symbols, URI is implicit thus omitted in the representation, and only the local-name component is encoded as a String (see 7.1.10 String). Otherwise, URI and local-name components are encoded as Strings. If the QName is in no namespace, the URI is represented by a zero length String.
When present, prefixes are represented as n-bit unsigned integers (7.1.9 n-bit Unsigned Integer), where n is log2(N) and N is the number of unique prefixes specified for the URI of the QName by preceding NS events in the EXI stream. Each unique prefix is assigned a unique n-bit integer (0 ... N-1) according to the order in which the associated NS event occurs in the EXI stream. If there are no prefixes specified for the URI of the QName by preceding NS events in the EXI stream, the prefix is undefined. An undefined prefix is represented using zero bits (i.e., omitted).
Given either a n-bit unsigned integer m that represents the prefix value or an undefined prefix, the effective prefix value is determined by following the rules described below in order. A QName is in error if it has an undefined prefix that cannot be resolved by the rules below.
Values typed as Date-Time are encoded as a sequence of values representing the individual components of the Date-Time. The following table specifies each of the possible date-time components along with how they are encoded.
| Component | Value | Type |
|---|---|---|
| Year | Offset from 2000 | Integer ( 7.1.5 Integer) |
| MonthDay | Month * 32 + Day | 9-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) where day is a value in the range 1-31 and month is a value in the range 1-12. |
| Time | ((Hour * 60) + Minutes) * 60 + seconds | 17-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) |
| FractionalSecs | Fractional seconds | Unsigned Integer ( 7.1.6 Unsigned Integer) representing the fractional part of the seconds with digits in reverse order to preserve leading zeros |
| TimeZone | TZHours * 60 + TZMinutes | 11-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) representing a signed integer offset by 840 ( = 14 * 60 ) |
| presence | Boolean presence indicator | Boolean (7.1.2 Boolean) |
The variety of components that constitute a value and their appearance order depend on the XML Schema type associated with the value. The following table shows which components are included in a value of each XML Schema type that is relevant to Date-Time datatype. Items listed in square brackets are included if and only if the value of its preceding presence indicator (specified above) is set to true.
| XML Schema Datatype | Included Components |
|---|---|
| gYearXS2 | Year, presence, [TimeZone] |
| gYearMonthXS2 | Year, MonthDay, presence, [TimeZone] |
| dateXS2 | |
| dateTimeXS2 | Year, MonthDay, Time, presence, [FractionalSecs], presence, [TimeZone] |
| gMonthXS2 | MonthDay, presence, [TimeZone] |
| gMonthDayXS2 | |
| gDayXS2 | |
| timeXS2 | Time, presence, [FractionalSecs], presence, [TimeZone] |
When the value of compression option is false and the value bit-packed is used for alignment options, values of type n-bit Unsigned Integer are represented as an unsigned binary integer using n bits. Otherwise, they are represented as an unsigned integer using the minimum number of bytes required to store n bits. Bytes are ordered with the least significant byte first.
The n-bit unsigned integer is used for encoding event codes, prefix component of QName (see 7.1.7 QName) as well as certain value content items, as described in respective relevant parts of this document. As shown in table Table 7-1, integers with bounded range size m equal to 4095 or smaller are encoded using n-bit unsigned integer with n being ⌈ log 2 m ⌉, as an offset from the minimum value in the range.
Values of type String are represented as a length prefixed sequence of characters. The length indicates the number of characters in the string and is represented as an Unsigned Integer (see 7.1.6 Unsigned Integer). If a restricted character set is defined for the string (see 7.1.10.1 Restricted Character Sets), each character is represented as an n-bit Unsigned Integer (see 7.1.9 n-bit Unsigned Integer). Otherwise, each character is represented by its UCS [ISO/IEC 10646] code point encoded as an Unsigned Integer (see 7.1.6 Unsigned Integer).
EXI uses a string table to represent certain content items more efficiently. Section 7.3 String Table describes the string table and how it is applied to different content items.
If a string value is associated with a schema datatype and one or more of the datatypes in its datatype hierarchy has one or more pattern facets, there may be a restricted character set defined for the string value. The following steps are used to determine the restricted character set, if any, defined for a given string value associated with such a schema datatype.
First, determine the character set for each datatype in the datatype hierarchy of the string value that has one or more pattern facets according to section E Deriving Character Sets from XML Schema Regular Expressions. For each datatype with more than one pattern facet, compute the restricted character set based on the union of the regular expressions specified by its pattern facets. If the restricted character set for a datatype contains at least 255 characters or contains non-BMP characters, the character set of the datatype is not restricted and can be omitted from further consideration.
Then, compute the restricted character set for the string value as the intersection of all the character sets computed above. If the resulting character set contains less than 255 characters, the string value has a restricted character set and each character is represented using an n-bit Unsigned Integer (see 7.1.9 n-bit Unsigned Integer), where n is log2(N + 1) and N is the number of characters in the restricted character set.
The characters in the restricted character set are sorted by UCS [ISO/IEC 10646] code point and represented by integer values in the range (0 ... N-1) according to their ordinal position in the set. Characters that are not in this set are represented by the integer N followed by the UCS code point of the character represented as an Unsigned Integer.
The figure below illustrates an overview of the process for determining and using restricted character sets described in this section.

Figure 7-1. String Processing Model
Values of type List are encoded as a length prefixed sequence of values. The length is encoded as an Unsigned Integer (see 7.1.6 Unsigned Integer) and each value is encoded according to its type (see 7. Representing Event Content).
Values of enumerated types are encoded as n-bit Unsigned Integers (7.1.9 n-bit Unsigned Integer) where n = ⌈ log 2 m ⌉ and m is the number of items in the enumerated type. The value assigned to each item corresponds to its ordinal position in the enumeration in schema-order starting with position zero (0).
Exceptions are for schema types derived from others by union and their subtypes, QName or Notation and types derived therefrom by restriction. The values of such types are processed by their respective built-in EXI datatype representations instead of being represented as enumerations.
EXI uses a string table to assign "compact identifiers" to some string values. Occurrences of string values found in the string table are represented using the associated compact identifier rather than encoding the entire "string literal". The string table is initially pre-populated with string values that are likely to occur in certain contexts and is dynamically expanded to include additional string values encountered in the document. The following content items are encoded using a string table:
When a string value is found in the string table, the value is encoded using the compact identifier and no changes are made to the string table as a result. When a string value is not found in the string table, its string literal is encoded as a String without using a compact identifier, only after which the string table is augmented by including the string value with an assigned compact identifier unless the string value represents a value content item and fails to satisfy the criteria in effect by virtue of valuePartitionCapacity and valueMaxLength options .
The string table is divided into partitions and each partition is optimized for more frequent use of either compact identifiers or string literals depending on the purpose of the partition. Section 7.3.1 String Table Partitions describes how EXI string table is partitioned. Section 7.3.2 Partitions Optimized for Frequent use of Compact Identifiers describes how string values are encoded when the associated partition is optimized for more frequent use of compact identifiers. Section 7.3.3 Partitions Optimized for Frequent use of String Literals describes how string values are encoded when the associated partition is optimized for more frequent use of string literals.
The life cycle of a string table spans the processing of a single EXI stream. String tables are not represented in an EXI stream or exchanged between EXI processors. A string table cannot be reused across multiple EXI streams; therefore, EXI processors MUST use a string table that is equivalent to the one that would have been newly created and pre-populated with initial values for processing each EXI stream.
The string table is organized into partitions so that the indices assigned to compact identifiers can stay relatively small. Smaller number of indices results in improved average compactness and the efficiency of table operations. Each partition has a separate set of compact identifiers and content items are assigned to specific partitions as described below.
Uri content items and the URI portion of qname content items are assigned to the uri partition. The uri partition is optimized for frequent use of compact identifiers and is pre-populated with initial entries as described in D.1 Initial Entries in Uri Partition. When a schema is provided, the uri partition is also pre-populated with the name of each target namespace URI declared in the schema, plus some of the namespace URIs used in wildcard terms (see section 8.5.4.1.7 Wildcard Terms for the condition), appended in lexicographical order.
Prefix content items are assigned to partitions based on their associated namespace URI. Partitions containing prefix content items are optimized for frequent use of compact identifiers and the string table is pre-populated with entries as described in D.2 Initial Entries in Prefix Partitions.
The local-name portion of qname content items are assigned to partitions based on the namespace URI of the qname content item of which the local-name is a part. Partitions containing local-names are optimized for frequent use of string literals and the string table is pre-populated with entries as described in D.3 Initial Entries in Local-Name Partitions. When a schema is provided, the string table is also pre-populated with the local name of each attribute, element and type declared in the schema, partitioned by namespace URI and sorted lexicographically.
Value content items are assigned simultaneously to the global value partition as well as to the "local" value partition that corresponds to the qname of the attribute or element in context at the time when the string table is looked up and the string value is not found in both global and local value partitions. Partitions containing value content items are optimized for frequent use of string literals and are initially empty. [Definition:] All value partitions in a string table share a single variable valueAmount the value of which is a non-negative integer that reflects the current number of unique values in value partitions. Its value is initially set to 0 (zero) and changes while processing an EXI stream per the rule described in 7.3.3 Partitions Optimized for Frequent use of String Literals.
String table partitions that are expected to contain a relatively small number of entries used repeatedly throughout the document are optimized for the frequent use of compact identifiers. This includes the uri partition and all partitions containing prefix content items.
When a string value is found in a partition optimized for frequent use of compact identifiers, the string value is represented as the value (i+1) encoded as an n-bit Unsigned Integer (7.1.9 n-bit Uns