forked from friendica/friendica-addons
		
	
		
			
				
	
	
		
			1459 lines
		
	
	
	
		
			54 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			1459 lines
		
	
	
	
		
			54 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Network Working Group                                          C. Newman
 | 
						||
Request for Comments: 4790                              Sun Microsystems
 | 
						||
Category: Standards Track                                      M. Duerst
 | 
						||
                                                Aoyama Gakuin University
 | 
						||
                                                          A. Gulbrandsen
 | 
						||
                                                                    Oryx
 | 
						||
                                                              March 2007
 | 
						||
 | 
						||
 | 
						||
            Internet Application Protocol Collation Registry
 | 
						||
 | 
						||
Status of This Memo
 | 
						||
 | 
						||
   This document specifies an Internet standards track protocol for the
 | 
						||
   Internet community, and requests discussion and suggestions for
 | 
						||
   improvements.  Please refer to the current edition of the "Internet
 | 
						||
   Official Protocol Standards" (STD 1) for the standardization state
 | 
						||
   and status of this protocol.  Distribution of this memo is unlimited.
 | 
						||
 | 
						||
Copyright Notice
 | 
						||
 | 
						||
   Copyright (C) The IETF Trust (2007).
 | 
						||
 | 
						||
Abstract
 | 
						||
 | 
						||
   Many Internet application protocols include string-based lookup,
 | 
						||
   searching, or sorting operations.  However, the problem space for
 | 
						||
   searching and sorting international strings is large, not fully
 | 
						||
   explored, and is outside the area of expertise for the Internet
 | 
						||
   Engineering Task Force (IETF).  Rather than attempt to solve such a
 | 
						||
   large problem, this specification creates an abstraction framework so
 | 
						||
   that application protocols can precisely identify a comparison
 | 
						||
   function, and the repertoire of comparison functions can be extended
 | 
						||
   in the future.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                     [Page 1]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
Table of Contents
 | 
						||
 | 
						||
   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
 | 
						||
     1.1.  Conventions Used in This Document  . . . . . . . . . . . .  4
 | 
						||
   2.  Collation Definition and Purpose . . . . . . . . . . . . . . .  4
 | 
						||
     2.1.  Definition . . . . . . . . . . . . . . . . . . . . . . . .  4
 | 
						||
     2.2.  Purpose  . . . . . . . . . . . . . . . . . . . . . . . . .  4
 | 
						||
     2.3.  Some Other Terms Used in this Document . . . . . . . . . .  5
 | 
						||
     2.4.  Sort Keys  . . . . . . . . . . . . . . . . . . . . . . . .  5
 | 
						||
   3.  Collation Identifier Syntax  . . . . . . . . . . . . . . . . .  6
 | 
						||
     3.1.  Basic Syntax . . . . . . . . . . . . . . . . . . . . . . .  6
 | 
						||
     3.2.  Wildcards  . . . . . . . . . . . . . . . . . . . . . . . .  6
 | 
						||
     3.3.  Ordering Direction . . . . . . . . . . . . . . . . . . . .  7
 | 
						||
     3.4.  URIs . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
 | 
						||
     3.5.  Naming Guidelines  . . . . . . . . . . . . . . . . . . . .  7
 | 
						||
   4.  Collation Specification Requirements . . . . . . . . . . . . .  8
 | 
						||
     4.1.  Collation/Server Interface . . . . . . . . . . . . . . . .  8
 | 
						||
     4.2.  Operations Supported . . . . . . . . . . . . . . . . . . .  8
 | 
						||
       4.2.1.  Validity . . . . . . . . . . . . . . . . . . . . . . .  9
 | 
						||
       4.2.2.  Equality . . . . . . . . . . . . . . . . . . . . . . .  9
 | 
						||
       4.2.3.  Substring  . . . . . . . . . . . . . . . . . . . . . .  9
 | 
						||
       4.2.4.  Ordering . . . . . . . . . . . . . . . . . . . . . . . 10
 | 
						||
     4.3.  Sort Keys  . . . . . . . . . . . . . . . . . . . . . . . . 10
 | 
						||
     4.4.  Use of Lookup Tables . . . . . . . . . . . . . . . . . . . 11
 | 
						||
   5.  Application Protocol Requirements  . . . . . . . . . . . . . . 11
 | 
						||
     5.1.  Character Encoding . . . . . . . . . . . . . . . . . . . . 11
 | 
						||
     5.2.  Operations . . . . . . . . . . . . . . . . . . . . . . . . 11
 | 
						||
     5.3.  Wildcards  . . . . . . . . . . . . . . . . . . . . . . . . 12
 | 
						||
     5.4.  String Comparison  . . . . . . . . . . . . . . . . . . . . 12
 | 
						||
     5.5.  Disconnected Clients . . . . . . . . . . . . . . . . . . . 12
 | 
						||
     5.6.  Error Codes  . . . . . . . . . . . . . . . . . . . . . . . 13
 | 
						||
     5.7.  Octet Collation  . . . . . . . . . . . . . . . . . . . . . 13
 | 
						||
   6.  Use by Existing Protocols  . . . . . . . . . . . . . . . . . . 13
 | 
						||
   7.  Collation Registration . . . . . . . . . . . . . . . . . . . . 14
 | 
						||
     7.1.  Collation Registration Procedure . . . . . . . . . . . . . 14
 | 
						||
     7.2.  Collation Registration Format  . . . . . . . . . . . . . . 15
 | 
						||
       7.2.1.  Registration Template  . . . . . . . . . . . . . . . . 15
 | 
						||
       7.2.2.  The Collation Element  . . . . . . . . . . . . . . . . 15
 | 
						||
       7.2.3.  The Identifier Element . . . . . . . . . . . . . . . . 16
 | 
						||
       7.2.4.  The Title Element  . . . . . . . . . . . . . . . . . . 16
 | 
						||
       7.2.5.  The Operations Element . . . . . . . . . . . . . . . . 16
 | 
						||
       7.2.6.  The Specification Element  . . . . . . . . . . . . . . 16
 | 
						||
       7.2.7.  The Submitter Element  . . . . . . . . . . . . . . . . 16
 | 
						||
       7.2.8.  The Owner Element  . . . . . . . . . . . . . . . . . . 16
 | 
						||
       7.2.9.  The Version Element  . . . . . . . . . . . . . . . . . 17
 | 
						||
       7.2.10. The Variable Element . . . . . . . . . . . . . . . . . 17
 | 
						||
     7.3.  Structure of Collation Registry  . . . . . . . . . . . . . 17
 | 
						||
     7.4.  Example Initial Registry Summary . . . . . . . . . . . . . 18
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                     [Page 2]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   8.  Guidelines for Expert Reviewer . . . . . . . . . . . . . . . . 18
 | 
						||
   9.  Initial Collations . . . . . . . . . . . . . . . . . . . . . . 19
 | 
						||
     9.1.  ASCII Numeric Collation  . . . . . . . . . . . . . . . . . 20
 | 
						||
       9.1.1.  ASCII Numeric Collation Description  . . . . . . . . . 20
 | 
						||
       9.1.2.  ASCII Numeric Collation Registration . . . . . . . . . 20
 | 
						||
     9.2.  ASCII Casemap Collation  . . . . . . . . . . . . . . . . . 21
 | 
						||
       9.2.1.  ASCII Casemap Collation Description  . . . . . . . . . 21
 | 
						||
       9.2.2.  ASCII Casemap Collation Registration . . . . . . . . . 22
 | 
						||
     9.3.  Octet Collation  . . . . . . . . . . . . . . . . . . . . . 22
 | 
						||
       9.3.1.  Octet Collation Description  . . . . . . . . . . . . . 22
 | 
						||
       9.3.2.  Octet Collation Registration . . . . . . . . . . . . . 23
 | 
						||
   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 23
 | 
						||
   11. Security Considerations  . . . . . . . . . . . . . . . . . . . 23
 | 
						||
   12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
 | 
						||
   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24
 | 
						||
     13.1. Normative References . . . . . . . . . . . . . . . . . . . 24
 | 
						||
     13.2. Informative References . . . . . . . . . . . . . . . . . . 24
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                     [Page 3]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
1.  Introduction
 | 
						||
 | 
						||
   The Application Configuration Access Protocol ACAP [11] specification
 | 
						||
   introduced the concept of a comparator (which we call collation in
 | 
						||
   this document), but failed to create an IANA registry.  With the
 | 
						||
   introduction of stringprep [6] and the Unicode Collation Algorithm
 | 
						||
   [7], it is now time to create that registry and populate it with some
 | 
						||
   initial values appropriate for an international community.  This
 | 
						||
   specification replaces and generalizes the definition of a comparator
 | 
						||
   in ACAP, and creates a collation registry.
 | 
						||
 | 
						||
1.1.  Conventions Used in This Document
 | 
						||
 | 
						||
   The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
 | 
						||
   in this document are to be interpreted as defined in "Key words for
 | 
						||
   use in RFCs to Indicate Requirement Levels" [1].
 | 
						||
 | 
						||
   The attribute syntax specifications use the Augmented Backus-Naur
 | 
						||
   Form (ABNF) [2] notation, including the core rules defined in
 | 
						||
   Appendix A.  The ABNF production "Language-tag" is imported from
 | 
						||
   Language Tags [5] and "reg-name" from URI: Generic Syntax [4].
 | 
						||
 | 
						||
2.  Collation Definition and Purpose
 | 
						||
 | 
						||
2.1.  Definition
 | 
						||
 | 
						||
   A collation is a named function which takes two arbitrary length
 | 
						||
   strings as input and can be used to perform one or more of three
 | 
						||
   basic comparison operations: equality test, substring match, and
 | 
						||
   ordering test.
 | 
						||
 | 
						||
2.2.  Purpose
 | 
						||
 | 
						||
   Collations are an abstraction for comparison functions so that these
 | 
						||
   comparison functions can be used in multiple protocols.  The details
 | 
						||
   of a particular comparison operation can be specified by someone with
 | 
						||
   appropriate expertise, independent of the application protocols that
 | 
						||
   use that collation.  This is similar to the way a charset [13]
 | 
						||
   separates the details of octet to character mapping from a protocol
 | 
						||
   specification, such as MIME [9], or the way SASL [10] separates the
 | 
						||
   details of an authentication mechanism from a protocol specification,
 | 
						||
   such as ACAP [11].
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                     [Page 4]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   Here is a small diagram to help illustrate the value of this
 | 
						||
   abstraction:
 | 
						||
 | 
						||
   +-------------------+                         +-----------------+
 | 
						||
   | IMAP i18n SEARCH  |--+                      | Basic           |
 | 
						||
   +-------------------+  |                   +--| Collation Spec  |
 | 
						||
                          |                   |  +-----------------+
 | 
						||
   +-------------------+  |  +-------------+  |  +-----------------+
 | 
						||
   | ACAP i18n SEARCH  |--+--| Collation   |--+--| A stringprep    |
 | 
						||
   +-------------------+  |  | Registry    |  |  | Collation Spec  |
 | 
						||
                          |  +-------------+  |  +-----------------+
 | 
						||
   +-------------------+  |                   |  +-----------------+
 | 
						||
   | ...other protocol |--+                   |  | locale-specific |
 | 
						||
   +-------------------+                      +--| Collation Spec  |
 | 
						||
                                                 +-----------------+
 | 
						||
 | 
						||
   Thus IMAP, ACAP, and future application protocols with international
 | 
						||
   search capability simply specify how to interface to the collation
 | 
						||
   registry instead of each protocol specification having to specify all
 | 
						||
   the collations it supports.
 | 
						||
 | 
						||
2.3.  Some Other Terms Used in this Document
 | 
						||
 | 
						||
   The terms client, server, and protocol are used in somewhat unusual
 | 
						||
   senses.
 | 
						||
 | 
						||
   Client means a user, or a program acting directly on behalf of a
 | 
						||
   user.  This may be a mail reader acting as an IMAP client, or it may
 | 
						||
   be an interactive shell, where the user can type protocol commands/
 | 
						||
   requests directly, or it may be a script or program written by the
 | 
						||
   user.
 | 
						||
 | 
						||
   Server means a program that performs services requested by the
 | 
						||
   client.  This may be a traditional server such as an HTTP server, or
 | 
						||
   it may be a Sieve [14] interpreter running a Sieve script written by
 | 
						||
   a user.  A server needs to use the operations provided by collations
 | 
						||
   in order to fulfill the client's requests.
 | 
						||
 | 
						||
   The protocol describes how the client tells the server what it wants
 | 
						||
   done, and (if applicable) how the server tells the client about the
 | 
						||
   results.  IMAP is a protocol by this definition, and so is the Sieve
 | 
						||
   language.
 | 
						||
 | 
						||
2.4.  Sort Keys
 | 
						||
 | 
						||
   One component of a collation is a transformation, which turns a
 | 
						||
   string into a sort key, which is then used while sorting.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                     [Page 5]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   The transformation can range from an identity mapping (e.g., the
 | 
						||
   i;octet collation Section 9.3) to a mapping that makes the string
 | 
						||
   unreadable to a human.
 | 
						||
 | 
						||
   This is an implementation detail of collations or servers.  A
 | 
						||
   protocol SHOULD NOT expose it to clients, since some collations leave
 | 
						||
   the sort key's format up to the implementation, and current
 | 
						||
   conformant implementations are known to use different formats.
 | 
						||
 | 
						||
3.  Collation Identifier Syntax
 | 
						||
 | 
						||
3.1.  Basic Syntax
 | 
						||
 | 
						||
   The collation identifier itself is a single US-ASCII string.  The
 | 
						||
   identifier MUST NOT be longer than 254 characters, and obeys the
 | 
						||
   following grammar:
 | 
						||
 | 
						||
     collation-char  = ALPHA / DIGIT / "-" / ";" / "=" / "."
 | 
						||
 | 
						||
     collation-id    = collation-prefix ";" collation-core-name
 | 
						||
                       *collation-arg
 | 
						||
 | 
						||
     collation-scope = Language-tag / "vnd-" reg-name
 | 
						||
 | 
						||
     collation-core-name = ALPHA *( ALPHA / DIGIT / "-" )
 | 
						||
 | 
						||
     collation-arg   = ";" ALPHA *( ALPHA / DIGIT ) "="
 | 
						||
                       1*( ALPHA / DIGIT / "." )
 | 
						||
 | 
						||
 | 
						||
   Note: the ABNF production "Language-tag" is imported from Language
 | 
						||
   Tags [5] and "reg-name" from URI: Generic Syntax [4].
 | 
						||
 | 
						||
   There is a special identifier called "default".  For protocols that
 | 
						||
   have a default collation, "default" refers to that collation.  For
 | 
						||
   other protocols, the identifier "default" MUST match no collations,
 | 
						||
   and servers SHOULD treat it in the same way as they treat nonexistent
 | 
						||
   collations.
 | 
						||
 | 
						||
3.2.  Wildcards
 | 
						||
 | 
						||
   The string a client uses to select a collation MAY contain one or
 | 
						||
   more wildcard ("*") characters that match zero or more collation-
 | 
						||
   chars.  Wildcard characters MUST NOT be adjacent.  If the wildcard
 | 
						||
   string matches multiple collations, the server SHOULD attempt to
 | 
						||
   select a widely useful collation in preference to a narrowly useful
 | 
						||
   one.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                     [Page 6]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
     collation-wild  =  ("*" / (ALPHA ["*"])) *(collation-char ["*"])
 | 
						||
                         ; MUST NOT exceed 254 characters total
 | 
						||
 | 
						||
3.3.  Ordering Direction
 | 
						||
 | 
						||
   When used as a protocol element for ordering, the collation
 | 
						||
   identifier MAY be prefixed by either "+" or "-" to explicitly specify
 | 
						||
   an ordering direction. "+" has no effect on the ordering operation,
 | 
						||
   while "-" inverts the result of the ordering operation.  In general,
 | 
						||
   collation-order is used when a client requests a collation, and
 | 
						||
   collation-selected is used when the server informs the client of the
 | 
						||
   selected collation.
 | 
						||
 | 
						||
     collation-selected =  ["+" / "-"] collation-id
 | 
						||
 | 
						||
     collation-order =  ["+" / "-"] collation-wild
 | 
						||
 | 
						||
3.4.  URIs
 | 
						||
 | 
						||
   Some protocols are designed to use URIs [4] to refer to collations
 | 
						||
   rather than simple tokens.  A special section of the IANA URL space
 | 
						||
   is reserved for such usage.  The "collation-uri" form is used to
 | 
						||
   refer to a specific named collation (the collation registration may
 | 
						||
   not actually be present).  The "collation-auri" form is an abstract
 | 
						||
   name for an ordering, a collation pattern or a vendor private
 | 
						||
   collator.
 | 
						||
 | 
						||
     collation-uri   =  "http://www.iana.org/assignments/collation/"
 | 
						||
                        collation-id ".xml"
 | 
						||
 | 
						||
     collation-auri  =  ( "http://www.iana.org/assignments/collation/"
 | 
						||
                        collation-order ".xml" ) / other-uri
 | 
						||
 | 
						||
     other-uri       =  <absoluteURI>
 | 
						||
                     ;  excluding the IANA collation namespace.
 | 
						||
 | 
						||
3.5.  Naming Guidelines
 | 
						||
 | 
						||
   While this specification makes no absolute requirements on the
 | 
						||
   structure of collation identifiers, naming consistency is important,
 | 
						||
   so the following initial guidelines are provided.
 | 
						||
 | 
						||
   Collation identifiers with an international audience typically begin
 | 
						||
   with "i;".  Collation identifiers intended for a particular language
 | 
						||
   or locale typically begin with a language tag [5] followed by a ";".
 | 
						||
   After the first ";" is normally the name of the general collation
 | 
						||
   algorithm, followed by a series of algorithm modifications separated
 | 
						||
   by the ";" delimiter.  Parameterized modifications will use "=" to
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                     [Page 7]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   delimit the parameter from the value.  The version numbers of any
 | 
						||
   lookup tables used by the algorithm SHOULD be present as
 | 
						||
   parameterized modifications.
 | 
						||
 | 
						||
   Collation identifiers of the form *;vnd-hostname;* are reserved for
 | 
						||
   vendor-specific collations created by the owner of the hostname
 | 
						||
   following the "vnd-" prefix (e.g., vnd-example.com for the vendor
 | 
						||
   example.com).  Registration of such collations (or the name space as
 | 
						||
   a whole), with intended use of the "Vendor", is encouraged when a
 | 
						||
   public specification or open-source implementation is available, but
 | 
						||
   is not required.
 | 
						||
 | 
						||
4.  Collation Specification Requirements
 | 
						||
 | 
						||
4.1.  Collation/Server Interface
 | 
						||
 | 
						||
   The collation itself defines what it operates on.  Most collations
 | 
						||
   are expected to operate on character strings.  The i;octet
 | 
						||
   (Section 9.3) collation operates on octet strings.  The i;ascii-
 | 
						||
   numeric (Section 9.1) operation operates on numbers.
 | 
						||
 | 
						||
   This specification defines the collation interface in terms of octet
 | 
						||
   strings.  However, implementations may choose to use character
 | 
						||
   strings instead.  Such implementations may not be able to implement
 | 
						||
   e.g., i;octet.  Since i;octet is not currently mandatory to implement
 | 
						||
   for any protocol, this should not be a problem.
 | 
						||
 | 
						||
4.2.  Operations Supported
 | 
						||
 | 
						||
   A collation specification MUST state which of the three basic
 | 
						||
   operations are supported (equality, substring, ordering) and how to
 | 
						||
   perform each of the supported operations on any two input character
 | 
						||
   strings, including empty strings.  Collations must be deterministic,
 | 
						||
   i.e., given a collation with a specific identifier, and any two fixed
 | 
						||
   input strings, the result MUST be the same for the same operation.
 | 
						||
 | 
						||
   In general, collation operations should behave as their names
 | 
						||
   suggest.  While a collation may be new, the operations are not, so
 | 
						||
   the new collation's operations should be similar to those of older
 | 
						||
   collations.  For example, a date/time collation should not provide a
 | 
						||
   "substring" operation that would morph IMAP substring SEARCH into
 | 
						||
   e.g., a date-range search.
 | 
						||
 | 
						||
   A non-obvious consequence of the rules for each collation operation
 | 
						||
   is that, for any single collation, either none or all of the
 | 
						||
   operations can return "undefined".  For example, it is not possible
 | 
						||
   to have an equality operation that never returns "undefined", and a
 | 
						||
   substring operation that occasionally does.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                     [Page 8]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
4.2.1.  Validity
 | 
						||
 | 
						||
   The validity test takes one string as argument.  It returns valid if
 | 
						||
   its input string is a valid input to the collation's other
 | 
						||
   operations, and invalid if not.  (In other words, a string is valid
 | 
						||
   if it is equal to itself according to the collation's equality
 | 
						||
   operation.)
 | 
						||
 | 
						||
   The validity test is provided by all collations.  It MUST NOT be
 | 
						||
   listed separately in the collation registration.
 | 
						||
 | 
						||
4.2.2.  Equality
 | 
						||
 | 
						||
   The equality test always returns "match" or "no-match" when it is
 | 
						||
   supplied valid input, and MAY return "undefined" if one or both input
 | 
						||
   strings are not valid.
 | 
						||
 | 
						||
   The equality test MUST be reflexive and symmetric.  For valid input,
 | 
						||
   it MUST be transitive.
 | 
						||
 | 
						||
   If a collation provides either a substring or an ordering test, it
 | 
						||
   MUST also provide an equality test.  The substring and/or ordering
 | 
						||
   tests MUST be consistent with the equality test.
 | 
						||
 | 
						||
   The return values of the equality test are called "match", "no-match"
 | 
						||
   and "undefined" in this document.
 | 
						||
 | 
						||
4.2.3.  Substring
 | 
						||
 | 
						||
   The substring matching operation determines if the first string is a
 | 
						||
   substring of the second string, i.e., if one or more substrings of
 | 
						||
   the second string is equal to the first, as defined by the
 | 
						||
   collation's equality operation.
 | 
						||
 | 
						||
   A collation that supports substring matching will automatically
 | 
						||
   support two special cases of substring matching: prefix and suffix
 | 
						||
   matching, if those special cases are supported by the application
 | 
						||
   protocol.  It returns "match" or "no-match" when it is supplied valid
 | 
						||
   input and returns "undefined" when supplied invalid input.
 | 
						||
 | 
						||
   Application protocols MAY return position information for substring
 | 
						||
   matches.  If this is done, the position information SHOULD include
 | 
						||
   both the starting offset and the ending offset for each match.  This
 | 
						||
   is important because more sophisticated collations can match strings
 | 
						||
   of unequal length (for example, a pre-composed accented character can
 | 
						||
   match a decomposed accented character).  In general, overlapping
 | 
						||
   matches SHOULD be reported (as when "ana" occurs twice within
 | 
						||
   "banana"), although there are cases where a collation may decide not
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                     [Page 9]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   to.  For example, in a collation which treats all whitespace
 | 
						||
   sequences as identical, the substring operation could be defined such
 | 
						||
   that " 1 " (SP "1" SP) is reported just once within "  1  " (SP SP
 | 
						||
   "1" SP SP), not four times (SP SP "1" SP, SP "1" SP, SP "1" SP SP and
 | 
						||
   SP SP "1" SP SP), since the four matches are, in a sense, the same
 | 
						||
   match.
 | 
						||
 | 
						||
   A string is a substring of itself.  The empty string is a substring
 | 
						||
   of all strings.
 | 
						||
 | 
						||
   Note that the substring operation of some collations can match
 | 
						||
   strings of unequal length.  For example, a pre-composed accented
 | 
						||
   character can match a decomposed accented character.  The Unicode
 | 
						||
   Collation Algorithm [7] discusses this in more detail.
 | 
						||
 | 
						||
   The return values of the substring operation are called "match", "no-
 | 
						||
   match", and "undefined" in this document.
 | 
						||
 | 
						||
4.2.4.  Ordering
 | 
						||
 | 
						||
   The ordering operation determines how two strings are ordered.  It
 | 
						||
   MUST be reflexive.  For valid input, it MUST be transitive and
 | 
						||
   trichotomous.
 | 
						||
 | 
						||
   Ordering returns "less" if the first string is listed before the
 | 
						||
   second string, according to the collation; "greater", if the second
 | 
						||
   string is listed before the first string; and "equal", if the two
 | 
						||
   strings are equal, as defined by the collation's equality operation.
 | 
						||
   If one or both strings are invalid, the result of ordering is
 | 
						||
   "undefined".
 | 
						||
 | 
						||
   When the collation is used with a "+" prefix, the behavior is the
 | 
						||
   same as when used with no prefix.  When the collation is used with a
 | 
						||
   "-" prefix, the result of the ordering operation of the collation
 | 
						||
   MUST be reversed.
 | 
						||
 | 
						||
   The return values of the ordering operation are called "less",
 | 
						||
   "equal", "greater", and "undefined" in this document.
 | 
						||
 | 
						||
4.3.  Sort Keys
 | 
						||
 | 
						||
   A collation specification SHOULD describe the internal transformation
 | 
						||
   algorithm to generate sort keys.  This algorithm can be applied to
 | 
						||
   individual strings, and the result can be stored to potentially
 | 
						||
   optimize future comparison operations.  A collation MAY specify that
 | 
						||
   the sort key is generated by the identity function.  The sort key may
 | 
						||
   have no meaning to a human.  The sort key may not be valid input to
 | 
						||
   the collation.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 10]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
4.4.  Use of Lookup Tables
 | 
						||
 | 
						||
   Some collations use customizable lookup tables, e.g., because the
 | 
						||
   tables depend on locale, and may be modified after shipping the
 | 
						||
   software.  Collations that use more than one customizable lookup
 | 
						||
   table in a documented format MUST assign numbers to the tables they
 | 
						||
   use.  This permits an application protocol command to access the
 | 
						||
   tables used by a server collation, so that clients and servers use
 | 
						||
   the same tables.
 | 
						||
 | 
						||
5.  Application Protocol Requirements
 | 
						||
 | 
						||
   This section describes the requirements and issues that an
 | 
						||
   application protocol needs to consider if it offers searching,
 | 
						||
   substring matching and/or sorting, and permits the use of characters
 | 
						||
   outside the US-ASCII charset.
 | 
						||
 | 
						||
5.1.  Character Encoding
 | 
						||
 | 
						||
   The protocol specification has to make sure that it is clear on which
 | 
						||
   characters (rather than just octets) the collations are used.  This
 | 
						||
   can be done by specifying the protocol itself in terms of characters
 | 
						||
   (e.g., in the case of a query language), by specifying a single
 | 
						||
   character encoding for the protocol (e.g., UTF-8 [3]), or by
 | 
						||
   carefully describing the relevant issues of character encoding
 | 
						||
   labeling and conversion.  In the later case, details to consider
 | 
						||
   include how to handle unknown charsets, any charsets that are
 | 
						||
   mandatory-to-implement, any issues with byte-order that might apply,
 | 
						||
   and any transfer encodings that need to be supported.
 | 
						||
 | 
						||
5.2.  Operations
 | 
						||
 | 
						||
   The protocol must specify which of the operations defined in this
 | 
						||
   specification (equality matching, substring matching, and ordering)
 | 
						||
   can be invoked in the protocol, and how they are invoked.  There may
 | 
						||
   be more than one way to invoke an operation.
 | 
						||
 | 
						||
   The protocol MUST provide a mechanism for the client to select the
 | 
						||
   collation to use with equality matching, substring matching, and
 | 
						||
   ordering.
 | 
						||
 | 
						||
   If a protocol needs a total ordering and the collation chosen does
 | 
						||
   not provide it because the ordering operation returns "undefined" at
 | 
						||
   least once, the recommended fallback is to sort all invalid strings
 | 
						||
   after the valid ones, and use i;octet to order the invalid strings.
 | 
						||
 | 
						||
   Although the collation's substring function provides a list of
 | 
						||
   matches, a protocol need not provide all that to the client.  It may
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 11]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   provide only the first matching substring, or even just the
 | 
						||
   information that the substring search matched.  In this way,
 | 
						||
   collations can be used with protocols that are defined such that "x
 | 
						||
   is a substring of y" returns true-false.
 | 
						||
 | 
						||
   If the protocol provides positional information for the results of a
 | 
						||
   substring match, that positional information SHOULD fully specify the
 | 
						||
   substring(s) in the result that matches, independent of the length of
 | 
						||
   the search string.  For example, returning both the starting and
 | 
						||
   ending offset of the match would suffice, as would the starting
 | 
						||
   offset and a length.  Returning just the starting offset is not
 | 
						||
   acceptable.  This rule is necessary because advanced collations can
 | 
						||
   treat strings of different lengths as equal (for example, pre-
 | 
						||
   composed and decomposed accented characters).
 | 
						||
 | 
						||
5.3.  Wildcards
 | 
						||
 | 
						||
   The protocol MUST specify whether it allows the use of wildcards in
 | 
						||
   collation identifiers.  If the protocol allows wildcards, then:
 | 
						||
      The protocol MUST specify how comparisons behave in the absence of
 | 
						||
      explicit collation negotiation, or when a collation of "default"
 | 
						||
      is requested.  The protocol MAY specify that the default collation
 | 
						||
      used in such circumstances is sensitive to server configuration.
 | 
						||
 | 
						||
      The protocol SHOULD provide a way to list available collations
 | 
						||
      matching a given wildcard pattern, or patterns.
 | 
						||
 | 
						||
5.4.  String Comparison
 | 
						||
 | 
						||
   If a protocol compares strings in any nontrivial way, using a
 | 
						||
   collation may be appropriate.  As an example, many protocols use
 | 
						||
   case-independent strings.  In many cases, a simple ASCII mapping to
 | 
						||
   upper/lower case works well.  In other cases, it may be better to use
 | 
						||
   a specifiable collation; for example, so that a server can treat "i"
 | 
						||
   and "I" as equivalent in Italy, and different in Turkey (Turkish also
 | 
						||
   has a dotted upper-case" I" and a dotless lower-case "i").
 | 
						||
 | 
						||
   Protocol designers should consider, in each case, whether to use a
 | 
						||
   specifiable collation.  Keywords often have other needs than user
 | 
						||
   variables, and search arguments may be different again.
 | 
						||
 | 
						||
5.5.  Disconnected Clients
 | 
						||
 | 
						||
   If the protocol supports disconnected clients, and a collation is
 | 
						||
   used that can use configurable tables (e.g., to support
 | 
						||
   locale-specific extensions), then the client may not be able to
 | 
						||
   reproduce the server's collation operations while offline.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 12]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   A mechanism to download such tables has been discussed.  Such a
 | 
						||
   mechanism is not included in the present specification, since the
 | 
						||
   problem is not yet well understood.
 | 
						||
 | 
						||
5.6.  Error Codes
 | 
						||
 | 
						||
   The protocol specification should consider assigning protocol error
 | 
						||
   codes for the following circumstances:
 | 
						||
 | 
						||
   o  The client requests the use of a collation by identifier or
 | 
						||
      pattern, but no implemented collation matches that pattern.
 | 
						||
 | 
						||
   o  The client attempts to use a collation for an operation that is
 | 
						||
      not supported by that collation -- for example, attempting to use
 | 
						||
      the "i;ascii-numeric" collation for substring matching.
 | 
						||
 | 
						||
   o  The client uses an equality or substring matching collation, and
 | 
						||
      the result is an error.  It may be appropriate to distinguish
 | 
						||
      between the two input strings, particularly when one is supplied
 | 
						||
      by the client and the other is stored by the server.  It might
 | 
						||
      also be appropriate to distinguish the specific case of an invalid
 | 
						||
      UTF-8 string.
 | 
						||
 | 
						||
5.7.  Octet Collation
 | 
						||
 | 
						||
   The i;octet (Section 9.3) collation is only usable with protocols
 | 
						||
   based on octet-strings.  Clients and servers MUST NOT use i;octet
 | 
						||
   with other protocols.
 | 
						||
 | 
						||
   If the protocol permits the use of collations with data structures
 | 
						||
   other than strings, the protocol MUST describe the default behavior
 | 
						||
   for a collation with those data structures.
 | 
						||
 | 
						||
6.  Use by Existing Protocols
 | 
						||
 | 
						||
   This section is informative.
 | 
						||
 | 
						||
   Both ACAP [11] and Sieve [14] are standards track specifications that
 | 
						||
   used collations prior to the creation of this specification and
 | 
						||
   registry.  Those standards do not meet all the application protocol
 | 
						||
   requirements described in Section 5.
 | 
						||
 | 
						||
   These protocols allow the use of the i;octet (Section 9.3) collation
 | 
						||
   working directly on UTF-8 data, as used in these protocols.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 13]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   In Sieve, all matches are either true or false.  Accordingly, Sieve
 | 
						||
   servers must treat "undefined" and "no-match" results of the equality
 | 
						||
   and substring operations as false, and only "match" as true.
 | 
						||
 | 
						||
   In ACAP and Sieve, there are no invalid strings.  In this document's
 | 
						||
   terms, invalid strings sort after valid strings.
 | 
						||
 | 
						||
   IMAP [15] also collates, although that is explicit only when the
 | 
						||
   COMPARATOR [17] extension is used.  The built-in IMAP substring
 | 
						||
   operation and the ordering provided by the SORT [16] extension may
 | 
						||
   not meet the requirements made in this document.
 | 
						||
 | 
						||
   Other protocols may be in a similar position.
 | 
						||
 | 
						||
   In IMAP, the default collation is i;ascii-casemap, because its
 | 
						||
   operations are understood to match IMAP's built-in operations.
 | 
						||
 | 
						||
7.  Collation Registration
 | 
						||
 | 
						||
7.1.  Collation Registration Procedure
 | 
						||
 | 
						||
   The IETF will create a mailing list, collation@ietf.org, which can be
 | 
						||
   used for public discussion of collation proposals prior to
 | 
						||
   registration.  Use of the mailing list is strongly encouraged.  The
 | 
						||
   IESG will appoint a designated expert who will monitor the
 | 
						||
   collation@ietf.org mailing list and review registrations.
 | 
						||
 | 
						||
   The registration procedure begins when a completed registration
 | 
						||
   template is sent to iana@iana.org and collation@ietf.org.  The
 | 
						||
   designated expert is expected to tell IANA and the submitter of the
 | 
						||
   registration within two weeks whether the registration is approved,
 | 
						||
   approved with minor changes, or rejected with cause.  When a
 | 
						||
   registration is rejected with cause, it can be re-submitted if the
 | 
						||
   concerns listed in the cause are addressed.  Decisions made by the
 | 
						||
   designated expert can be appealed to the IESG Applications Area
 | 
						||
   Director, then to the IESG.  They follow the normal appeals procedure
 | 
						||
   for IESG decisions.
 | 
						||
 | 
						||
   Collation registrations in a standards track, BCP, or IESG-approved
 | 
						||
   experimental RFC are owned by the IETF, and changes to the
 | 
						||
   registration follow normal procedures for updating such documents.
 | 
						||
   Collation registrations in other RFCs are owned by the RFC author(s).
 | 
						||
   Other collation registrations are owned by the individual(s) listed
 | 
						||
   in the contact field of the registration, and IANA will preserve this
 | 
						||
   information.
 | 
						||
 | 
						||
   If the registration is a change of an existing collation, it MUST be
 | 
						||
   approved by the owner.  In the event the owner cannot be contacted
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 14]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   for a period of one month, and the designated expert deems the change
 | 
						||
   necessary, the IESG MAY re-assign ownership to an appropriate party.
 | 
						||
 | 
						||
7.2.  Collation Registration Format
 | 
						||
 | 
						||
   Registration of a collation is done by sending a well-formed XML
 | 
						||
   document to collation@ietf.org and iana@iana.org.
 | 
						||
 | 
						||
7.2.1.  Registration Template
 | 
						||
 | 
						||
   Here is a template for the registration:
 | 
						||
 | 
						||
   <?xml version='1.0'?>
 | 
						||
   <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
 | 
						||
   <collation rfc="YYYY" scope="global" intendedUse="common">
 | 
						||
     <identifier>collation identifier</identifier>
 | 
						||
     <title>technical title for collation</title>
 | 
						||
     <operations>equality order substring</operations>
 | 
						||
     <specification>specification reference</specification>
 | 
						||
     <owner>email address of owner or IETF</owner>
 | 
						||
     <submitter>email address of submitter</submitter>
 | 
						||
     <version>1</version>
 | 
						||
   </collation>
 | 
						||
 | 
						||
7.2.2.  The Collation Element
 | 
						||
 | 
						||
   The root of the registration document MUST be a <collation> element.
 | 
						||
   The collation element contains the other elements in the
 | 
						||
   registration, which are described in the following sub-subsections,
 | 
						||
   in the order given here.
 | 
						||
 | 
						||
   The <collation> element MAY include an "rfc=" attribute if the
 | 
						||
   specification is in an RFC.  The "rfc=" attribute gives only the
 | 
						||
   number of the RFC, without any prefix, such as "RFC", or suffix, such
 | 
						||
   as ".txt".
 | 
						||
 | 
						||
   The <collation> element MUST include a "scope=" attribute, which MUST
 | 
						||
   have one of the values "global", "local", or "other".
 | 
						||
 | 
						||
   The <collation> element MUST include an "intendedUse=" attribute,
 | 
						||
   which must have one of the values "common", "limited", "vendor", or
 | 
						||
   "deprecated".  Collation specifications intended for "common" use are
 | 
						||
   expected to reference standards from standards bodies with
 | 
						||
   significant experience dealing with the details of international
 | 
						||
   character sets.
 | 
						||
 | 
						||
   Be aware that future revisions of this specification may add
 | 
						||
   additional function types, as well as additional XML attributes,
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 15]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   values, and elements.  Any system that automatically parses these XML
 | 
						||
   documents MUST take this into account to preserve future
 | 
						||
   compatibility.
 | 
						||
 | 
						||
7.2.3.  The Identifier Element
 | 
						||
 | 
						||
   The <identifier> element gives the precise identifier of the
 | 
						||
   collation, e.g., i;ascii-casemap.  The <identifier> element is
 | 
						||
   mandatory.
 | 
						||
 | 
						||
7.2.4.  The Title Element
 | 
						||
 | 
						||
   The <title> element gives the title of the collation.  The <title>
 | 
						||
   element is mandatory.
 | 
						||
 | 
						||
7.2.5.  The Operations Element
 | 
						||
 | 
						||
   The <operations> element lists which of the three operations
 | 
						||
   ("equality", "order" or "substring") the collation provides,
 | 
						||
   separated by single spaces.  The <operations> element is mandatory.
 | 
						||
 | 
						||
7.2.6.  The Specification Element
 | 
						||
 | 
						||
   The <specification> element describes where to find the
 | 
						||
   specification.  The <specification> element is mandatory.  It MAY
 | 
						||
   have a URI attribute.  There may be more than one <specification>
 | 
						||
   element, in which case, they together form the specification.
 | 
						||
 | 
						||
   If it is discovered that parts of a collation specification conflict,
 | 
						||
   a new revision of the collation is necessary, and the
 | 
						||
   collation@ietf.org mailing list should be notified.
 | 
						||
 | 
						||
7.2.7.  The Submitter Element
 | 
						||
 | 
						||
   The <submitter> element provides an RFC 2822 [12] email address for
 | 
						||
   the person who submitted the registration.  It is optional if the
 | 
						||
   <owner> element contains an email address.
 | 
						||
 | 
						||
   There may be more than one <submitter> element.
 | 
						||
 | 
						||
7.2.8.  The Owner Element
 | 
						||
 | 
						||
   The <owner> element contains either the four letters "IETF" or an
 | 
						||
   email address of the owner of the registration.  The <owner> element
 | 
						||
   is mandatory.  There may be more than one <owner> element.  If so,
 | 
						||
   all owners are equal.  Each owner can speak for all.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 16]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
7.2.9.  The Version Element
 | 
						||
 | 
						||
   The <version> element MUST be included when the registration is
 | 
						||
   likely to be revised, or has been revised in such a way that the
 | 
						||
   results change for one or more input strings.  The <version> element
 | 
						||
   is optional.
 | 
						||
 | 
						||
7.2.10.  The Variable Element
 | 
						||
 | 
						||
   The <variable> element specifies an optional variable to control the
 | 
						||
   collation's behaviour, for example whether it is case sensitive.  The
 | 
						||
   <variable> element is optional.  When <variable> is used, it must
 | 
						||
   contain <name> and <default> elements, and it may contain one or more
 | 
						||
   <value> elements.
 | 
						||
 | 
						||
7.2.10.1.  The Name Element
 | 
						||
 | 
						||
   The <name> element specifies the name value of a variable.  The
 | 
						||
   <name> element is mandatory.
 | 
						||
 | 
						||
7.2.10.2.  The Default Element
 | 
						||
 | 
						||
   The <default> element specifies the default value of a variable.  The
 | 
						||
   <default> element is mandatory.
 | 
						||
 | 
						||
7.2.10.3.  The Value Element
 | 
						||
 | 
						||
   The <value> element specifies a legal value of a variable.  The
 | 
						||
   <value> element is optional.  If one or more <value> elements are
 | 
						||
   present, only those values are legal.  If none are, then the
 | 
						||
   variable's legal values do not form an enumerated set, and the rules
 | 
						||
   MUST be specified in an RFC accompanying the registration.
 | 
						||
 | 
						||
7.3.  Structure of Collation Registry
 | 
						||
 | 
						||
   Once the registration is approved, IANA will store each XML
 | 
						||
   registration document in a URL of the form
 | 
						||
   http://www.iana.org/assignments/collation/collation-id.xml, where
 | 
						||
   collation-id is the content of the identifier element in the
 | 
						||
   registration.  Both the submitter and the designated expert are
 | 
						||
   responsible for verifying that the XML is well-formed.  The
 | 
						||
   registration document should avoid using new elements.  If any are
 | 
						||
   necessary, it is important to be consistent with other registrations.
 | 
						||
 | 
						||
   IANA will also maintain a text summary of the registry under the name
 | 
						||
   http://www.iana.org/assignments/collation/collation-index.html.  This
 | 
						||
   summary is divided into four sections.  The first section is for
 | 
						||
   collations intended for common use.  This section is intended for
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 17]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   collation registrations published in IESG-approved RFCs, or for
 | 
						||
   locally scoped collations from the primary standards body for that
 | 
						||
   locale.  The designated expert is encouraged to reject collation
 | 
						||
   registrations with an intended use of "common" if the expert believes
 | 
						||
   it should be "limited", as it is desirable to keep the number of
 | 
						||
   "common" registrations small and of high quality.  The second section
 | 
						||
   is reserved for limited-use collations.  The third section is
 | 
						||
   reserved for registered vendor-specific collations.  The final
 | 
						||
   section is reserved for deprecated collations.
 | 
						||
 | 
						||
7.4.  Example Initial Registry Summary
 | 
						||
 | 
						||
   The following is an example of how IANA might structure the initial
 | 
						||
   registry summary.html file:
 | 
						||
 | 
						||
     Collation                              Functions Scope Reference
 | 
						||
     ---------                              --------- ----- ---------
 | 
						||
   Common Use Collations:
 | 
						||
     i;ascii-casemap                        e, o, s   Local [RFC 4790]
 | 
						||
 | 
						||
   Limited Use Collations:
 | 
						||
     i;octet                                e, o, s   Other [RFC 4790]
 | 
						||
     i;ascii-numeric                        e, o      Other [RFC 4790]
 | 
						||
 | 
						||
   Vendor Collations:
 | 
						||
 | 
						||
   Deprecated Collations:
 | 
						||
 | 
						||
 | 
						||
   References
 | 
						||
   ----------
 | 
						||
   [RFC 4790]  Newman, C., Duerst, M., Gulbrandsen, A., "Internet
 | 
						||
               Application Protocol Collation Registry", RFC 4790,
 | 
						||
               Sun Microsystems, March 2007.
 | 
						||
 | 
						||
8.  Guidelines for Expert Reviewer
 | 
						||
 | 
						||
   The expert reviewer appointed by the IESG has fairly broad latitude
 | 
						||
   for this registry.  While a number of collations are expected
 | 
						||
   (particularly customizations of the UCA for localized use), an
 | 
						||
   explosion of collations (particularly common-use collations) is not
 | 
						||
   desirable for widespread interoperability.  However, it is important
 | 
						||
   for the expert reviewer to provide cause when rejecting a
 | 
						||
   registration, and, when possible, to describe corrective action to
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 18]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   permit the registration to proceed.  The following table includes
 | 
						||
   some example reasons to reject a registration with cause:
 | 
						||
 | 
						||
   o  The registration is not a well-formed XML document.
 | 
						||
 | 
						||
   o  The registration has an intended use of "common", but there is no
 | 
						||
      evidence the collation will be widely deployed, so it should be
 | 
						||
      listed as "limited".
 | 
						||
 | 
						||
   o  The registration has an intended use of "common", but it is
 | 
						||
      redundant with the functionality of a previously registered
 | 
						||
      "common" collation.
 | 
						||
 | 
						||
   o  The registration has an intended use of "common", but the
 | 
						||
      specification is not detailed enough to allow interoperable
 | 
						||
      implementations by others.
 | 
						||
 | 
						||
   o  The collation identifier fails to precisely identify the version
 | 
						||
      numbers of relevant tables to use.
 | 
						||
 | 
						||
   o  The registration fails to meet one of the "MUST" requirements in
 | 
						||
      Section 4.
 | 
						||
 | 
						||
   o  The collation identifier fails to meet the syntax in Section 3.
 | 
						||
 | 
						||
   o  The collation specification referenced in the registration is
 | 
						||
      vague or has optional features without a clear behavior specified.
 | 
						||
 | 
						||
   o  The referenced specification does not adequately address security
 | 
						||
      considerations specific to that collation.
 | 
						||
 | 
						||
   o  The registration's operations are needlessly different from those
 | 
						||
      of traditional operations.
 | 
						||
 | 
						||
   o  The registration's XML is needlessly different from that of
 | 
						||
      already registered collations.
 | 
						||
 | 
						||
9.  Initial Collations
 | 
						||
 | 
						||
   This section registers the three collations that were originally
 | 
						||
   defined in [11], and are implemented in most [14] engines.  Some of
 | 
						||
   the behavior of these collations is perhaps not ideal, such as
 | 
						||
   i;ascii-casemap accepting non-ASCII input.  Compatibility with widely
 | 
						||
   deployed code was judged more important than fixing the collations.
 | 
						||
   Some of the aspects of these collations are necessary to maintain
 | 
						||
   compatibility with widely deployed code.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 19]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
9.1.  ASCII Numeric Collation
 | 
						||
 | 
						||
9.1.1.  ASCII Numeric Collation Description
 | 
						||
 | 
						||
   The "i;ascii-numeric" collation is a simple collation intended for
 | 
						||
   use with arbitrarily-sized, unsigned decimal integer numbers stored
 | 
						||
   as octet strings.  US-ASCII digits (0x30 to 0x39) represent digits of
 | 
						||
   the numbers.  Before converting from string to integer, the input
 | 
						||
   string is truncated at the first non-digit character.  All input is
 | 
						||
   valid; strings that do not start with a digit represent positive
 | 
						||
   infinity.
 | 
						||
 | 
						||
   The collation supports equality and ordering, but does not support
 | 
						||
   the substring operation.
 | 
						||
 | 
						||
   The equality operation returns "match" if the two strings represent
 | 
						||
   the same number (i.e., leading zeroes and trailing non-digits are
 | 
						||
   disregarded), and "no-match" if the two strings represent different
 | 
						||
   numbers.
 | 
						||
 | 
						||
   The ordering operation returns "less" if the first string represents
 | 
						||
   a smaller number than the second, "equal" if they represent the same
 | 
						||
   number, and "greater" if the first string represents a larger number
 | 
						||
   than the second.
 | 
						||
 | 
						||
   Some examples: "0" is less than "1", and "1" is less than
 | 
						||
   "4294967298". "4294967298", "04294967298", and "4294967298b" are all
 | 
						||
   equal. "04294967298" is less than "". "", "x", and "y" are equal.
 | 
						||
 | 
						||
9.1.2.  ASCII Numeric Collation Registration
 | 
						||
 | 
						||
   <?xml version='1.0'?>
 | 
						||
   <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
 | 
						||
   <collation rfc="4790" scope="other" intendedUse="limited">
 | 
						||
     <identifier>i;ascii-numeric</identifier>
 | 
						||
     <title>ASCII Numeric</title>
 | 
						||
     <operations>equality order</operations>
 | 
						||
     <specification>RFC 4790</specification>
 | 
						||
     <owner>IETF</owner>
 | 
						||
     <submitter>chris.newman@sun.com</submitter>
 | 
						||
   </collation>
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 20]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
9.2.  ASCII Casemap Collation
 | 
						||
 | 
						||
9.2.1.  ASCII Casemap Collation Description
 | 
						||
 | 
						||
   The "i;ascii-casemap" collation is a simple collation that operates
 | 
						||
   on octet strings and treats US-ASCII letters case-insensitively.  It
 | 
						||
   provides equality, substring, and ordering operations.  All input is
 | 
						||
   valid.  Note that letters outside ASCII are not treated case-
 | 
						||
   insensitively.
 | 
						||
 | 
						||
   Its equality, ordering, and substring operations are as for i;octet,
 | 
						||
   except that at first, the lower-case letters (octet values 97-122) in
 | 
						||
   each input string are changed to upper case (octet values 65-90).
 | 
						||
 | 
						||
   Care should be taken when using OS-supplied functions to implement
 | 
						||
   this collation, as it is not locale sensitive.  Functions, such as
 | 
						||
   strcasecmp and toupper, are sometimes locale sensitive, and may
 | 
						||
   inappropriately map lower-case letters other than a-z to upper case.
 | 
						||
 | 
						||
   The i;ascii-casemap collation is well-suited for use with many
 | 
						||
   Internet protocols and computer languages.  Use with natural language
 | 
						||
   is often inappropriate; even though the collation apparently supports
 | 
						||
   languages such as Swahili and English, in real-world use, it tends to
 | 
						||
   mis-sort a number of types of string:
 | 
						||
 | 
						||
   o  people and place names containing non-ASCII,
 | 
						||
 | 
						||
   o  words such as "naive" (if spelled with an accent, the accented
 | 
						||
      character could push the word to the wrong spot in a sorted list),
 | 
						||
 | 
						||
   o  names such as "Lloyd" (which, in Welsh, sorts after "Lyon", unlike
 | 
						||
      in English),
 | 
						||
 | 
						||
   o  strings containing euro and pound sterling symbols, quotation
 | 
						||
      marks other than '"', dashes/hyphens, etc.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 21]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
9.2.2.  ASCII Casemap Collation Registration
 | 
						||
 | 
						||
   <?xml version='1.0'?>
 | 
						||
   <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
 | 
						||
   <collation rfc="4790" scope="local" intendedUse="common">
 | 
						||
     <identifier>i;ascii-casemap</identifier>
 | 
						||
     <title>ASCII Casemap</title>
 | 
						||
     <operations>equality order substring</operations>
 | 
						||
     <specification>RFC 4790</specification>
 | 
						||
     <owner>IETF</owner>
 | 
						||
     <submitter>chris.newman@sun.com</submitter>
 | 
						||
   </collation>
 | 
						||
 | 
						||
9.3.  Octet Collation
 | 
						||
 | 
						||
9.3.1.  Octet Collation Description
 | 
						||
 | 
						||
   The "i;octet" collation is a simple and fast collation intended for
 | 
						||
   use on binary octet strings rather than on character data.  Protocols
 | 
						||
   that want to make this collation available have to do so by
 | 
						||
   explicitly allowing it.  If not explicitly allowed, it MUST NOT be
 | 
						||
   used.  It never returns an "undefined" result.  It provides equality,
 | 
						||
   substring, and ordering operations.
 | 
						||
 | 
						||
   The ordering algorithm is as follows:
 | 
						||
 | 
						||
   1.  If both strings are the empty string, return the result "equal".
 | 
						||
 | 
						||
   2.  If the first string is empty and the second is not, return the
 | 
						||
       result "less".
 | 
						||
 | 
						||
   3.  If the second string is empty and the first is not, return the
 | 
						||
       result "greater".
 | 
						||
 | 
						||
   4.  If both strings begin with the same octet value, remove the first
 | 
						||
       octet from both strings and repeat this algorithm from step 1.
 | 
						||
 | 
						||
   5.  If the unsigned value (0 to 255) of the first octet of the first
 | 
						||
       string is less than the unsigned value of the first octet of the
 | 
						||
       second string, then return "less".
 | 
						||
 | 
						||
   6.  If this step is reached, return "greater".
 | 
						||
 | 
						||
   This algorithm is roughly equivalent to the C library function
 | 
						||
   memcmp, with appropriate length checks added.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 22]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   The matching operation returns "match" if the sorting algorithm would
 | 
						||
   return "equal".  Otherwise, the matching operation returns "no-
 | 
						||
   match".
 | 
						||
 | 
						||
   The substring operation returns "match" if the first string is the
 | 
						||
   empty string, or if there exists a substring of the second string of
 | 
						||
   length equal to the length of the first string, which would result in
 | 
						||
   a "match" result from the equality function.  Otherwise, the
 | 
						||
   substring operation returns "no-match".
 | 
						||
 | 
						||
9.3.2.  Octet Collation Registration
 | 
						||
 | 
						||
   This collation is defined with intendedUse="limited" because it can
 | 
						||
   only be used by protocols that explicitly allow it.
 | 
						||
 | 
						||
   <?xml version='1.0'?>
 | 
						||
   <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
 | 
						||
   <collation rfc="4790" scope="global" intendedUse="limited">
 | 
						||
     <identifier>i;octet</identifier>
 | 
						||
     <title>Octet</title>
 | 
						||
     <operations>equality order substring</operations>
 | 
						||
     <specification>RFC 4790</specification>
 | 
						||
     <owner>IETF</owner>
 | 
						||
     <submitter>chris.newman@sun.com</submitter>
 | 
						||
   </collation>
 | 
						||
 | 
						||
10.  IANA Considerations
 | 
						||
 | 
						||
   Section 7 defines how to register collations with IANA.  Section 9
 | 
						||
   defines a list of predefined collations that have been registered
 | 
						||
   with IANA.
 | 
						||
 | 
						||
11.  Security Considerations
 | 
						||
 | 
						||
   Collations will normally be used with UTF-8 strings.  Thus, the
 | 
						||
   security considerations for UTF-8 [3], stringprep [6], and Unicode
 | 
						||
   TR-36 [8] also apply, and are normative to this specification.
 | 
						||
 | 
						||
12.  Acknowledgements
 | 
						||
 | 
						||
   The authors want to thank all who have contributed to this document,
 | 
						||
   including Brian Carpenter, John Cowan, Dave Cridland, Mark Davis,
 | 
						||
   Spencer Dawkins, Lisa Dusseault, Lars Eggert, Frank Ellermann, Philip
 | 
						||
   Guenther, Tony Hansen, Ted Hardie, Sam Hartman, Kjetil Torgrim Homme,
 | 
						||
   Michael Kay, John Klensin, Alexey Melnikov, Jim Melton, and Abhijit
 | 
						||
   Menon-Sen.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 23]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
13.  References
 | 
						||
 | 
						||
13.1.  Normative References
 | 
						||
 | 
						||
   [1]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
 | 
						||
         Levels", BCP 14, RFC 2119, March 1997.
 | 
						||
 | 
						||
   [2]   Crocker, D. and P. Overell, "Augmented BNF for Syntax
 | 
						||
         Specifications: ABNF", RFC 4234, October 2005.
 | 
						||
 | 
						||
   [3]   Yergeau, F., "UTF-8, a transformation format of ISO 10646",
 | 
						||
         STD 63, RFC 3629, November 2003.
 | 
						||
 | 
						||
   [4]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
 | 
						||
         Resource Identifier (URI): Generic Syntax", RFC 3986,
 | 
						||
         January 2005.
 | 
						||
 | 
						||
   [5]   Phillips, A. and M. Davis, "Tags for Identifying Languages",
 | 
						||
         BCP 47, RFC 4646, September 2006.
 | 
						||
 | 
						||
   [6]   Hoffman, P. and M. Blanchet, "Preparation of Internationalized
 | 
						||
         Strings ("stringprep")", RFC 3454, December 2002.
 | 
						||
 | 
						||
   [7]   Davis, M. and K. Whistler, "Unicode Collation Algorithm version
 | 
						||
         14", May 2005,
 | 
						||
         <http://www.unicode.org/reports/tr10/tr10-14.html>.
 | 
						||
 | 
						||
   [8]   Davis, M. and M. Suignard, "Unicode Security Considerations",
 | 
						||
         February 2006, <http://www.unicode.org/reports/tr36/>.
 | 
						||
 | 
						||
13.2.  Informative References
 | 
						||
 | 
						||
   [9]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
 | 
						||
         Extensions (MIME) Part One: Format of Internet Message Bodies",
 | 
						||
         RFC 2045, November 1996.
 | 
						||
 | 
						||
   [10]  Melnikov, A., "Simple Authentication and Security Layer
 | 
						||
         (SASL)", RFC 4422, June 2006.
 | 
						||
 | 
						||
   [11]  Newman, C. and J. Myers, "ACAP -- Application Configuration
 | 
						||
         Access Protocol", RFC 2244, November 1997.
 | 
						||
 | 
						||
   [12]  Resnick, P., "Internet Message Format", RFC 2822, April 2001.
 | 
						||
 | 
						||
   [13]  Freed, N. and J. Postel, "IANA Charset Registration
 | 
						||
         Procedures", BCP 19, RFC 2978, October 2000.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 24]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
   [14]  Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028,
 | 
						||
         January 2001.
 | 
						||
 | 
						||
   [15]  Crispin, M., "Internet Message Access Protocol - Version
 | 
						||
         4rev1", RFC 3501, March 2003.
 | 
						||
 | 
						||
   [16]  Crispin, M. and K. Murchison, "Internet Message Access Protocol
 | 
						||
         - Sort and Thread Extensions", Work in Progress, May 2004.
 | 
						||
 | 
						||
   [17]  Newman, C. and A. Gulbrandsen, "Internet Message Access
 | 
						||
         Protocol Internationalization", Work in Progress, January 2006.
 | 
						||
 | 
						||
Authors' Addresses
 | 
						||
 | 
						||
   Chris Newman
 | 
						||
   Sun Microsystems
 | 
						||
   1050 Lakes Drive
 | 
						||
   West Covina, CA  91790
 | 
						||
   USA
 | 
						||
 | 
						||
   EMail: chris.newman@sun.com
 | 
						||
 | 
						||
 | 
						||
   Martin Duerst
 | 
						||
   Aoyama Gakuin University
 | 
						||
   5-10-1 Fuchinobe
 | 
						||
   Sagamihara, Kanagawa  229-8558
 | 
						||
   Japan
 | 
						||
 | 
						||
   Phone: +81 42 759 6329
 | 
						||
   Fax:   +81 42 759 6495
 | 
						||
   EMail: duerst@it.aoyama.ac.jp
 | 
						||
   URI:   http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/
 | 
						||
 | 
						||
   Note: Please write "Duerst" with u-umlaut wherever possible, for
 | 
						||
   example as "Dürst" in XML and HTML.
 | 
						||
 | 
						||
 | 
						||
   Arnt Gulbrandsen
 | 
						||
   Oryx Mail Systems GmbH
 | 
						||
   Schweppermannstr. 8
 | 
						||
   81671 Munich
 | 
						||
   Germany
 | 
						||
 | 
						||
   Fax:   +49 89 4502 9758
 | 
						||
   EMail: arnt@oryx.com
 | 
						||
   URI:   http://www.oryx.com/arnt/
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 25]
 | 
						||
 | 
						||
RFC 4790                   Collation Registry                 March 2007
 | 
						||
 | 
						||
 | 
						||
Full Copyright Statement
 | 
						||
 | 
						||
   Copyright (C) The IETF Trust (2007).
 | 
						||
 | 
						||
   This document is subject to the rights, licenses and restrictions
 | 
						||
   contained in BCP 78, and except as set forth therein, the authors
 | 
						||
   retain all their rights.
 | 
						||
 | 
						||
   This document and the information contained herein are provided on an
 | 
						||
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
 | 
						||
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
 | 
						||
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
 | 
						||
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
 | 
						||
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
 | 
						||
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
 | 
						||
 | 
						||
Intellectual Property
 | 
						||
 | 
						||
   The IETF takes no position regarding the validity or scope of any
 | 
						||
   Intellectual Property Rights or other rights that might be claimed to
 | 
						||
   pertain to the implementation or use of the technology described in
 | 
						||
   this document or the extent to which any license under such rights
 | 
						||
   might or might not be available; nor does it represent that it has
 | 
						||
   made any independent effort to identify any such rights.  Information
 | 
						||
   on the procedures with respect to rights in RFC documents can be
 | 
						||
   found in BCP 78 and BCP 79.
 | 
						||
 | 
						||
   Copies of IPR disclosures made to the IETF Secretariat and any
 | 
						||
   assurances of licenses to be made available, or the result of an
 | 
						||
   attempt made to obtain a general license or permission for the use of
 | 
						||
   such proprietary rights by implementers or users of this
 | 
						||
   specification can be obtained from the IETF on-line IPR repository at
 | 
						||
   http://www.ietf.org/ipr.
 | 
						||
 | 
						||
   The IETF invites any interested party to bring to its attention any
 | 
						||
   copyrights, patents or patent applications, or other proprietary
 | 
						||
   rights that may cover technology that may be required to implement
 | 
						||
   this standard.  Please address the information to the IETF at
 | 
						||
   ietf-ipr@ietf.org.
 | 
						||
 | 
						||
Acknowledgement
 | 
						||
 | 
						||
   Funding for the RFC Editor function is currently provided by the
 | 
						||
   Internet Society.
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
 | 
						||
Newman, et al.              Standards Track                    [Page 26]
 | 
						||
 |