Network Working Group P. Resnick, Ed.
Request for Comments: 5322 Qualcomm Incorporated
Obsoletes: 2822 October 2008
Updates: 4021
Category: Standards Track
Internet Message Format
Status of This Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Abstract
This document specifies the Internet Message Format (IMF), a syntax
for text messages that are sent between computer users, within the
framework of "electronic mail" messages. This specification is a
revision of Request For Comments (RFC) 2822, which itself superseded
Request For Comments (RFC) 822, "Standard for the Format of ARPA
Internet Text Messages", updating it to reflect current practice and
incorporating incremental changes that were specified in other RFCs.
Resnick Standards Track [Page 1]
RFC 5322 Internet Message Format October 2008
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Notational Conventions . . . . . . . . . . . . . . . . . . 5
1.2.1. Requirements Notation . . . . . . . . . . . . . . . . 5
1.2.2. Syntactic Notation . . . . . . . . . . . . . . . . . . 5
1.2.3. Structure of This Document . . . . . . . . . . . . . . 5
2. Lexical Analysis of Messages . . . . . . . . . . . . . . . . . 6
2.1. General Description . . . . . . . . . . . . . . . . . . . 6
2.1.1. Line Length Limits . . . . . . . . . . . . . . . . . . 7
2.2. Header Fields . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1. Unstructured Header Field Bodies . . . . . . . . . . . 8
2.2.2. Structured Header Field Bodies . . . . . . . . . . . . 8
2.2.3. Long Header Fields . . . . . . . . . . . . . . . . . . 8
2.3. Body . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 10
3.2. Lexical Tokens . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1. Quoted characters . . . . . . . . . . . . . . . . . . 10
3.2.2. Folding White Space and Comments . . . . . . . . . . . 11
3.2.3. Atom . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.4. Quoted Strings . . . . . . . . . . . . . . . . . . . . 13
3.2.5. Miscellaneous Tokens . . . . . . . . . . . . . . . . . 14
3.3. Date and Time Specification . . . . . . . . . . . . . . . 14
3.4. Address Specification . . . . . . . . . . . . . . . . . . 16
3.4.1. Addr-Spec Specification . . . . . . . . . . . . . . . 17
3.5. Overall Message Syntax . . . . . . . . . . . . . . . . . . 18
3.6. Field Definitions . . . . . . . . . . . . . . . . . . . . 19
3.6.1. The Origination Date Field . . . . . . . . . . . . . . 22
3.6.2. Originator Fields . . . . . . . . . . . . . . . . . . 22
3.6.3. Destination Address Fields . . . . . . . . . . . . . . 23
3.6.4. Identification Fields . . . . . . . . . . . . . . . . 25
3.6.5. Informational Fields . . . . . . . . . . . . . . . . . 27
3.6.6. Resent Fields . . . . . . . . . . . . . . . . . . . . 28
3.6.7. Trace Fields . . . . . . . . . . . . . . . . . . . . . 30
3.6.8. Optional Fields . . . . . . . . . . . . . . . . . . . 30
4. Obsolete Syntax . . . . . . . . . . . . . . . . . . . . . . . 31
4.1. Miscellaneous Obsolete Tokens . . . . . . . . . . . . . . 32
4.2. Obsolete Folding White Space . . . . . . . . . . . . . . . 33
4.3. Obsolete Date and Time . . . . . . . . . . . . . . . . . . 33
4.4. Obsolete Addressing . . . . . . . . . . . . . . . . . . . 35
4.5. Obsolete Header Fields . . . . . . . . . . . . . . . . . . 35
4.5.1. Obsolete Origination Date Field . . . . . . . . . . . 36
4.5.2. Obsolete Originator Fields . . . . . . . . . . . . . . 36
4.5.3. Obsolete Destination Address Fields . . . . . . . . . 37
4.5.4. Obsolete Identification Fields . . . . . . . . . . . . 37
4.5.5. Obsolete Informational Fields . . . . . . . . . . . . 37
Resnick Standards Track [Page 2]
RFC 5322 Internet Message Format October 2008
4.5.6. Obsolete Resent Fields . . . . . . . . . . . . . . . . 38
4.5.7. Obsolete Trace Fields . . . . . . . . . . . . . . . . 38
4.5.8. Obsolete optional fields . . . . . . . . . . . . . . . 38
5. Security Considerations . . . . . . . . . . . . . . . . . . . 38
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39
Appendix A. Example Messages . . . . . . . . . . . . . . . . . 43
Appendix A.1. Addressing Examples . . . . . . . . . . . . . . . 44
Appendix A.1.1. A Message from One Person to Another with
Simple Addressing . . . . . . . . . . . . . . . . 44
Appendix A.1.2. Different Types of Mailboxes . . . . . . . . . . . 45
Appendix A.1.3. Group Addresses . . . . . . . . . . . . . . . . . 45
Appendix A.2. Reply Messages . . . . . . . . . . . . . . . . . . 46
Appendix A.3. Resent Messages . . . . . . . . . . . . . . . . . 47
Appendix A.4. Messages with Trace Fields . . . . . . . . . . . . 48
Appendix A.5. White Space, Comments, and Other Oddities . . . . 49
Appendix A.6. Obsoleted Forms . . . . . . . . . . . . . . . . . 50
Appendix A.6.1. Obsolete Addressing . . . . . . . . . . . . . . . 50
Appendix A.6.2. Obsolete Dates . . . . . . . . . . . . . . . . . . 50
Appendix A.6.3. Obsolete White Space and Comments . . . . . . . . 51
Appendix B. Differences from Earlier Specifications . . . . . 52
Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . 53
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.1. Normative References . . . . . . . . . . . . . . . . . . . 55
7.2. Informative References . . . . . . . . . . . . . . . . . . 55
Resnick Standards Track [Page 3]
RFC 5322 Internet Message Format October 2008
1. Introduction
1.1. Scope
This document specifies the Internet Message Format (IMF), a syntax
for text messages that are sent between computer users, within the
framework of "electronic mail" messages. This specification is an
update to [RFC2822], which itself superseded [RFC0822], updating it
to reflect current practice and incorporating incremental changes
that were specified in other RFCs such as [RFC1123].
This document specifies a syntax only for text messages. In
particular, it makes no provision for the transmission of images,
audio, or other sorts of structured data in electronic mail messages.
There are several extensions published, such as the MIME document
series ([RFC2045], [RFC2046], [RFC2049]), which describe mechanisms
for the transmission of such data through electronic mail, either by
extending the syntax provided here or by structuring such messages to
conform to this syntax. Those mechanisms are outside of the scope of
this specification.
In the context of electronic mail, messages are viewed as having an
envelope and contents. The envelope contains whatever information is
needed to accomplish transmission and delivery. (See [RFC5321] for a
discussion of the envelope.) The contents comprise the object to be
delivered to the recipient. This specification applies only to the
format and some of the semantics of message contents. It contains no
specification of the information in the envelope.
However, some message systems may use information from the contents
to create the envelope. It is intended that this specification
facilitate the acquisition of such information by programs.
This specification is intended as a definition of what message
content format is to be passed between systems. Though some message
systems locally store messages in this format (which eliminates the
need for translation between formats) and others use formats that
differ from the one specified in this specification, local storage is
outside of the scope of this specification.
Note: This specification is not intended to dictate the internal
formats used by sites, the specific message system features that
they are expected to support, or any of the characteristics of
user interface programs that create or read messages. In
addition, this document does not specify an encoding of the
characters for either transport or storage; that is, it does not
specify the number of bits used or how those bits are specifically
transferred over the wire or stored on disk.
Resnick Standards Track [Page 4]
RFC 5322 Internet Message Format October 2008
1.2. Notational Conventions
1.2.1. Requirements Notation
This document occasionally uses terms that appear in capital letters.
When the terms "MUST", "SHOULD", "RECOMMENDED", "MUST NOT", "SHOULD
NOT", and "MAY" appear capitalized, they are being used to indicate
particular requirements of this specification. A discussion of the
meanings of these terms appears in [RFC2119].
1.2.2. Syntactic Notation
This specification uses the Augmented Backus-Naur Form (ABNF)
[RFC5234] notation for the formal definitions of the syntax of
messages. Characters will be specified either by a decimal value
(e.g., the value %d65 for uppercase A and %d97 for lowercase A) or by
a case-insensitive literal value enclosed in quotation marks (e.g.,
"A" for either uppercase or lowercase A).
1.2.3. Structure of This Document
This document is divided into several sections.
This section, section 1, is a short introduction to the document.
Section 2 lays out the general description of a message and its
constituent parts. This is an overview to help the reader understand
some of the general principles used in the later portions of this
document. Any examples in this section MUST NOT be taken as
specification of the formal syntax of any part of a message.
Section 3 specifies formal ABNF rules for the structure of each part
of a message (the syntax) and describes the relationship between
those parts and their meaning in the context of a message (the
semantics). That is, it lays out the actual rules for the structure
of each part of a message (the syntax) as well as a description of
the parts and instructions for their interpretation (the semantics).
This includes analysis of the syntax and semantics of subparts of
messages that have specific structure. The syntax included in
section 3 represents messages as they MUST be created. There are
also notes in section 3 to indicate if any of the options specified
in the syntax SHOULD be used over any of the others.
Both sections 2 and 3 describe messages that are legal to generate
for purposes of this specification.
Resnick Standards Track [Page 5]
RFC 5322 Internet Message Format October 2008
Section 4 of this document specifies an "obsolete" syntax. There are
references in section 3 to these obsolete syntactic elements. The
rules of the obsolete syntax are elements that have appeared in
earlier versions of this specification or have previously been widely
used in Internet messages. As such, these elements MUST be
interpreted by parsers of messages in order to be conformant to this
specification. However, since items in this syntax have been
determined to be non-interoperable or to cause significant problems
for recipients of messages, they MUST NOT be generated by creators of
conformant messages.
Section 5 details security considerations to take into account when
implementing this specification.
Appendix A lists examples of different sorts of messages. These
examples are not exhaustive of the types of messages that appear on
the Internet, but give a broad overview of certain syntactic forms.
Appendix B lists the differences between this specification and
earlier specifications for Internet messages.
Appendix C contains acknowledgements.
2. Lexical Analysis of Messages
2.1. General Description
At the most basic level, a message is a series of characters. A
message that is conformant with this specification is composed of
characters with values in the range of 1 through 127 and interpreted
as US-ASCII [ANSI.X3-4.1986] characters. For brevity, this document
sometimes refers to this range of characters as simply "US-ASCII
characters".
Note: This document specifies that messages are made up of
characters in the US-ASCII range of 1 through 127. There are
other documents, specifically the MIME document series ([RFC2045],
[RFC2046], [RFC2047], [RFC2049], [RFC4288], [RFC4289]), that
extend this specification to allow for values outside of that
range. Discussion of those mechanisms is not within the scope of
this specification.
Messages are divided into lines of characters. A line is a series of
characters that is delimited with the two characters carriage-return
and line-feed; that is, the carriage return (CR) character (ASCII
value 13) followed immediately by the line feed (LF) character (ASCII
value 10). (The carriage return/line feed pair is usually written in
this document as "CRLF".)
Resnick Standards Track [Page 6]
RFC 5322 Internet Message Format October 2008
A message consists of header fields (collectively called "the header
section of the message") followed, optionally, by a body. The header
section is a sequence of lines of characters with special syntax as
defined in this specification. The body is simply a sequence of
characters that follows the header section and is separated from the
header section by an empty line (i.e., a line with nothing preceding
the CRLF).
Note: Common parlance and earlier versions of this specification
use the term "header" to either refer to the entire header section
or to refer to an individual header field. To avoid ambiguity,
this document does not use the terms "header" or "headers" in
isolation, but instead always uses "header field" to refer to the
individual field and "header section" to refer to the entire
collection.
2.1.1. Line Length Limits
There are two limits that this specification places on the number of
characters in a line. Each line of characters MUST be no more than
998 characters, and SHOULD be no more than 78 characters, excluding
the CRLF.
The 998 character limit is due to limitations in many implementations
that send, receive, or store IMF messages which simply cannot handle
more than 998 characters on a line. Receiving implementations would
do well to handle an arbitrarily large number of characters in a line
for robustness sake. However, there are so many implementations that
(in compliance with the transport requirements of [RFC5321]) do not
accept messages containing more than 1000 characters including the CR
and LF per line, it is important for implementations not to create
such messages.
The more conservative 78 character recommendation is to accommodate
the many implementations of user interfaces that display these
messages which may truncate, or disastrously wrap, the display of
more than 78 characters per line, in spite of the fact that such
implementations are non-conformant to the intent of this
specification (and that of [RFC5321] if they actually cause
information to be lost). Again, even though this limitation is put
on messages, it is incumbent upon implementations that display
messages to handle an arbitrarily large number of characters in a
line (certainly at least up to the 998 character limit) for the sake
of robustness.
Resnick Standards Track [Page 7]
RFC 5322 Internet Message Format October 2008
2.2. Header Fields
Header fields are lines beginning with a field name, followed by a
colon (":"), followed by a field body, and terminated by CRLF. A
field name MUST be composed of printable US-ASCII characters (i.e.,
characters that have values between 33 and 126, inclusive), except
colon. A field body may be composed of printable US-ASCII characters
as well as the space (SP, ASCII value 32) and horizontal tab (HTAB,
ASCII value 9) characters (together known as the white space
characters, WSP). A field body MUST NOT include CR and LF except
when used in "folding" and "unfolding", as described in section
2.2.3. All field bodies MUST conform to the syntax described in
sections 3 and 4 of this specification.
2.2.1. Unstructured Header Field Bodies
Some field bodies in this specification are defined simply as
"unstructured" (which is specified in section 3.2.5 as any printable
US-ASCII characters plus white space characters) with no further
restrictions. These are referred to as unstructured field bodies.
Semantically, unstructured field bodies are simply to be treated as a
single line of characters with no further processing (except for
"folding" and "unfolding" as described in section 2.2.3).
2.2.2. Structured Header Field Bodies
Some field bodies in this specification have a syntax that is more
restrictive than the unstructured field bodies described above.
These are referred to as "structured" field bodies. Structured field
bodies are sequences of specific lexical tokens as described in
sections 3 and 4 of this specification. Many of these tokens are
allowed (according to their syntax) to be introduced or end with
comments (as described in section 3.2.2) as well as the white space
characters, and those white space characters are subject to "folding"
and "unfolding" as described in section 2.2.3. Semantic analysis of
structured field bodies is given along with their syntax.
2.2.3. Long Header Fields
Each header field is logically a single line of characters comprising
the field name, the colon, and the field body. For convenience
however, and to deal with the 998/78 character limitations per line,
the field body portion of a header field can be split into a
multiple-line representation; this is called "folding". The general
rule is that wherever this specification allows for folding white
space (not simply WSP characters), a CRLF may be inserted before any
WSP.
Resnick Standards Track [Page 8]
RFC 5322 Internet Message Format October 2008
For example, the header field:
Subject: This is a test
can be represented as:
Subject: This
is a test
Note: Though structured field bodies are defined in such a way
that folding can take place between many of the lexical tokens
(and even within some of the lexical tokens), folding SHOULD be
limited to placing the CRLF at higher-level syntactic breaks. For
instance, if a field body is defined as comma-separated values, it
is recommended that folding occur after the comma separating the
structured items in preference to other places where the field
could be folded, even if it is allowed elsewhere.
The process of moving from this folded multiple-line representation
of a header field to its single line representation is called
"unfolding". Unfolding is accomplished by simply removing any CRLF
that is immediately followed by WSP. Each header field should be
treated in its unfolded form for further syntactic and semantic
evaluation. An unfolded header field has no length restriction and
therefore may be indeterminately long.
2.3. Body
The body of a message is simply lines of US-ASCII characters. The
only two limitations on the body are as follows:
o CR and LF MUST only occur together as CRLF; they MUST NOT appear
independently in the body.
o Lines of characters in the body MUST be limited to 998 characters,
and SHOULD be limited to 78 characters, excluding the CRLF.
Note: As was stated earlier, there are other documents,
specifically the MIME documents ([RFC2045], [RFC2046], [RFC2049],
[RFC4288], [RFC4289]), that extend (and limit) this specification
to allow for different sorts of message bodies. Again, these
mechanisms are beyond the scope of this document.
Resnick Standards Track [Page 9]
RFC 5322 Internet Message Format October 2008
3. Syntax
3.1. Introduction
The syntax as given in this section defines the legal syntax of
Internet messages. Messages that are conformant to this
specification MUST conform to the syntax in this section. If there
are options in this section where one option SHOULD be generated,
that is indicated either in the prose or in a comment next to the
syntax.
For the defined expressions, a short description of the syntax and
use is given, followed by the syntax in ABNF, followed by a semantic
analysis. The following primitive tokens that are used but otherwise
unspecified are taken from the "Core Rules" of [RFC5234], Appendix
B.1: CR, LF, CRLF, HTAB, SP, WSP, DQUOTE, DIGIT, ALPHA, and VCHAR.
In some of the definitions, there will be non-terminals whose names
start with "obs-". These "obs-" elements refer to tokens defined in
the obsolete syntax in section 4. In all cases, these productions
are to be ignored for the purposes of generating legal Internet
messages and MUST NOT be used as part of such a message. However,
when interpreting messages, these tokens MUST be honored as part of
the legal syntax. In this sense, section 3 defines a grammar for the
generation of messages, with "obs-" elements that are to be ignored,
while section 4 adds grammar for the interpretation of messages.
3.2. Lexical Tokens
The following rules are used to define an underlying lexical
analyzer, which feeds tokens to the higher-level parsers. This
section defines the tokens used in structured header field bodies.
Note: Readers of this specification need to pay special attention
to how these lexical tokens are used in both the lower-level and
higher-level syntax later in the document. Particularly, the
white space tokens and the comment tokens defined in section 3.2.2
get used in the lower-level tokens defined here, and those lower-
level tokens are in turn used as parts of the higher-level tokens
defined later. Therefore, white space and comments may be allowed
in the higher-level tokens even though they may not explicitly
appear in a particular definition.
3.2.1. Quoted characters
Some characters are reserved for special interpretation, such as
delimiting lexical tokens. To permit use of these characters as
uninterpreted data, a quoting mechanism is provided.
Resnick Standards Track [Page 10]
RFC 5322 Internet Message Format October 2008
quoted-pair = ("\" (VCHAR / WSP)) / obs-qp
Where any quoted-pair appears, it is to be interpreted as the
character alone. That is to say, the "\" character that appears as
part of a quoted-pair is semantically "invisible".
Note: The "\" character may appear in a message where it is not
part of a quoted-pair. A "\" character that does not appear in a
quoted-pair is not semantically invisible. The only places in
this specification where quoted-pair currently appears are
ccontent, qcontent, and in obs-dtext in section 4.
3.2.2. Folding White Space and Comments
White space characters, including white space used in folding
(described in section 2.2.3), may appear between many elements in
header field bodies. Also, strings of characters that are treated as
comments may be included in structured field bodies as characters
enclosed in parentheses. The following defines the folding white
space (FWS) and comment constructs.
Strings of characters enclosed in parentheses are considered comments
so long as they do not appear within a "quoted-string", as defined in
section 3.2.4. Comments may nest.
There are several places in this specification where comments and FWS
may be freely inserted. To accommodate that syntax, an additional
token for "CFWS" is defined for places where comments and/or FWS can
occur. However, where CFWS occurs in this specification, it MUST NOT
be inserted in such a way that any line of a folded header field is
made up entirely of WSP characters and nothing else.
FWS = ([*WSP CRLF] 1*WSP) / obs-FWS
; Folding white space
ctext = %d33-39 / ; Printable US-ASCII
%d42-91 / ; characters not including
%d93-126 / ; "(", ")", or "\"
obs-ctext
ccontent = ctext / quoted-pair / comment
comment = "(" *([FWS] ccontent) [FWS] ")"
CFWS = (1*([FWS] comment) [FWS]) / FWS
Resnick Standards Track [Page 11]
RFC 5322 Internet Message Format October 2008
Throughout this specification, where FWS (the folding white space
token) appears, it indicates a place where folding, as discussed in
section 2.2.3, may take place. Wherever folding appears in a message
(that is, a header field body containing a CRLF followed by any WSP),
unfolding (removal of the CRLF) is performed before any further
semantic analysis is performed on that header field according to this
specification. That is to say, any CRLF that appears in FWS is
semantically "invisible".
A comment is normally used in a structured field body to provide some
human-readable informational text. Since a comment is allowed to
contain FWS, folding is permitted within the comment. Also note that
since quoted-pair is allowed in a comment, the parentheses and
backslash characters may appear in a comment, so long as they appear
as a quoted-pair. Semantically, the enclosing parentheses are not
part of the comment; the comment is what is contained between the two
parentheses. As stated earlier, the "\" in any quoted-pair and the
CRLF in any FWS that appears within the comment are semantically
"invisible" and therefore not part of the comment either.
Runs of FWS, comment, or CFWS that occur between lexical tokens in a
structured header field are semantically interpreted as a single
space character.
3.2.3. Atom
Several productions in structured header field bodies are simply
strings of certain basic characters. Such productions are called
atoms.
Some of the structured header field bodies also allow the period
character (".", ASCII value 46) within runs of atext. An additional
"dot-atom" token is defined for those purposes.
Note: The "specials" token does not appear anywhere else in this
specification. It is simply the visible (i.e., non-control, non-
white space) characters that do not appear in atext. It is
provided only because it is useful for implementers who use tools
that lexically analyze messages. Each of the characters in
specials can be used to indicate a tokenization point in lexical
analysis.
Resnick Standards Track [Page 12]
RFC 5322 Internet Message Format October 2008
atext = ALPHA / DIGIT / ; Printable US-ASCII
"!" / "#" / ; characters not including
"$" / "%" / ; specials. Used for atoms.
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"
atom = [CFWS] 1*atext [CFWS]
dot-atom-text = 1*atext *("." 1*atext)
dot-atom = [CFWS] dot-atom-text [CFWS]
specials = "(" / ")" / ; Special characters that do
"<" / ">" / ; not appear in atext
"[" / "]" /
":" / ";" /
"@" / "\" /
"," / "." /
DQUOTE
Both atom and dot-atom are interpreted as a single unit, comprising
the string of characters that make it up. Semantically, the optional
comments and FWS surrounding the rest of the characters are not part
of the atom; the atom is only the run of atext characters in an atom,
or the atext and "." characters in a dot-atom.
3.2.4. Quoted Strings
Strings of characters that include characters other than those
allowed in atoms can be represented in a quoted string format, where
the characters are surrounded by quote (DQUOTE, ASCII value 34)
characters.
Resnick Standards Track [Page 13]
RFC 5322 Internet Message Format October 2008
qtext = %d33 / ; Printable US-ASCII
%d35-91 / ; characters not including
%d93-126 / ; "\" or the quote character
obs-qtext
qcontent = qtext / quoted-pair
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
A quoted-string is treated as a unit. That is, quoted-string is
identical to atom, semantically. Since a quoted-string is allowed to
contain FWS, folding is permitted. Also note that since quoted-pair
is allowed in a quoted-string, the quote and backslash characters may
appear in a quoted-string so long as they appear as a quoted-pair.
Semantically, neither the optional CFWS outside of the quote
characters nor the quote characters themselves are part of the
quoted-string; the quoted-string is what is contained between the two
quote characters. As stated earlier, the "\" in any quoted-pair and
the CRLF in any FWS/CFWS that appears within the quoted-string are
semantically "invisible" and therefore not part of the quoted-string
either.
3.2.5. Miscellaneous Tokens
Three additional tokens are defined: word and phrase for combinations
of atoms and/or quoted-strings, and unstructured for use in
unstructured header fields and in some places within structured
header fields.
word = atom / quoted-string
phrase = 1*word / obs-phrase
unstructured = (*([FWS] VCHAR) *WSP) / obs-unstruct
3.3. Date and Time Specification
Date and time values occur in several header fields. This section
specifies the syntax for a full date and time specification. Though
folding white space is permitted throughout the date-time
specification, it is RECOMMENDED that a single space be used in each
place that FWS appears (whether it is required or optional); some
older implementations will not interpret longer sequences of folding
white space correctly.
Resnick Standards Track [Page 14]
RFC 5322 Internet Message Format October 2008
date-time = [ day-of-week "," ] date time [CFWS]
day-of-week = ([FWS] day-name) / obs-day-of-week
day-name = "Mon" / "Tue" / "Wed" / "Thu" /
"Fri" / "Sat" / "Sun"
date = day month year
day = ([FWS] 1*2DIGIT FWS) / obs-day
month = "Jan" / "Feb" / "Mar" / "Apr" /
"May" / "Jun" / "Jul" / "Aug" /
"Sep" / "Oct" / "Nov" / "Dec"
year = (FWS 4*DIGIT FWS) / obs-year
time = time-of-day zone
time-of-day = hour ":" minute [ ":" second ]
hour = 2DIGIT / obs-hour
minute = 2DIGIT / obs-minute
second = 2DIGIT / obs-second
zone = (FWS ( "+" / "-" ) 4DIGIT) / obs-zone
The day is the numeric day of the month. The year is any numeric
year 1900 or later.
The time-of-day specifies the number of hours, minutes, and
optionally seconds since midnight of the date indicated.
The date and time-of-day SHOULD express local time.
The zone specifies the offset from Coordinated Universal Time (UTC,
formerly referred to as "Greenwich Mean Time") that the date and
time-of-day represent. The "+" or "-" indicates whether the time-of-
day is ahead of (i.e., east of) or behind (i.e., west of) Universal
Time. The first two digits indicate the number of hours difference
from Universal Time, and the last two digits indicate the number of
additional minutes difference from Universal Time. (Hence, +hhmm
means +(hh * 60 + mm) minutes, and -hhmm means -(hh * 60 + mm)
minutes). The form "+0000" SHOULD be used to indicate a time zone at
Universal Time. Though "-0000" also indicates Universal Time, it is
Resnick Standards Track [Page 15]
RFC 5322 Internet Message Format October 2008
used to indicate that the time was generated on a system that may be
in a local time zone other than Universal Time and that the date-time
contains no information about the local time zone.
A date-time specification MUST be semantically valid. That is, the
day-of-week (if included) MUST be the day implied by the date, the
numeric day-of-month MUST be between 1 and the number of days allowed
for the specified month (in the specified year), the time-of-day MUST
be in the range 00:00:00 through 23:59:60 (the number of seconds
allowing for a leap second; see [RFC1305]), and the last two digits
of the zone MUST be within the range 00 through 59.
3.4. Address Specification
Addresses occur in several message header fields to indicate senders
and recipients of messages. An address may either be an individual
mailbox, or a group of mailboxes.
address = mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
angle-addr = [CFWS] "<" addr-spec ">" [CFWS] /
obs-angle-addr
group = display-name ":" [group-list] ";" [CFWS]
display-name = phrase
mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list
address-list = (address *("," address)) / obs-addr-list
group-list = mailbox-list / CFWS / obs-group-list
A mailbox receives mail. It is a conceptual entity that does not
necessarily pertain to file storage. For example, some sites may
choose to print mail on a printer and deliver the output to the
addressee's desk.
Normally, a mailbox is composed of two parts: (1) an optional display
name that indicates the name of the recipient (which can be a person
or a system) that could be displayed to the user of a mail
application, and (2) an addr-spec address enclosed in angle brackets
Resnick Standards Track [Page 16]
RFC 5322 Internet Message Format October 2008
("<" and ">"). There is an alternate simple form of a mailbox where
the addr-spec address appears alone, without the recipient's name or
the angle brackets. The Internet addr-spec address is described in
section 3.4.1.
Note: Some legacy implementations used the simple form where the
addr-spec appears without the angle brackets, but included the
name of the recipient in parentheses as a comment following the
addr-spec. Since the meaning of the information in a comment is
unspecified, implementations SHOULD use the full name-addr form of
the mailbox, instead of the legacy form, to specify the display
name associated with a mailbox. Also, because some legacy
implementations interpret the comment, comments generally SHOULD
NOT be used in address fields to avoid confusing such
implementations.
When it is desirable to treat several mailboxes as a single unit
(i.e., in a distribution list), the group construct can be used. The
group construct allows the sender to indicate a named group of
recipients. This is done by giving a display name for the group,
followed by a colon, followed by a comma-separated list of any number
of mailboxes (including zero and one), and ending with a semicolon.
Because the list of mailboxes can be empty, using the group construct
is also a simple way to communicate to recipients that the message
was sent to one or more named sets of recipients, without actually
providing the individual mailbox address for any of those recipients.
3.4.1. Addr-Spec Specification
An addr-spec is a specific Internet identifier that contains a
locally interpreted string followed by the at-sign character ("@",
ASCII value 64) followed by an Internet domain. The locally
interpreted string is either a quoted-string or a dot-atom. If the
string can be represented as a dot-atom (that is, it contains no
characters other than atext characters or "." surrounded by atext
characters), then the dot-atom form SHOULD be used and the quoted-
string form SHOULD NOT be used. Comments and folding white space
SHOULD NOT be used around the "@" in the addr-spec.
Note: A liberal syntax for the domain portion of addr-spec is
given here. However, the domain portion contains addressing
information specified by and used in other protocols (e.g.,
[RFC1034], [RFC1035], [RFC1123], [RFC5321]). It is therefore
incumbent upon implementations to conform to the syntax of
addresses for the context in which they are used.
Resnick Standards Track [Page 17]
RFC 5322 Internet Message Format October 2008
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
dtext = %d33-90 / ; Printable US-ASCII
%d94-126 / ; characters not including
obs-dtext ; "[", "]", or "\"
The domain portion identifies the point to which the mail is
delivered. In the dot-atom form, this is interpreted as an Internet
domain name (either a host name or a mail exchanger name) as
described in [RFC1034], [RFC1035], and [RFC1123]. In the domain-
literal form, the domain is interpreted as the literal Internet
address of the particular host. In both cases, how addressing is
used and how messages are transported to a particular host is covered
in separate documents, such as [RFC5321]. These mechanisms are
outside of the scope of this document.
The local-part portion is a domain-dependent string. In addresses,
it is simply interpreted on the particular host as a name of a
particular mailbox.
3.5. Overall Message Syntax
A message consists of header fields, optionally followed by a message
body. Lines in a message MUST be a maximum of 998 characters
excluding the CRLF, but it is RECOMMENDED that lines be limited to 78
characters excluding the CRLF. (See section 2.1.1 for explanation.)
In a message body, though all of the characters listed in the text
rule MAY be used, the use of US-ASCII control characters (values 1
through 8, 11, 12, and 14 through 31) is discouraged since their
interpretation by receivers for display is not guaranteed.
message = (fields / obs-fields)
[CRLF body]
body = (*(*998text CRLF) *998text) / obs-body
text = %d1-9 / ; Characters excluding CR
%d11 / ; and LF
%d12 /
%d14-127
Resnick Standards Track [Page 18]
RFC 5322 Internet Message Format October 2008
The header fields carry most of the semantic information and are
defined in section 3.6. The body is simply a series of lines of text
that are uninterpreted for the purposes of this specification.
3.6. Field Definitions
The header fields of a message are defined here. All header fields
have the same general syntactic structure: a field name, followed by
a colon, followed by the field body. The specific syntax for each
header field is defined in the subsequent sections.
Note: In the ABNF syntax for each field in subsequent sections,
each field name is followed by the required colon. However, for
brevity, sometimes the colon is not referred to in the textual
description of the syntax. It is, nonetheless, required.
It is important to note that the header fields are not guaranteed to
be in a particular order. They may appear in any order, and they
have been known to be reordered occasionally when transported over
the Internet. However, for the purposes of this specification,
header fields SHOULD NOT be reordered when a message is transported
or transformed. More importantly, the trace header fields and resent
header fields MUST NOT be reordered, and SHOULD be kept in blocks
prepended to the message. See sections 3.6.6 and 3.6.7 for more
information.
The only required header fields are the origination date field and
the originator address field(s). All other header fields are
syntactically optional. More information is contained in the table
following this definition.
Resnick Standards Track [Page 19]
RFC 5322 Internet Message Format October 2008
fields = *(trace
*optional-field /
*(resent-date /
resent-from /
resent-sender /
resent-to /
resent-cc /
resent-bcc /
resent-msg-id))
*(orig-date /
from /
sender /
reply-to /
to /
cc /
bcc /
message-id /
in-reply-to /
references /
subject /
comments /
keywords /
optional-field)
The following table indicates limits on the number of times each
field may occur in the header section of a message as well as any
special limitations on the use of those fields. An asterisk ("*")
next to a value in the minimum or maximum column indicates that a
special restriction appears in the Notes column.
Resnick Standards Track [Page 20]
RFC 5322 Internet Message Format October 2008
+----------------+--------+------------+----------------------------+
| Field | Min | Max number | Notes |
| | number | | |
+----------------+--------+------------+----------------------------+
| trace | 0 | unlimited | Block prepended - see |
| | | | 3.6.7 |
| resent-date | 0* | unlimited* | One per block, required if |
| | | | other resent fields are |
| | | | present - see 3.6.6 |
| resent-from | 0 | unlimited* | One per block - see 3.6.6 |
| resent-sender | 0* | unlimited* | One per block, MUST occur |
| | | | with multi-address |
| | | | resent-from - see 3.6.6 |
| resent-to | 0 | unlimited* | One per block - see 3.6.6 |
| resent-cc | 0 | unlimited* | One per block - see 3.6.6 |
| resent-bcc | 0 | unlimited* | One per block - see 3.6.6 |
| resent-msg-id | 0 | unlimited* | One per block - see 3.6.6 |
| orig-date | 1 | 1 | |
| from | 1 | 1 | See sender and 3.6.2 |
| sender | 0* | 1 | MUST occur with |
| | | | multi-address from - see |
| | | | 3.6.2 |
| reply-to | 0 | 1 | |
| to | 0 | 1 | |
| cc | 0 | 1 | |
| bcc | 0 | 1 | |
| message-id | 0* | 1 | SHOULD be present - see |
| | | | 3.6.4 |
| in-reply-to | 0* | 1 | SHOULD occur in some |
| | | | replies - see 3.6.4 |
| references | 0* | 1 | SHOULD occur in some |
| | | | replies - see 3.6.4 |
| subject | 0 | 1 | |
| comments | 0 | unlimited | |
| keywords | 0 | unlimited | |
| optional-field | 0 | unlimited | |
+----------------+--------+------------+----------------------------+
The exact interpretation of each field is described in subsequent
sections.
Resnick Standards Track [Page 21]
RFC 5322 Internet Message Format October 2008
3.6.1. The Origination Date Field
The origination date field consists of the field name "Date" followed
by a date-time specification.
orig-date = "Date:" date-time CRLF
The origination date specifies the date and time at which the creator
of the message indicated that the message was complete and ready to
enter the mail delivery system. For instance, this might be the time
that a user pushes the "send" or "submit" button in an application
program. In any case, it is specifically not intended to convey the
time that the message is actually transported, but rather the time at
which the human or other creator of the message has put the message
into its final form, ready for transport. (For example, a portable
computer user who is not connected to a network might queue a message
for delivery. The origination date is intended to contain the date
and time that the user queued the message, not the time when the user
connected to the network to send the message.)
3.6.2. Originator Fields
The originator fields of a message consist of the from field, the
sender field (when applicable), and optionally the reply-to field.
The from field consists of the field name "From" and a comma-
separated list of one or more mailbox specifications. If the from
field contains more than one mailbox specification in the mailbox-
list, then the sender field, containing the field name "Sender" and a
single mailbox specification, MUST appear in the message. In either
case, an optional reply-to field MAY also be included, which contains
the field name "Reply-To" and a comma-separated list of one or more
addresses.
from = "From:" mailbox-list CRLF
sender = "Sender:" mailbox CRLF
reply-to = "Reply-To:" address-list CRLF
The originator fields indicate the mailbox(es) of the source of the
message. The "From:" field specifies the author(s) of the message,
that is, the mailbox(es) of the person(s) or system(s) responsible
for the writing of the message. The "Sender:" field specifies the
mailbox of the agent responsible for the actual transmission of the
message. For example, if a secretary were to send a message for
another person, the mailbox of the secretary would appear in the
"Sender:" field and the mailbox of the actual author would appear in
the "From:" field. If the originator of the message can be indicated
Resnick Standards Track [Page 22]
RFC 5322 Internet Message Format October 2008
by a single mailbox and the author and transmitter are identical, the
"Sender:" field SHOULD NOT be used. Otherwise, both fields SHOULD
appear.
Note: The transmitter information is always present. The absence
of the "Sender:" field is sometimes mistakenly taken to mean that
the agent responsible for transmission of the message has not been
specified. This absence merely means that the transmitter is
identical to the author and is therefore not redundantly placed
into the "Sender:" field.
The originator fields also provide the information required when
replying to a message. When the "Reply-To:" field is present, it
indicates the address(es) to which the author of the message suggests
that replies be sent. In the absence of the "Reply-To:" field,
replies SHOULD by default be sent to the mailbox(es) specified in the
"From:" field unless otherwise specified by the person composing the
reply.
In all cases, the "From:" field SHOULD NOT contain any mailbox that
does not belong to the author(s) of the message. See also section
3.6.3 for more information on forming the destination addresses for a
reply.
3.6.3. Destination Address Fields
The destination fields of a message consist of three possible fields,
each of the same form: the field name, which is either "To", "Cc", or
"Bcc", followed by a comma-separated list of one or more addresses
(either mailbox or group syntax).
to = "To:" address-list CRLF
cc = "Cc:" address-list CRLF
bcc = "Bcc:" [address-list / CFWS] CRLF
The destination fields specify the recipients of the message. Each
destination field may have one or more addresses, and the addresses
indicate the intended recipients of the message. The only difference
between the three fields is how each is used.
The "To:" field contains the address(es) of the primary recipient(s)
of the message.
Resnick Standards Track [Page 23]
RFC 5322 Internet Message Format October 2008
The "Cc:" field (where the "Cc" means "Carbon Copy" in the sense of
making a copy on a typewriter using carbon paper) contains the
addresses of others who are to receive the message, though the
content of the message may not be directed at them.
The "Bcc:" field (where the "Bcc" means "Blind Carbon Copy") contains
addresses of recipients of the message whose addresses are not to be
revealed to other recipients of the message. There are three ways in
which the "Bcc:" field is used. In the first case, when a message
containing a "Bcc:" field is prepared to be sent, the "Bcc:" line is
removed even though all of the recipients (including those specified
in the "Bcc:" field) are sent a copy of the message. In the second
case, recipients specified in the "To:" and "Cc:" lines each are sent
a copy of the message with the "Bcc:" line removed as above, but the
recipients on the "Bcc:" line get a separate copy of the message
containing a "Bcc:" line. (When there are multiple recipient
addresses in the "Bcc:" field, some implementations actually send a
separate copy of the message to each recipient with a "Bcc:"
containing only the address of that particular recipient.) Finally,
since a "Bcc:" field may contain no addresses, a "Bcc:" field can be
sent without any addresses indicating to the recipients that blind
copies were sent to someone. Which method to use with "Bcc:" fields
is implementation dependent, but refer to the "Security
Considerations" section of this document for a discussion of each.
When a message is a reply to another message, the mailboxes of the
authors of the original message (the mailboxes in the "From:" field)
or mailboxes specified in the "Reply-To:" field (if it exists) MAY
appear in the "To:" field of the reply since these would normally be
the primary recipients of the reply. If a reply is sent to a message
that has destination fields, it is often desirable to send a copy of
the reply to all of the recipients of the message, in addition to the
author. When such a reply is formed, addresses in the "To:" and
"Cc:" fields of the original message MAY appear in the "Cc:" field of
the reply, since these are normally secondary recipients of the
reply. If a "Bcc:" field is present in the original message,
addresses in that field MAY appear in the "Bcc:" field of the reply,
but they SHOULD NOT appear in the "To:" or "Cc:" fields.
Note: Some mail applications have automatic reply commands that
include the destination addresses of the original message in the
destination addresses of the reply. How those reply commands
behave is implementation dependent and is beyond the scope of this
document. In particular, whether or not to include the original
destination addresses when the original message had a "Reply-To:"
field is not addressed here.
Resnick Standards Track [Page 24]
RFC 5322 Internet Message Format October 2008
3.6.4. Identification Fields
Though listed as optional in the table in section 3.6, every message
SHOULD have a "Message-ID:" field. Furthermore, reply messages
SHOULD have "In-Reply-To:" and "References:" fields as appropriate
and as described below.
The "Message-ID:" field contains a single unique message identifier.
The "References:" and "In-Reply-To:" fields each contain one or more
unique message identifiers, optionally separated by CFWS.
The message identifier (msg-id) syntax is a limited version of the
addr-spec construct enclosed in the angle bracket characters, "<" and
">". Unlike addr-spec, this syntax only permits the dot-atom-text
form on the left-hand side of the "@" and does not have internal CFWS
anywhere in the message identifier.
Note: As with addr-spec, a liberal syntax is given for the right-
hand side of the "@" in a msg-id. However, later in this section,
the use of a domain for the right-hand side of the "@" is
RECOMMENDED. Again, the syntax of domain constructs is specified
by and used in other protocols (e.g., [RFC1034], [RFC1035],
[RFC1123], [RFC5321]). It is therefore incumbent upon
implementations to conform to the syntax of addresses for the
context in which they are used.
message-id = "Message-ID:" msg-id CRLF
in-reply-to = "In-Reply-To:" 1*msg-id CRLF
references = "References:" 1*msg-id CRLF
msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]
id-left = dot-atom-text / obs-id-left
id-right = dot-atom-text / no-fold-literal / obs-id-right
no-fold-literal = "[" *dtext "]"
The "Message-ID:" field provides a unique message identifier that
refers to a particular version of a particular message. The
uniqueness of the message identifier is guaranteed by the host that
generates it (see below). This message identifier is intended to be
machine readable and not necessarily meaningful to humans. A message
identifier pertains to exactly one version of a particular message;
subsequent revisions to the message each receive new message
identifiers.
Resnick Standards Track [Page 25]
RFC 5322 Internet Message Format October 2008
Note: There are many instances when messages are "changed", but
those changes do not constitute a new instantiation of that
message, and therefore the message would not get a new message
identifier. For example, when messages are introduced into the
transport system, they are often prepended with additional header
fields such as trace fields (described in section 3.6.7) and
resent fields (described in section 3.6.6). The addition of such
header fields does not change the identity of the message and
therefore the original "Message-ID:" field is retained. In all
cases, it is the meaning that the sender of the message wishes to
convey (i.e., whether this is the same message or a different
message) that determines whether or not the "Message-ID:" field
changes, not any particular syntactic difference that appears (or
does not appear) in the message.
The "In-Reply-To:" and "References:" fields are used when creating a
reply to a messa