7 Lexical Conventions
The source text of an ECMAScript program is first converted into a sequence of input elements, which are
tokens, line terminators, comments, or white space. The source text is scanned from left to right, repeatedly
taking the longest possible sequence of characters as the next input element.
There are two goal symbols for the lexical grammar. The
InputElementDiv symbol is used in those syntactic
grammar contexts where a leading division (/) or division-assignment (/=) operator is permitted. The
InputElementRegExp symbol is used in other syntactic grammar contexts.
NOTE There are no syntactic grammar contexts where both a leading division or division-assignment, and a leading
RegularExpressionLiteral are permitted. This is not affected by semicolon insertion (see
7.9); in examples such as the
following:
a = b
/ hi / g.exec(c).map(d);
where the first non-whitespace, non-comment character after a
LineTerminator is slash (/) and the syntactic context allows
division or division-assignment, no semicolon is inserted at the
LineTerminator. That is, the above example is interpreted in
the same way as:
a = b / hi / g.exec(c).map(d);
(scroll to the next page)
© Ecma International 200913
Syntax
- InputElementDiv ::
- WhiteSpace
LineTerminator
Comment
Token
DivPunctuator
- InputElementRegExp ::
- WhiteSpace
LineTerminator
Comment
Token
RegularExpressionLiteral
7.1 Unicode Format-Control Characters
The Unicode format-control characters (i.e., the characters in category "Cf" in the Unicode Character
Database such as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting
of a range of text in the absence of higher-level protocols for this (such as mark-up languages).
It is useful to allow format-control characters in source text to facilitate editing and display. All format control
characters may be used within comments, and within string literals and regular expression literals.
<ZWNJ> and <ZWJ> are format-control characters that are used to make necessary distinctions when forming
words or phrases in certain languages. In ECMAScript source text, <ZWNJ> and <ZWJ> may also be used in
an identifier after the first character.
<BOM> is a format-control character used primarily at the start of a text to mark it as Unicode and to allow
detection of the text's encoding and byte order. <BOM> characters intended for this purpose can sometimes
also appear after the start of a text, for example as a result of concatenating files. <BOM> characters are
treated as white space characters (see
7.2 ).
The special treatment of certain format-control characters outside of comments, string literals, and regular
expression literals is summarized in Table 1.
Table 1 — Format-Control Character Usage
| Code Unit Value | Name | Formal Name | Usage |
  \u200C | Zero width non-joiner | <ZWNJ> | IdentifierPart |
  \u200C | Zero width joiner | <ZWJ> | IdentifierPart |
  \uFEFF | Byte Order Mark | <BOM> | Whitespace |
7.2 White Space
White space characters are used to improve source text readability and to separate tokens (indivisible lexical
units) from each other, but are otherwise insignificant. White space characters may occur between any two
tokens and at the start or end of input. White space characters may also occur within a
StringLiteral or a
RegularExpressionLiteral (where they are considered significant characters forming part of the literal value) or
within a Comment, but cannot appear within any other kind of token.
The ECMAScript white space characters are listed in Table 2.
© Ecma International 200914
Table 2 — Whitespace Characters
| Code Unit Value | Name | Formal Name |
  \u0009 | Tab | <TAB> |
  \u000B | Vertical Tab | <VT> |
  \u000C | Form Feed | <FF> |
  \u0020 | Space | <SP> |
  \u00A0 | No-break space | <NBSP> |
  \uFEFF | Byte Order Mark | <BOM> |
|   Other category "Zs" | Any other Unicode "space separator" | <USP> |
ECMAScript implementations must recognize all of the white space characters defined in Unicode 3.0. Later
editions of the Unicode Standard may define other white space characters. ECMAScript implementations may
recognize white space characters from later editions of the Unicode Standard.
Syntax
- WhiteSpace ::
- <TAB>
<VT>
<FF>
<SP>
<NBSP>
<BOM>
<USP>
7.3 Line Terminators
Like white space characters, line terminator characters are used to improve source text readability and to
separate tokens (indivisible lexical units) from each other. However, unlike white space characters, line
terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators
may occur between any two tokens, but there are a few places where they are forbidden by the syntactic
grammar. Line terminators also affect the process of automatic semicolon insertion (
7.9). A line terminator
cannot occur within any token except a StringLiteral. Line terminators may only occur within a
StringLiteral
token as part of a
LineContinuation.
A line terminator can occur within a MultiLineComment (
7.4) but cannot occur within a
SingleLineComment.
Line terminators are included in the set of white space characters that are matched by the \s class in regular
expressions.
The ECMAScript line terminator characters are listed in Table 3.
Table 3 — Line Terminator Characters
| Code Unit Value | Name | Formal Name |
  \u000A | Line Feed | <LF> |
  \u000D | Carriage Return | <CR> |
  \u2028 | Line separator | <LF> |
  \u2029 | Paragraph separator | <PS> |
Only the characters in Table 3 are treated as line terminators. Other new line or line breaking characters are
treated as white space but not as line terminators. The character sequence <CR><LF> is commonly used as
a line terminator. It should be considered a single character for the purpose of reporting line numbers.
© Ecma International 200915
Syntax
- LineTerminator ::
- <TAB>
<VT>
<CR>
<LF>
<PS>
- LineTerminatorSequence ::
- <TAB>
<LF>
<CR> [lookahead ∉ <LF>]
<PS>
<CR>
<LF>
7.4 Comments
Comments can be either single or multi-line. Multi-line comments cannot nest.
Because a single-line comment can contain any character except a
LineTerminator character, and because of
the general rule that a token is always as long as possible, a single-line comment always consists of all
characters from the // marker to the end of the line. However, the
LineTerminator at the end of the line is not
considered to be part of the single-line comment; it is recognized separately by the lexical grammar and
becomes part of the stream of input elements for the syntactic grammar. This point is very important, because
it implies that the presence or absence of single-line comments does not affect the process of automatic
semicolon insertion (see
7.9 ).
Comments behave like white space and are discarded except that, if a MultiLineComment contains a line
terminator character, then the entire comment is considered to be a LineTerminator for purposes of parsing by
the syntactic grammar.
Syntax
- MultiLineComment ::
- /*   MultiLineCommentopt  */
- MultiLineCommentChars ::
- MultiLineNotAsteriskChar   MultiLineCommentCharsopt
* PostAsteriskCommentCharsopt
- PostAsteriskCommentChars ::
- MultiLineNotForwardSlashOrAsteriskChar   MultiLineCommentCharsopt
* PostAsteriskCommentCharsopt
- MultiLineNotAsteriskChar ::
- MultiLineNotAsteriskCharbut notasterisk *
- MultiLineNotForwardSlashOrAsteriskChar ::
- SourceCharacterbut not forward-slash /or asterisk *
- SingleLineComment ::
- / / SingleLineCommentCharsopt
- SingleLineCommentChars ::
- SingleLineCommentChar   SingleLineCommentCharsopt
© Ecma International 200916
- SingleLineCommentChar ::
- SourceCharacter but notLineTerminator
7.5   Tokens
Syntax
- Token ::
- WhiteSpace
IdentifierName
Punctuator
NumericLiteral
StringLiteral
NOTE     The DivPunctuator and RegularExpressionLiteral productions define tokens, but are not included in the Token
production.
7.6   Identifier Names and Identifiers
Identifier Names are tokens that are interpreted according to the grammar given in the "Identifiers" section of
chapter 5 of the Unicode standard, with some small modifications. An
Identifier is an
IdentifierName that is not
a
ReservedWord (see
7.6.1). The Unicode identifier grammar is based on both normative and informative
character categories specified by the Unicode Standard. The characters in the specified categories in version
3.0 of the Unicode standard must be treated as in those categories by all conforming ECMAScript
implementations.
This standard specifies specific character additions: The dollar sign ($) and the underscore (_) are permitted
anywhere in an
IdentifierName.
Unicode escape sequences are also permitted in an
IdentifierName, where they contribute a single character to
the
IdentifierName, as computed by the CV of the
UnicodeEscapeSequence (see
7.8.4). The \ preceding the
UnicodeEscapeSequence does not contribute a character to the
IdentifierName. A
UnicodeEscapeSequence cannot
be used to put a character into an
IdentifierName that would otherwise be illegal. In other words, if a
\
UnicodeEscapeSequence sequence were replaced by its
UnicodeEscapeSequence's CV, the result must still be
a valid
IdentifierName that has the exact same sequence of characters as the original
IdentifierName. All
interpretations of identifiers within this specification are based upon their actual characters regardless of
whether or not an escape sequence was used to contribute any particular characters.
Two
IdentifierName that are canonically equivalent according to the Unicode standard are not equal unless
they are represented by the exact same sequence of code units (in other words, conforming ECMAScript
implementations are only required to do bitwise comparison on
IdentifierName values). The intent is that the
incoming source text has been converted to normalized form C before it reaches the compiler.
ECMAScript implementations may recognize identifier characters defined in later editions of the Unicode
Standard. If portability is a concern, programmers should only employ identifier characters defined in Unicode
3.0.
Syntax
- Identifier ::
- IdentifierName
IdentifierNamebut notReservedWord
- IdentifierName ::
- IdentifierStart
IdentifierName IdentifierPart
© Ecma International 200917
- IdentifierStart ::
- UnicodeLetter>
$
_
\
UnicodeEscapeSequence
- IdentifierPart ::
- IdentifierStart
UnicodeCombiningMark
UnicodeDigit
UnicodeConnectorPunctuation
<ZWNJ>
<ZWJ>
- UnicodeLetter
- any character in the Unicode categories "Uppercase letter (Lu)", "Lowercase letter
(Ll)", "Titlecase letter
(Lt)", "Modifier letter (Lm)", "Other letter (Lo)", or "Letter
number (Nl)".
- UnicodeCombiningMark
- any character in the Unicode categories "Non-spacing mark (Mn)" or "Combining
spacing mark (Mc)"
- UnicodeDigit
- any character in the Unicode category "Decimal number (Nd)"
- UnicodeConnectorPunctuation
- any character in the Unicode category "Connector punctuation (Pc)"
- UnicodeEscapeSequence
- see 7.8.4.
7.6.1   Reserved Words
A reserved word is an
IdentifierName that cannot be used as an
Identifier.
Syntax
- ReservedWord ::
- Keyword
FutureReservedWord
NullLiteral
BooleanLiteral
7.6.1.1   Keywords
The following tokens are ECMAScript keywords and may not be used as Identifiers in ECMAScript programs.
Syntax
- Keyword ::one of
-
| break | do | instanceof | typeof |
| case | else | new | var |
| catch | finally | return | void |
| continue | for | switch | while |
| debugger | function | this | with |
| default | if | throw | |
| delete | in | try |
© Ecma International 200918
7.6.1.2   Future Reserved Words
The following words are used as keywords in proposed extensions and are therefore reserved to allow for the
possibility of future adoption of those extensions.
Syntax
- FutureReservedWord ::one of
-
| class | enum | extends | super |
| const | export | import |
The following tokens are also considered to be
FutureReservedWords when they occur within strict mode code
(see
10.1.1). The occurrence of any of these tokens within strict mode code in any context where the
occurrence of a
FutureReservedWord would produce an error must also produce an equivalent error:
| implements | let | private | public |
| interface | package | protected | static |
| yield |
7.7   Punctuators
Syntax
- Punctuator ::one of
-
| { | } | ( | ) | [ | ] |
| . | ; | , | < | > | <= |
| >= | == | != | === | !== | |
| + | - | * | % | ++ | -- |
| << | >> | >>> | & | | | ^ |
| ! | ~ | && | || | ? | : |
| = | += | -= | *= | %= | <<= |
| >>= | >>>= | &= | |= | ^= | |
- DivPunctuator ::one of
7.8   Literals
Syntax
- Literal ::
- NullLiteral
BooleanLiteral
NumericLiteral
StringLiteral
RegularExpressionLiteral
7.8.1   Null Literals
Syntax
- NullLiteral ::
null
© Ecma International 200919
Semantics
The value of the
null literal null is the sole value of the Null type, namely
null.
7.8.2   Boolean Literals
Syntax
- BooleanLiteral ::
true
false
Semantics
The value of the Boolean literal
true is a value of the Boolean type, namely
true.
The value of the Boolean literal
false is a value of the Boolean type, namely
false.
7.8.3   Numeric Literals
Syntax
- NumericLiteral ::
- DecimalLiteral
HexIntegerLiteral
- DecimalLiteral ::
- DecimalIntegerLiteral . DecimalDigitsopt ExponentPartopt
. DecimalDigits ExponentPartopt
DecimalIntegerLiteral ExponentPartopt
- DecimalIntegerLiteral ::
- 0
NonZeroDigit DecimalDigitsopt
- DecimalDigits ::
- DecimalDigit
DecimalDigits DecimalDigit
- DecimalDigit ::one of
0 1 2 3 4 5 6 7 8 9
- NonZeroDigit ::one of
1 2 3 4 5 6 7 8 9
- ExponentPart ::
- ExponentIndicator SignedInteger
- ExponentIndicator ::one of
e E
- SignedInteger ::
- DecimalDigits
+ DecimalDigits
- DecimalDigits
- HexIntegerLiteral ::
0x HexDigit
0X HexDigit
HexIntegerLiteral HexDigit
© Ecma International 200920
- HexDigit ::one of
0 1 2 3 4 5 6 7 8 9
a b c d e f A B C D E F
The source character immediately following a
NumericLiteral must not be an
IdentifierStart or
DecimalDigit.
NOTE     For example:
3in
is an error and not the two input elements 3 and in.
Semantics
A numeric literal stands for a value of the Number type. This value is determined in two steps: first, a
mathematical value (MV) is derived from the literal; second, this mathematical value is rounded as described
below.
- The MV of NumericLiteral :: DecimalLiteral is the MV of DecimalLiteral.
- The MV of NumericLiteral :: HexIntegerLiteral is the MV of HexIntegerLiteral.
- The MV of DecimalLiteral :: DecimalIntegerLiteral . is the MV of DecimalIntegerLiteral.
- The MV of DecimalLiteral :: DecimalIntegerLiteral . DecimalDigits is the MV of DecimalIntegerLiteral plus
(the MV of DecimalDigits times 10–n), where n is the number of characters in DecimalDigits.
- The MV of DecimalLiteral :: DecimalIntegerLiteral . ExponentPart is the MV of DecimalIntegerLiteral times 10e, where e is the MV of ExponentPart.
- The MV of DecimalLiteral :: DecimalIntegerLiteral . DecimalDigits ExponentPart is (the MV of
DecimalIntegerLiteral plus (the MV of DecimalDigits times 10-n) times 10e, where n is the number of
characters in DecimalDigits and e is the MV of ExponentPart.
- The MV of DecimalLiteral :: . DecimalDigits is the MV of DecimalDigits times 10-n, where n is the number of characters in DecimalDigits.
- The MV of DecimalLiteral :: . DecimalDigits ExponentPart is the MV of DecimalDigits times 10e-n, where n is
the number of characters in DecimalDigits and e is the MV of ExponentPart.
- The MV of DecimalLiteral :: DecimalIntegerLiteral is the MV of DecimalIntegerLiteral.
- The MV of DecimalLiteral :: DecimalIntegerLiteral ExponentPart is the MV of DecimalIntegerLiteral times 10e, where e is the MV of ExponentPart.
- The MV of DecimalIntegerLiteral ::
0 is 0.
- The MV of DecimalIntegerLiteral :: NonZeroDigit DecimalDigits is (the MV of NonZeroDigit times 10n) plus
the MV of DecimalDigits, where n is the number of characters in DecimalDigits.
- The MV of DecimalDigits :: DecimalDigit is the MV of DecimalDigit.
- The MV of DecimalDigits :: DecimalDigits DecimalDigit is (the MV of DecimalDigits times 10) plus the MV of DecimalDigit.
- The MV of ExponentPart :: ExponentIndicator SignedInteger is the MV of SignedInteger.
- The MV of SignedInteger :: DecimalDigits is the MV of DecimalDigits.
- The MV of SignedInteger ::
+ DecimalDigits is the MV of DecimalDigits.
- The MV of SignedInteger ::
- DecimalDigits is the negative of the MV of DecimalDigits.
- The MV of DecimalDigit ::
0 or of HexDigit :: 0 is 0.
- The MV of DecimalDigit ::
1 or of NonZeroDigit :: 1 or of HexDigit :: 1 is 1.
- The MV of DecimalDigit ::
2 or of NonZeroDigit :: 2 or of HexDigit :: 2 is 2.
- The MV of DecimalDigit ::
3 or of NonZeroDigit :: 3 or of HexDigit :: 3 is 3.
- The MV of DecimalDigit ::
4 or of NonZeroDigit :: 4 or of HexDigit :: 4 is 4.
- The MV of DecimalDigit ::
5 or of NonZeroDigit :: 5 or of HexDigit :: 5 is 5.
- The MV of DecimalDigit ::
6 or of NonZeroDigit :: 6 or of HexDigit :: 6 is 6.
- The MV of DecimalDigit ::
7 or of NonZeroDigit :: 7 or of HexDigit :: 7 is 7.
- The MV of DecimalDigit ::
8 or of NonZeroDigit :: 8 or of HexDigit :: 8 is 8.
- The MV of DecimalDigit ::
9 or of NonZeroDigit :: 9 or of HexDigit :: 9 is 9.
- The MV of HexDigit ::
a or of HexDigit :: A is 10.
- The MV of HexDigit ::
b or of HexDigit :: B is 11.
© Ecma International 200921
- The MV of HexDigit ::
c or of HexDigit :: C is 12.
- The MV of HexDigit ::
d or of HexDigit :: D is 13.
- The MV of HexDigit ::
e or of HexDigit :: E is 14.
- The MV of HexDigit ::
f or of HexDigit :: F is 15.
- The MV of HexIntegerLiteral ::
0x HexDigit is the MV of HexDigit.
- The MV of HexIntegerLiteral ::
0X HexDigit is the MV of HexDigit.
- The MV of HexIntegerLiteral :: HexIntegerLiteral HexDigit is (the MV of HexIntegerLiteral times 16) plus the MV of HexDigit.
Once the exact MV for a numeric literal has been determined, it is then rounded to a value of the Number type.
If the MV is 0, then the rounded value is
+0; otherwise, the rounded value must be the Number value for the
MV (as specified in
8.5), unless the literal is a
DecimalLiteral and the literal has more than 20 significant digits,
in which case the Number value may be either the Number value for the MV of a literal produced by replacing
each significant digit after the 20th with a
0 digit or the Number value for the MV of a literal produced by
replacing each significant digit after the 20th with a
0 digit and then incrementing the literal at the 20th
significant digit position. A digit is
significant if it is not part of an
ExponentPart and
- it is not 0; or
- there is a nonzero digit to its left and there is a nonzero digit, not in the ExponentPart, to its right.
A conforming implementation, when processing strict mode code (see
10.1.1), must not extend the syntax of
NumericLiteral to include
OctalIntegerLiteral as described in
B.1.1.
7.8.4   String Literals
A string literal is zero or more characters enclosed in single or double quotes. Each character may be
represented by an escape sequence. All characters may appear literally in a string literal except for the closing
quote character, backslash, carriage return, line separator, paragraph separator, and line feed. Any character
may appear in the form of an escape sequence.
Syntax
- StringLiteral ::
- " DoubleStringCharactersopt"
' SingleStringCharactersopt '
- DoubleStringCharacters ::
- DoubleStringCharacter DoubleStringCharactersopt
- SingleStringCharacters ::
- SingleStringCharacter SingleStringCharactersopt
- DoubleStringCharacter ::
- SourceCharacter but not double-quote " or backslash
\or LineTerminator
\ EscapeSequence
LineContinuation
- SingleStringCharacter ::
- SourceCharacter but not single-quote ' or backslash
\or LineTerminator
\ EscapeSequence
LineContinuation
- LineContinuation ::
\ LineTerminatorSequence
© Ecma International 200922
- EscapeSequence ::
- CharacterEscapeSequence
0 [
lookahead ∉ DecimalDigit]
HexEscapeSequence
UnicodeEscapeSequence
- CharacterEscapeSequence ::
- SingleEscapeCharacter
NonEscapeCharacter
- SingleEscapeCharacter ::one of
' " \ b f n r t v
- NonEscapeCharacter ::
- SourceCharacter but not EscapeCharacter or LineTerminator
- EscapeCharacter ::
- SingleEscapeCharacter
DecimalDigit
x
u
- HexEscapeSequence ::
x
HexDigit HexDigit
- UnicodeEscapeSequence ::
u
HexDigit HexDigit HexDigit HexDigit
The definitions of the nonterminal HexDigit is given in
7.6. SourceCharacter is defined in
clause 6.
A string literal stands for a value of the String type. The String value (SV) of the literal is described in terms of
character values (CV) contributed by the various parts of the string literal. As part of this process, some
characters within the string literal are interpreted as having a mathematical value (MV), as described below or
in
7.8.3.
Semantics
- The SV of StringLiteral :: " " is the empty character sequence.
- The SV of StringLiteral :: ' ' is the empty character sequence.
- The SV of StringLiteral :: " DoubleStringCharacters " is the SV of DoubleStringCharacters.
- The SV of StringLiteral :: ' SingleStringCharacters ' is the SV of SingleStringCharacters.
- The SV of DoubleStringCharacters :: DoubleStringCharacters is a sequence of one character, the CV of DoubleStringCharacter.
- The SV of DoubleStringCharacters :: DoubleStringCharacter DoubleStringCharacter is a sequence of the CV
of DoubleStringCharacter followed by all the characters in the SV of DoubleStringCharacters in order.
- The SV of SingleStringCharacters :: SingleStringCharacter is a sequence of one character, the CV of SingleStringCharacter.
- The SV of SingleStringCharacters :: SingleStringCharacter SingleStringCharacters is a sequence of the CV
of SingleStringCharacter followed by all the characters in the SV of SingleStringCharacters in order.
- The SV of LineContinuation :: \ LineTerminatorSequence is the empty character sequence.
- The CV of DoubleStringCharacter :: SourceCharacterbut notdouble-quote "orbackslash \
orLineTerminator is the SourceCharacter character itself.
- The CV of DoubleStringCharacter :: \ EscapeSequence is the CV of the EscapeSequence.
- The CV of SingleStringCharacter :: SourceCharacterbut notsingle-quote 'orbackslash \ orLineTerminator is the SourceCharacter character itself.
- The CV of SingleStringCharacter :: \ EscapeSequence is the CV of the EscapeSequence.
- The CV of EscapeSequence :: CharacterEscapeSequence is the CV of the CharacterEscapeSequence.
© Ecma International 200923
- The CV of EscapeSequence :: 0 [
lookahead ∉ DecimalDigit] is a <NUL> character (Unicode value
0000).
- The CV of EscapeSequence :: HexEscapeSequence is the CV of the HexEscapeSequence.
- The CV of EscapeSequence :: UnicodeEscapeSequence is the CV of the UnicodeEscapeSequence.
- The CV of CharacterEscapeSequence :: SingleEscapeCharacter is the character whose code unit value is
determined by the SingleEscapeCharacter according to Table 4:
Table 4 — String Single Character Escape Sequences
| Escape Sequence | Code Unit Value | Name | Symbol |
|
| \b | \u0008 | backspace | <BS> |
| \t | \u0009 | horizontal tab | <HT> |
| \n | \u000A | line feed (new line) | <LF> |
| \v | \u000B | vertical tab | <VT> |
| \f | \u000C | form feed | <FF> |
| \r | \u000D | carriage return | <CR> |
| \" | \u0022 | double quote | " |
| \' | \u0027 | single quote | ' |
| \\ | \u005C | backslash | \ |
- The CV of CharacterEscapeSequence :: NonEscapeCharacter is the CV of the NonEscapeCharacter.
- The CV of NonEscapeCharacter :: SourceCharacterbut notEscapeCharacterorLineTerminator is the SourceCharacter character itself.
- The CV of HexEscapeSequence ::
x HexDigit HexDigit is the character whose code unit value is (16 times the MV of the first HexDigit)
plus the MV of the second HexDigit.
- The CV of UnicodeEscapeSequence ::
u HexDigit HexDigit HexDigit HexDigit is the character whose code unit value is (4096 times the MV of the first HexDigit)
plus (256 times the MV of the second HexDigit) plus (16 times the MV of the third HexDigit) plus the MV of the fourth HexDigit.
A conforming implementation, when processing strict mode code (see
10.1.1), may not extend the syntax of
EscapeSequence to include
OctalEscapeSequence as described in
B.1.2.
NOTE     A line terminator character cannot appear in a string literal, except as part of
a LineContinuation to produce the empty character sequence. The correct way to cause a line terminator character
to be part of the String value of a string literal is to use an escape sequence such as \n or \u000A.
7.8.5 Regular Expression Literals
A regular expression literal is an input element that is converted to a RegExp object (see
15.10) each time the
literal is evaluated. Two regular expression literals in a program evaluate to regular expression objects that
never compare as
=== to each other even if the two literals' contents are identical. A RegExp object may also
be created at runtime by
new RegExp (see
15.10.4) or calling the
RegExp constructor as a function (
15.10.3).
The productions below describe the syntax for a regular expression literal and are used by the input element
scanner to find the end of the regular expression literal. The Strings of characters comprising the
RegularExpressionBody and the
RegularExpressionFlags are passed uninterpreted to the regular expression
constructor, which interprets them according to its own, more stringent grammar. An implementation may
extend the regular expression constructor's grammar, but it must not extend the
RegularExpressionBody and
RegularExpressionFlags productions or the productions used by these productions.
Syntax
- RegularExpressionLiteral ::
- / RegularExpressionBody / RegularExpressionFlags
© Ecma International 200924
- RegularExpressionBody ::
- / RegularExpressionFirstChar / RegularExpressionChars
- RegularExpressionChars ::
- [empty]
RegularExpressionChars RegularExpressionChar
- RegularExpressionFirstChar ::
- RegularExpressionNonTerminator but not * or
\ or
/ or[
RegularExpressionBackslashSequence
RegularExpressionClass
- RegularExpressionChar ::
- RegularExpressionNonTerminator but not
\ or
/ or[
RegularExpressionBackslashSequence
RegularExpressionClass
- RegularExpressionBackslashSequence ::
\ NonTerminator
- RegularExpressionNonTerminator ::
- SourceCharacter but not LineTerminator
- RegularExpressionClass ::
- [RegularExpressionClassChars]
- RegularExpressionClass ::
- [empty]
RegularExpressionClassChars RegularExpressionClassChar
- RegularExpressionClassChar ::
- RegularExpressionNonTerminator but not
] or \
RegularExpressionBackslashSequence
- RegularExpressionFlags ::
- [empty]
RegularExpressionFlags IdentifierPart
NOTE     Regular expression literals may not be empty; instead of representing an empty regular expression literal, the
characters // start a single-line comment. To specify an empty regular expression, use: / ( ? : ) / .
Semantics
A regular expression literal evaluates to a value of the Object type that is an instance of the standard built-in
constructor RegExp. This value is determined in two steps: first, the characters comprising the regular
expression's
RegularExpressionBody and
RegularExpressionFlags production expansions are collected
uninterpreted into two Strings Pattern and Flags, respectively. Then each time the literal is evaluated, a new
object is created as if by the expression
new RegExp (
Pattern, Flags) where RegExp is the standard
built-in constructor with that name. The newly constructed object becomes the value of the
RegularExpressionLiteral. If the call to
new RegExp would generate an error as specified in
15.10.4.1, the error
must be treated as an early error (
Clause 16).
7.9 Automatic Semicolon Insertion
Certain ECMAScript statements (empty statement, variable statement, expression statement,
do-while
statement,
continue statement,
break statement,
return statement, and
throw statement) must be
terminated with semicolons. Such semicolons may always appear explicitly in the source text. For
convenience, however, such semicolons may be omitted from the source text in certain situations. These
situations are described by saying that semicolons are automatically inserted into the source code token
stream in those situations.
© Ecma International 200925
7.9.1 Rules of Automatic Semicolon Insertion
There are three basic rules of semicolon insertion:
- When, as the program is parsed from left to right, a token (called the offending token) is encountered that
is not allowed by any production of the grammar, then a semicolon is automatically inserted before the
offending token if one or more of the following conditions is true:
- The offending token is separated from the previous token by at least one LineTerminator.
- The offending token is }.
- When, as the program is parsed from left to right, the end of the input stream of tokens is encountered
and the parser is unable to parse the input token stream as a single complete ECMAScript Program, then
a semicolon is automatically inserted at the end of the input stream.
- When, as the program is parsed from left to right, a token is encountered that is allowed by some
production of the grammar, but the production is a restricted production and the token would be the first
token for a terminal or nonterminal immediately following the annotation "[no LineTerminator here]" within the
restricted production (and therefore such a token is called a restricted token), and the restricted token is
separated from the previous token by at least one LineTerminator, then a semicolon is automatically
inserted before the restricted token.
However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted
automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become
one of the two semicolons in the header of a for statement (see
12.6.3).
NOTE     The following are the only restricted productions in the grammar:
- PostfixExpression :
- LeftHandSideExpression
LeftHandSideExpression [no LineTerminator here] ++
LeftHandSideExpression [no LineTerminator here] - -
- ContinueStatement :
continue [no LineTerminator here]
Identifieropt ;
- BreakStatement :
break [no LineTerminator here]
Identifieropt ;
- ReturnStatement :
return [no LineTerminator here]
Identifieropt ;
- ThrowStatement :
throw [no LineTerminator here]
Expression ;
The practical effect of these restricted productions is as follows:
When a
++ or
-- token is encountered where the parser would treat it as a postfix operator, and at least one
LineTerminator occurred between the preceding token and the
++ or
-- token, then a semicolon is automatically inserted
before the
++ or
-- token.
When a
continue,
break,
return, or
throw token is encountered and a
LineTerminator is encountered before the
next token, a semicolon is automatically inserted after the
continue,
break,
return, or
throw token.
The resulting practical advice to ECMAScript programmers is:
A postfix
++ or
-- operator should appear on the same line as its operand.
An
Expression in a
return or
throw statement should start on the same line as the
return, or
throw token.
An
Identifier in a
break or
continue statement should be on the same line as the
break or
continue token.
© Ecma International 200926
7.9.2 Examples of Automatic Semicolon Insertion
The source
{ 1 2 } 3
is not a valid sentence in the ECMAScript grammar, even with the automatic semicolon insertion rules. In
contrast, the source
{ 1
2 } 3
is also not a valid ECMAScript sentence, but is transformed by automatic semicolon insertion into the following:
{ 1
;2 ;} 3;
which is a valid ECMAScript sentence.
The source
for (a; b
)
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion because the
semicolon is needed for the header of a
for statement. Automatic semicolon insertion never inserts one of
the two semicolons in the header of a
for statement.
The source
return
a + b
is transformed by automatic semicolon insertion into the following:
return;
a + b;
NOTE     The expression a + b is not treated as a value to be returned by the
return statement, because a
LineTerminator separates it from the token return.
The source
a = b
++c
is transformed by automatic semicolon insertion into the following:
a = b;
++c;
NOTE     The token ++ is not treated as a postfix operator applying to the variable b,
because a LineTerminator occurs between b and ++.
The source
if (a > b)
else c = d
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion before the
else token,
even though no production of the grammar applies at that point, because an automatically inserted semicolon
would then be parsed as an empty statement.
The source
a = b + c
(d + e).print()
is
not transformed by automatic semicolon insertion, because the parenthesized expression that begins the
second line can be interpreted as an argument list for a function call:
a = b + c(d + e).print()
© Ecma International 200927

In the circumstance that an assignment statement must begin with a left parenthesis, it is a good idea for the
programmer to provide an explicit semicolon at the end of the preceding statement rather than to rely on
automatic semicolon insertion.
© Ecma International 200928