URL-encoding
Question: How do I convert a string to URL-encoding?
Answer: You can convert a string to the URL-encoded form (suitable for transmission as a query string or, generally speaking, as part of a URL) using the JavaScript functions escape
, encodeURI
and encodeURIComponent
. Below is a detailed discussion of these functions.
escape
In all browsers that support JavaScript, you can use the escape
function. This function works as follows: digits, Latin letters and the characters + - * / . _ @
remain unchanged; all other characters in the original string are replaced by escape-sequences %XX
, where XX
is the ASCII code of the original character. Example:
escape("It's me!") // result: It%27s%20me%21
For Unicode input strings, the function escape
has a more complex behavior. If the input is a Unicode string, then non-ASCII Unicode characters will be converted to the Unicode escape-sequences %uXXXX
. For example, escape
will encode the capital Cyrillic letter A as %u0410
.
encodeURI
and encodeURIComponent
In addition to escape
, modern browsers support two more functions for URL-encoding: encodeURI
and encodeURIComponent
. These functions are similar to escape
, except that they leave intact some characters that escape
encodes (e.g. apostrophe, tilde, parentheses); moreover, encodeURIComponent
encodes some characters (+ / @
) that escape
leaves intact. Unlike escape
, the functions encodeURI
and encodeURIComponent
do not produce %uXXXX
for Unicode input; instead, they produce %XX%XX
. For example, encodeURI
and encodeURIComponent
will encode the capital Cyrillic letter A as %D0%90
.
The following tables illustrate the differences between escape
, encodeURI
, and encodeURIComponent
for
Differences between encodeURI, encodeURIComponent, and escape:
lower ASCII characters (codes 1-127)
chr escape(chr) encodeURI(chr) encodeURIComponent(chr)
_ _ _ _
- - - -
. . . .
* * * *
+ + + %2B
/ / / %2F
@ @ @ %40
~ %7E ~ ~
! %21 ! !
' %27 ' '
( %28 ( (
) %29 ) )
# %23 # %23
$ %24 $ %24
& %26 & %26
, %2C , %2C
: %3A : %3A
; %3B ; %3B
= %3D = %3D
? %3F ? %3F
all other lower-ASCII characters produce identical results:
space %20 %20 %20
" %22 %22 %22
% %25 %25 %25
< %3C %3C %3C
> %3E %3E %3E
[ %5B %5B %5B
\ %5C %5C %5C
] %5D %5D %5D
^ %5E %5E %5E
{ %7B %7B %7B
| %7C %7C %7C
} %7D %7D %7D
... ... ... ...
Differences between encodeURI, encodeURIComponent, and escape:
upper ASCII characters (codes 128-255)
As shown in the table below, encodeURI
, encodeURIComponent
, and escape
produce different results for upper ASCII characters. For example, the non-breaking space character (ASCII-code 0xA0, or 160) will be encoded as %A0
if you use escape
, and %C2%A0
if you use encodeURI
or encodeURIComponent
. The small ü or u-umlaut letter (ASCII-code 0xFC, or 252) will be encoded as %FC
if you use escape
, and %C3%BC
if you use encodeURI
or encodeURIComponent
. chr escape(chr) encodeURI(chr) encodeURIComponent(chr)
%A0 %C2%A0 %C2%A0
¡ %A1 %C2%A1 %C2%A1
¢ %A2 %C2%A2 %C2%A2
£ %A3 %C2%A3 %C2%A3
¤ %A4 %C2%A4 %C2%A4
¥ %A5 %C2%A5 %C2%A5
¦ %A6 %C2%A6 %C2%A6
§ %A7 %C2%A7 %C2%A7
¨ %A8 %C2%A8 %C2%A8
© %A9 %C2%A9 %C2%A9
ª %AA %C2%AA %C2%AA
« %AB %C2%AB %C2%AB
¬ %AC %C2%AC %C2%AC
%AD %C2%AD %C2%AD
® %AE %C2%AE %C2%AE
¯ %AF %C2%AF %C2%AF
° %B0 %C2%B0 %C2%B0
± %B1 %C2%B1 %C2%B1
² %B2 %C2%B2 %C2%B2
³ %B3 %C2%B3 %C2%B3
´ %B4 %C2%B4 %C2%B4
µ %B5 %C2%B5 %C2%B5
¶ %B6 %C2%B6 %C2%B6
· %B7 %C2%B7 %C2%B7
¸ %B8 %C2%B8 %C2%B8
¹ %B9 %C2%B9 %C2%B9
º %BA %C2%BA %C2%BA
» %BB %C2%BB %C2%BB
¼ %BC %C2%BC %C2%BC
½ %BD %C2%BD %C2%BD
¾ %BE %C2%BE %C2%BE
¿ %BF %C2%BF %C2%BF
À %C0 %C3%80 %C3%80
Á %C1 %C3%81 %C3%81
 %C2 %C3%82 %C3%82
à %C3 %C3%83 %C3%83
Ä %C4 %C3%84 %C3%84
Å %C5 %C3%85 %C3%85
Æ %C6 %C3%86 %C3%86
Ç %C7 %C3%87 %C3%87
È %C8 %C3%88 %C3%88
É %C9 %C3%89 %C3%89
Ê %CA %C3%8A %C3%8A
Ë %CB %C3%8B %C3%8B
Ì %CC %C3%8C %C3%8C
Í %CD %C3%8D %C3%8D
Î %CE %C3%8E %C3%8E
Ï %CF %C3%8F %C3%8F
Ð %D0 %C3%90 %C3%90
Ñ %D1 %C3%91 %C3%91
Ò %D2 %C3%92 %C3%92
Ó %D3 %C3%93 %C3%93
Ô %D4 %C3%94 %C3%94
Õ %D5 %C3%95 %C3%95
Ö %D6 %C3%96 %C3%96
× %D7 %C3%97 %C3%97
Ø %D8 %C3%98 %C3%98
Ù %D9 %C3%99 %C3%99
Ú %DA %C3%9A %C3%9A
Û %DB %C3%9B %C3%9B
Ü %DC %C3%9C %C3%9C
Ý %DD %C3%9D %C3%9D
Þ %DE %C3%9E %C3%9E
ß %DF %C3%9F %C3%9F
à %E0 %C3%A0 %C3%A0
á %E1 %C3%A1 %C3%A1
â %E2 %C3%A2 %C3%A2
ã %E3 %C3%A3 %C3%A3
ä %E4 %C3%A4 %C3%A4
å %E5 %C3%A5 %C3%A5
æ %E6 %C3%A6 %C3%A6
ç %E7 %C3%A7 %C3%A7
è %E8 %C3%A8 %C3%A8
é %E9 %C3%A9 %C3%A9
ê %EA %C3%AA %C3%AA
ë %EB %C3%AB %C3%AB
ì %EC %C3%AC %C3%AC
í %ED %C3%AD %C3%AD
î %EE %C3%AE %C3%AE
ï %EF %C3%AF %C3%AF
ð %F0 %C3%B0 %C3%B0
ñ %F1 %C3%B1 %C3%B1
ò %F2 %C3%B2 %C3%B2
ó %F3 %C3%B3 %C3%B3
ô %F4 %C3%B4 %C3%B4
õ %F5 %C3%B5 %C3%B5
ö %F6 %C3%B6 %C3%B6
÷ %F7 %C3%B7 %C3%B7
ø %F8 %C3%B8 %C3%B8
ù %F9 %C3%B9 %C3%B9
ú %FA %C3%BA %C3%BA
û %FB %C3%BB %C3%BB
ü %FC %C3%BC %C3%BC
ý %FD %C3%BD %C3%BD
þ %FE %C3%BE %C3%BE
ÿ %FF %C3%BF %C3%BF
Differences between encodeURI, encodeURIComponent, and escape:
Unicode (non-ASCII) characters
As a simple example of Unicode (non-ASCII) characters, the table below shows the URL encodings for part of the Cyrillic subset of the Unicode character set (\u0410 thru \u042F). Note that the same Unicode character may produce the encoding %uXXXX
if you use escape
and the encoding %XX%XX
if you use encodeURI
or encodeURIComponent
. Importantly, the functions escape
, encodeURI
, and encodeURIComponent
work like that for other parts of the Unicode character set as well: only the escape
function may return the encoding of the form %uXXXX
. chr escape(chr) encodeURI(chr) encodeURIComponent(chr)
А %u0410 %D0%90 %D0%90
Б %u0411 %D0%91 %D0%91
В %u0412 %D0%92 %D0%92
Г %u0413 %D0%93 %D0%93
Д %u0414 %D0%94 %D0%94
Е %u0415 %D0%95 %D0%95
Ж %u0416 %D0%96 %D0%96
З %u0417 %D0%97 %D0%97
И %u0418 %D0%98 %D0%98
Й %u0419 %D0%99 %D0%99
К %u041A %D0%9A %D0%9A
Л %u041B %D0%9B %D0%9B
М %u041C %D0%9C %D0%9C
Н %u041D %D0%9D %D0%9D
О %u041E %D0%9E %D0%9E
П %u041F %D0%9F %D0%9F
Р %u0420 %D0%A0 %D0%A0
С %u0421 %D0%A1 %D0%A1
Т %u0422 %D0%A2 %D0%A2
У %u0423 %D0%A3 %D0%A3
Ф %u0424 %D0%A4 %D0%A4
Х %u0425 %D0%A5 %D0%A5
Ц %u0426 %D0%A6 %D0%A6
Ч %u0427 %D0%A7 %D0%A7
Ш %u0428 %D0%A8 %D0%A8
Щ %u0429 %D0%A9 %D0%A9
Ъ %u042A %D0%AA %D0%AA
Ы %u042B %D0%AB %D0%AB
Ь %u042C %D0%AC %D0%AC
Э %u042D %D0%AD %D0%AD
Ю %u042E %D0%AE %D0%AE
Я %u042F %D0%AF %D0%AF
No comments:
Post a Comment