Wednesday, February 2, 2011

URL-encoding

URL-encoding


Question: How do I convert a string to URL-encoding?

Answer: You can convert a string to the URL-encoded form (suitable for transmission as a query string or, generally speaking, as part of a URL) using the JavaScript functions escape, encodeURI and encodeURIComponent. Below is a detailed discussion of these functions.

escape   In all browsers that support JavaScript, you can use the escape function. This function works as follows: digits, Latin letters and the characters + - * / . _ @ remain unchanged; all other characters in the original string are replaced by escape-sequences %XX, where XX is the ASCII code of the original character. Example:

escape("It's me!") // result: It%27s%20me%21

For Unicode input strings, the function escape has a more complex behavior. If the input is a Unicode string, then non-ASCII Unicode characters will be converted to the Unicode escape-sequences %uXXXX. For example, escape will encode the capital Cyrillic letter A as %u0410.

encodeURI and encodeURIComponent   In addition to escape, modern browsers support two more functions for URL-encoding: encodeURI and encodeURIComponent. These functions are similar to escape, except that they leave intact some characters that escape encodes (e.g. apostrophe, tilde, parentheses); moreover, encodeURIComponent encodes some characters (+ / @) that escape leaves intact. Unlike escape, the functions encodeURI and encodeURIComponent do not produce %uXXXX for Unicode input; instead, they produce %XX%XX. For example, encodeURI and encodeURIComponent will encode the capital Cyrillic letter A as %D0%90.

The following tables illustrate the differences between escape, encodeURI, and encodeURIComponent for

  • lowerASCII characters (codes 1 thru 127),
  • upperASCII characters (codes 128 thru 255), and
  • Unicode characters.


  • Differences between encodeURI, encodeURIComponent, and escape:
    lower ASCII characters (codes 1-127)

    chr     escape(chr)   encodeURI(chr)  encodeURIComponent(chr)
    _ _ _ _
    - - - -
    . . . .
    * * * *
    + + + %2B
    / / / %2F
    @ @ @ %40
    ~ %7E ~ ~
    ! %21 ! !
    ' %27 ' '
    ( %28 ( (
    ) %29 ) )
    # %23 # %23
    $ %24 $ %24
    & %26 & %26
    , %2C , %2C
    : %3A : %3A
    ; %3B ; %3B
    = %3D = %3D
    ? %3F ? %3F

    all other lower-ASCII characters produce identical results:

    space %20 %20 %20
    " %22 %22 %22
    % %25 %25 %25
    < %3C %3C %3C
    > %3E %3E %3E
    [ %5B %5B %5B
    \ %5C %5C %5C
    ] %5D %5D %5D
    ^ %5E %5E %5E
    { %7B %7B %7B
    | %7C %7C %7C
    } %7D %7D %7D
    ... ... ... ...

     

    Differences between encodeURI, encodeURIComponent, and escape:
    upper ASCII characters (codes 128-255)

    As shown in the table below, encodeURI, encodeURIComponent, and escape produce different results for upper ASCII characters. For example, the non-breaking space character (ASCII-code 0xA0, or 160) will be encoded as %A0 if you use escape, and %C2%A0 if you use encodeURI or encodeURIComponent. The small ü or u-umlaut letter (ASCII-code 0xFC, or 252) will be encoded as %FC if you use escape, and %C3%BC if you use encodeURI or encodeURIComponent.
    chr    escape(chr)   encodeURI(chr)  encodeURIComponent(chr)
      %A0 %C2%A0 %C2%A0
    ¡ %A1 %C2%A1 %C2%A1
    ¢ %A2 %C2%A2 %C2%A2
    £ %A3 %C2%A3 %C2%A3
    ¤ %A4 %C2%A4 %C2%A4
    ¥ %A5 %C2%A5 %C2%A5
    ¦ %A6 %C2%A6 %C2%A6
    § %A7 %C2%A7 %C2%A7
    ¨ %A8 %C2%A8 %C2%A8
    © %A9 %C2%A9 %C2%A9
    ª %AA %C2%AA %C2%AA
    « %AB %C2%AB %C2%AB
    ¬ %AC %C2%AC %C2%AC
    ­ %AD %C2%AD %C2%AD
    ® %AE %C2%AE %C2%AE
    ¯ %AF %C2%AF %C2%AF
    ° %B0 %C2%B0 %C2%B0
    ± %B1 %C2%B1 %C2%B1
    ² %B2 %C2%B2 %C2%B2
    ³ %B3 %C2%B3 %C2%B3
    ´ %B4 %C2%B4 %C2%B4
    µ %B5 %C2%B5 %C2%B5
    ¶ %B6 %C2%B6 %C2%B6
    · %B7 %C2%B7 %C2%B7
    ¸ %B8 %C2%B8 %C2%B8
    ¹ %B9 %C2%B9 %C2%B9
    º %BA %C2%BA %C2%BA
    » %BB %C2%BB %C2%BB
    ¼ %BC %C2%BC %C2%BC
    ½ %BD %C2%BD %C2%BD
    ¾ %BE %C2%BE %C2%BE
    ¿ %BF %C2%BF %C2%BF
    À %C0 %C3%80 %C3%80
    Á %C1 %C3%81 %C3%81
    Â %C2 %C3%82 %C3%82
    Ã %C3 %C3%83 %C3%83
    Ä %C4 %C3%84 %C3%84
    Å %C5 %C3%85 %C3%85
    Æ %C6 %C3%86 %C3%86
    Ç %C7 %C3%87 %C3%87
    È %C8 %C3%88 %C3%88
    É %C9 %C3%89 %C3%89
    Ê %CA %C3%8A %C3%8A
    Ë %CB %C3%8B %C3%8B
    Ì %CC %C3%8C %C3%8C
    Í %CD %C3%8D %C3%8D
    Î %CE %C3%8E %C3%8E
    Ï %CF %C3%8F %C3%8F
    Ð %D0 %C3%90 %C3%90
    Ñ %D1 %C3%91 %C3%91
    Ò %D2 %C3%92 %C3%92
    Ó %D3 %C3%93 %C3%93
    Ô %D4 %C3%94 %C3%94
    Õ %D5 %C3%95 %C3%95
    Ö %D6 %C3%96 %C3%96
    × %D7 %C3%97 %C3%97
    Ø %D8 %C3%98 %C3%98
    Ù %D9 %C3%99 %C3%99
    Ú %DA %C3%9A %C3%9A
    Û %DB %C3%9B %C3%9B
    Ü %DC %C3%9C %C3%9C
    Ý %DD %C3%9D %C3%9D
    Þ %DE %C3%9E %C3%9E
    ß %DF %C3%9F %C3%9F
    à %E0 %C3%A0 %C3%A0
    á %E1 %C3%A1 %C3%A1
    â %E2 %C3%A2 %C3%A2
    ã %E3 %C3%A3 %C3%A3
    ä %E4 %C3%A4 %C3%A4
    å %E5 %C3%A5 %C3%A5
    æ %E6 %C3%A6 %C3%A6
    ç %E7 %C3%A7 %C3%A7
    è %E8 %C3%A8 %C3%A8
    é %E9 %C3%A9 %C3%A9
    ê %EA %C3%AA %C3%AA
    ë %EB %C3%AB %C3%AB
    ì %EC %C3%AC %C3%AC
    í %ED %C3%AD %C3%AD
    î %EE %C3%AE %C3%AE
    ï %EF %C3%AF %C3%AF
    ð %F0 %C3%B0 %C3%B0
    ñ %F1 %C3%B1 %C3%B1
    ò %F2 %C3%B2 %C3%B2
    ó %F3 %C3%B3 %C3%B3
    ô %F4 %C3%B4 %C3%B4
    õ %F5 %C3%B5 %C3%B5
    ö %F6 %C3%B6 %C3%B6
    ÷ %F7 %C3%B7 %C3%B7
    ø %F8 %C3%B8 %C3%B8
    ù %F9 %C3%B9 %C3%B9
    ú %FA %C3%BA %C3%BA
    û %FB %C3%BB %C3%BB
    ü %FC %C3%BC %C3%BC
    ý %FD %C3%BD %C3%BD
    þ %FE %C3%BE %C3%BE
    ÿ %FF %C3%BF %C3%BF

     

    Differences between encodeURI, encodeURIComponent, and escape:
    Unicode (non-ASCII) characters

    As a simple example of Unicode (non-ASCII) characters, the table below shows the URL encodings for part of the Cyrillic subset of the Unicode character set (\u0410 thru \u042F). Note that the same Unicode character may produce the encoding %uXXXX if you use escape and the encoding %XX%XX if you use encodeURI or encodeURIComponent. Importantly, the functions escape, encodeURI, and encodeURIComponent work like that for other parts of the Unicode character set as well: only the escape function may return the encoding of the form %uXXXX.
    chr    escape(chr)   encodeURI(chr)  encodeURIComponent(chr)
    А %u0410 %D0%90 %D0%90
    Б %u0411 %D0%91 %D0%91
    В %u0412 %D0%92 %D0%92
    Г %u0413 %D0%93 %D0%93
    Д %u0414 %D0%94 %D0%94
    Е %u0415 %D0%95 %D0%95
    Ж %u0416 %D0%96 %D0%96
    З %u0417 %D0%97 %D0%97
    И %u0418 %D0%98 %D0%98
    Й %u0419 %D0%99 %D0%99
    К %u041A %D0%9A %D0%9A
    Л %u041B %D0%9B %D0%9B
    М %u041C %D0%9C %D0%9C
    Н %u041D %D0%9D %D0%9D
    О %u041E %D0%9E %D0%9E
    П %u041F %D0%9F %D0%9F
    Р %u0420 %D0%A0 %D0%A0
    С %u0421 %D0%A1 %D0%A1
    Т %u0422 %D0%A2 %D0%A2
    У %u0423 %D0%A3 %D0%A3
    Ф %u0424 %D0%A4 %D0%A4
    Х %u0425 %D0%A5 %D0%A5
    Ц %u0426 %D0%A6 %D0%A6
    Ч %u0427 %D0%A7 %D0%A7
    Ш %u0428 %D0%A8 %D0%A8
    Щ %u0429 %D0%A9 %D0%A9
    Ъ %u042A %D0%AA %D0%AA
    Ы %u042B %D0%AB %D0%AB
    Ь %u042C %D0%AC %D0%AC
    Э %u042D %D0%AD %D0%AD
    Ю %u042E %D0%AE %D0%AE
    Я %u042F %D0%AF %D0%AF

    No comments:

    Post a Comment