UTF-8 Encoding
UTF-8 Encoding (루bits£©)
8-bit Unicode Transformation format, called UTF-8, is a variable width character encoding that can encode all of the 1.111.064 valid code points in Unicode wit one to four 8-bit bytes. The number “8” means 8-bit blocks are used by UTF for representing a character. (8位Unicode转换格式,称为UTF-8 ,是一种可变宽度字符编码,可以用一到四个8位字节对Unicode中的所有1.111.064有效代码点进行编码。数字“8”表示UTF使用8位块来表示字符。)
Since 2009, UTF-8 has been the leading encoding for the World Wide Web. (自2009年以来, UTF-8一直是万维网的领先编码方式。)
For characters that are equal to or below 127 (hex 0x7F), the UTF-8 representation is one byte. This is similar to the ASCII value. (对于等于或低于127 (十六进制0x7F )的字符, UTF-8表示为一个字节。这与ASCII值类似。)
For any character equal to or below 2047 (hex 0x07FF), the UTF-8 representation is scattered over two bytes. (对于等于或低于2047的任何字符(十六进制0x07FF ) , UTF-8表示分散在两个字节上。)
For any character that is equal to or greater than 2048 but less than 65535 (0xFFFF), the UTF-8 representation will be spread across three bytes. (对于等于或大于2048但小于65535 (0xFFFF)的任何字符, UTF-8表示将分布在三个字节中。)
The list below shows some UTF-8 character codes which are supported by HTML5:
Character Codes | Decimal | Hexadecimal |
---|---|---|
C0 Controls and Basic Latin | 0-127 | 0000-007F |
C1 Controls and Latin-1 Supplement | 128-255 | 0080-00FF |
Latin Extended-A | 256-383 | 0100-017F |
Latin Extended-B | 384-591 | 0180-024F |
Spacing Modifiers | 688-767 | 02B0-02FF |
Diacritical Marks | 768-879 | 0300-036F |
Greek and Coptic | 880-1023 | 0370-03FF |
Cyrillic Basic | 1024-1279 | 0400-04FF |
Cyrillic Supplement | 1280-1327 | 0500-052F |
General Punctuation | 8192-8303 | 2000-206F |
Currency Symbols | 8352-8399 | 20A0-20CF |
Letterlike Symbols | 8448-8527 | 2100-214F |
Arrows | 8592-8703 | 2190-21FF |
Mathmetical Operators | 8704-8959 | 2200-22FF |
Box Drawings | 9472-9599 | 2500-257F |
Block Elements | 9600-9631 | 2580-259F |
Geometric Shapes | 9632-9727 | 25A0-25FF |
Miscellaneous Symbols | 9728-9983 | 2600-26FF |
Dingbats | 9984-10175 | 2700-27BF |