HTML5 CHARSET CHAPTER 21
21 TUTORIAL ON HTLML CHARATER SETS
HTML Encoding
(Character Sets)
To
display an HTML page correctly, a web browser must know which character set to
use.
What is Character Encoding?
ASCII
was the first character encoding standard (also
called character set). ASCII defined 128 different alphanumeric characters that
could be used on the internet: numbers (0-9), English letters (A-Z), and some
special characters like ! $ + - ( ) @ < > .
ISO-8859-1
was the default character set for HTML 4. This character set supported 256
different character codes.
ANSI
(Windows-1252) was the original Windows character set. ANSI is identical to
ISO-8859-1, except that ANSI has 32 extra characters.
Because
ANSI and ISO-8859-1 were so limited, HTML 4 also supported UTF-8.
UTF-8 (Unicode) covers almost all of the
characters and symbols in the world.
The default character set for HTML5 is UTF-8.
The HTML charset Attribute
To
display an HTML page correctly, a web browser must know the character set used in
the page.
This
is specified in the <meta>
tag:
<meta charset="UTF-8">
Differences Between Character
Sets
The following table displays
the differences between the character sets described above:
Numb |
ASCII |
ANSI |
8859 |
UTF-8 |
Description |
32 |
space |
||||
33 |
! |
! |
! |
! |
exclamation mark |
34 |
" |
" |
" |
" |
quotation mark |
35 |
# |
# |
# |
# |
number sign |
36 |
$ |
$ |
$ |
$ |
dollar sign |
37 |
% |
% |
% |
% |
percent sign |
38 |
& |
& |
& |
& |
ampersand |
The ASCII Character Set
ASCII
uses the values from 0 to 31 (and 127) for control characters.
ASCII
uses the values from 32 to 126 for letters, digits, and symbols.
ASCII
does not use the values from 128 to 255.
The ANSI Character Set (Windows-1252)
ANSI
is identical to ASCII for the values from 0 to 127.
ANSI
has a proprietary set of characters for the values from 128 to 159.
ANSI
is identical to UTF-8 for the values from 160 to 255.
The ISO-8859-1 Character Set
8859-1
is identical to ASCII for the values from 0 to 127.
8859-1
does not use the values from 128 to 159.
8859-1
is identical to UTF-8 for the values from 160 to 255.
The UTF-8 Character Set
UTF-8
is identical to ASCII for the values from 0 to 127.
UTF-8
does not use the values from 128 to 159.
UTF-8
is identical to both ANSI and 8859-1 for the values from 160 to 255.
UTF-8
continues from the value 256 with more than 10 000 different characters.
The @charset CSS Rule
You
can use the CSS @charset
rule
to specify the character encoding used in a style sheet:
Example
Set the encoding of the style
sheet to Unicode UTF-8:
@charset "UTF-8";
Comments
Post a Comment