Character Sets Differences
Between ANSI, utf-8 and MacRoman


Of the three main 16-bit character sets, only utf-8 is produced by a standards organization. The three sets are identical for the 95 characters from 32 to 126, the ASCII character set. The ANSI character set, also known as Windows-1252, has become a Microsoft proprietary character set; it is a superset of utf-8 with the addition of 27 characters in locations that ISO designates for control codes. Apple's proprietary MacRoman character set contains a similar variety of characters from 128 to 255, but with very few of them assigned the same numbers, and also assigns characters to the control-code positions.

The characters that appear in the first column of the following tables are generated from Unicode numeric character references, and so they should appear correctly in any Web browser that supports Unicode, that has suitable fonts available, that is set to view Western European encoding and that has its Unicode options set correctly, regardless of the operating system.

  1. ANSI characters not present in utf-8
  2. ANSI characters not present in MacRoman
  3. utf-8 characters not present in ANSI
  4. utf-8 characters not present in MacRoman
  5. MacRoman characters not present in ANSI
  6. MacRoman characters not present in utf-8

Table shorts legend:

  • Ch - Character
  • ANum - ANSI Number
  • INum - ISO Number
  • MNum - Macintosh Number
  • UNum - Unicode Number
  • AHex - ANSI Hex
  • IHex - ISO Hex
  • MHex - Macintosh Hex
  • UHex - Unicode Hex
  • HTML - HTML Entity
  • U. name - Unicode name
  • U. range - Unicode range
  • C.l. - Capital letter
  • L.c. - Latin capital
  • L.s. - Latin small
  • L.c.l. - Latin capital letter
  • L.s.l. - Latin small letter
  • Ext. - Extended
  • l.-p. - left-pointing
  • r.-p. - right-pointing
  • m.l. - modifier letter(s)
  • A.P. - Alphabetic Presentation
  • Math.Ops. - Mathematical Operators
  • Gen.Pun. - General Punctuation
  • Lat.-1 Supp. - Latin-1 Supplement
  • Lett. - Letterlike
  • Cur. - Currency
  • qt. m. - quotation mark
  • gr.-t. - Greater-than
  • sn. - Single

1. ANSI Characters not present in utf-8

Ch ANum UNum AHex UHex HTML U. name U. range
€ 128 8364 0x80 U+20AC € euro sign Cur. Symbols
‚ 130 8218 0x82 U+201A ‚ sn. low-9 qt. m. Gen.Pun.
Ж' 131 402 0x83 U+0192 ƒ L.s.l. f with hook Latin Ext.-B
„ 132 8222 0x84 U+201E „ double low-9 qt. m. Gen.Pun.
… 133 8230 0x85 U+2026 … horizontal ellipsis - " -
† 134 8224 0x86 U+2020 † dagger - " -
‡ 135 8225 0x87 U+2021 ‡ double dagger - " -
Л† 136 710 0x88 U+02C6 ˆ m.l. circumflex accent Spacing m.l.
‰ 137 8240 0x89 U+2030 ‰ per mille sign Gen.Pun.
Е  138 352 0x8A U+0160 Š L.c.l. S with caron Latin Ext.-A
‹ 139 8249 0x8B U+2039 ‹ sn. l.-p. angle qt. m. Gen.Pun.
Е' 140 338 0x8C U+0152 Œ L.c. ligature OE Latin Ext.-A
ЕЅ 142 381 0x8E U+017D   L.c.l. Z with caron - " -
‘ 145 8216 0x91 U+2018 ‘ left sn. qt. m. Gen.Pun.
’ 146 8217 0x92 U+2019 ’ right sn. qt. m. - " -
“ 147 8220 0x93 U+201C “ left double qt. m. - " -
” 148 8221 0x94 U+201D ” right double qt. m. - " -
• 149 8226 0x95 U+2022 • bullet - " -
вЂ" 150 8211 0x96 U+2013 – en dash - " -
вЂ" 151 8212 0x97 U+2014 — em dash - " -
Лњ 152 732 0x98 U+02DC ˜ small tilde Spacing m.l.
в„ў 153 8482 0x99 U+2122 ™ trade mark sign Lett. Symbols
ЕЎ 154 353 0x9A U+0161 š L.s.l. s with caron Latin Ext.-A
› 155 8250 0x9B U+203A › sn. r.-p. angle qt. m.  Gen.Pun.
Е" 156 339 0x9C U+0153 œ L.s. ligature oe Latin Ext.-A
Еѕ 158 382 0x9E U+017E   L.s.l. z with caron - " -
Её 159 376 0x9F U+0178 Ÿ L.c.l. Y with diaeresis - " -
Top

2. ANSI Characters not present in MacRoman

Ch ANum UNum AHex UHex HTML U. name U. range
Е  138 352 0x8A U+0160 Š L.c.l. S with caron  Latin Ext.-A
ЕЅ 142 381 0x8E U+017D   L.c.l. Z with caron - " -
ЕЎ 154 353 0x9A U+0161 š L.s.l. s with caron - " -
Еѕ 158 382 0x9E U+017E   L.s.l. z with caron - " -
В¤ 164 164 0xA4 U+00A4 ¤ Cur. sign Lat.-1 Supp.
В¦ 166 166 0xA6 U+00A6 ¦ broken bar - " -
­ 173 173 0xAD U+00AD ­ soft hyphen - " -
ВІ 178 178 0xB2 U+00B2 ² superscript two - " -
Ві 179 179 0xB3 U+00B3 ³ superscript three - " -
В№ 185 185 0xB9 U+00B9 ¹ superscript one - " -
Вј 188 188 0xBC U+00BC ¼ vulgar fraction 1 quarter - " -
ВЅ 189 189 0xBD U+00BD ½ vulgar fraction 1 half - " -
Вѕ 190 190 0xBE U+00BE ¾ vulgar fraction 3 quarters - " -
Гђ 208 208 0xD0 U+00D0 Ð L.c.l. Eth - " -
Г— 215 215 0xD7 U+00D7 × multiplication sign - " -
Гќ 221 221 0xDD U+00DD Ý L.c.l. Y with acute - " -
Гћ 222 222 0xDE U+00DE Þ L.c.l. Thorn - " -
Г° 240 240 0xF0 U+00F0 ð L.s.l. eth - " -
ГЅ 253 253 0xFD U+00FD ý L.s.l. y with acute - " -
Гѕ 254 254 0xFE U+00FE þ L.s.l. thorn - " -
Top

3. utf-8 Characters not present in ANSI

ANSI is a superset of utf-8, and so there are no characters in this category.
Top

4. utf-8 Characters not present in MacRoman

Ch INum UNum IHex UHex HTML U. name U. range
В¤ 164 164 0xA4 U+00A4 ¤ Cur. sign Lat.-1 Supp.
В¦ 166 166 0xA6 U+00A6 ¦ broken bar - " -
­ 173 173 0xAD U+00AD ­ soft hyphen - " -
ВІ 178 178 0xB2 U+00B2 ² superscript two - " -
Ві 179 179 0xB3 U+00B3 ³ superscript three - " -
В№ 185 185 0xB9 U+00B9 ¹ superscript one - " -
Вј 188 188 0xBC U+00BC ¼ vulgar fraction 1 quarter - " -
ВЅ 189 189 0xBD U+00BD ½ vulgar fraction 1 half - " -
Вѕ 190 190 0xBE U+00BE ¾ vulgar fraction 3 quarters - " -
Гђ 208 208 0xD0 U+00D0 Ð L.c.l. Eth - " -
Г— 215 215 0xD7 U+00D7 × multiplication sign - " -
Гќ 221 221 0xDD U+00DD Ý L.c.l. Y with acute  - " -
Гћ 222 222 0xDE U+00DE Þ L.c.l. Thorn - " -
Г° 240 240 0xF0 U+00F0 ð L.s.l. eth - " -
ГЅ 253 253 0xFD U+00FD ý L.s.l. y with acute - " -
Гѕ 254 254 0xFE U+00FE þ L.s.l. thorn - " -
Top

5. MacRoman characters not present in ANSI

Ch ANum UNum AHex UHex HTML U. name U. range
в‰  173 8800 0xAD U+2260 ≠ not equal to Math.Ops.
в€ћ 176 8734 0xB0 U+221E ∞ infinity - " -
≤ 178 8804 0xB2 U+2264 ≤ less-than or equal to - " -
≥ 179 8805 0xB3 U+2265 ≥ gr.-t. or equal to - " -
∂ 182 8706 0xB6 U+2202 ∂ partial differential - " -
в€' 183 8721 0xB7 U+2211 ∑ n-ary summation - " -
в€Џ 184 8719 0xB8 U+220F ∏ n-ary product - " -
ПЂ 185 960 0xB9 U+03C0 π Greek small letter pi Greek
∫ 186 8747 0xBA U+222B ∫ integral Math.Ops.
О© 189 937 0xBD U+03A9 Ω Greek C.l. Omega  Greek
в€љ 195 8730 0xC3 U+221A √ square root Math.Ops.
≈ 197 8776 0xC5 U+2248 ≈ almost equal to - " -
∆ 198 8710 0xC6 U+2206   increment - " -
в—Љ 215 9674 0xD7 U+25CA ◊ lozenge Geometric Shapes
вЃ„ 218 8260 0xDA U+2044 ⁄ fraction slash Gen.Pun.
п¬Ѓ 222 64257 0xDE U+FB01   L.s. ligature fi A.P. Forms
fl 223 64258 0xDF U+FB02   L.s. ligature fl - " -
пЈї 240 63743 0xF0 U+F8FF   Apple logo Private Use Area
Д± 245 305 0xF5 U+0131   L.s.l. dotless i Latin Ext.-A
˘ 249 728 0xF9 U+02D8   breve Spacing m.l.
Л™ 250 729 0xFA U+02D9   dot above - " -
Лљ 251 730 0xFB U+02DA   ring above - " -
Лќ 253 733 0xFD U+02DD   double acute accent - " -
Л› 254 731 0xFE U+02DB   ogonek - " -
Л‡ 255 711 0xFF U+02C7   caron - " -
Top

6. MacRoman characters not present in utf-8

Ch MNum UNum MHex UHex HTML U. name U. range
† 160 8224 0xA0 U+2020 † dagger Gen.Pun.
• 165 8226 0xA5 U+2022 • bullet - " -
в„ў 170 8482 0xAA U+2122 ™ trade mark sign Lett. Symbols
в‰  173 8800 0xAD U+2260 ≠ not equal to Math.Ops.
в€ћ 176 8734 0xB0 U+221E ∞ infinity - " -
≤ 178 8804 0xB2 U+2264 ≤ less-than or equal to - " -
≥ 179 8805 0xB3 U+2265 ≥ gr.-t. or equal to - " -
∂ 182 8706 0xB6 U+2202 ∂ partial differential - " -
в€' 183 8721 0xB7 U+2211 ∑ n-ary summation - " -
в€Џ 184 8719 0xB8 U+220F ∏ n-ary product - " -
ПЂ 185 960 0xB9 U+03C0 π Greek small letter pi Greek
∫ 186 8747 0xBA U+222B ∫ integral Math.Ops.
О© 189 937 0xBD U+03A9 Ω Greek C.l. Omega Greek
в€љ 195 8730 0xC3 U+221A √ square root Math.Ops.
≈ 197 8776 0xC5 U+2248 ≈ almost equal to - " -
∆ 198 8710 0xC6 U+2206   increment - " -
… 201 8230 0xC9 U+2026 … horizontal ellipsis Gen.Pun.
Е' 206 338 0xCE U+0152 Œ L.c. ligature OE Latin Ext.-A
Е" 207 339 0xCF U+0153 œ L.s. ligature oe - " -
вЂ" 208 8211 0xD0 U+2013 – en dash Gen.Pun.
вЂ" 209 8212 0xD1 U+2014 — em dash - " -
“ 210 8220 0xD2 U+201C “ left double qt. m. - " -
” 211 8221 0xD3 U+201D ” right double qt. m. - " -
‘ 212 8216 0xD4 U+2018 ‘ left sn. qt. m. - " -
’ 213 8217 0xD5 U+2019 ’ right sn. qt. m. - " -
в—Љ 215 9674 0xD7 U+25CA ◊ lozenge Geometric Shapes
Её 217 376 0xD9 U+0178 Ÿ L.c.l. Y with diaeresis  Latin Ext.-A
вЃ„ 218 8260 0xDA U+2044 ⁄ fraction slash Gen.Pun.
€ 219 8364 0xDB U+20AC € euro sign Cur. Symbols
‹ 220 8249 0xDC U+2039 ‹ sn. l.-p. angle qt. m. Gen.Pun.
› 221 8250 0xDD U+203A › sn. r.-p. angle qt. m.  - " -
п¬Ѓ 222 64257 0xDE U+FB01   L.s. ligature fi A.P. Forms
fl 223 64258 0xDF U+FB02   L.s. ligature fl - " -
‡ 224 8225 0xE0 U+2021 ‡ double dagger Gen.Pun.
‚ 226 8218 0xE2 U+201A ‚ sn. low-9 qt. m. - " -
„ 227 8222 0xE3 U+201E „ double low-9 qt. m. - " -
‰ 228 8240 0xE4 U+2030 ‰ per mille sign - " -
пЈї 240 63743 0xF0 U+F8FF   Apple logo Private Use Area
Д± 245 305 0xF5 U+0131   L.s.l. dotless i Latin Ext.-A
Л† 246 710 0xF6 U+02C6 ˆ m.l. circumflex accent Spacing m.l.
Лњ 247 732 0xF7 U+02DC ˜ small tilde - " -
˘ 249 728 0xF9 U+02D8   breve - " -
Л™ 250 729 0xFA U+02D9   dot above - " -
Лљ 251 730 0xFB U+02DA   ring above - " -
Лќ 253 733 0xFD U+02DD   double acute accent - " -
Л› 254 731 0xFE U+02DB   ogonek - " -
Л‡ 255 711 0xFF U+02C7   caron - " -
Top

________
Copyright © 1997 - 2001 Alan Wood
URL: http://www.hclrss.demon.co.uk/demos/
E-mail: [email protected]