0% found this document useful (0 votes)

33 views8 pages

UTF-8 & LaTeX for ISO-8859-1 Encoding

This technical report discusses the UTF-8 encoding of the ISO-8859-1 (Latin-1) character set, which is essential for Western European languages. It details how traditional ASCII characters are encoded in UTF-8 and provides LATEX commands necessary for using these characters in documents. The report also outlines the differences between input and output encodings in LATEX to ensure proper character representation.

Uploaded by

derikhizo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views8 pages

UTF-8 & LaTeX for ISO-8859-1 Encoding

Uploaded by

derikhizo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/359509972

UTF-8 & Latex Encodings of ISO-8859 (Latin-1) Character Set

Technical Report · March 2022

DOI: 10.13140/RG.2.2.18402.61121

CITATIONS READS

0 4,005

1 author:

Manuel José Fernández Iglesias

atlanTTic - University of Vigo
253 PUBLICATIONS 1,552 CITATIONS

SEE PROFILE

All content following this page was uploaded by Manuel José Fernández Iglesias on 28 March 2022.

The user has requested enhancement of the downloaded file.

UTF-8 & LATEX Encodings of ISO-8859-1 (Latin-1) Character Set

Manuel J. Fernández Iglesias

https://desire.webs.uvigo.gal

Abstract
The ISO-8859-1 character set, also known as Latin-1, is an 8-bit character set that includes all the
characters used in Western European alphabets based on the Latin alphabet. UTF-8 is a variable-
length character encoding format that became the dominant encoding for internet technologies
and most computing platforms. Traditional 7-bit ASCII characters (ISO-8859-1 characters from
0x00 to 0x7F) are encoded in UTF-8 by means of single-byte codes that match the ASCII codes.
The rest of the ISO-8859-1 codes (from 0x80 to 0xfF) are encoded using two bytes. This document
discusses the UTF-8 encoding of the ISO-8859-1 set and includes the LATEX commands necessary
to obtain all the characters in it.

1 Introducción
The ISO-8859-1 (Latin-1) character set is an 8-bit or 256-character character set endorsed by the In-
ternational Organization for Standardization (ISO) that includes the characters used in Western Euro-
pean languages based on the Latin alphabet. As its name implies, it is a subset of ISO-8859, which
addresses other writing systems or alphabets such as Cyrillic, Hebrew, or Arabic. Until the populariza-
tion of the UTF-8 encoding, it was the encoding used by most Unix systems, as well as by the Microsoft
Windows operating system. This character set is also known as extended ASCII because its ﬁrst 128
characters are the same as the ASCII standard developed by the American Standards Association,
now the American National Standards Institute (ANSI).
UTF-8 (8-bit Unicode Transformation Format) is variable-width character encoding deﬁned by the
Unicode standard and also adopted by the Internet Engineering Task Force (IETF) in RFC 2277 (BCP 18)
for future internet standards work, replacing single-byte character sets such as ISO-8859-1. UTF-8 is
the dominant encoding in present-day computing environments and internet technologies, accounting
for 98% of all web pages, and up to 100.0% for some languages and computing scenarios.
UTF-8 directly encodes the traditional 7-bit ASCII characters (ISO-8859-1 characters from 0x00 to
0x7F), so any ASCII message or document is rendered unchanged. UTF-8 encodes practically all com-
monly used symbols worldwide, such as the characters of any alphabet (e.g., Latin, Cyrillic, Chinese,
Japanese, Korean, etc.) or mathematical symbols. For this, it utilizes 2, 3 or 4 bytes. In the case of the
Western European symbols in ISO-8859-1, the remaining 128 glyphs (codes from 0x80 to 0xFF) are
encoded with two bytes according to the model outlined in Table 1.
This document collects the glyphs in the ISO-8859-1 characters set and details its UTF-8 encoding,
as well as the LATEX commands necessary to obtain all the mentioned glyphs. To use any of these
characters directly in a LATEX document, the selected input and output encodings have to be considered.

Licensed under an Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You are free to copy
and redistribute this document in any medium or format but you must give appropriate credit, provide a link to the license,
and indicate if changes were made. You may not use this content for commercial purposes (https://creativecommons.
org/licenses/by-nc/4.0).

1
Table 1. UTF-8 encoding of ISO-8859-1 characters (codes 0x00 – 0xFF).

Range UTF-8 (binary) Comments

0x00-0x7F 0b0xxxxxxx 7-bit ASCII characters.
0x80-0xFF 0b110yyyyy 0b10xxxxxx Accented characters and other com-
monly used symbols.

• In the case of directly using UTF-8 as the input encoding (e.g., the characters in .tex ﬁles), which
is practically the standard in any modern computing scenario, it is not necessary to specify any
encoding as LATEX is conﬁgured by default to use UTF-8. In the case of using another input en-
coding, such as ISO 8859–1, the needed encoding should be loaded explicitly. For example, to
use ISO 8859–1 we would include \usepackage[latin1]{inputenc} in the preamble of the LATEX
document.

• If pdflatex or latex are used, the default output encoding will be OT1 (e.g., characters in .dvi
or .pdf ﬁles), so the output encoding should be set to T1 to obtain the UTF-8 glyphs. The T1 font
encoding, also known as Cork encoding, provides 256 glyph slots, including all glyphs in ISO
8859–1, and allows hyphenation for most Western European languages. In addition to missing
many of the symbols in the UTF-8 repertoire, OT1 does not include accented letters, so accented
letters will be constructed using the \accent primitive instead of the accented characters in the
corresponding font. This has the consequence that words using OT1 will not be hyphenated. To
load the T1 output encoding use \usepackage[T1]{fontenc} in the preamble of your document.

Once our LATEX document is configured to use UTF-8 as input encoding and T1 as output font en-
coding, we may use any glyph from the tables below directly (e.g., by copying it from another UTF-8-
encoded document). Alternatively, in those cases where we cannot directly access the original glyphs,
we can use the LATEX commands provided in the tables. Many of these commands are defined in
the file utf8enc.dfu of the base LATEXdistribution. In the tables below, we include LATEXcommands for
those glyphs that do not appear in the original ASCII character set or that cannot be used directly in a
LATEXdocument because they serve as special characters.

2 Encoding of the ASCII set (0x00 a 0x7F)

The characters in the ﬁrst group (cf. Tables 4 and 5) correspond to the original 7-bit ASCII characters.
These are therefore single-byte symbols whose most signiﬁcant bit is 0. They are grouped into control
characters, digits and punctuation marks, uppercase letters and special characters, and lowercase
letters and special characters, as collected in Table 2.
Table 2. ISO-8859-1/ASCII character grouping (0x00 – 0x7F).

Encoding Group
0b000X XXXX Control characters.
0b001X XXXX Digits and punctuation marks.
0b010X XXXX Uppercase letters and special characters.
0b011X XXXX Lowercase letters and special characters.

The symbols listed in Table 4 for control characters are those included in the ascii LATEX package,
which provides the glyphs and commands to access the symbols in the IBM PC Code Page 437 C0
Graphics. We have included them here because such glyphs are commonly used to represent ASCII

2
control characters in programming environments and interface documentation. To make them avail-
able, the command \usepackage{ascii} must be included in the preamble. This package requires
the T1 encoding above.

3 Encoding of the extended set (0x80 a 0xFF)

The symbols in the second group (cf. Tables 6 and 7) used to be deﬁned as extended ASCII characters.
They were encoded in one byte with the most signiﬁcant bit set to 1 (i.e., codes from 0x80 to 0xFF),
and are encoded in UTF-8 using two bytes. The UTF-8 2-byte encoding is performed as follows:

0b0000 0000 XXXX YYYY → 0b1100 00XX 10XX YYYY

For example, letter “ñ” with ISO 8859–1 code 0xF1 is encoded as (cf. Table 7 in page 7):

0b0000 0000 1111 0001 → 0b1100 0011 1011 0001

That is, as 0xC3B1. The reason for this encoding is to guarantee a relevant property of the UTF-8
encoding system, namely being strictly non overlapping and self-synchronizing. The UTF-8 encoding
is constructed in a way that character boundaries are easily identified by scanning for well-defined
bit patterns in either direction. For example, it is not possible to confuse any one-byte UTF-8 symbol
with the first or second byte of a two-byte symbol. Besides, in a transmission of UTF-8 symbols it is
possible to determine the start of each symbol without restarting the transmission and byte-oriented
string-searching algorithms can be used directly.
The distribution of codes and glyphs in Tables 6 and 7 is outlined in Table 3. Note that uppercase
and lowercase letters differ in a single bit, the same as in the case of ASCII characters in Table 5.
Table 3. ISO-8859-1/UTF-8 character grouping (0x80 – 0xFF).

Encoding Group
0xC2 0b100X XXXX C1 Controls (ISO-8859-1) and symbols (UTF-8).
0xC2 0b101X XXXX Symbols
0xC3 0b100X XXXX Additional uppercase letters.
0xC3 0b101X XXXX Additional lowercase letters.

3
Table 4. UTF-8 encodings from 0x00 to 0x3F. Control characters, digits, and punctuation marks. The original IBM PC CP437 code page
utilized control characters from (soh) to (us) to deﬁne the printable characters in the table. The character corresponding to (nul) was not
part of the original CP437 code page.

Dec Hex S Ctl LATEX Dec Hex S LATEX

0 0x00 ␀ (nul) \NUL 32 0x20 \textvisiblespace
1 0x01 ␁ (soh) \SOH 33 0x21 !
2 0x02 ␂ (stx) \STX 34 0x22 "
3 0x03 ␃ (etx) \ETX 35 0x23 # \#
4 0x04 ␄ (eot) \EOT 36 0x24 $ \$
5 0x05 ␅ (enq) \ENQ 37 0x25 % \%
6 0x06 ␆ (ack) \ACK 38 0x26 & \&
7 0x07 ␇ (bel) \BEL 39 0x27 ' \textquotesingle
8 0x08 ␈ (bs) \BS 40 0x28 (
9 0x09 (tab) 41 0x29 )
10 0x0A ␊ (lf) \LF 42 0x2A *
11 0x0B ␋ (vt) \VT 43 0x2B +
12 0x0C (ff) 44 0x2C ’ \textquoteright
13 0x0D ␍ (cr) \CR 45 0x2D -
14 0x0E ␎ (so) \SO 46 0x2E .
15 0x0F ␏ (si) \SI 47 0x2F /
16 0x10 ␐ (dle) \DLE 48 0x30 0
17 0x11 ␑ (dc1) \DCa 49 0x31 1
18 0x12 ␒ (dc2) \DCb 50 0x32 2
19 0x13 ␓ (dc3) \DCc 51 0x33 3
20 0x14 ␔ (dc4) \DCd 52 0x34 4
21 0x15 ␕ (nak) \NAK 53 0x35 5
22 0x16 ␖ (syn) \SYN 54 0x36 6
23 0x17 ␗ (etb) \ETB 55 0x37 7
24 0x18 ␘ (can) \CAN 56 0x38 8
25 0x19 ␙ (em) \EM 57 0x39 9
26 0x1A (eof) 58 0x3A :
27 0x1B ␛ (esc) \ESC 59 0x3B ;
28 0x1C ␜ (fs) \FS 60 0x3C <
29 0x1D ␝ (gs) \GS 61 0x3D =
30 0x1E ␞ (rs) \RS 62 0x3E >
31 0x1F ␟ (us) \US 63 0x3F ?

4
Table 5. UTF-8 encodings from 0x40 to 0x7F. Uppercase letters, lowercase letters and special characters. Command \char returns the
glyph corresponding to the numeric code passed as an argument. On the other side, TEX operator ‘ (left quote) returns the character code of
a glyph regardless of whether it has a special meaning in LATEX. Thus, the composition \char‘C returns the glyph corresponding to character
C. \DEL is provided by the ascii package.

Dec Hex S LATEX Dec Hex S LATEX

64 0x40 @ 96 0x60 ‘ \textquoteleft
65 0x41 A 97 0x61 a
66 0x42 B 98 0x62 b
67 0x43 C 99 0x63 c
68 0x44 D 100 0x64 d
69 0x45 E 101 0x65 e
70 0x46 F 102 0x66 f
71 0x47 G 103 0x67 g
72 0x48 H 104 0x68 h
73 0x49 I 105 0x69 i
74 0x4A J 106 0x6A j
75 0x4B K 107 0x6B k
76 0x4C L 108 0x6C l
77 0x4D M 109 0x6D m
78 0x4E N 110 0x6E n
79 0x4F O 111 0x6F o
80 0x50 P 112 0x70 p
81 0x51 Q 113 0x71 q
82 0x52 R 114 0x72 r
83 0x53 S 115 0x73 s
84 0x54 T 116 0x74 t
85 0x55 U 117 0x75 u
86 0x56 V 118 0x76 v
87 0x57 W 119 0x77 w
88 0x58 X 120 0x78 x
89 0x59 Y 121 0x79 y
90 0x5A Z 122 0x7A z
91 0x5B [ 123 0x7B { \char‘\{
92 0x5C \ \char‘\\ 124 0x7C |
93 0x5D ] 125 0x7D } \char‘\}
94 0x5E ˆ \^{} 126 0x7E ˜ \~{}
95 0x5F _ \char‘\_ 127 0x7F ␡ \DEL

5
Table 6. UTF-8 encodings from 0xC280 to 0xC2BF. Miscellaneous symbols. The first extended 32 ISO 8859–1 codes correspond to non-
printable control characters known as C1 Controls. Column S identifies the encoded UTF-8 glyphs in this case. For the last 32 characters,
Column LATEXlists the commands to get the corresponding symbols as defined in LATEX distribution file utf8enc.dfu.

ISO UTF-8 Ctl S LATEX ISO UTF-8 S LATEX

0x80 0xC280 (pad) € \texteuro 0xA0 0xC2A0 \nobreakspace
0x81 0xC281 (hop) 0xA1 0xC2A1 ¡ \textexclamdown, !‘
0x82 0xC282 (bph) ‚ \quotesinglbase 0xA2 0xC2A2 ¢ \textcent
0x83 0xC283 (nbh) ƒ \textit{f} 0xA3 0xC2A3 £ \textsterling, \pounds
0x84 0xC284 (ind) „ \quotedblbase 0xA4 0xC2A4 ¤ \textcurrency
0x85 0xC285 (nel) ... \dots 0xA5 0xC2A5 ¥ \textyen
0x86 0xC286 (ssa) † \dag 0xA6 0xC2A6 ¦ \textbrokenbar
0x87 0xC287 (esa) ‡ \ddag 0xA7 0xC2A7 § \textsection, \S
0x88 0xC288 (hts ^ \textasciicircum 0xA8 0xC2A8 ¨ \textasciidieresis
0x89 0xC289 (htj) ‰ \textperthousand 0xA9 0xC2A9 © \textcopyright
0x8A 0xC28A (lts) Š \v{S} 0xAA 0xC2AA a \textordfeminine
0x8B 0xC28B (pld) ‹ \guilsinglleft 0xAB 0xC2AB « \guillemotleft
0x8C 0xC28C (plu) Œ \OE 0xAC 0xC2AC ¬ \textlnot
0x8D 0xC28D (ri) 0xAD 0xC2AD \-
0x8E 0xC28E (ss2) Ž \v{Z} 0xAE 0xC2AE ® \textregistered
0x8F 0xC28F (ss3) 0xAF 0xC2AF ¯ \textasciimacron
0x90 0xC290 (dcs) 0xB0 0xC2B0 ° \textdegree
0x91 0xC291 (pu1) ‘ 0xB1 0xC2B1 ± \textpm
0x92 0xC292 (pu2) ’ 0xB2 0xC2B2 ² \texttwosuperior
0x93 0xC293 (sts) “ ‘‘ 0xB3 0xC2B3 ³ \textthreesuperior
0x94 0xC294 (cch) ” ’’ 0xB4 0xC2B4 ´ \textasciiacute
0x95 0xC295 (mw) • \textbullet 0xB5 0xC2B5 µ \textmu
0x96 0xC296 (spa) – 0xB6 0xC2B6 ¶ \textparagraph, \P
0x97 0xC297 (epa) — -- 0xB7 0xC2B7 · \textperiodcentered
0x98 0xC298 (sos) ~ \textasciitilde 0xB8 0xC2B8 ¸ \c{}, \c\
0x99 0xC299 (sgci) ™ \texttrademark 0xB9 0xC2B9 ¹ \textonesuperior
0x9A 0xC29A (sci) š \v{s} 0xBA 0xC2BA º \textordmasculine
0x9B 0xC29B (csi) › \guilsinglright 0xBB 0xC2BB » \guillemotright
0x9C 0xC29C (st) œ \oe 0xBC 0xC2BC ¼ \textonequarter
0x9D 0xC29D (osc) 0xBD 0xC2BD ½ \textonehalf
0x9E 0xC29E (pm) ž \v{z} 0xBE 0xC2BE ¾ \textthreequarters
0x9F 0xC29F (apc) Ÿ \"{Y} 0xBF 0xC2BF ¿ \textquestiondown, ?‘

6
Table 7. UTF-8 encodings from 0xC380 to 0xC3BF. Accented letters in the Latin-1 supplement. We can see that the uppercase and lower-
case letters differ in a single bit, the same as in the case of ASCII characters from block 0x40 to 0x7F.

ISO UTF-8 Char LATEX ISO UTF-8 Char LATEX

0xC0 0xC380 À \‘{A} 0xE0 0xC3A0 à \‘{a}

0xC1 0xC381 Á \’{A} 0xE1 0xC3A1 á \’{a}
0xC2 0xC382 Â \^{A} 0xE2 0xC3A2 â \^{a}
0xC3 0xC383 Ã \~{A} 0xE3 0xC3A3 ã \~{a}
0xC4 0xC384 Ä \"{A} 0xE4 0xC3A4 ä \"{a}
0xC5 0xC385 Å \AA 0xE5 0xC3A5 å \aa
0xC6 0xC386 Æ \AE 0xE6 0xC3A6 æ \ae
0xC7 0xC387 Ç \c{C} 0xE7 0xC3A7 ç \c{c}
0xC8 0xC388 È \‘{E} 0xE8 0xC3A8 è \‘{e}
0xC9 0xC389 É \’{E} 0xE9 0xC3A9 é \’{e}
0xCA 0xC38A Ê \^{E} 0xEA 0xC3AA ê \^{e}
0xCB 0xC38B Ë \"{E} 0xEB 0xC3AB ë \"{e}
0xCC 0xC38C Ì \‘{I} 0xEC 0xC3AC ì \‘{i}
0xCD 0xC38D Í \’{I} 0xED 0xC3AD í \’{i}
0xCE 0xC38E Î \^{I} 0xEE 0xC3AE î \^{i}
0xCF 0xC38F Ï \"{I} 0xEF 0xC3AF ï \"{i}
0xD0 0xC390 Ð \DH 0xF0 0xC3B0 ð \dh
0xD1 0xC391 Ñ \~{N} 0xF1 0xC3B1 ñ \~{n}
0xD2 0xC392 Ò \‘{O} 0xF2 0xC3B2 ò \‘{o}
0xD3 0xC393 Ó \’{O} 0xF3 0xC3B3 ó \’{o}
0xD4 0xC394 Ô \^{O} 0xF4 0xC3B4 ô \^{o}
0xD5 0xC395 Õ \~{O} 0xF5 0xC3B5 õ \~{o}
0xD6 0xC396 Ö \"{O} 0xF6 0xC3B6 ö \"{o}
0xD7 0xC397 × \texttimes 0xF7 0xC3B7 ÷ \textdiv
0xD8 0xC398 Ø \O 0xF8 0xC3B8 ø \o
0xD9 0xC399 Ù \‘{U} 0xF9 0xC3B9 ù \‘{u}
0xDA 0xC39A Ú \’{U} 0xFA 0xC3BA ú \’{u}
0xDB 0xC39B Û \^{U} 0xFB 0xC3BB û \^{u}
0xDC 0xC39C Ü \"{U} 0xFC 0xC3BC ü \"{u}
0xDD 0xC39D Ý \’{Y} 0xFD 0xC3BD ý \’{y}
0xDE 0xC39E Þ \TH 0xFE 0xC3BE þ \th
0xDF 0xC39F ß \ss 0xFF 0xC3BF ÿ \"{y}

View publication stats

Latex Font Encodings
No ratings yet
Latex Font Encodings
40 pages
TB 121 Ltnews 28
No ratings yet
TB 121 Ltnews 28
3 pages
LATEX Communication and Encoding Tips
No ratings yet
LATEX Communication and Encoding Tips
5 pages
XE TEX: Global Typesetting Simplified
No ratings yet
XE TEX: Global Typesetting Simplified
6 pages
The Not So Short Introduction To LaTeX
No ratings yet
The Not So Short Introduction To LaTeX
171 pages
Multimedia Unit 4
No ratings yet
Multimedia Unit 4
16 pages
Credits
No ratings yet
Credits
10 pages
Creating S Olarly Multilingual Documents Using Unicode, Opentype, and Xǝtex
No ratings yet
Creating S Olarly Multilingual Documents Using Unicode, Opentype, and Xǝtex
41 pages
TEX Gyre Heros: B. Jackowski and J. M. Nowacki
100% (1)
TEX Gyre Heros: B. Jackowski and J. M. Nowacki
37 pages
Comics Ans
No ratings yet
Comics Ans
32 pages
Learn LATEX: From Beginner to Expert
No ratings yet
Learn LATEX: From Beginner to Expert
21 pages
Integrating TrueType Fonts Into ConTeXt
No ratings yet
Integrating TrueType Fonts Into ConTeXt
16 pages
Latex
No ratings yet
Latex
38 pages
Programacion Web Parte-4
No ratings yet
Programacion Web Parte-4
4 pages
Fontselection in Latex PDF
No ratings yet
Fontselection in Latex PDF
29 pages
Windows LaTeX Setup Guide
No ratings yet
Windows LaTeX Setup Guide
11 pages
Unicode®: Character Encodings
No ratings yet
Unicode®: Character Encodings
11 pages
The Not So Short Introduction To L Tex 2Ε: Orl Tex 2Ε In Minutes
No ratings yet
The Not So Short Introduction To L Tex 2Ε: Orl Tex 2Ε In Minutes
153 pages
Latexfoots: Ki-Joo Kim November 29, 2005
No ratings yet
Latexfoots: Ki-Joo Kim November 29, 2005
24 pages
Introduction To LaTeX
No ratings yet
Introduction To LaTeX
30 pages
Howto Unicode
No ratings yet
Howto Unicode
12 pages
LaTeX For Undergraduates
No ratings yet
LaTeX For Undergraduates
19 pages
Info
No ratings yet
Info
3 pages
The Not So Short Introduction To LATEX
No ratings yet
The Not So Short Introduction To LATEX
171 pages
The Not So Short Introduction To LaTeX
100% (3)
The Not So Short Introduction To LaTeX
153 pages
Hyph-Pt Lic
No ratings yet
Hyph-Pt Lic
2 pages
Web 2 C
No ratings yet
Web 2 C
70 pages
Unicode HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Unicode HOWTO: Guido Van Rossum and The Python Development Team
12 pages
Msymdoc
No ratings yet
Msymdoc
34 pages
Latex Tutorial
100% (1)
Latex Tutorial
10 pages
Latex Tutorial
No ratings yet
Latex Tutorial
10 pages
The Not So Short Introduction To L Tex 2Ε: Orl Tex 2Ε In Minutes
No ratings yet
The Not So Short Introduction To L Tex 2Ε: Orl Tex 2Ε In Minutes
145 pages
Winfonts
No ratings yet
Winfonts
12 pages
Understanding Unicode and Encodings
No ratings yet
Understanding Unicode and Encodings
4 pages
Latex Guide
No ratings yet
Latex Guide
13 pages
Unicode Basics for Tech Enthusiasts
No ratings yet
Unicode Basics for Tech Enthusiasts
51 pages
Latex Presentation
No ratings yet
Latex Presentation
123 pages
Latex Introduction
No ratings yet
Latex Introduction
171 pages
Uni Code
No ratings yet
Uni Code
9 pages
(TeX Ebook) - Latex Manual
No ratings yet
(TeX Ebook) - Latex Manual
143 pages
Get A Version 5 Cheatsheet For Unicode & Glyphs Font Awesome
No ratings yet
Get A Version 5 Cheatsheet For Unicode & Glyphs Font Awesome
9 pages
Nptel: Coding Theory - Video Course
No ratings yet
Nptel: Coding Theory - Video Course
4 pages
Overview of JPEG Image Compression Standards
No ratings yet
Overview of JPEG Image Compression Standards
31 pages
Error Detection and Correction Codes PDF
0% (1)
Error Detection and Correction Codes PDF
2 pages
Flexible LDPC/Turbo Decoder Design
No ratings yet
Flexible LDPC/Turbo Decoder Design
16 pages
LDPC Code Encoding Process Explained
No ratings yet
LDPC Code Encoding Process Explained
5 pages
Cid 2 Code
No ratings yet
Cid 2 Code
322 pages
MP4 Video Transcoding Analysis Log
No ratings yet
MP4 Video Transcoding Analysis Log
27 pages
Binary Linear Codes Explained
100% (1)
Binary Linear Codes Explained
5 pages
Thesis Plagiarism Analysis Report
No ratings yet
Thesis Plagiarism Analysis Report
43 pages
Access MISS Proxy
No ratings yet
Access MISS Proxy
7 pages
Cicada - Ascii Art
No ratings yet
Cicada - Ascii Art
1 page
Understanding Cyclic Redundancy Check (CRC)
No ratings yet
Understanding Cyclic Redundancy Check (CRC)
9 pages
Togneri Fundamentals of Information Theory and Coding Design Discrete Mathematics and It Sapplications
No ratings yet
Togneri Fundamentals of Information Theory and Coding Design Discrete Mathematics and It Sapplications
398 pages
Vlsi Implementation of Turbo Decoder: Seminar On
No ratings yet
Vlsi Implementation of Turbo Decoder: Seminar On
22 pages
Adlam - Constructions of The Binary Golay Code - 2
No ratings yet
Adlam - Constructions of The Binary Golay Code - 2
10 pages
XOR
0% (1)
XOR
2 pages
Error Control Coding Sep 2023 To Jan 2018
No ratings yet
Error Control Coding Sep 2023 To Jan 2018
22 pages
2 Checksum For Students
No ratings yet
2 Checksum For Students
27 pages
Possible Values for Digital Signals
No ratings yet
Possible Values for Digital Signals
7 pages
Unit 3 Biomedical
No ratings yet
Unit 3 Biomedical
4 pages
Error Correcting Codes Guide
No ratings yet
Error Correcting Codes Guide
5 pages
Digital Communications Overview 2015-2016
No ratings yet
Digital Communications Overview 2015-2016
12 pages
Source Coding Techniques Explained
No ratings yet
Source Coding Techniques Explained
9 pages
Create A Huffman Code Dictionary in MATLAB
No ratings yet
Create A Huffman Code Dictionary in MATLAB
10 pages
Error Detection & Correction PDF
No ratings yet
Error Detection & Correction PDF
2 pages
Python String Methods Guide
No ratings yet
Python String Methods Guide
1 page
ASCII Code - The Extended ASCII Table
No ratings yet
ASCII Code - The Extended ASCII Table
5 pages
Tengwar Unicode Reference
No ratings yet
Tengwar Unicode Reference
2 pages
Types of Binary and Alphanumeric Codes
No ratings yet
Types of Binary and Alphanumeric Codes
5 pages

UTF-8 & LaTeX for ISO-8859-1 Encoding

Uploaded by

UTF-8 & LaTeX for ISO-8859-1 Encoding

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

UTF-8 & Latex Encodings of ISO-8859 (Latin-1) Character Set

Technical Report · March 2022

Manuel José Fernández Iglesias

The user has requested enhancement of the downloaded file.

Manuel J. Fernández Iglesias

Range UTF-8 (binary) Comments

2 Encoding of the ASCII set (0x00 a 0x7F)

3 Encoding of the extended set (0x80 a 0xFF)

0b0000 0000 XXXX YYYY → 0b1100 00XX 10XX YYYY

0b0000 0000 1111 0001 → 0b1100 0011 1011 0001

Dec Hex S Ctl LATEX Dec Hex S LATEX

Dec Hex S LATEX Dec Hex S LATEX

ISO UTF-8 Ctl S LATEX ISO UTF-8 S LATEX

ISO UTF-8 Char LATEX ISO UTF-8 Char LATEX

0xC0 0xC380 À \‘{A} 0xE0 0xC3A0 à \‘{a}

View publication stats

You might also like