UTF-8 1 and 2 byte codes

UTF-8 as it is since 2003 (the x characters are replaced by the bits of the Unicode code point):

Num of bytes Code point bits First code point Last code point Byte 1 Byte 2 Byte 3 Byte 4 Hex
1 7 U+0000 U+007F 0xxxxxxx 0-7f ASCII
2 11 U+0080 U+07FF 110xxxxx 10xxxxxx cx/dx xx
3 16 U+0800 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx ex xx xx
4 21 U+10000 U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx fx xx xx xx
++++Backward compatibility: One-byte codes are used only for the ASCII values 0 through 127. In this case the UTF-8 code has the same value as the ASCII code. The high-order bit of these codes is always 0. This means that ASCII text is valid UTF-8, and UTF-8 can be used for parsers expecting 8-bit extended ASCII even if they are not designed for UTF-8.
++++Clear distinction between multi-byte and single-byte characters: Code points larger than 127 are represented by multi-byte sequences, composed of a leading byte and one or more continuation bytes. The leading byte has two or more high-order 1s followed by a 0, while continuation bytes all have ’10’ in the high-order position. Thus, no bytes representing ASCII characters appear in multi-byte sequences.
++++Self synchronization: The high order bits of every byte determine the type of byte; single bytes (0xxxxxxx), leading bytes (11xxxxxx), and continuation bytes (10xxxxxx) do not share values. This makes the scheme self-synchronizing, allowing the start of a character to be found by backing up at most three bytes. The first byte of a valid character sequence will be either a single byte or leading byte.
++++Clear indication of byte sequence length: The number of high-order 1s in the leading byte of a multi-byte sequence indicates the number of bytes in the sequence. When reading from a stream, a reader can process all fully received sequences without first having to wait for either the leading byte of a next sequence or an end-of-stream indication.

++++The first 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode. This covers the remainder of almost all Latin alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac, Thaana and N’Ko alphabets, as well as Combining Diacritical Marks. Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use including most Chinese, Japanese and Korean characters. Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols). – wikipedia

1-byte codes (= ASCII)

20: ! ” # $ % & ‘ ( ) * + , – . /
30: 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
40: @ A B C D E F G H I J K L M N O
50: P Q R S T U V W X Y Z [ \ ] ^ _
60: ` a b c d e f g h i j k l m n o
70: p q r s t u v w x y z { | } ~ 

2-byte codes

Latin-1 Supplement
c2 a0:   ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯
c2 b0: ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
c3 80: À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
c3 90: Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
c3 a0: à á â ã ä å æ ç è é ê ë ì í î ï
c3 b0: ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
Latin Extended-A
c4 80: Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď
c4 90: Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ
c4 a0: Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į
c4 b0: İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ
c5 80: ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ
c5 90: Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş
c5 a0: Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů
c5 b0: Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ
Latin Extended-B
c6 80: ƀ Ɓ Ƃ ƃ Ƅ ƅ Ɔ Ƈ ƈ Ɖ Ɗ Ƌ ƌ ƍ Ǝ Ə
c6 90: Ɛ Ƒ ƒ Ɠ Ɣ ƕ Ɩ Ɨ Ƙ ƙ ƚ ƛ Ɯ Ɲ ƞ Ɵ
c6 a0: Ơ ơ Ƣ ƣ Ƥ ƥ Ʀ Ƨ ƨ Ʃ ƪ ƫ Ƭ ƭ Ʈ Ư
c6 b0: ư Ʊ Ʋ Ƴ ƴ Ƶ ƶ Ʒ Ƹ ƹ ƺ ƻ Ƽ ƽ ƾ ƿ
c7 80: ǀ ǁ ǂ ǃ DŽ Dž dž LJ Lj lj NJ Nj nj Ǎ ǎ Ǐ
c7 90: ǐ Ǒ ǒ Ǔ ǔ Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ ǝ Ǟ ǟ
c7 a0: Ǡ ǡ Ǣ ǣ Ǥ ǥ Ǧ ǧ Ǩ ǩ Ǫ ǫ Ǭ ǭ Ǯ ǯ
c7 b0: ǰ DZ Dz dz Ǵ ǵ Ƕ Ƿ Ǹ ǹ Ǻ ǻ Ǽ ǽ Ǿ ǿ
c8 80: Ȁ ȁ Ȃ ȃ Ȅ ȅ Ȇ ȇ Ȉ ȉ Ȋ ȋ Ȍ ȍ Ȏ ȏ
c8 90: Ȑ ȑ Ȓ ȓ Ȕ ȕ Ȗ ȗ Ș ș Ț ț Ȝ ȝ Ȟ ȟ
c8 a0: Ƞ ȡ Ȣ ȣ Ȥ ȥ Ȧ ȧ Ȩ ȩ Ȫ ȫ Ȭ ȭ Ȯ ȯ
c8 b0: Ȱ ȱ Ȳ ȳ ȴ ȵ ȶ ȷ ȸ ȹ Ⱥ Ȼ ȼ Ƚ Ⱦ ȿ
c9 80: ɀ Ɂ ɂ Ƀ Ʉ Ʌ Ɇ ɇ Ɉ ɉ Ɋ ɋ Ɍ ɍ Ɏ ɏ
IPA Extensions
c9 90: ɐ ɑ ɒ ɓ ɔ ɕ ɖ ɗ ɘ ə ɚ ɛ ɜ ɝ ɞ ɟ
c9 a0: ɠ ɡ ɢ ɣ ɤ ɥ ɦ ɧ ɨ ɩ ɪ ɫ ɬ ɭ ɮ ɯ
c9 b0: ɰ ɱ ɲ ɳ ɴ ɵ ɶ ɷ ɸ ɹ ɺ ɻ ɼ ɽ ɾ ɿ
ca 80: ʀ ʁ ʂ ʃ ʄ ʅ ʆ ʇ ʈ ʉ ʊ ʋ ʌ ʍ ʎ ʏ
ca 90: ʐ ʑ ʒ ʓ ʔ ʕ ʖ ʗ ʘ ʙ ʚ ʛ ʜ ʝ ʞ ʟ
ca a0: ʠ ʡ ʢ ʣ ʤ ʥ ʦ ʧ ʨ ʩ ʪ ʫ ʬ ʭ ʮ ʯ
Spacing Modifier Letters
ca b0: ʰ ʱ ʲ ʳ ʴ ʵ ʶ ʷ ʸ ʹ ʺ ʻ ʼ ʽ ʾ ʿ
cb 80: ˀ ˁ ˂ ˃ ˄ ˅ ˆ ˇ ˈ ˉ ˊ ˋ ˌ ˍ ˎ ˏ
cb 90: ː ˑ ˒ ˓ ˔ ˕ ˖ ˗ ˘ ˙ ˚ ˛ ˜ ˝ ˞ ˟
cb a0: ˠ ˡ ˢ ˣ ˤ ˥ ˦ ˧ ˨ ˩ ˪ ˫ ˬ ˭ ˮ ˯
cb b0: ˰ ˱ ˲ ˳ ˴ ˵ ˶ ˷ ˸ ˹ ˺ ˻ ˼ ˽ ˾ ˿
Combining Diacritical Marks
cc 80: ̀ ́ ̂ ̃ ̄ ̅ ̆ ̇ ̈ ̉ ̊ ̋ ̌ ̍ ̎ ̏
cc 90: ̐ ̑ ̒ ̓ ̔ ̕ ̖ ̗ ̘ ̙ ̚ ̛ ̜ ̝ ̞ ̟
cc a0: ̠ ̡ ̢ ̣ ̤ ̥ ̦ ̧ ̨ ̩ ̪ ̫ ̬ ̭ ̮ ̯
cc b0: ̰ ̱ ̲ ̳ ̴ ̵ ̶ ̷ ̸ ̹ ̺ ̻ ̼ ̽ ̾ ̿
cd 80: ̀ ́ ͂ ̓ ̈́ ͅ ͆ ͇ ͈ ͉ ͊ ͋ ͌ ͍ ͎ ͏
cd 90: ͐ ͑ ͒ ͓ ͔ ͕ ͖ ͗ ͘ ͙ ͚ ͛ ͜ ͝ ͞ ͟
cd a0: ͠ ͡ ͢ ͣ ͤ ͥ ͦ ͧ ͨ ͩ ͪ ͫ ͬ ͭ ͮ ͯ
Greek and Coptic
cd b0: Ͱ ͱ Ͳ ͳ ʹ ͵ Ͷ ͷ ͸ ͹ ͺ ͻ ͼ ͽ ; Ϳ
ce 80: ΀ ΁ ΂ ΃ ΄ ΅ Ά · Έ Ή Ί ΋ Ό ΍ Ύ Ώ
ce 90: ΐ Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο
ce a0: Π Ρ ΢ Σ Τ Υ Φ Χ Ψ Ω Ϊ Ϋ ά έ ή ί
ce b0: ΰ α β γ δ ε ζ η θ ι κ λ μ ν ξ ο
cf 80: π ρ ς σ τ υ φ χ ψ ω ϊ ϋ ό ύ ώ Ϗ
cf 90: ϐ ϑ ϒ ϓ ϔ ϕ ϖ ϗ Ϙ ϙ Ϛ ϛ Ϝ ϝ Ϟ ϟ
cf a0: Ϡ ϡ Ϣ ϣ Ϥ ϥ Ϧ ϧ Ϩ ϩ Ϫ ϫ Ϭ ϭ Ϯ ϯ
cf b0: ϰ ϱ ϲ ϳ ϴ ϵ ϶ Ϸ ϸ Ϲ Ϻ ϻ ϼ Ͻ Ͼ Ͽ
Cyrillic
d0 80: Ѐ Ё Ђ Ѓ Є Ѕ І Ї Ј Љ Њ Ћ Ќ Ѝ Ў Џ
d0 90: А Б В Г Д Е Ж З И Й К Л М Н О П
d0 a0: Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я
d0 b0: а б в г д е ж з и й к л м н о п
d1 80: р с т у ф х ц ч ш щ ъ ы ь э ю я
d1 90: ѐ ё ђ ѓ є ѕ і ї ј љ њ ћ ќ ѝ ў џ
d1 a0: Ѡ ѡ Ѣ ѣ Ѥ ѥ Ѧ ѧ Ѩ ѩ Ѫ ѫ Ѭ ѭ Ѯ ѯ
d1 b0: Ѱ ѱ Ѳ ѳ Ѵ ѵ Ѷ ѷ Ѹ ѹ Ѻ ѻ Ѽ ѽ Ѿ ѿ
d2 80: Ҁ ҁ ҂ ҃ ҄ ҅ ҆ ҇ ҈ ҉ Ҋ ҋ Ҍ ҍ Ҏ ҏ
d2 90: Ґ ґ Ғ ғ Ҕ ҕ Җ җ Ҙ ҙ Қ қ Ҝ ҝ Ҟ ҟ
d2 a0: Ҡ ҡ Ң ң Ҥ ҥ Ҧ ҧ Ҩ ҩ Ҫ ҫ Ҭ ҭ Ү ү
d2 b0: Ұ ұ Ҳ ҳ Ҵ ҵ Ҷ ҷ Ҹ ҹ Һ һ Ҽ ҽ Ҿ ҿ
d3 80: Ӏ Ӂ ӂ Ӄ ӄ Ӆ ӆ Ӈ ӈ Ӊ ӊ Ӌ ӌ Ӎ ӎ ӏ
d3 90: Ӑ ӑ Ӓ ӓ Ӕ ӕ Ӗ ӗ Ә ә Ӛ ӛ Ӝ ӝ Ӟ ӟ
d3 a0: Ӡ ӡ Ӣ ӣ Ӥ ӥ Ӧ ӧ Ө ө Ӫ ӫ Ӭ ӭ Ӯ ӯ
d3 b0: Ӱ ӱ Ӳ ӳ Ӵ ӵ Ӷ ӷ Ӹ ӹ Ӻ ӻ Ӽ ӽ Ӿ ӿ
Cyrillic Supplement
d4 80: Ԁ ԁ Ԃ ԃ Ԅ ԅ Ԇ ԇ Ԉ ԉ Ԋ ԋ Ԍ ԍ Ԏ ԏ
d4 90: Ԑ ԑ Ԓ ԓ Ԕ ԕ Ԗ ԗ Ԙ ԙ Ԛ ԛ Ԝ ԝ Ԟ ԟ
d4 a0: Ԡ ԡ Ԣ ԣ Ԥ ԥ Ԧ ԧ Ԩ ԩ Ԫ ԫ Ԭ ԭ Ԯ ԯ
Armenian
d4 b0: ԰ Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ
d5 80: Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ
d5 90: Ր Ց Ւ Փ Ք Օ Ֆ ՗ ՘ ՙ ՚ ՛ ՜ ՝ ՞ ՟
d5 a0: ՠ ա բ գ դ ե զ է ը թ ժ ի լ խ ծ կ
d5 b0: հ ձ ղ ճ մ յ ն շ ո չ պ ջ ռ ս վ տ
d6 80: ր ց ւ փ ք օ ֆ և ֈ ։ ֊ ֋ ֌ ֍ ֎ ֏
Hebrew
d6 90: ֐ ֑ ֒ ֓ ֔ ֕ ֖ ֗ ֘ ֙ ֚ ֛ ֜ ֝ ֞ ֟
d6 a0: ֠ ֡ ֢ ֣ ֤ ֥ ֦ ֧ ֨ ֩ ֪ ֫ ֬ ֭ ֮ ֯
d6 b0: ְ ֱ ֲ ֳ ִ ֵ ֶ ַ ָ ֹ ֺ ֻ ּ ֽ ־ ֿ
d7 80: ׀ ׁ ׂ ׃ ׄ ׅ ׆ ׇ ׈ ׉ ׊ ׋ ׌ ׍ ׎ ׏
d7 90: א ב ג ד ה ו ז ח ט י ך כ ל ם מ ן
d7 a0: נ ס ע ף פ ץ צ ק ר ש ת ׫ ׬ ׭ ׮ ׯ
d7 b0: װ ױ ײ ׳ ״ ׵ ׶ ׷ ׸ ׹ ׺ ׻ ׼ ׽ ׾ ׿
Arabic
d8 80: ؀ ؁ ؂ ؃ ؄ ؅ ؆ ؇ ؈ ؉ ؊ ؋ ، ؍ ؎ ؏
d8 90: ؐ ؑ ؒ ؓ ؔ ؕ ؖ ؗ ؘ ؙ ؚ ؛ ؜ ؝ ؞ ؟
d8 a0: ؠ ء آ أ ؤ إ ئ ا ب ة ت ث ج ح خ د
d8 b0: ذ ر ز س ش ص ض ط ظ ع غ ػ ؼ ؽ ؾ ؿ
d9 80: ـ ف ق ك ل م ن ه و ى ي ً ٌ ٍ َ ُ
d9 90: ِ ّ ْ ٓ ٔ ٕ ٖ ٗ ٘ ٙ ٚ ٛ ٜ ٝ ٞ ٟ
d9 a0: ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩ ٪ ٫ ٬ ٭ ٮ ٯ
d9 b0: ٰ ٱ ٲ ٳ ٴ ٵ ٶ ٷ ٸ ٹ ٺ ٻ ټ ٽ پ ٿ
da 80: ڀ ځ ڂ ڃ ڄ څ چ ڇ ڈ ډ ڊ ڋ ڌ ڍ ڎ ڏ
da 90: ڐ ڑ ڒ ړ ڔ ڕ ږ ڗ ژ ڙ ښ ڛ ڜ ڝ ڞ ڟ
da a0: ڠ ڡ ڢ ڣ ڤ ڥ ڦ ڧ ڨ ک ڪ ګ ڬ ڭ ڮ گ
da b0: ڰ ڱ ڲ ڳ ڴ ڵ ڶ ڷ ڸ ڹ ں ڻ ڼ ڽ ھ ڿ
db 80: ۀ ہ ۂ ۃ ۄ ۅ ۆ ۇ ۈ ۉ ۊ ۋ ی ۍ ێ ۏ
db 90: ې ۑ ے ۓ ۔ ە ۖ ۗ ۘ ۙ ۚ ۛ ۜ ۝ ۞ ۟
db a0: ۠ ۡ ۢ ۣ ۤ ۥ ۦ ۧ ۨ ۩ ۪ ۫ ۬ ۭ ۮ ۯ
db b0: ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ ۺ ۻ ۼ ۽ ۾ ۿ

Syriac
dc 80:
Thaana:
de 80:

see page of 3-byte codes

UTF-8 = Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.

++++UTF-8 was designed, in front of my eyes, on a placemat in a New Jersey diner one night in September or so 1992.
++++What happened was this. We had used the original UTF from ISO 10646
to make Plan 9 support 16-bit characters, but we hated it. We were close to shipping the system when, late one afternoon, I received a call from some folks, I think at IBM – I remember them being in Austin – who were in an X/Open committee meeting. They wanted Ken [Thompson] and me to vet their FSS/UTF design. We understood why they were introducing a new design, and Ken and I suddenly realized there was an opportunity to use our experience to design a really good standard and get the X/Open guys to push it out. We suggested this and the deal was, if we could do it fast, OK. So we went to dinner, Ken figured out the bit-packing, and when we came back to the lab after dinner we called the X/Open guys and explained our scheme. We mailed them an outline of our spec, and they replied saying that it was better than theirs (I don’t believe I ever actually saw their proposal; I know I don’t remember it) and how fast could we implement it? I think this was a Wednesday night and we promised a complete running system by Monday, which I think was when their big vote was.
++++So that night Ken wrote packing and unpacking code and I started tearing into the C and graphics libraries. The next day all the code was done and we started converting the text files on the system itself. By Friday some time Plan 9 was running, and only running, what would be called UTF-8. We called X/Open and the rest, as they say, is slightly rewritten history.
++++– Rob Pike, 2003 (link)