Thursday, February 13, 2014

COMP6021 Class 12

the one with the UTF-8






We reviewed the data from our experiments in the lab when we saved English & Japanese text using different formats.




I reviewed the UTF-8 variable length coding system. (screen capture failed)

Exercise
I gave students a UTF-8 message to decode. 49 E2 99 A5 E6 97 A5 E6 9C AC
Students were given copies of the relevant areas of the Unicode code points.
http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml
http://www.alanwood.net/unicode/miscellaneous_symbols.html


Although they got off to a slow start, I was very pleased that by the student all students had figured it out. This is the kind of thing that would make for a good exam question.

Solution
01001001 11100010:10011001:10100101 11100110:10010111:10100101  11100110:10011100:10101100

11100010:10011001:10100101 11100110:10010111:10100101  11100110:10011100:10101100
remove markers for leading and continuation bytes

I  0010011001100101 0110010111100101  0110011100101100

I 2665 65E5 672C

I ♥日本

No comments: