Wednesday, February 12, 2014

COMP6021 Class 11 Unicode Lab

the one with the text files

I asked students to download a English-language book, paste it into Notepad, and save it using each of the four formats available: ANSI, Unicode, Unicode big endian, UTF-8.

I then asked them to do the same with a Japanese book. But some students' machines lost the plot.

I asked students to generate a 1000 character English-language document and save it with the four formats.

I then asked them to replace 200 of the characters with Japanese and do the same. Again some machines freaked out.




The values Colin got are shown in the table below. What conclusions can we draw? In particular, consider what happened in the mixed English & Japanese text.

If you have another text editor on your own machine try the same experiments with different formats.

Document ANSI Unicode Unicode BE UTF-8
Hounds of the Baskervilles 320,692 641,386 641,386 326,691
Rashomon 6,315* 12,632 12,632 18,222
Thousand English 1,080 2,162 2,162 1,083
Thousand Mixed 1,080* 2,124 2,124 1,274



* Japanese text not preserved. Displayed jibberish


Feel free to leave comments below


At the end of class I got each student to download the three images needed for the assignment. More about that another time.

No comments: