the one with the text files
I asked students to download a English-language book, paste it into Notepad, and save it using each of the four formats available: ANSI, Unicode, Unicode big endian, UTF-8.
I then asked them to do the same with a Japanese book. But some students' machines lost the plot.
I asked students to generate a 1000 character English-language document and save it with the four formats.
I then asked them to replace 200 of the characters with Japanese and do the same. Again some machines freaked out.
The values Colin got are shown in the table below. What conclusions can we draw? In particular, consider what happened in the mixed English & Japanese text.
If you have another text editor on your own machine try the same experiments with different formats.
Document |
ANSI |
Unicode |
Unicode BE |
UTF-8 |
Hounds of the Baskervilles |
320,692 |
641,386 |
641,386 |
326,691 |
Rashomon |
6,315* |
12,632 |
12,632 |
18,222 |
Thousand English |
1,080 |
2,162 |
2,162 |
1,083 |
Thousand Mixed |
1,080* |
2,124 |
2,124 |
1,274 |
* Japanese text not preserved. Displayed jibberish
Feel free to leave comments below
At the end of class I got each student to download the three images needed for the assignment. More about that another time.