The five unavoidable Facts of Life:
- All input and output of your program is bytes.
- The world needs more than 256 symbols to communicate text.
- Your program has to deal with both bytes and Unicode.
- A stream of bytes can't tell you its encoding.
- Encoding specifications can be wrong.
To keep your code Unicode-clean:
- Unicode sandwich: keep all text in your program as Unicode, and convert as close to the edges as possible.
- Know what your strings are: you should be able to explain which of your strings are Unicode, which are bytes, and for your byte strings, what encoding they use.
- Test your Unicode support. Use exotic strings throughout your test suites to be sure you're covering all the cases.