The Python Apprentice
上QQ阅读APP看书,第一时间看更新

Strings with Unicode

Strings are fully Unicode capable, so you can use them with international characters easily, even in literals, because the default source code encoding for Python 3 is UTF-8. For example, if you have access to Norwegian characters, you can simply enter this:

>>> "Vi er så glad for å høre og lære om Python!"
'Vi er så glad for å høre og lære om Python!'

Alternatively, you can use the hexadecimal representations of Unicode code points as an escape sequence prefixed by \u:

>>> "Vi er s\u00e5 glad for \u00e5 h\xf8re og l\u00e6re om Python!"
'Vi er så glad for å høre og lære om Python!'

We're sure you'll agree, though, that this is somewhat more unwieldy.

Similarly, you can use the \x escape sequence followed by a 2-character hexadecimal string to include one-byte Unicode code points in a string literal:

>>> '\xe5'
'å'

You can even an use an escaped octal string using a single backlash followed by three digits in the range zero to seven, although we confess we've never seen this used in practice, except inadvertently as a bug:

>>> '\345'
'å'

There are no such Unicode capabilities in the otherwise similar bytes type, which we'll look at next.