上QQ阅读APP看书，第一时间看更新

Strings with Unicode

Strings are fully Unicode capable, so you can use them with international characters easily, even in literals, because the default source code encoding for Python 3 is UTF-8. For example, if you have access to Norwegian characters, you can simply enter this:

>>> "Vi er så glad for å høre og lære om Python!"
'Vi er så glad for å høre og lære om Python!'

Alternatively, you can use the hexadecimal representations of Unicode code points as an escape sequence prefixed by \u:

>>> "Vi er s\u00e5 glad for \u00e5 h\xf8re og l\u00e6re om Python!"
'Vi er så glad for å høre og lære om Python!'

We're sure you'll agree, though, that this is somewhat more unwieldy.

Similarly, you can use the \x escape sequence followed by a 2-character hexadecimal string to include one-byte Unicode code points in a string literal:

>>> '\xe5'
'å'

You can even an use an escaped octal string using a single backlash followed by three digits in the range zero to seven, although we confess we've never seen this used in practice, except inadvertently as a bug:

>>> '\345'
'å'

There are no such Unicode capabilities in the otherwise similar bytes type, which we'll look at next.