
Strings with Unicode
Strings are fully Unicode capable, so you can use them with international characters easily, even in literals, because the default source code encoding for Python 3 is UTF-8. For example, if you have access to Norwegian characters, you can simply enter this:
>>> "Vi er så glad for å høre og lære om Python!"
'Vi er så glad for å høre og lære om Python!'
Alternatively, you can use the hexadecimal representations of Unicode code points as an escape sequence prefixed by \u:
>>> "Vi er s\u00e5 glad for \u00e5 h\xf8re og l\u00e6re om Python!"
'Vi er så glad for å høre og lære om Python!'
We're sure you'll agree, though, that this is somewhat more unwieldy.
Similarly, you can use the \x escape sequence followed by a 2-character hexadecimal string to include one-byte Unicode code points in a string literal:
>>> '\xe5'
'å'
You can even an use an escaped octal string using a single backlash followed by three digits in the range zero to seven, although we confess we've never seen this used in practice, except inadvertently as a bug:
>>> '\345'
'å'
There are no such Unicode capabilities in the otherwise similar bytes type, which we'll look at next.