String concatenation
The fact that Python strings are immutable imposes some problems when multiple string instances need to be joined together. As we stated previously, concatenating immutable sequences results in the creation of a new sequence object. Consider that a new string is built by repeated concatenation of multiple strings, as follows:
substrings = ["These ", "are ", "strings ", "to ", "concatenate."]
s = "" for substring in substrings: s += substring
This will result in quadratic runtime costs in the total string length. In other words, it is highly inefficient. For handling such situations, the str.join() method is available. It accepts iterables of strings as the argument and returns joined strings. The call to join() of the str type can be done in two forms:
# using empty literal
s = "".join(substrings)
# using "unbound" method call
str.join("", substrings)
The first form of the join() call is the most common idiom. The string that provides this method will be used as a separator between concatenated substrings. Consider the following example:
>>> ','.join(['some', 'comma', 'separated', 'values']) 'some,comma,separated,values'
It is worth remembering that just because it is faster (especially for large lists), it does not mean that the join() method should be used in every situation where two strings need to be concatenated. Despite being a widely recognized idiom, it does not improve code readability. And readability counts! There are also some situations where join() may not perform as well as ordinary concatenation with a + operator. Here are some examples:
- If the number of substrings is very small and they are not contained already by some iterable variable (existing list or tuple of strings) – in some cases the overhead of creating a new sequence just to perform concatenation can overshadow the gain of using join().
- When concatenating short literals – thanks to some interpreter-level optimizations, such as constant folding in CPython (see the following subsection), some complex literals (not only strings), such as 'a' + 'b' + 'c', can be translated into a shorter form at compile time (here 'abc'). Of course, this is enabled only for constants (literals) that are relatively short.
Ultimately, if the number of strings to concatenate is known beforehand, the best readability is ensured by proper string formatting either using the str.format() method, the % operator, or f-string formatting. In code sections where the performance is not critical or the gain from optimizing string concatenation is very little, string formatting is recommended as the best alternative to concatenation.