Expert Python Programming(Third Edition)
上QQ阅读APP看书,第一时间看更新

Generators and yield statements

Generators provide an elegant way to write simple and efficient code for functions that return a sequence of elements. Based on the yield statement, they allow you to pause a function and return an intermediate result. The function saves its execution context and can be resumed later, if necessary.

For instance, the function that returns consecutive numbers of the Fibonacci sequence can be written using a generator syntax. The following code is an example that was taken from the PEP 255 (Simple Generators) document:

def fibonacci():
a, b = 0, 1
while True:
yield b
a, b = b, a + b

You can retrieve new values from generators as if they were iterators, so using the next() function or for loops:

>>> fib = fibonacci()
>>> next(fib)
1
>>> next(fib)
1
>>> next(fib)
2
>>> [next(fib) for i in range(10)]
[3, 5, 8, 13, 21, 34, 55, 89, 144, 233]

Our fibonacci() function returns a generator object, a special iterator, which knows how to save the execution context. It can be called indefinitely, yielding the next element of the sequence each time. The syntax is concise, and the infinite nature of the algorithm does not disturb the readability of the code. It does not have to provide a way to make the function stoppable. In fact, it looks similar to how the sequence generating function would be designed in pseudocode.

In many cases, the resources required to process one element are less than the resources required to store whole sequences. Therefore, they can be kept low, making the program more efficient. For instance, the Fibonacci sequence is infinite, and yet the generator that generates it does not require an infinite amount of memory to provide the values one by one and, theoretically, could work ad infinitum. A common use case is to stream data buffers with generators (for example, from files). They can be paused, resumed, and stopped whenever necessary at any stage of the data processing pipeline without any need to load whole datasets into the program's memory.

The tokenize module from the standard library, for instance, generates tokens out of a stream of text working on them in a line-by-line fashion:

>>> import io
>>> import tokenize
>>> code = io.StringIO("""
... if __name__ == "__main__":
... print("hello world!")
... """)
>>> tokens = tokenize.generate_tokens(code.readline)
>>> next(tokens)
TokenInfo(type=56 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n')
>>> next(tokens)
TokenInfo(type=1 (NAME), string='if', start=(2, 0), end=(2, 2), line='if __name__ == "__main__":\n')
>>> next(tokens)
TokenInfo(type=1 (NAME), string='__name__', start=(2, 3), end=(2, 11), line='if __name__ == "__main__":\n')
>>> next(tokens)
TokenInfo(type=53 (OP), string='==', start=(2, 12), end=(2, 14), line='if __name__ == "__main__":\n')
>>> next(tokens)
TokenInfo(type=3 (STRING), string='"__main__"', start=(2, 15), end=(2, 25), line='if __name__ == "__main__":\n')
>>> next(tokens)
TokenInfo(type=53 (OP), string=':', start=(2, 25), end=(2, 26), line='if __name__ == "__main__":\n')
>>> next(tokens)
TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 26), end=(2, 27), line='if __name__ == "__main__":\n')
>>> next(tokens)
TokenInfo(type=5 (INDENT), string=' ', start=(3, 0), end=(3, 4), line=' print("hello world!")\n')

Here, we can see that open.readline iterates over the lines of the file and generate_tokens iterates over them in a pipeline, doing some additional work. Generators can also help in breaking the complexity of your code, and increasing the efficiency of some data transformation algorithms if they can be divided into separate processing steps. Thinking of each processing step as an iterator and then combining them into a high-level function is a great way to avoid big, ugly, and unreadable functions. Moreover, this can provide live feedback to the whole processing chain.

In the following example, each function defines a transformation over a sequence. They are then chained and applied together. Each call processes one element and returns its result:

def capitalize(values):
for value in values:
yield value.upper()


def hyphenate(values):
for value in values:
yield f"-{value}-"


def leetspeak(values):
for value in values:
if value in {'t', 'T'}:
yield '7'
elif value in {'e', 'E'}:
yield '3'
else:
yield value


def join(values):
return "".join(values)

Once you split your data processing pipeline into several independent steps, you can combine them in different ways:

>>> join(capitalize("This will be uppercase text"))
'THIS WILL BE UPPERCASE TEXT'
>>> join(leetspeak("This isn't a leetspeak"))
"7his isn'7 a l337sp3ak"
>>> join(hyphenate("Will be hyphenated by words".split()))
'-Will--be--hyphenated--by--words-'
>>> join(hyphenate("Will be hyphenated by character"))
'-W--i--l--l-- --b--e-- --h--y--p--h--e--n--a--t--e--d-- --b--y-- --c--h--a--r--a--c--t--e--r-'
Keep the code simple, not the data
It is better to have a lot of simple iterable functions that work over sequences of values than a complex function that computes the result for one value at a time.

Another important feature that's available in Python regarding generators is the ability to interact with the code that's called with the next() function. The yield statement becomes an expression, and some value can be passed through it to the decorator with a new generator method, named send():

def psychologist():
    print('Please tell me your problems')
    while True:
        answer = (yield)
        if answer is not None:
            if answer.endswith('?'):
                print("Don't ask yourself too much questions")
            elif 'good' in answer:
                print("Ahh that's good, go on")
            elif 'bad' in answer:
                print("Don't be so negative") 

Here is an example session with our psychologist() function:

>>> free = psychologist()
>>> next(free)
Please tell me your problems
>>> free.send('I feel bad')
Don't be so negative
>>> free.send("Why I shouldn't ?")
Don't ask yourself too much questions
>>> free.send("ok then i should find what is good for me")
Ahh that's good, go on

The send() method acts similarly to the next() function, but makes the yield statement return the value passed to it inside of the function definition. The function can, therefore, change its behavior depending on the client code. Two other methods are available to complete this behavior: throw() and close(). They allow us to inject exceptions into the generator:

  • throw(): This allows the client code to send any kind of exception to be raised.
  • close(): This acts in the same way, but raises a specific exception, GeneratorExit. In this case, the generator function must raise GeneratorExit again, or StopIteration.
Generators are the basis of other concepts that are available in Python, such as coroutines and asynchronous concurrency, which are covered in   Chapter 15,   Concurrency.