Expert Python Programming(Second Edition)
上QQ阅读APP看书,第一时间看更新

Advanced syntax

It is hard to objectively tell which element of language syntax is advanced. For the purpose of this chapter on advanced syntax elements, we will consider the elements that do not directly relate to any specific built-in datatypes and which are relatively hard to grasp at the beginning. The most common Python features that may be hard to understand are:

  • Iterators
  • Generators
  • Decorators
  • Context managers

Iterators

An iterator is nothing more than a container object that implements the iterator protocol. It is based on two methods:

  • __next__: This returns the next item of the container
  • __iter__: This returns the iterator itself

Iterators can be created from a sequence using the iter built-in function. Consider the following example:

>>> i = iter('abc')
>>> next(i)
'a'
>>> next(i)
'b'
>>> next(i)
'c'
>>> next(i)
Traceback (most recent call last):
 File "<input>", line 1, in <module>
StopIteration

When the sequence is exhausted, a StopIteration exception is raised. It makes iterators compatible with loops since they catch this exception to stop cycling. To create a custom iterator, a class with a __next__ method can be written, as long as it provides the special method __iter__ that returns an instance of the iterator:

class CountDown:def __init__(self, step):
        self.step = step
    def __next__(self):
        """Return the next element."""
        if self.step <= 0:
            raise StopIteration
        self.step -= 1
        return self.step
    def __iter__(self):
        """Return the iterator itself."""
        return self

Here is example usage of such iterator:

>>> for element in CountDown(4):
... print(element)
... 
3
2
1
0

Iterators themselves are a low-level feature and concept, and a program can live without them. But they provide the base for a much more interesting feature, generators.

The yield statement

Generators provide an elegant way to write simple and efficient code for functions that return a sequence of elements. Based on the yield statement, they allow you to pause a function and return an intermediate result. The function saves its execution context and can be resumed later, if necessary.

For instance, the Fibonacci series can be written with an iterator (this is the example provided in the PEP about iterators):

def fibonacci():
    a, b = 0, 1
    while True:
        yield b
        a, b = b, a + b

You can retrieve new values from generators as if it were iterators, so using next() function or for loops:

>>> fib = fibonacci()
>>> next(fib)
1
>>> next(fib)
1
>>> next(fib)
2
>>> [next(fib) for i in range(10)]
[3, 5, 8, 13, 21, 34, 55, 89, 144, 233]

This function returns a generator object, a special iterator, which knows how to save the execution context. It can be called indefinitely, yielding the next element of the suite each time. The syntax is concise, and the infinite nature of the algorithm does not disturb the readability of the code anymore. It does not have to provide a way to make the function stoppable. In fact, it looks similar to how the series would be designed in pseudocode.

In the community, generators are not used so often because the developers are not used to thinking this way. The developers have been used to working with straight functions for years. Generators should be considered every time you deal with a function that returns a sequence or works in a loop. Returning the elements one at a time can improve the overall performance, when they are passed to another function for further work.

In that case, the resources used to work out one element are most of the time less important than the resources used for the whole process. Therefore, they can be kept low, making the program more efficient. For instance, the Fibonacci sequence is infinite, and yet the generator that generates it does not require an infinite amount of memory to provide the values one at a time. A common use case is to stream data buffers with generators. They can be paused, resumed, and stopped by third-party code that plays over the data, and all the data does not need to be loaded before starting the process.

The tokenize module from the standard library, for instance, generates tokens out of a stream of text and returns an iterator for each treated line that can be passed along to some processing:

>>> import tokenize
>>> reader = open('hello.py').readline
>>> tokens = tokenize.generate_tokens(reader)
>>> next(tokens)
TokenInfo(type=57 (COMMENT), string='# -*- coding: utf-8 -*-', start=(1, 0), end=(1, 23), line='# -*- coding: utf-8 -*-\n')
>>> next(tokens)
TokenInfo(type=58 (NL), string='\n', start=(1, 23), end=(1, 24), line='# -*- coding: utf-8 -*-\n')
>>> next(tokens)
TokenInfo(type=1 (NAME), string='def', start=(2, 0), end=(2, 3), line='def hello_world():\n')

Here, we can see that open iterates over the lines of the file and generate_tokens iterates over them in a pipeline, doing additional work. Generators can also help in breaking the complexity and raising the efficiency of some data transformation algorithms that are based on several suites. Thinking of each suite as an iterator, and then combining them into a high-level function is a great way to avoid a big, ugly, and unreadable function. Moreover, this can provide a live feedback to the whole processing chain.

In the following example, each function defines a transformation over a sequence. They are then chained and applied. Each function call processes one element and returns its result:

def power(values):
    for value in values:
        print('powering %s' % value)
        yield value


def adder(values):
    for value in values:
        print('adding to %s' % value)
        if value % 2 == 0:
            yield value + 3
        else:
            yield value + 2

Here is the possible result of using these generators together:

>>> elements = [1, 4, 7, 9, 12, 19]
>>> results = adder(power(elements))
>>> next(results)
powering 1
adding to 1
3
>>> next(results)
powering 4
adding to 4
7
>>> next(results)
powering 7
adding to 7
9

Tip

Keep the code simple, not the data

It is better to have a lot of simple iterable functions that work over sequences of values than a complex function that computes the result for entire collection at once.

Another important feature available in Python regarding generators is the ability to interact with the code called with the next function. yield becomes an expression, and a value can be passed along with a new method called send:

def psychologist():
    print('Please tell me your problems')
    while True:
        answer = (yield)
        if answer is not None:
            if answer.endswith('?'):
                print("Don't ask yourself too much questions")
            elif 'good' in answer:
                print("Ahh that's good, go on")
            elif 'bad' in answer:
                print("Don't be so negative")

Here is an example session with our psychologist() function:

>>> free = psychologist()
>>> next(free)
Please tell me your problems
>>> free.send('I feel bad')
Don't be so negative
>>> free.send("Why I shouldn't ?")
Don't ask yourself too much questions
>>> free.send("ok then i should find what is good for me")
Ahh that's good, go on

send acts like next, but makes yield return the value passed to it inside of the function definition. The function can, therefore, change its behavior depending on the client code. Two other functions were added to complete this behavior—throw and close. They raise an error into the generator:

  • throw: This allows the client code to send any kind of exception to be raised.
  • close: This acts in the same way, but raises a specific exception, GeneratorExit. In that case, the generator function must raise GeneratorExit again, or StopIteration.

Note

Generators are the basis of other concepts available in Python—coroutines and asynchronous concurrency, which are covered in Chapter 13, Concurrency.

Decorators

Decorators were added in Python to make function and method wrapping (a function that receives a function and returns an enhanced one) easier to read and understand. The original use case was to be able to define the methods as class methods or static methods on the head of their definition. Without the decorator syntax, it would require a rather sparse and repetitive definition:

class WithoutDecorators:
    def some_static_method():
        print("this is static method")
    some_static_method = staticmethod(some_static_method)
    
    def some_class_method(cls):
        print("this is class method")
    some_class_method = classmethod(some_class_method)

If the decorator syntax is used for the same purpose, the code is shorter and easier to understand:

class WithDecorators:
    @staticmethod
    def some_static_method():
        print("this is static method")
    
    @classmethod
    def some_class_method(cls):
        print("this is class method")

General syntax and possible implementations

The decorator is generally a named object (lambda expressions are not allowed) that accepts a single argument when called (it will be the decorated function) and returns another callable object. "Callable" is used here instead of "function" with premeditation. While decorators are often discussed in the scope of methods and functions, they are not limited to them. In fact, anything that is callable (any object that implements the __call__ method is considered callable), can be used as a decorator and often objects returned by them are not simple functions but more instances of more complex classes implementing their own __call__ method.

The decorator syntax is simply only a syntactic sugar. Consider the following decorator usage:

@some_decorator
def decorated_function():
    pass

This can always be replaced by an explicit decorator call and function reassignment:

def decorated_function():
    pass
decorated_function = some_decorator(decorated_function)

However, the latter is less readable and also very hard to understand if multiple decorators are used on a single function.

Tip

Decorator does not even need to return a callable!

As a matter of fact, any function can be used as a decorator because Python does not enforce the return type of decorators. So, using some function as a decorator that accepts a single argument but does not return callable, let's say str, is completely valid in terms of syntax. This will eventually fail if the user tries to call an object decorated this way. Anyway, this part of decorator syntax creates a field for some interesting experimentation.

As a function

There are many ways to write custom decorators, but the simplest way is to write a function that returns a subfunction that wraps the original function call.

The generic patterns is as follows:

def mydecorator(function):
    def wrapped(*args, **kwargs):     
        # do some stuff before the original
        # function gets called
        result = function(*args, **kwargs)
        # do some stuff after function call and
        # return the result
        return result
    # return wrapper as a decorated function
    return wrapped

As a class

While decorators almost always can be implemented using functions, there are some situations when using user-defined classes is a better option. This is often true when the decorator needs complex parametrization or it depends on a specific state.

The generic pattern for a nonparametrized decorator as a class is as follows:

class DecoratorAsClass:
    def __init__(self, function):
        self.function = function

    def __call__(self, *args, **kwargs):
        # do some stuff before the original
        # function gets called
        result = self.function(*args, **kwargs)
        # do some stuff after function call and
        # return the result
        return result

Parametrizing decorators

In real code, there is often a need to use decorators that can be parametrized. When the function is used as a decorator, then the solution is simple—a second level of wrapping has to be used. Here is a simple example of the decorator that repeats the execution of a decorated function the specified number of times every time it is called:

def repeat(number=3):
    """Cause decorated function to be repeated a number of times.
    
    Last value of original function call is returned as a result
    :param number: number of repetitions, 3 if not specified
    """
    def actual_decorator(function):
        def wrapper(*args, **kwargs):
            result = None
            for _ in range(number):
                result = function(*args, **kwargs)
            return result
        return wrapper
    return actual_decorator

The decorator defined this way can accept parameters:

>>> @repeat(2)
... def foo():
... print("foo")
... 
>>> foo()
foo
foo

Note that even if the parametrized decorator has default values for its arguments, the parentheses after its name is required. The correct way to use the preceding decorator with default arguments is as follows:

>>> @repeat()
... def bar():
... print("bar")
... 
>>> bar()
bar
bar
bar

Missing these parentheses will result in the following error when decorated function is called:

>>> @repeat
... def bar():
... pass
... 
>>> bar()
Traceback (most recent call last):
 File "<input>", line 1, in <module>
TypeError: actual_decorator() missing 1 required positional
argument: 'function'

Introspection preserving decorators

Common pitfalls of using decorators is not preserving function metadata (mostly docstring and original name) when using decorators. All the previous examples have this issue. They created a new function by composition and returned a new object without any respect to the identity of the original one. This makes the debugging of functions decorated that way harder and will also break most of the auto-documentation tools that may be used because the original docstrings and function signatures are no longer accessible.

But let's see this in detail. Assume that we have some dummy decorator that does nothing more than decorating and some other functions decorated with it:

def dummy_decorator(function):
    def wrapped(*args, **kwargs):
        """Internal wrapped function documentation."""
        return function(*args, **kwargs)
    return wrapped


@dummy_decorator
def function_with_important_docstring():
    """This is important docstring we do not want to lose."""

If we inspect function_with_important_docstring() in a Python interactive session, we can notice that it has lost its original name and docstring:

>>> function_with_important_docstring.__name__
'wrapped'
>>> function_with_important_docstring.__doc__
'Internal wrapped function documentation.'

A proper solution to this problem is to use the built-in wraps() decorator provided by the functools module:

from functools import wraps


def preserving_decorator(function):
    @wraps(function)
    def wrapped(*args, **kwargs):
        """Internal wrapped function documentation."""
        return function(*args, **kwargs)
    return wrapped


@preserving_decorator
def function_with_important_docstring():
    """This is important docstring we do not want to lose."""

With the decorator defined in such a way, the important function metadata is preserved:

>>> function_with_important_docstring.__name__
'function_with_important_docstring.'
>>> function_with_important_docstring.__doc__
'This is important docstring we do not want to lose.'

Usage and useful examples

Since decorators are loaded by the interpreter when the module is first read, their usage should be limited to wrappers that can be generically applied. If a decorator is tied to the method's class or to the function's signature it enhances, it should be refactored into a regular callable to avoid complexity. In any case, when the decorators are dealing with APIs, a good practice is to group them in a module that is easy to maintain.

The common patterns for decorators are:

  • Argument checking
  • Caching
  • Proxy
  • Context provider

Argument checking

Checking the arguments that a function receives or returns can be useful when it is executed in a specific context. For example, if a function is to be called through XML-RPC, Python will not be able to directly provide its full signature as in the statically-typed languages. This feature is needed to provide introspection capabilities, when the XML-RPC client asks for the function signatures.

Tip

The XML-RPC protocol

The XML-RPC protocol is a lightweight Remote Procedure Call protocol that uses XML over HTTP to encode its calls. It is often used instead of SOAP for simple client-server exchanges. Unlike SOAP, which provides a page that lists all callable functions (WSDL), XML-RPC does not have a directory of available functions. An extension of the protocol that allows discovering the server API was proposed, and Python's xmlrpc module implements it (refer to https://docs.python.org/3/library/xmlrpc.server.html).

A custom decorator can provide this type of signature. It can also make sure that what goes in and comes out respects the defined signature parameters:

rpc_info = {}


def xmlrpc(in_=(), out=(type(None),)):
    def _xmlrpc(function):
        # registering the signature
        func_name = function.__name__
        rpc_info[func_name] = (in_, out)
        def _check_types(elements, types):
            """Subfunction that checks the types."""
            if len(elements) != len(types):
                raise TypeError('argument count is wrong')
            typed = enumerate(zip(elements, types))
            for index, couple in typed:
                arg, of_the_right_type = couple
                if isinstance(arg, of_the_right_type):
                    continue
                raise TypeError(
                    'arg #%d should be %s' % (index, of_the_right_type))

        # wrapped function
        def __xmlrpc(*args):  # no keywords allowed
            # checking what goes in
            checkable_args = args[1:]  # removing self
            _check_types(checkable_args, in_)
            # running the function
            res = function(*args)
            # checking what goes out
            if not type(res) in (tuple, list):
                checkable_res = (res,)
            else:
                checkable_res = res
            _check_types(checkable_res, out)

            # the function and the type
            # checking succeeded
            return res
        return __xmlrpc
    return _xmlrpc

The decorator registers the function into a global dictionary and keeps a list of the types for its arguments and for the returned values. Note that the example was highly simplified to demonstrate argument-checking decorators.

A usage example is as follows:

class RPCView:
    @xmlrpc((int, int))  # two int -> None
    def meth1(self, int1, int2):
        print('received %d and %d' % (int1, int2))

    @xmlrpc((str,), (int,))  # string -> int
    def meth2(self, phrase):
        print('received %s' % phrase)
        return 12

When it is read, this class definition populates the rpc_infos dictionary and can be used in a specific environment, where the argument types are checked:

>>> rpc_info
{'meth2': ((<class 'str'>,), (<class 'int'>,)), 'meth1': ((<class 'int'>, <class 'int'>), (<class 'NoneType'>,))}
>>> my = RPCView()
>>> my.meth1(1, 2)
received 1 and 2
>>> my.meth2(2)
Traceback (most recent call last):
 File "<input>", line 1, in <module>
 File "<input>", line 26, in __xmlrpc
 File "<input>", line 20, in _check_types
TypeError: arg #0 should be <class 'str'>

Caching

The caching decorator is quite similar to argument checking, but focuses on those functions whose internal state does not affect the output. Each set of arguments can be linked to a unique result. This style of programming is the characteristic of functional programming (refer to http://en.wikipedia.org/wiki/Functional_programming) and can be used when the set of input values is finite.

Therefore, a caching decorator can keep the output together with the arguments that were needed to compute it, and return it directly on subsequent calls. This behavior is called memoizing (refer to http://en.wikipedia.org/wiki/Memoizing) and is quite simple to implement as a decorator:

import time
import hashlib
import pickle

cache = {}


def is_obsolete(entry, duration):
    return time.time() - entry['time']> duration


def compute_key(function, args, kw):
    key = pickle.dumps((function.__name__, args, kw))
    return hashlib.sha1(key).hexdigest()


def memoize(duration=10):
    def _memoize(function):
        def __memoize(*args, **kw):
            key = compute_key(function, args, kw)

            # do we have it already ?
            if (key in cache and
                not is_obsolete(cache[key], duration)):
                print('we got a winner')
                return cache[key]['value']

            # computing
            result = function(*args, **kw)
            # storing the result
            cache[key] = {
                'value': result,
                'time': time.time()
            }
            return result
        return __memoize
    return _memoize

A SHA hash key is built using the ordered argument values, and the result is stored in a global dictionary. The hash is made using a pickle, which is a bit of a shortcut to freeze the state of all objects passed as arguments, ensuring that all arguments are good candidates. If a thread or a socket is used as an argument, for instance, a PicklingError will occur. (Refer to https://docs.python.org/3/library/pickle.html.) The duration parameter is used to invalidate the cached value when too much time has passed since the last function call.

Here's an example of the usage:

>>> @memoize()
... def very_very_very_complex_stuff(a, b):
... # if your computer gets too hot on this calculation
... # consider stopping it
... return a + b
...
>>> very_very_very_complex_stuff(2, 2)
4
>>> very_very_very_complex_stuff(2, 2)
we got a winner
4
>>> @memoize(1) # invalidates the cache after 1 second
... def very_very_very_complex_stuff(a, b):
... return a + b
...
>>> very_very_very_complex_stuff(2, 2)
4
>>> very_very_very_complex_stuff(2, 2)
we got a winner
4
>>> cache
{'c2727f43c6e39b3694649ee0883234cf': {'value': 4, 'time':
1199734132.7102251)}
>>> time.sleep(2)
>>> very_very_very_complex_stuff(2, 2)
4

Caching expensive functions can dramatically increase the overall performance of a program, but it has to be used with care. The cached value could also be tied to the function itself to manage its scope and life cycle, instead of a centralized dictionary. But in any case, a more efficient decorator would use a specialized cache library based on advanced caching algorithm.

Note

Chapter 12, Optimization – Some Powerful Techniques, provides detailed information and techniques on caching.

Proxy

Proxy decorators are used to tag and register functions with a global mechanism. For instance, a security layer that protects the access of the code, depending on the current user, can be implemented using a centralized checker with an associated permission required by the callable:

class User(object):
    def __init__(self, roles):
        self.roles = roles


class Unauthorized(Exception):
    pass


def protect(role):
    def _protect(function):
        def __protect(*args, **kw):
            user = globals().get('user')
            if user is None or role not in user.roles:
                raise Unauthorized("I won't tell you")
            return function(*args, **kw)
        return __protect
    return _protect

This model is often used in Python web frameworks to define the security over publishable classes. For instance, Django provides decorators to secure function access.

Here's an example, where the current user is kept in a global variable. The decorator checks his or her roles when the method is accessed:

>>> tarek = User(('admin', 'user'))
>>> bill = User(('user',))
>>> class MySecrets(object):
... @protect('admin')
... def waffle_recipe(self):
... print('use tons of butter!')
...
>>> these_are = MySecrets()
>>> user = tarek
>>> these_are.waffle_recipe()
use tons of butter!
>>> user = bill
>>> these_are.waffle_recipe()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in wrap
__main__.Unauthorized: I won't tell you

Context provider

A context decorator makes sure that the function can run in the correct context, or run some code before and after the function. In other words, it sets and unsets a specific execution environment. For example, when a data item has to be shared among several threads, a lock has to be used to ensure that it is protected from multiple access. This lock can be coded in a decorator as follows:

from threading import RLock
lock = RLock()


def synchronized(function):
    def _synchronized(*args, **kw):
        lock.acquire()
        try:
            return function(*args, **kw)
        finally:
            lock.release()
    return _synchronized


@synchronized
def thread_safe():  # make sure it locks the resource
    pass

Context decorators are more often being replaced by the usage of the context managers (the with statement) that are also described later in this chapter.

Context managers – the with statement

The try...finally statement is useful to ensure some cleanup code is run even if an error is raised. There are many use cases for this, such as:

  • Closing a file
  • Releasing a lock
  • Making a temporary code patch
  • Running protected code in a special environment

The with statement factors out these use cases by providing a simple way to wrap a block of code. This allows you to call some code before and after block execution even if this block raises an exception. For example, working with a file is usually done like this:

>>> hosts = open('/etc/hosts')
>>> try:
... for line in hosts:
... if line.startswith('#'):
... continue
... print(line.strip())
... finally:
... hosts.close()
...
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost

Note

This example is specific to Linux since it reads the host file located in etc, but any text file could have been used here in the same way.

By using the with statement, it can be rewritten like this:

>>> with open('/etc/hosts') as hosts:
... for line in hosts:
... if line.startswith('#'):
... continue
... print(line.strip )
...
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost

In the preceding example, open used as a context manager ensures that the file will be closed after executing the for loop and even if some exception will occur.

Some other items that are compatible with this statement are classes from the threading module:

  • threading.Lock
  • threading.RLock
  • threading.Condition
  • threading.Semaphore
  • threading.BoundedSemaphore

General syntax and possible implementations

The general syntax for the with statement in the simplest form is:

with context_manager:
    # block of code
    ...

Additionally, if the context manager provides a context variable, it can be stored locally using the as clause:

with context_manager as context:
    # block of code
    ...

Note that multiple context managers can be used at once, as follows:

with A() as a, B() as b:
    ...

This is equivalent to nesting them, as follows:

with A() as a:
    with B() as b:
        ...

As a class

Any object that implements the context manager protocol can be used as a context manager. This protocol consists of two special methods:

In short, the execution of the with statement proceeds as follows:

  1. The __enter__ method is invoked. Any return value is bound to target the specified as clause.
  2. The inner block of code is executed.
  3. The __exit__ method is invoked.

__exit__ receives three arguments that are filled when an error occurs within the code block. If no error occurs, all three arguments are set to None. When an error occurs, __exit__ should not re-raise it, as this is the responsibility of the caller. It can prevent the exception being raised though, by returning True. This is provided to implement some specific use cases, such as the contextmanager decorator that we will see in the next section. But for most use cases, the right behavior for this method is to do some cleaning, like what would be done by the finally clause; no matter what happens in the block, it does not return anything.

The following is an example of some context manager that implements this protocol to better illustrate how it works:

class ContextIllustration:
    def __enter__(self):
        print('entering context')

    def __exit__(self, exc_type, exc_value, traceback):
        print('leaving context')

        if exc_type is None:
            print('with no error')
        else:
            print('with an error (%s)' % exc_value)

When run without exceptions raised, the output is as follows:

>>> with ContextIllustration():
... print("inside")
... 
entering context
inside
leaving context
with no error

When the exception is raised, the output is as follows:

>>> with ContextIllustration():
... raise RuntimeError("raised within 'with'")
... 
entering context
leaving context
with an error (raised within 'with')
Traceback (most recent call last):
 File "<input>", line 2, in <module>
RuntimeError: raised within 'with'

As a function – the contextlib module

Using classes seems to be the most flexible way to implement any protocol provided in the Python language but may be too much boilerplate for many use cases. A contextlib module was added to the standard library to provide helpers to use with context managers. The most useful part of it is the contextmanager decorator. It allows you to provide both __enter__ and __exit__ parts in a single function, separated by a yield statement (note that this makes the function a generator). The previous example written with this decorator would look like the following code:

from contextlib import contextmanager

@contextmanager
def context_illustration():
    print('entering context')

    try:
        yield
    except Exception as e:
        print('leaving context')
        print('with an error (%s)' % e)
        # exception needs to be reraised
        raise
    else:
        print('leaving context')
        print('with no error')

If any exception occurs, the function needs to re-raise it in order to pass it along. Note that the context_illustration could have some arguments if needed, as long as they are provided in the call. This small helper simplifies the normal class-based context API exactly as generators do with the classed-based iterator API.

The three other helpers provided by this module are:

  • closing(element): This returns the context manager that calls the element's close method on exit. This is useful for classes that deal with streams, for instance.
  • supress(*exceptions): This suppresses any of the specified exceptions if they occur in the body of the with statement.
  • redirect_stdout(new_target) and redirect_stderr(new_target): This redirects the sys.stdout or sys.stderr output of any code within the block to another file of the file-like object.