Advanced syntax
It is hard to objectively tell which element of language syntax is advanced. For the purpose of this chapter on advanced syntax elements, we will consider the elements that do not directly relate to any specific built-in datatypes and which are relatively hard to grasp at the beginning. The most common Python features that may be hard to understand are:
- Iterators
- Generators
- Decorators
- Context managers
Iterators
An iterator is nothing more than a container object that implements the iterator protocol. It is based on two methods:
__next__
: This returns the next item of the container__iter__
: This returns the iterator itself
Iterators can be created from a sequence using the iter
built-in function. Consider the following example:
>>> i = iter('abc') >>> next(i) 'a' >>> next(i) 'b' >>> next(i) 'c' >>> next(i) Traceback (most recent call last): File "<input>", line 1, in <module> StopIteration
When the sequence is exhausted, a StopIteration
exception is raised. It makes iterators compatible with loops since they catch this exception to stop cycling. To create a custom iterator, a class with a __next__
method can be written, as long as it provides the special method __iter__
that returns an instance of the iterator:
class CountDown:def __init__(self, step): self.step = step def __next__(self): """Return the next element.""" if self.step <= 0: raise StopIteration self.step -= 1 return self.step def __iter__(self): """Return the iterator itself.""" return self
Here is example usage of such iterator:
>>> for element in CountDown(4): ... print(element) ... 3 2 1 0
Iterators themselves are a low-level feature and concept, and a program can live without them. But they provide the base for a much more interesting feature, generators.
The yield statement
Generators provide an elegant way to write simple and efficient code for functions that return a sequence of elements. Based on the yield
statement, they allow you to pause a function and return an intermediate result. The function saves its execution context and can be resumed later, if necessary.
For instance, the Fibonacci series can be written with an iterator (this is the example provided in the PEP about iterators):
def fibonacci(): a, b = 0, 1 while True: yield b a, b = b, a + b
You can retrieve new values from generators as if it were iterators, so using next()
function or for
loops:
>>> fib = fibonacci() >>> next(fib) 1 >>> next(fib) 1 >>> next(fib) 2 >>> [next(fib) for i in range(10)] [3, 5, 8, 13, 21, 34, 55, 89, 144, 233]
This function returns a generator
object, a special iterator, which knows how to save the execution context. It can be called indefinitely, yielding the next element of the suite each time. The syntax is concise, and the infinite nature of the algorithm does not disturb the readability of the code anymore. It does not have to provide a way to make the function stoppable. In fact, it looks similar to how the series would be designed in pseudocode.
In the community, generators are not used so often because the developers are not used to thinking this way. The developers have been used to working with straight functions for years. Generators should be considered every time you deal with a function that returns a sequence or works in a loop. Returning the elements one at a time can improve the overall performance, when they are passed to another function for further work.
In that case, the resources used to work out one element are most of the time less important than the resources used for the whole process. Therefore, they can be kept low, making the program more efficient. For instance, the Fibonacci sequence is infinite, and yet the generator that generates it does not require an infinite amount of memory to provide the values one at a time. A common use case is to stream data buffers with generators. They can be paused, resumed, and stopped by third-party code that plays over the data, and all the data does not need to be loaded before starting the process.
The tokenize
module from the standard library, for instance, generates tokens out of a stream of text and returns an iterator
for each treated line that can be passed along to some processing:
>>> import tokenize >>> reader = open('hello.py').readline >>> tokens = tokenize.generate_tokens(reader) >>> next(tokens) TokenInfo(type=57 (COMMENT), string='# -*- coding: utf-8 -*-', start=(1, 0), end=(1, 23), line='# -*- coding: utf-8 -*-\n') >>> next(tokens) TokenInfo(type=58 (NL), string='\n', start=(1, 23), end=(1, 24), line='# -*- coding: utf-8 -*-\n') >>> next(tokens) TokenInfo(type=1 (NAME), string='def', start=(2, 0), end=(2, 3), line='def hello_world():\n')
Here, we can see that open
iterates over the lines of the file and generate_tokens
iterates over them in a pipeline, doing additional work. Generators can also help in breaking the complexity and raising the efficiency of some data transformation algorithms that are based on several suites. Thinking of each suite as an iterator
, and then combining them into a high-level function is a great way to avoid a big, ugly, and unreadable function. Moreover, this can provide a live feedback to the whole processing chain.
In the following example, each function defines a transformation over a sequence. They are then chained and applied. Each function call processes one element and returns its result:
def power(values): for value in values: print('powering %s' % value) yield value def adder(values): for value in values: print('adding to %s' % value) if value % 2 == 0: yield value + 3 else: yield value + 2
Here is the possible result of using these generators together:
>>> elements = [1, 4, 7, 9, 12, 19] >>> results = adder(power(elements)) >>> next(results) powering 1 adding to 1 3 >>> next(results) powering 4 adding to 4 7 >>> next(results) powering 7 adding to 7 9
Another important feature available in Python regarding generators
is the ability to interact with the code called with the next
function. yield
becomes an expression, and a value can be passed along with a new method called send
:
def psychologist(): print('Please tell me your problems') while True: answer = (yield) if answer is not None: if answer.endswith('?'): print("Don't ask yourself too much questions") elif 'good' in answer: print("Ahh that's good, go on") elif 'bad' in answer: print("Don't be so negative")
Here is an example session with our psychologist()
function:
>>> free = psychologist() >>> next(free) Please tell me your problems >>> free.send('I feel bad') Don't be so negative >>> free.send("Why I shouldn't ?") Don't ask yourself too much questions >>> free.send("ok then i should find what is good for me") Ahh that's good, go on
send
acts like next
, but makes yield
return the value passed to it inside of the function definition. The function can, therefore, change its behavior depending on the client code. Two other functions were added to complete this behavior—throw
and close
. They raise an error into the generator:
throw
: This allows the client code to send any kind of exception to be raised.close
: This acts in the same way, but raises a specific exception,GeneratorExit
. In that case, the generator function must raiseGeneratorExit
again, orStopIteration
.
Note
Generators are the basis of other concepts available in Python—coroutines and asynchronous concurrency, which are covered in Chapter 13, Concurrency.
Decorators
Decorators were added in Python to make function and method wrapping (a function that receives a function and returns an enhanced one) easier to read and understand. The original use case was to be able to define the methods as class methods or static methods on the head of their definition. Without the decorator syntax, it would require a rather sparse and repetitive definition:
class WithoutDecorators: def some_static_method(): print("this is static method") some_static_method = staticmethod(some_static_method) def some_class_method(cls): print("this is class method") some_class_method = classmethod(some_class_method)
If the decorator syntax is used for the same purpose, the code is shorter and easier to understand:
class WithDecorators: @staticmethod def some_static_method(): print("this is static method") @classmethod def some_class_method(cls): print("this is class method")
The decorator is generally a named object (lambda
expressions are not allowed) that accepts a single argument when called (it will be the decorated function) and returns another callable object. "Callable" is used here instead of "function" with premeditation. While decorators are often discussed in the scope of methods and functions, they are not limited to them. In fact, anything that is callable (any object that implements the __call__
method is considered callable), can be used as a decorator and often objects returned by them are not simple functions but more instances of more complex classes implementing their own __call__
method.
The decorator syntax is simply only a syntactic sugar. Consider the following decorator usage:
@some_decorator def decorated_function(): pass
This can always be replaced by an explicit decorator call and function reassignment:
def decorated_function(): pass decorated_function = some_decorator(decorated_function)
However, the latter is less readable and also very hard to understand if multiple decorators are used on a single function.
Tip
Decorator does not even need to return a callable!
As a matter of fact, any function can be used as a decorator because Python does not enforce the return type of decorators. So, using some function as a decorator that accepts a single argument but does not return callable, let's say str
, is completely valid in terms of syntax. This will eventually fail if the user tries to call an object decorated this way. Anyway, this part of decorator syntax creates a field for some interesting experimentation.
There are many ways to write custom decorators, but the simplest way is to write a function that returns a subfunction that wraps the original function call.
The generic patterns is as follows:
def mydecorator(function): def wrapped(*args, **kwargs): # do some stuff before the original # function gets called result = function(*args, **kwargs) # do some stuff after function call and # return the result return result # return wrapper as a decorated function return wrapped
While decorators almost always can be implemented using functions, there are some situations when using user-defined classes is a better option. This is often true when the decorator needs complex parametrization or it depends on a specific state.
The generic pattern for a nonparametrized decorator as a class is as follows:
class DecoratorAsClass: def __init__(self, function): self.function = function def __call__(self, *args, **kwargs): # do some stuff before the original # function gets called result = self.function(*args, **kwargs) # do some stuff after function call and # return the result return result
In real code, there is often a need to use decorators that can be parametrized. When the function is used as a decorator, then the solution is simple—a second level of wrapping has to be used. Here is a simple example of the decorator that repeats the execution of a decorated function the specified number of times every time it is called:
def repeat(number=3): """Cause decorated function to be repeated a number of times. Last value of original function call is returned as a result :param number: number of repetitions, 3 if not specified """ def actual_decorator(function): def wrapper(*args, **kwargs): result = None for _ in range(number): result = function(*args, **kwargs) return result return wrapper return actual_decorator
The decorator defined this way can accept parameters:
>>> @repeat(2) ... def foo(): ... print("foo") ... >>> foo() foo foo
Note that even if the parametrized decorator has default values for its arguments, the parentheses after its name is required. The correct way to use the preceding decorator with default arguments is as follows:
>>> @repeat() ... def bar(): ... print("bar") ... >>> bar() bar bar bar
Missing these parentheses will result in the following error when decorated function is called:
>>> @repeat ... def bar(): ... pass ... >>> bar() Traceback (most recent call last): File "<input>", line 1, in <module> TypeError: actual_decorator() missing 1 required positional argument: 'function'
Common pitfalls of using decorators is not preserving function metadata (mostly docstring and original name) when using decorators. All the previous examples have this issue. They created a new function by composition and returned a new object without any respect to the identity of the original one. This makes the debugging of functions decorated that way harder and will also break most of the auto-documentation tools that may be used because the original docstrings and function signatures are no longer accessible.
But let's see this in detail. Assume that we have some dummy decorator that does nothing more than decorating and some other functions decorated with it:
def dummy_decorator(function): def wrapped(*args, **kwargs): """Internal wrapped function documentation.""" return function(*args, **kwargs) return wrapped @dummy_decorator def function_with_important_docstring(): """This is important docstring we do not want to lose."""
If we inspect function_with_important_docstring()
in a Python interactive session, we can notice that it has lost its original name and docstring:
>>> function_with_important_docstring.__name__ 'wrapped' >>> function_with_important_docstring.__doc__ 'Internal wrapped function documentation.'
A proper solution to this problem is to use the built-in wraps()
decorator provided by the functools
module:
from functools import wraps def preserving_decorator(function): @wraps(function) def wrapped(*args, **kwargs): """Internal wrapped function documentation.""" return function(*args, **kwargs) return wrapped @preserving_decorator def function_with_important_docstring(): """This is important docstring we do not want to lose."""
With the decorator defined in such a way, the important function metadata is preserved:
>>> function_with_important_docstring.__name__ 'function_with_important_docstring.' >>> function_with_important_docstring.__doc__ 'This is important docstring we do not want to lose.'
Since decorators are loaded by the interpreter when the module is first read, their usage should be limited to wrappers that can be generically applied. If a decorator is tied to the method's class or to the function's signature it enhances, it should be refactored into a regular callable to avoid complexity. In any case, when the decorators are dealing with APIs, a good practice is to group them in a module that is easy to maintain.
The common patterns for decorators are:
- Argument checking
- Caching
- Proxy
- Context provider
Checking the arguments that a function receives or returns can be useful when it is executed in a specific context. For example, if a function is to be called through XML-RPC, Python will not be able to directly provide its full signature as in the statically-typed languages. This feature is needed to provide introspection capabilities, when the XML-RPC client asks for the function signatures.
Tip
The XML-RPC protocol
The XML-RPC protocol is a lightweight Remote Procedure Call protocol that uses XML over HTTP to encode its calls. It is often used instead of SOAP for simple client-server exchanges. Unlike SOAP, which provides a page that lists all callable functions (WSDL), XML-RPC does not have a directory of available functions. An extension of the protocol that allows discovering the server API was proposed, and Python's xmlrpc
module implements it (refer to https://docs.python.org/3/library/xmlrpc.server.html).
A custom decorator can provide this type of signature. It can also make sure that what goes in and comes out respects the defined signature parameters:
rpc_info = {} def xmlrpc(in_=(), out=(type(None),)): def _xmlrpc(function): # registering the signature func_name = function.__name__ rpc_info[func_name] = (in_, out) def _check_types(elements, types): """Subfunction that checks the types.""" if len(elements) != len(types): raise TypeError('argument count is wrong') typed = enumerate(zip(elements, types)) for index, couple in typed: arg, of_the_right_type = couple if isinstance(arg, of_the_right_type): continue raise TypeError( 'arg #%d should be %s' % (index, of_the_right_type)) # wrapped function def __xmlrpc(*args): # no keywords allowed # checking what goes in checkable_args = args[1:] # removing self _check_types(checkable_args, in_) # running the function res = function(*args) # checking what goes out if not type(res) in (tuple, list): checkable_res = (res,) else: checkable_res = res _check_types(checkable_res, out) # the function and the type # checking succeeded return res return __xmlrpc return _xmlrpc
The decorator registers the function into a global dictionary and keeps a list of the types for its arguments and for the returned values. Note that the example was highly simplified to demonstrate argument-checking decorators.
A usage example is as follows:
class RPCView: @xmlrpc((int, int)) # two int -> None def meth1(self, int1, int2): print('received %d and %d' % (int1, int2)) @xmlrpc((str,), (int,)) # string -> int def meth2(self, phrase): print('received %s' % phrase) return 12
When it is read, this class definition populates the rpc_infos
dictionary and can be used in a specific environment, where the argument types are checked:
>>> rpc_info {'meth2': ((<class 'str'>,), (<class 'int'>,)), 'meth1': ((<class 'int'>, <class 'int'>), (<class 'NoneType'>,))} >>> my = RPCView() >>> my.meth1(1, 2) received 1 and 2 >>> my.meth2(2) Traceback (most recent call last): File "<input>", line 1, in <module> File "<input>", line 26, in __xmlrpc File "<input>", line 20, in _check_types TypeError: arg #0 should be <class 'str'>
The caching decorator is quite similar to argument checking, but focuses on those functions whose internal state does not affect the output. Each set of arguments can be linked to a unique result. This style of programming is the characteristic of functional programming (refer to http://en.wikipedia.org/wiki/Functional_programming) and can be used when the set of input values is finite.
Therefore, a caching decorator can keep the output together with the arguments that were needed to compute it, and return it directly on subsequent calls. This behavior is called memoizing (refer to http://en.wikipedia.org/wiki/Memoizing) and is quite simple to implement as a decorator:
import time import hashlib import pickle cache = {} def is_obsolete(entry, duration): return time.time() - entry['time']> duration def compute_key(function, args, kw): key = pickle.dumps((function.__name__, args, kw)) return hashlib.sha1(key).hexdigest() def memoize(duration=10): def _memoize(function): def __memoize(*args, **kw): key = compute_key(function, args, kw) # do we have it already ? if (key in cache and not is_obsolete(cache[key], duration)): print('we got a winner') return cache[key]['value'] # computing result = function(*args, **kw) # storing the result cache[key] = { 'value': result, 'time': time.time() } return result return __memoize return _memoize
A SHA
hash key is built using the ordered argument values, and the result is stored in a global dictionary. The hash is made using a pickle, which is a bit of a shortcut to freeze the state of all objects passed as arguments, ensuring that all arguments are good candidates. If a thread or a socket is used as an argument, for instance, a PicklingError
will occur. (Refer to https://docs.python.org/3/library/pickle.html.) The duration
parameter is used to invalidate the cached value when too much time has passed since the last function call.
Here's an example of the usage:
>>> @memoize() ... def very_very_very_complex_stuff(a, b): ... # if your computer gets too hot on this calculation ... # consider stopping it ... return a + b ... >>> very_very_very_complex_stuff(2, 2) 4 >>> very_very_very_complex_stuff(2, 2) we got a winner 4 >>> @memoize(1) # invalidates the cache after 1 second ... def very_very_very_complex_stuff(a, b): ... return a + b ... >>> very_very_very_complex_stuff(2, 2) 4 >>> very_very_very_complex_stuff(2, 2) we got a winner 4 >>> cache {'c2727f43c6e39b3694649ee0883234cf': {'value': 4, 'time': 1199734132.7102251)} >>> time.sleep(2) >>> very_very_very_complex_stuff(2, 2) 4
Caching expensive functions can dramatically increase the overall performance of a program, but it has to be used with care. The cached value could also be tied to the function itself to manage its scope and life cycle, instead of a centralized dictionary. But in any case, a more efficient decorator would use a specialized cache library based on advanced caching algorithm.
Note
Chapter 12, Optimization – Some Powerful Techniques, provides detailed information and techniques on caching.
Proxy decorators are used to tag and register functions with a global mechanism. For instance, a security layer that protects the access of the code, depending on the current user, can be implemented using a centralized checker with an associated permission required by the callable:
class User(object): def __init__(self, roles): self.roles = roles class Unauthorized(Exception): pass def protect(role): def _protect(function): def __protect(*args, **kw): user = globals().get('user') if user is None or role not in user.roles: raise Unauthorized("I won't tell you") return function(*args, **kw) return __protect return _protect
This model is often used in Python web frameworks to define the security over publishable classes. For instance, Django provides decorators to secure function access.
Here's an example, where the current user is kept in a global variable. The decorator checks his or her roles when the method is accessed:
>>> tarek = User(('admin', 'user')) >>> bill = User(('user',)) >>> class MySecrets(object): ... @protect('admin') ... def waffle_recipe(self): ... print('use tons of butter!') ... >>> these_are = MySecrets() >>> user = tarek >>> these_are.waffle_recipe() use tons of butter! >>> user = bill >>> these_are.waffle_recipe() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 7, in wrap __main__.Unauthorized: I won't tell you
A context decorator makes sure that the function can run in the correct context, or run some code before and after the function. In other words, it sets and unsets a specific execution environment. For example, when a data item has to be shared among several threads, a lock has to be used to ensure that it is protected from multiple access. This lock can be coded in a decorator as follows:
from threading import RLock lock = RLock() def synchronized(function): def _synchronized(*args, **kw): lock.acquire() try: return function(*args, **kw) finally: lock.release() return _synchronized @synchronized def thread_safe(): # make sure it locks the resource pass
Context decorators are more often being replaced by the usage of the context managers (the with
statement) that are also described later in this chapter.
Context managers – the with statement
The try...finally
statement is useful to ensure some cleanup code is run even if an error is raised. There are many use cases for this, such as:
- Closing a file
- Releasing a lock
- Making a temporary code patch
- Running protected code in a special environment
The with
statement factors out these use cases by providing a simple way to wrap a block of code. This allows you to call some code before and after block execution even if this block raises an exception. For example, working with a file is usually done like this:
>>> hosts = open('/etc/hosts') >>> try: ... for line in hosts: ... if line.startswith('#'): ... continue ... print(line.strip()) ... finally: ... hosts.close() ... 127.0.0.1 localhost 255.255.255.255 broadcasthost ::1 localhost
By using the with
statement, it can be rewritten like this:
>>> with open('/etc/hosts') as hosts: ... for line in hosts: ... if line.startswith('#'): ... continue ... print(line.strip ) ... 127.0.0.1 localhost 255.255.255.255 broadcasthost ::1 localhost
In the preceding example, open
used as a context manager ensures that the file will be closed after executing the for
loop and even if some exception will occur.
Some other items that are compatible with this statement are classes from the threading
module:
threading.Lock
threading.RLock
threading.Condition
threading.Semaphore
threading.BoundedSemaphore
The general syntax for the with
statement in the simplest form is:
with context_manager: # block of code ...
Additionally, if the context manager provides a context variable, it can be stored locally using the as
clause:
with context_manager as context: # block of code ...
Note that multiple context managers can be used at once, as follows:
with A() as a, B() as b: ...
This is equivalent to nesting them, as follows:
with A() as a: with B() as b: ...
Any object that implements the context manager protocol can be used as a context manager. This protocol consists of two special methods:
__enter__(self)
: More on this can be found at https://docs.python.org/3.3/reference/datamodel.html#object.__enter____exit__(self, exc_type, exc_value, traceback)
: More on this can be found at https://docs.python.org/3.3/reference/datamodel.html#object.__exit__
In short, the execution of the with
statement proceeds as follows:
- The
__enter__
method is invoked. Any return value is bound to target the specified as clause. - The inner block of code is executed.
- The
__exit__
method is invoked.
__exit__
receives three arguments that are filled when an error occurs within the code block. If no error occurs, all three arguments are set to None
. When an error occurs, __exit__
should not re-raise it, as this is the responsibility of the caller. It can prevent the exception being raised though, by returning True
. This is provided to implement some specific use cases, such as the contextmanager
decorator that we will see in the next section. But for most use cases, the right behavior for this method is to do some cleaning, like what would be done by the finally
clause; no matter what happens in the block, it does not return anything.
The following is an example of some context manager that implements this protocol to better illustrate how it works:
class ContextIllustration: def __enter__(self): print('entering context') def __exit__(self, exc_type, exc_value, traceback): print('leaving context') if exc_type is None: print('with no error') else: print('with an error (%s)' % exc_value)
When run without exceptions raised, the output is as follows:
>>> with ContextIllustration(): ... print("inside") ... entering context inside leaving context with no error
When the exception is raised, the output is as follows:
>>> with ContextIllustration(): ... raise RuntimeError("raised within 'with'") ... entering context leaving context with an error (raised within 'with') Traceback (most recent call last): File "<input>", line 2, in <module> RuntimeError: raised within 'with'
Using classes seems to be the most flexible way to implement any protocol provided in the Python language but may be too much boilerplate for many use cases. A contextlib
module was added to the standard library to provide helpers to use with context managers. The most useful part of it is the contextmanager
decorator. It allows you to provide both __enter__
and __exit__
parts in a single function, separated by a yield
statement (note that this makes the function a generator). The previous example written with this decorator would look like the following code:
from contextlib import contextmanager @contextmanager def context_illustration(): print('entering context') try: yield except Exception as e: print('leaving context') print('with an error (%s)' % e) # exception needs to be reraised raise else: print('leaving context') print('with no error')
If any exception occurs, the function needs to re-raise it in order to pass it along. Note that the context_illustration
could have some arguments if needed, as long as they are provided in the call. This small helper simplifies the normal class-based context API exactly as generators do with the classed-based iterator API.
The three other helpers provided by this module are:
closing(element)
: This returns the context manager that calls the element's close method on exit. This is useful for classes that deal with streams, for instance.supress(*exceptions)
: This suppresses any of the specified exceptions if they occur in the body of the with statement.redirect_stdout(new_target)
andredirect_stderr(new_target)
: This redirects thesys.stdout
orsys.stderr
output of any code within the block to another file of the file-like object.