Application-level isolation of Python environments
Nowadays, a lot of operating systems come with Python as a standard component. Most Linux distributions and Unix-based systems such as FreeBSD, NetBSD, OpenBSD, or OS X come with Python are either installed by default or available through system package repositories. Many of them even use it as part of some core components—Python powers the installers of Ubuntu (Ubiquity), Red Hat Linux (Anaconda), and Fedora (Anaconda again).
Due to this fact, a lot of packages from PyPI are also available as native packages managed by the system's package management tools such as apt-get
(Debian, Ubuntu), rpm
(Red Hat Linux), or emerge
(Gentoo). Although it should be remembered that the list of available libraries is very limited and they are mostly outdated when compared to PyPI. This is the reason why pip
should always be used to obtain new packages in the latest version as a recommendation of PyPA (Python Packaging Authority). Although it is an independent package starting from version 2.7.9 and 3.4 of CPython, it is bundled with every new release by default. Installing the new package is as simple as this:
pip install <package-name>
Among other features, pip
allows forcing specific versions of packages (using the pip install package-name==version
syntax) and upgrading to the latest version available (using the ––upgrade
switch). The full usage description for most of the command-line tools presented in the book can be easily obtained simply by running the command with the -h
or --help
switch, but here is an example session that demonstrates the most commonly used options:
$ pip show pip --- Metadata-Version: 2.0 Name: pip Version: 7.1.2 Summary: The PyPA recommended tool for installing Python packages. Home-page: https://pip.pypa.io/ Author: The pip developers Author-email: python-virtualenv@groups.google.com License: MIT Location: /usr/lib/python2.7/site-packages Requires: $ pip install 'pip<7.0.0' Collecting pip<7.0.0 Downloading pip-6.1.1-py2.py3-none-any.whl (1.1MB) 100% |████████████████████████████████| 1.1MB 242kB/s Installing collected packages: pip Found existing installation: pip 7.1.2 Uninstalling pip-7.1.2: Successfully uninstalled pip-7.1.2 Successfully installed pip-6.1.1 You are using pip version 6.1.1, however version 7.1.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command. $ pip install --upgrade pip You are using pip version 6.1.1, however version 7.1.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command. Collecting pip Using cached pip-7.1.2-py2.py3-none-any.whl Installing collected packages: pip Found existing installation: pip 6.1.1 Uninstalling pip-6.1.1: Successfully uninstalled pip-6.1.1 Successfully installed pip-7.1.2
In some cases, pip
may not be available by default. From Python 3.4 version onwards (and also Python 2.7.9), it can always be bootstrapped using the ensurepip
module:
$ python -m ensurepip Ignoring indexes: https://pypi.python.org/simple Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python2.7/site-packages Collecting pip Installing collected packages: pip Successfully installed pip-6.1.1
The most up-to-date information on how to install pip for older Python versions is available on the project's documentation page at https://pip.pypa.io/en/stable/installing/.
Why isolation?
pip
may be used to install system-wide packages. On Unix-based and Linux systems, this will require super user privileges, so the actual invocation will be:
sudo pip install <package-name>
Note that this is not required on Windows since it does not provide the Python interpreter by default, and Python on Windows is usually installed manually by the user without super user privileges.
Anyway, installing system-wide packages directly from PyPI is not recommended and should be avoided. This may seem like a contradiction with the previous statement that using pip
is a PyPA recommendation, but there are some serious reasons for that. As explained earlier, Python is very often an important part of many packages available through operating system package repositories and may power a lot of important services. System distribution maintainers put a lot of effort in selecting the correct versions of packages to match various package dependencies. Very often, Python packages that are available from system's package repositories contain custom patches or are kept outdated only to ensure compatibility with some other system components. Forcing an update of such a package using pip
to a version that breaks some backwards compatibility might break some crucial system services.
Doing such things only on the local computer for development purposes is also not a good excuse. Recklessly using pip
that way is almost always asking for trouble and will eventually lead to issues that are very hard to debug. This does not mean that installing packages from PyPI globally is a strictly forbidden thing, but it should always be done consciously and while knowing the related risks.
Fortunately, there is an easy solution to this problem—environment isolation. There are various tools that allow the isolation of the Python runtime environment at different levels of system abstraction. The main idea is to isolate project dependencies from packages required by different projects and/or system services. The benefits of this approach are:
- It solves the "Project X depends on version 1.x but Project Y needs 4.x" dilemma. The developer can work on multiple projects with different dependencies that may even collide without the risk of affecting each other.
- The project is no longer constrained by versions of packages that are provided in his system distribution repositories.
- There is no risk of breaking other system services that depend on certain package versions because new package versions are only available inside such an environment.
- A list of packages that are project dependencies can be easily "frozen", so it is very easy to reproduce them.
The easiest and most lightweight approach to isolation is to use application-level virtual environments. They focus only on isolating the Python interpreter and packages available in it. They are very easy to set up and are very often just enough to ensure proper isolation during the development of small projects and packages.
Unfortunately, in some cases, this may not be enough to ensure enough consistency and reproducibility. For such cases, system-level isolation is a good addition to the workflow and some available solutions to that are explained later in this chapter.
Popular solutions
There are several ways to isolate Python at runtime. The simplest and most obvious, although hardest to maintain, is to manually change PATH
and PYTHONPATH
environment variables and/or move Python binary to a different place to affect the way it discovers available packages and change it to a custom place where we want to store our project's dependencies. Fortunately, there are several tools available that help in maintaining virtual environments and how installed packages are stored in the system. These are mainly: virtualenv
, venv
, and buildout
. What they do under the hood is in fact the same as what we would do manually. The actual strategy depends on the specific tool implementation, but generally, they are more convenient to use and can provide additional benefits.
Virtualenv is by far the most popular tool in this list. Its name simply stands for Virtual Environment. It's not a part of the standard Python distribution, so it needs to be obtained using pip
. It is one of the packages that is worth installing system-wide (using sudo
on Linux and Unix-based systems).
Once it is installed, a new virtual environment is created using the following command:
virtualenv ENV
Here, ENV
should be replaced by the desired name for the new environment. This will create a new ENV
directory in the current working directory path. It will contain a few new directories inside:
bin/
: This is where the new Python executable and scripts/executables provided by other packages are stored.lib/
andinclude/
: These directories contain the supporting library files for the new Python inside the virtual environment. The new packages will be installed inENV/lib/pythonX.Y/site-packages/
.
Once the new environment is created, it needs to be activated in the current shell session using Unix's source command:
source ENV/bin/activate
This changes the state of the current shell sessions by affecting its environment variables. In order to make the user aware that he has activated the virtual environment, it will change the shell prompt by appending the (ENV)
string at its beginning. Here is an example session that creates a new environment and activates it to illustrate this:
$ virtualenv example New python executable in example/bin/python Installing setuptools, pip, wheel...done. $ source example/bin/activate (example)$ deactivate $
The important thing to note about virtualenv
is that it depends completely on its state stored on a filesystem. It does not provide any additional abilities to track what packages should be installed in it. These virtual environments are not portable and should not be moved to another system/machine. This means that the new virtual environment needs to be created from scratch for each new application deployment. Because of that, there is a good practice used by virtualenv
users to store all project dependencies in the requirements.txt
file (this is the naming convention), as shown in the following code:
# lines followed by hash (#) are treated as a comments # strict version names are best for reproducibility eventlet==0.17.4 graceful==0.1.1 # for projects that are well tested with different # dependency versions the relative version specifiers # are acceptable too falcon>=0.3.0,<0.5.0 # packages without versions should be avoided unless # latest release is always required/desired pytz
With such files, all dependencies can be easily installed using pip
because it accepts the requirements file as its output:
pip install -r requirements.txt
What needs to be remembered is that the requirements file is not always the ideal solution because it does not define the exact list of dependencies, only those that are to be installed. So, the whole project can work without problems in a development environment but will fail to start in others if the requirements file is outdated and does not reflect actual state of environment. There is, of course, the pip freeze
command that prints all packages in the current environment but it should not be used blindly—it will output everything, even packages that are not used in the project but installed only for testing. The other tool mentioned in the book, buildout
, addresses this issue, so it may be a better choice for some development teams.
Note
For Windows users, virtualenv
under Windows uses a different naming for its internal structure of directories. You need to use Scripts/
, Libs/
, and Include/
instead of bin/
, lib/
, include/
, to better match development conventions on that operating system. The commands used for activating/deactivating the environment are also different; you need to use ENV/Scripts/activate.bat
and ENV/Scripts/deactivate.bat
instead of using source
on activate
and deactivate
scripts.
Virtual environments shortly became well established and a popular tool within the community. Starting from Python 3.3, creating virtual environments is supported by standard library. The usage is almost the same as with Virtualenv, although command-line options have quite a different naming convention. The new venv
module provides a pyvenv
script for creating a new virtual environment:
pyvenv ENV
Here, ENV
should be replaced by the desired name for the new environment. Also, new environments can now be created directly from Python code because all functionality is exposed from the built-in venv
module. The other usage and implementation details, like the structure of the environment directory and activate/deactivate scripts are mostly the same as in Virtualenv, so migration to this solution should be easy and painless.
For developers using newer versions of Python, it is recommended to use venv
instead of Virtualenv. For Python 3.3, switching to venv
may require more effort because in this version, it does not install setuptools
and pip
by default in the new environment, so the users need to install them manually. Fortunately, it has changed in Python 3.4, and also due to the customizability of venv
, it is possible to override its behavior. The details are explained in the Python documentation (refer to https://docs.python.org/3.5/library/venv.html), but some users might find it too tricky and will stay with Virtualenv for that specific version of Python.
Buildout is a powerful tool for bootstrapping and the deployment of applications written in Python. Some of its advanced features will also be explained later in the book. For a long time, it was also used as a tool to create isolated Python environments. Because Buildout requires a declarative configuration that must be changed every time there is a change in dependencies, instead of relying on the environment state, these environments were easier to reproduce and manage.
Unfortunately, this has changed. The buildout
package since version 2.0.0 no longer tries to provide any level of isolation from system Python installation. Isolation handling is left to other tools such as Virtualenv, so it is still possible to have isolated Buildouts, but things become a bit more complicated. A Buildout must be initialized inside an isolated environment in order to be really isolated.
This has a major drawback as compared to the previous versions of Buildout, since it depends on other solutions for isolation. The developer working on this code can no longer be sure whether the dependencies description is complete because some packages can be installed by bypassing the declarative configuration. This issue can of course be solved using proper testing and release procedures, but it adds some more complexity to the whole workflow.
To summarize, Buildout is no longer a solution that provides environment isolation but its declarative configuration can improve maintainability and the reproducibility of virtual environments.
Which one to choose?
There is no best solution that will fit every use case. What is good in one organization may not fit the workflow of other teams. Also, every application has different needs. Small projects can easily depend on sole virtualenv
or venv
but bigger ones may require additional help of buildout
to perform more complex assembly.
What was not described in detail earlier is that previous versions of Buildout (buildout<2.0.0) allowed the assembly of projects in an isolated environment with similar results as provided by Virtualenv. Unfortunately, 1.x branch of this project is no longer maintained, so using it for that purpose is discouraged.
I would recommend to use venv
module instead of Virtualenv whenever it is possible. So, this should be the default choice for projects targeting Python versions 3.4 and higher. Using venv
in Python 3.3 may be a little inconvenient due to a lack of built-in support for setuptools
and pip
. For projects targeting a wider spectrum of Python run times (including alternative interpreters and 2.x branch), it seems that Virtualenv is the best choice.