Note: If you don’t want to read the blah-blah context and history stuff then you can jump to the recommendations
The need for virtual python environments becomes fairly obvious early in most Python developers’ careers when they switch between two projects and realise that they have incompatible dependences (e.g. project1 needs
scikit-learn-0.21 and project2 needs
When you run into this problem – you have two choices: you can either play around with the libraries you have installed and risk breaking things for one of your projects and not being able to get it back or you search “python incompatible dependencies install both?” with a hopeful glimmer in your eye as you hit enter.
Virtual Environments and Package Managers
Virtual environments are the community approved way to manage this issue and likely the first hit you’ll get with the above search in your favourite search engine.
There are two related but distinct activities that we need to manage here:
- If I’m working on different projects, I want to be able to quickly switch between them and their supporting Python runtime environments without breaking my setup for other projects. This is what an environment manager does.
- I want to be able to quickly and easily install new Python libraries without worrying about their inter-dependencies. Furthermore, I’d like to be able to package up my project’s list of dependencies so that others can quickly and easily use my code. This is what a package manager does.
There are lots of options available to you for both tasks and some tools try to solve both of them for you. Unfortunately this means that there are a large number, partially compatible standards.
Virtual Environments: What are they?
A virtual environment is a copy of a Python interpreter, bundled away into a folder with project-specific libraries and dependencies. This allows you to keep your project runtimes logically separated and avoids inter-project dependency conflicts.
Simply: when you install a library into a virtual environment the files are literally in a separate folder to the dependencies of your other projects. Beautiful simplicity.
So then, how do we manage which libraries are installed in the environment and make sure that they are compatible with each other and the software that we’re writing/using?
pip: The Original Python Package Manager
pip is the official Python Python Packaging Authority package management tool. It’s been a recognisable part of a Python developer’s arsenal for at least the last 10 years and became part of the standard Python library as of v3.4
in 2014 (although most operating systems distributed it as standard long before then or if not a very easily installable extra).
Whilst it’s the official option,
pip is very bare bones. It doesn’t know or care which environment it is being run in so you have to make sure that you take care of that by using tools like venv or virtualenv. Furthermore
pip doesn’t store its list of dependencies in a file by default (you have to manually call
pip freeze > requirements.txt to store your pip environment state in a text file every time you install or uninstall stuff) so this is yet another overhead.
Another potential problem with pip is its lack of deterministic builds – simply put: if you don’t explicitely ask
pip to install a particular version of a package or one of that package’s dependencies it will download the latest version of that package. That means that there might be a bug introduced because a dependency-of-a-dependency that I installed on my system last month is a different version to the same package for someone who just installed my software today. What a headache!
None of this is particularly ideal – more manual steps = more stuff you can forget about
Pipenv: Environment + Package Management Swiss Army Knife
pipenv is a tool that tries to solve many of the shortcomings of pip above:
- pipenv generates a
Pipfilein your project which is conceptually similar to a Package.json file in Node.js land. That is, a manifest at the top level of your project that describes which dependencies and Python version it requires. This file is maintained as you add/remove packages (no more manual
- Pipenv also maintains a
Pipfile.lockfile – this is a machine readable list of all of your dependencies and subdependencies allowing Pipenv to handle deterministic builds and avoid confusing dependency issues.
- Pipenv will transparently take care of your virtualenv management for you. You can run your commands as normal but prefixed with
pipenv runand the library will make sure you’re using the environment associated with whatever project you’re trying to use.
Many people stopped using pipenv when they believed the project to have been abandoned in 2019. However, it turns out pipenv is still under active development. As of writing the most recent release was v2020.11.15.
pypoetry: A Challenger Environment + Package Management Option
Poetry is yet another all-in-one virtualenv and package manager which offers similar functionality to
pipenv. It gained a lot of users during the pipenv project hiatus mentioned above and has similar performance and functionality.
The main reason I prefer poetry over pipenv today is its ability to generate “standard” Python packages (wheels, source distributions) that are fully Pypa compliant natively (you can do this with
pipenv but it requires manual maintainence of setup.py and requirements.txt files which is another moving part that could go wrong in a big project).
Pypoetry also stores its project information in a PEP-621 compatible
pyproject.toml format, providing core metadata compatibility with other dependency management tools and indeed PyPA’s own setuptools toolkit.
Where do [Mini/Ana]conda Fit Into All of This?
Anaconda and its slimmed down cousin Miniconda are alternatives to the standard CPython/PyPA python distribution distributed by Continuum Analytics. Both environments use the conda package + venv management tool.
Conda is open source but not directly compatible with PyPa packages. However, almost every package you can think of is available on conda-forge – a community driven conda-compatible package repository. Furthermore, if something is missing from conda you can run
pip inside your conda virtual environment and get it the normal way from PyPa.
What About Deterministic Builds and Distributing Software Using Conda?
Well, conda environments and requirements can be stored in an environment.yml and the file format allows you to specify both packages installed via conda and pip. Furthermore, using the
conda env export command to generate an
environment.yml file dumps all of the packages installed in your current environment including their version information for deterministic recreation. Happy days!
But Wait, There’s More!
One feature of conda that is both controversial and convenient – the latter especially if you’re a data scientist – is it’s management of system libraries and dependencies beyond Python. Conda can install C libraries that your Python packages depend on for you – including Nvidia’s CUDA runtime libraries needed for tensorflow and torch.
If you’ve ever had the pleasure of trying to manually configure Nvidia drivers and CUDA runtime libraries on Linux you’ll know how much of a pain this is. Even with pip/virtualenv environments, torch and tensorflow will try to link against and load whichever version of CUDA is installed system wide and that means that switching between versions of these libraries for different projects could mean messing with which system libraries are installed. Assuming you even have permission to do that (you might be on a shared GPU cluster), we’re back at square one with tightly coupled inter-project dependencies – the very problem that virtualenv is supposed to fix for us but can’t because of the dependency on cuda. As usual there are of course manual workarounds but to me this is another moving part that could fail or go wrong – especially in a team environment.
As for the controversy? Well purists don’t tend to like the fact that
conda also messes with system libraries – even if those libraries, like with pip/virtualenv based environments are copies isolated in folders.
If Conda is So Good Why Don’t You Marry It?
Conda is great but it has its downsides too:
- It’s incompatibility with the pip/pypa universe requires extra faff when building pip-compatible software (or you can accept that your software is doomed to only be run by conda-users)
environment.ymlfiles can be too deterministic and this is exacerbated by the system libraries issue. If I generate an
environment.ymlfile on my Linux desktop and create a conda environment from it on my Mac it will usually fail because the linux libraries are not compatible with the mac libraries.
- Running conda inside docker environments is a bit weird and again controversial some might argue since you always have permission to install whichever libraries you need inside a container and there shouldn’t be any use cases where you’d need two conflicting environments/libraries inside a container. Again, it’s perfectly possible but in my opinion, another weak link.
Best of Both Worlds: Conda + Pip-based Package Managers
This, in my opinion, offers the best of both worlds: we can take the speed and ease-of-use of conda and team it up with the flexibility and compatibility offered by these pip-based package management offerings.
To use pipenv or poetry inside a conda-based environment you can simply activate the environment you want to use and then run
pip install poetry or
pip install pipenv – the tool of your choice will then be available for use whenever you have that environment active in the future.
Recommended Setups for Various Use Cases
Some Principles for Use With The Recommendations Below
- K.I.S.S Keep it simple stupid – these suggestions get more complicated for more nuanced use cases. My general philosophy, as mentioned earlier in the post is to minimise moving parts so I definitely don’t think everyone should be maintaining a
environment.ymlfile for Windows usesrs and an
environment.ymlfile for Linux users. You know your use case and you can judge for yourself what is appropriate.
- If I say or then it’s up to you. Pick one and be consistent. Quite a lot of the time
pipenvoffer very similar feature sets and which one you want to use is just a personal preference. They’re not directly compatible though so if you pick
poetryand your colleague picks
pipenvyou’re going to have a bad time.
I’m new to Python (Mac, Windows or Linux)
Firstly, if you’re really really new to Python you might want to consider just getting familiar with the language without having to deal with virtual environments – most modern Linux distributions have Python 3.x pre-installed and if you’re on mac you can get it trivially if you use brew. That said, virtualenvs are likely to be something that you’ll need sooner rather than later once you get into intermediate Python development so it might be better to dive in sooner rather than later.
- If you’re new to Python and you’re running Mac, Windows or Linux you might find Anaconda to be the most intuitive, lowest barrier to entry option for getting started.
- If you’re on Windows, Conda-based distributions definitely represent the lowest barrier to entry since you don’t have to worry about setting up compilers and libraries. That said, if you are running WSL you probably already have Python 3 installed and can make use of some excellent existing resources.
- If you’re new to deep learning, again conda-based distributions are probably the lowest barrier to entry since conda can handle installing CUDA and dependencies for you.
I’m an experienced Python developer and noone else needs to run my code
My suggestions assume that even though you’re not planning to share your code with others, you’re still interested in version controlling it and your dependencies in case your laptop breaks/gets stolen/spontaneously combusts and you need to re-create your project.
- If you’re on Linux or Mac and you don’t need CUDA then, assuming you have root permissions you’ll probably find that
poetrywork well for you. I’m not suggesting conda as a first stop since most of the time Python 3.X is already available in modern *Nix environments so you might not need to install anything (except your chosen package manager via
- If you’re on Linux or Mac and you need CUDA then
condais likely the lowest barrier to entry. If you’ve never done it, try installing and using Tensorflow/PyTorch without conda once – for academic/edification. Then you’ll be able to feel the benefit.
- If you’re working on Windows outside of WSL my default suggestion would still be conda due to its management of compiler toolchains and external libraries. If you’re on windows inside WSL then see above for Linux/Mac.
I’m writing private/proprietary Python code that friends/colleagues need to use
- If you all run the same OS (for example you’re all on the same analytics team in an organisation that uses Windows 10 company-wide) then K.I.S.S and use conda. If everyone is using the same OS you can probably safely mix
pip installcommands and version control your
environment.ymlfile without worrying about cross-platform compatibility issues.
- If you are writing code that needs to work cross-platform but you don’t need CUDA (e.g. you run MacOS, your colleage runs Linux) then use
poetry. This will allow you to provide cross-platform deterministic builds/dependency resolution. Keep the
Pipfileand respective lock files version controlled. If you or one of your colleagues runs Windows, they might find that the easiest way to interact with you is to install anaconda and then run
poetryinside a conda-managed environment.
- If you are writing code that needs to work cross-platform and uses CUDA (e.g. you’re building a PyTorch model on Linux and your friend wants to run it on Windows) then you’re probably going to want to use
condato manage the environment (i.e. pull in specific versions of cuda runtime libraries) and
pipenvto manage pythonic dependencies. You could version control a hand written
environment.ymlwith the specific versions of the cuda runtime that your model is expecting (but without OS-specific build tags) and you will definitely want to version control your
Pipfileas above. Alternatively, document the
conda installcommands the user should run in the project readme.
- If you are writing code that you need to package as a wheel or egg for others to use (e.g. it’s a proprietary Python package you ship to customers) then I refer you to the section below but leveraging poetry publish
--repositoryoption to specify a private PIP repository.
I’m writing Python code that I want to share with the community
- If you don’t need CUDA then my suggestion would be standalone
poetrysince it has build/distribution tools built in and you can produce wheels and source distributions from the commandline and submit them to pypi. Version control your
- If you need CUDA then my suggestion is to use conda to create and manage your virtual environment and install cuda and then use
poetryto manage packages and PyPi build (or use standalone
poetryand manually manage your cuda libraries – you masochist you!). You might want to version control your
environment.ymlbut this file won’t be needed for building or uploading your package to PyPi – it’s just for you (and other developers) to use to quickly spin up your project locally in a development context.
- If you want your package to be available in conda then you’ll need to use conda-build to generate conda-specific package files and metadata for your project.
PEP-582, PDM and the Future of Python Dependencies?
Without wishing to confuse matters further, I wanted to give PEP-582 an honourable mention.
This is a Python Enhancement Proposal that will allow the python runtime to support
npm-esque loading of dependencies from a file in the project directory (like
node_modules). There is already a package manager PDM in development for working with local directories
This is an interesting and exciting paradigm shift that should simplify python packaging and remove the need completely for virtual environments. However, there are many issues to solve and the proposal is only for Python 3.8 with no plans to backport the functionality to earlier versions of the language runtime.
Given how long it’s taken some users to make the jump from Python 2.X to Python 3.X, it is likely that virtual environments are going to be around for a few more years to come.