Contrary to most Python distros, currently the sys.path on SWAN has the user’s .local/lib/python3.x/site-packages as a lower priority than the distribution one:
Thank you for the feedback. The situation is indeed as you describe. We had many issues with users that installed things (including jupyter and other libraries) on their CERNBox, and that was causing interferences with the LCG releases, to the point of not being able to start their sessions. If the LCG release comes first, at least we gain some control on that regard.
Said that, the next version of SWAN (based on JupyterLab), which will be out this year, will change the way software environments are set. Instead of having one environment per session, you will be able to pick one per SWAN project. That environment will come from an LCG release, an experiment stack (e.g. CMSSW) and (hopefully also) a conda environment. Moreover, we will probably make it more explicit how the user installs packages on top of an LCG release for a given project, and give priority to those over the LCG release.
Great to hear this! Given my previous posts, and the fact that the decisions taken will have an impact on how people expect to interact with a Python environment across the CERN estate, I’d be very happy to be involved at the design/prototyping stage if at all possible.
It seems to me that a per-project environment (or even multiple environments) makes a lot of sense. PEP582 (still in draft) seems like a reasonable way to go if pure Python packages are the objective, but when it comes to flexibility, a language agnostic package manager such as conda is a really interesting prospect.
Absolutely, we appreciate that! We will prepare a test instance before we roll this out in production, and we’ll make a call here in the forum for people interested in trying it out.
Absolutely, we appreciate that! We will prepare a test instance before we roll this out in production, and we’ll make a call here in the forum for people interested in trying it out.
thank you
Indeed, this is the very reason that user site-packages are a terrible idea, and should not be encouraged at all. From a SWAN perspective you have sufficient control of the environment to be able to workaround the problem, but in other (non SWAN) scenarios that luxury simply doesn’t exist / isn’t viable. In the community that I support, the message that user site-packages is bad for environment isolation starting to be heard, and our operational deployment tooling disables user site-packages entirely, but I still regularly receive support requests to which the answer is “please remove your user site-packages, and perhaps use a virtual environment instead”.
I suspect it would have been better from a SWAN perspective to simply enforce the PYTHONNOUSERSITE environment variable to disable user site-packages at Python startup, rather than having to maintain something that isn’t quite Python behaviour. I believe this is a genuinely viable and workable option even today; it would require a special location to be put on the PYTHONPATH and people to be told to pip install --prefix=$PROJECT_ROOT/env <package-name>. In fact, this flag can even be pre-configured in pip config to do this automatically. From a user perspective they could just type pip install <pacakge-name> to extend their environment.
One thing to note though - I don’t think you can reasonably prevent an environment which is broken (e.g. has invalid dependencies) from failing to start a kernel (the fail-fast principle). Rather, I think it would be better to focus on improving the experience in such a situation, for example, providing a good error message, and possibly even some commands to type or a button to help people fix the problem with their environment. If you control the environment to which pip installs stuff (e.g. a special directory in a project) then this is quite viable I think.
Indeed since we are targeting environments per project for the next version of SWAN, we can make use of what you propose and not rely anymore on .local. Anyway, installing packages on top of LCG releases can be quite messy, independently of where you install them. If you install a new package A that updates package B, and B is also provided by the LCG release in an older version, you can have troubles with an LCG package C that relies on that old version of B depending on how you set your environment.
However, we would still like to keep that option since it can be useful to extend LCG releases in some (controlled) cases, but it’s kind of at your own risk. At least, since it will only affect one project, it won’t do any harm to your session (as it can happen now).
On the other hand, these LCG extension issues are also the reason why we would like to provide project environments based just on conda, so that users have a more controlled option to use packages not present in the LCG releases.