Avoiding the use of ``--user`` and ``.local`` for pip installations

pelson · April 14, 2020, 8:52am

Hi all,

I’ve seen a lot of use of the .local directory in SWAN, mostly through pip --user, for user-installations. Indeed, it is mentioned in the SWAN help at https://github.com/swan-cern/help/blob/master/advanced/install_packages.md.

Given that the .local directory is automatically added to the Python path for all subsequent Python executions (not just the one that you did the installation with), it is fair to say that doing user installations in this way has a global effect for the user. This may not be obvious to the user, and can very easily lead to confusion, a lack of reproducibility and can easily lead to the situation where a notebook works one day but not the next (because another notebook modified the environment).

Python provides tools for creating virtual environments to avoid this global installation problem, but because of the nature of Jupyter, it is hard (but by no means impossible ) for SWAN to pick up a user-installed Jupyter kernel from a venv. One solution is to create a venv and then to add the venv’s site-packages to the running kernel’s Python path.

With this in mind, the following steps would be all that you need to create an isolated environment and be able to install whichever packages you need (without that pesky global effect!):

create a venv (subprocess) in a particular location (I chose $HOME/python/environments)
add the venv to the sys.path
pip install into the venv (subprocess)

Of course, this isn’t quite as appealing in terms of simplicity than simply !pip install --user --upgrade <my-chosen-package>. I therefore have wrapped it up into a simpler command at https://gitlab.cern.ch/pelson/swan-run-in-venv. And it boils down to:

!curl -s https://gitlab.cern.ch/pelson/swan-run-in-venv/-/raw/master/run_in_venv.py -o .run_in_venv
%run .run_in_venv <my-venv> -m pip install <my-chosen-package>

It wouldn’t be hard to make this even more concise, for example %venv_install <my-venv> flake8, in the future (with SWAN pre-installation of said magic).

I’m wondering if there is appetite for encouraging this approach, rather than using the global .local directory as is the current recommendation? Of course, one major implication of this is that every notebook which has non-standard dependencies would need to have the command as the first (executed) code-cell as we would no longer be modifying the installed packages globally (for a user).

Cheers,

Phil

etejedor · April 14, 2020, 10:05am

Dear Phil,

Thank you for your message and your proposal.

Regarding the use of .local , I would like to make a small clarification: in SWAN user still needs to include that directory in the PYTHONPATH via an environment script for that SWAN session (and its notebooks and terminal) to automatically pick it. This is specified here:

github.com

swan-cern/help/blob/master/advanced/install_packages.md

# Install packages in CERNBox

LCG releases on CVMFS incorporate new packages quite frequently, so if you think there is a missing package that can be 
potentially useful for a significant number of users, please [let the SWAN team know](mailto:swan-talk@cern.ch) and 
[contact the librarians directly](https://sft.its.cern.ch/jira/projects/SPI).

On the other hand, you can install packages on your CERNBox and, if necessary, configure your environment to pick them 
up in SWAN. 

## Python

A typical case is the installation of Python packages, which requires to run pip from a SWAN terminal:

   `pip install --user package_name`

If this fails because you are trying to install an updated version of a package that already exists in CVMFS, you will need to add the `--upgrade` flag.

Then, it would be necessary to add the local installation path to `PYTHONPATH`, by creating a bash startup script that configures that variable (don't forget to call this startup script in the session configuration menu):

    export PYTHONPATH=$CERNBOX_HOME/.local/lib/python3.5/site-packages:$PYTHONPATH

This file has been truncated. show original

Note that the user can also choose to install in other directory (with pip’s --target) and include it in the PYTHONPATH. But I agree this is not an ideal solution either.

As a matter of fact, we have been working on another option (not yet in production) so that users can create conda environments in SWAN, which would be more similar to what you are proposing with venv. In first approximation, those environments can be completely independent from the LCG releases (i.e. self contained). Making them work alongside the python packages in the LCG releases can easily cause conflicts between package versions.

But in the meantime, your solution can be very helpful for SWAN users that prefer to use venvs, thanks a lot for this contribution!

pelson · April 14, 2020, 11:16am

Note that the user can also choose to install in other directory (with pip’s --target) and include it in the PYTHONPATH. But I agree this is not an ideal solution either.

I noticed that. That must be something that is done explicitly for the SWAN notebook environment, because that isn’t the default Python behaviour. Indeed, if you open a terminal in SWAN and start Python, ~/.local is on the path.

Making them work alongside the python packages in the LCG releases can easily cause conflicts between package versions.

I agree. Are you proposing that users in the future might never install packages on top of LCG, and instead create a reproducible environment with conda(+pip) directly? From a compatibility perspective this is quite appealing.

On the conda-front, I have a hacky approach to using conda in SWAN today… basically I have an environment script which installs conda on $SCRATCH and then installs the a Jupyter kernel which can be picked up by the notebook interface. Appart from the slow-ish startup time (over a minute) it seems to work quite well - if we can provide something pre-prepared in SWAN to speed this up, this would be extremely interesting

I met with @diogo before Christmas regarding conda and SWAN and I would love to hear more about future plans and progress.

etejedor · April 14, 2020, 4:01pm

We will encourage the use of those environments if people want to work with packages/versions that are not present in the LCG releases. The downside is that you lose everything that is in the LCG release, so some people might still want to add on top using the “traditional” way, especially in the cases where the addition is minor.

So you did something already quite similar to our idea But in our case, we will provide some means so that you only pay that startup time once (i.e. some caching of the environment packages on EOS). And we will tie an environment to a SWAN project.

Perhaps @dalvesde wants to add on what I said?

shelena · November 17, 2021, 4:19pm

Hi,

I hope this is the right thread to put my question. Link swan-cern/help/blob/master/advanced/install_packages.md is not working.
I need to install a package that requires a singularity so it can be writable. I was told that a possibility would be to install a virtual environment - anaconda, which would allow me to use SWAN.
I have no idea how to install it and how to make this working with SWAN. Are there recommendations? Can it run in any server with CPU or with GPUs?
Thanks a lot for help.
Best wishes,
Helena Santos

ozapatam · November 18, 2021, 2:28pm

Hi @shelena
what is not working for you?

what is the name of the package to take a look?
which LCG version are you using?
do you have any traceback?

Cheers
Omar.

shelena · November 18, 2021, 2:36pm

Hi Omar,

I want to install Files · master · atlas-flavor-tagging-tools / algorithms / Umami · GitLab in CERNBox following the instructions at https://umami.docs.cern.ch/installation/ but I get:
[shelena@lxplus702 umami]$ python setup.py install running install error: can’t create or remove files in install directory
The following error occurred while trying to add or remove files in the installation directory:
[Errno 13] Permission denied: '/usr/lib/python2.7/site-packages/test-easy-install-3698.write-test'
etc, etc
I need a singularity image and I was told I could conciliate with SWAN through anaconda.
Cheers,
Helena

ozapatam · November 18, 2021, 4:27pm

Hi Helena,

I can see in Installation - Umami documentation
that there are 3 different ways to use the package.

installing it in your account with python setup.py install --user
using it in a docker container
using it in a singularity container.

And, I can see that you are using lxplus and not SWAN doing the procedure
“[shelena@lxplus702 umami]$”
then

if you want to use it with (1) installing it in your user home, please do the next.

In a SWAN session, for example, LCG_101, let’s open a terminal and run the next. (The terminal has to be on the SWAN session not in lxplus.)

git clone --recursive ssh://git@gitlab.cern.ch:7999/atlas-flavor-tagging-tools/algorithms/umami.git
python setup.py install --user

it would be necessary to add the local installation path to PYTHONPATH , by creating a bash startup script that configures that variable (don’t forget to call this startup script in the session configuration menu):
export PYTHONPATH=$CERNBOX_HOME/.local/lib/python3.9/site-packages:$PYTHONPATH

I got this

let me know if this works for you.

Cheers.

shelena · November 19, 2021, 1:51pm

Hi Omar,
Excellent! Thanks!
Helena