How to propagate swan configuration to spark nodes

Hi all, I’m trying to run pyspark code interactively through swan. In particular what I do is:

  • start swan on Analytix configuring some local packages, i.e. provide .sh script where I append the path to my packages to the cvmfs pythonpath

export PYTHONPATH=$PYTHONPATH:$CERNBOX_HOME/SWAN_projects/opint-framework # custom user package
export PYTHONPATH=$PYTHONPATH:$CERNBOX_HOME/SWAN_projects/opint-venv/lib/python3.7/site-packages # custom python package

  • create spark session using nb_extension (star button at the top) ticking the box

Include PropagateUserPythonModules options

At this point, I have checked that the python path is correct


and I am able to import my custom modules in swan, e.g.:

import dash # python package
import opint_framework

However, when I run a spark job I get an error:

ModuleNotFoundError: No module named ‘opint_framework’

which makes me think that either my modules are not propagated correctly to the executors, or the executor cannot see my cernbox (unlikely since in other occasions I managed to import modules exporting the path to .py files in the pythonpath).
If anybody has any suggestion, please let me know :slight_smile:

Thanks in advance,

Dear @lclissa We do not have EOSUSER mounted on the spark executor nodes, so the way to pass custom packages is installing locally on your CERNBOX (pip install --user) and clicking PropagateUserPythonModules configuration bundle. We also have some instructions on shipping virtual env. to the executors here - Knowledge Base - CERN Service Portal: easy access to services at CERN

Please check if that works for you

Thank you,

1 Like

Thanks @pkothuri :slight_smile: I’ve been playing around since yesterday and I managed to make it work by shipping a .zip file (option 3 from your link). However, I’d like to avoid this as I’d have to rebuild the zip every time I make a change in my code.
What I’d like is just to add a couple of paths to the PYTHONPATH of the executor, i.e.:

  • start from the cvmfs packages as a base
  • add on top of that the custom python packages like the ones in a virtual environment: path_to_env/lib/python3.7/site-packages
  • add a folder with my user modules
    Is there a way to achieve that?