Hi all, I’m trying to run pyspark code interactively through swan. In particular what I do is:
- start swan on Analytix configuring some local packages, i.e. provide .sh script where I append the path to my packages to the cvmfs pythonpath
export PYTHONPATH=$PYTHONPATH:$CERNBOX_HOME/SWAN_projects/opint-framework # custom user package
export PYTHONPATH=$PYTHONPATH:$CERNBOX_HOME/SWAN_projects/opint-venv/lib/python3.7/site-packages # custom python package
- create spark session using nb_extension (star button at the top) ticking the box
Include PropagateUserPythonModules options
At this point, I have checked that the python path is correct
/usr/local/lib/swan/extensions/:/cvmfs/sft.cern.ch/lcg/views/LCG_97apython3/x86_64-centos7-gcc8-opt/python:/cvmfs/sft.cern.ch/lcg/views/LCG_97apython3/x86_64-centos7-gcc8-opt/lib:/cvmfs/sft.cern.ch/lcg/views/LCG_97apython3/x86_64-centos7-gcc8-opt/lib/python3.7/site-packages:/eos/user/l/lclissa//SWAN_projects/opint-framework:/eos/user/l/lclissa//SWAN_projects/opint-venv/lib/python3.7/site-packages
and I am able to import my custom modules in swan, e.g.:
import dash # python package
import opint_framework
However, when I run a spark job I get an error:
ModuleNotFoundError: No module named ‘opint_framework’
which makes me think that either my modules are not propagated correctly to the executors, or the executor cannot see my cernbox (unlikely since in other occasions I managed to import modules exporting the path to .py files in the pythonpath).
If anybody has any suggestion, please let me know
Thanks in advance,
Luca