I would like to know if it is possible to connect to a spark cluster when I open swan without having to connect manually from the UI.
In particular I would like to be able to connect to the NXCALS cluster when starting my SWAN session. I configured my SWAN startup script to get a valid kerberos ticket.
Is there a way to skip this part?
currently it is not possible to generate spark session object in the startup script. As this is attached to the notebook, this has to be done from with in the notebook either with the icon or we also have a python snippet that create spark session from the cell
Thanks for the info.
What is the snippet you’re referring to start the spark session?
Following snipped should work, make sure you have krb ticket (which you said you were getting in the startup script)
# Stop spark session try: spark.stop() except NameError: pass # Manual spark configuration to execute notebook outside of SWAN service import os import random import subprocess from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSession if "SPARK_PORTS" in os.environ: ports = os.getenv("SPARK_PORTS").split(",") else: ports = [random.randrange(5001,5300),random.randrange(5001,5300),random.randrange(5001,5300)] # Spark Config nxcals_jars=subprocess.run(['ls $LCG_VIEW/nxcals/nxcals_java/* | xargs | sed -e "s/ /:/g"'], shell=True, stdout=subprocess.PIPE, env=os.environ).stdout.decode('utf-8') conf = SparkConf() conf.set('spark.master', 'yarn') conf.set("spark.logConf", True) conf.set("spark.driver.host", os.environ.get('SERVER_HOSTNAME')) conf.set("spark.driver.port", ports) conf.set("spark.blockManager.port", ports) conf.set("spark.ui.port", ports) conf.set('spark.executorEnv.PYTHONPATH', os.environ.get('PYTHONPATH')) conf.set('spark.executorEnv.LD_LIBRARY_PATH', os.environ.get('LD_LIBRARY_PATH')) conf.set('spark.executorEnv.JAVA_HOME', os.environ.get('JAVA_HOME')) conf.set('spark.executorEnv.SPARK_HOME', os.environ.get('SPARK_HOME')) conf.set('spark.executorEnv.SPARK_EXTRA_CLASSPATH', os.environ.get('SPARK_DIST_CLASSPATH')) conf.set('spark.driver.extraClassPath', nxcals_jars) conf.set('spark.executor.extraClassPath', nxcals_jars) conf.set('spark.driver.extraJavaOptions',"-Dlog4j.configuration=file:/eos/project/s/swan/public/NXCals/log4j_conf -Dservice.url=https://cs-ccr-nxcals6.cern.ch:19093,https://cs-ccr-nxcals7.cern.ch:19093,https://cs-ccr-nxcals8.cern.ch:19093 -Djavax.net.ssl.trustStore=/etc/pki/tls/certs/truststore.jks -Djavax.net.ssl.trustStorePassword=password") sc = SparkContext(conf=conf) spark = SparkSession(sc)
Hi Prasanth and Gianluca,
this topic is of interest for me too. I’m not sure how to get a krb ticket with SWAN. Could you please share some insights on that?
Thanks in advance, Michał.
Thanks @pkothuri for the script.
@mmacieje I don’t know if it’s the best way to do it but what I did in my case was to generate a keytab file (from example from lxplus with
cern-get-keytab --user --keytab grigolet.keytab and I saved it on my eos). Then I made a simple startup script that I’m calling when SWAN is starting that contains this line:
kinit -kt /path/to/grigolet.keytab grigolet@CERN.CH