Hello,
I would like to know if it is possible to connect to a spark cluster when I open swan without having to connect manually from the UI.
In particular I would like to be able to connect to the NXCALS cluster when starting my SWAN session. I configured my SWAN startup script to get a valid kerberos ticket.
Is there a way to skip this part?
currently it is not possible to generate spark session object in the startup script. As this is attached to the notebook, this has to be done from with in the notebook either with the icon or we also have a python snippet that create spark session from the cell
Thanks for the info.
What is the snippet you’re referring to start the spark session?
Dear @grigolet
Following snipped should work, make sure you have krb ticket (which you said you were getting in the startup script)
regards,
Prasanth
# Stop spark session
try:
spark.stop()
except NameError:
pass
# Manual spark configuration to execute notebook outside of SWAN service
import os
import random
import subprocess
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
if "SPARK_PORTS" in os.environ:
ports = os.getenv("SPARK_PORTS").split(",")
else:
ports = [random.randrange(5001,5300),random.randrange(5001,5300),random.randrange(5001,5300)]
# Spark Config
nxcals_jars=subprocess.run(['ls $LCG_VIEW/nxcals/nxcals_java/* | xargs | sed -e "s/ /:/g"'], shell=True, stdout=subprocess.PIPE, env=os.environ).stdout.decode('utf-8')
conf = SparkConf()
conf.set('spark.master', 'yarn')
conf.set("spark.logConf", True)
conf.set("spark.driver.host", os.environ.get('SERVER_HOSTNAME'))
conf.set("spark.driver.port", ports[0])
conf.set("spark.blockManager.port", ports[1])
conf.set("spark.ui.port", ports[2])
conf.set('spark.executorEnv.PYTHONPATH', os.environ.get('PYTHONPATH'))
conf.set('spark.executorEnv.LD_LIBRARY_PATH', os.environ.get('LD_LIBRARY_PATH'))
conf.set('spark.executorEnv.JAVA_HOME', os.environ.get('JAVA_HOME'))
conf.set('spark.executorEnv.SPARK_HOME', os.environ.get('SPARK_HOME'))
conf.set('spark.executorEnv.SPARK_EXTRA_CLASSPATH', os.environ.get('SPARK_DIST_CLASSPATH'))
conf.set('spark.driver.extraClassPath', nxcals_jars)
conf.set('spark.executor.extraClassPath', nxcals_jars)
conf.set('spark.driver.extraJavaOptions',"-Dlog4j.configuration=file:/eos/project/s/swan/public/NXCals/log4j_conf -Dservice.url=https://cs-ccr-nxcals6.cern.ch:19093,https://cs-ccr-nxcals7.cern.ch:19093,https://cs-ccr-nxcals8.cern.ch:19093 -Djavax.net.ssl.trustStore=/etc/pki/tls/certs/truststore.jks -Djavax.net.ssl.trustStorePassword=password")
sc = SparkContext(conf=conf)
spark = SparkSession(sc)
Hi Prasanth and Gianluca,
this topic is of interest for me too. I’m not sure how to get a krb ticket with SWAN. Could you please share some insights on that?
Thanks in advance, Michał.
Thanks @pkothuri for the script.
@mmacieje I don’t know if it’s the best way to do it but what I did in my case was to generate a keytab file (from example from lxplus with cern-get-keytab --user --keytab grigolet.keytab
and I saved it on my eos). Then I made a simple startup script that I’m calling when SWAN is starting that contains this line:
kinit -kt /path/to/grigolet.keytab grigolet@CERN.CH