I believe the reason you don’t get your credentials on the other side is that you started your session in one of our SWAN physical machines (swan004,005 or 006) which require an extra kinit for this to work.
The easiest way to do what you want is to connect to https://swan-k8s.cern.ch, and then select the Cloud containers (k8s) Spark cluster when you start your session. With this, SWAN should automatically propagate your credentials to the cluster side. Please try it out and let me know how it goes.
On the other hand, I see that you are using an old version of the distributed RDataFrame library. It’s no longer imported as PyRDF, please check out these links:
for examples. From SWAN, you would do something like:
# Point RDataFrame calls to the Spark specific RDataFrame
RDataFrame = ROOT.RDF.Experimental.Distributed.Spark.RDataFrame
# The Spark RDataFrame constructor accepts an optional "sparkcontext" parameter
# and it will distribute the application to the connected cluster
df = RDataFrame("mytree", "myfile.root", sparkcontext = sc) # sc is provided by SWAN as a result of connecting to a cluster from your notebook
Thank you for the detailed instructions !
I have followed the procedures last weekend and now the Spark clusters are able to look at the eos/user directories by xrootd.
I couldn’t find any instructions about using swan-k8s when using Spark clusters, but are there any dedicated pages on this topic ? (If would be nice to improve my understanding on the structure of the SWAN machines.)
I was looking at an old slide about the RDataFrame distribution.
Thank you for pointing to the latest page and I have modified my script to use the latest version.