I have been using RDataFrame with Spark Clusters and currently have no issues using a single jupyter notebook. Now I want to convert my local scripts with RDataFrame without converting all scripts to Jupyter Notebooks. (Keep most of the scripts as python files and execute the script from the jupyter notebook.)
For example, there is a script called “CutFlow.py” which uses the RDataFrame with spark clusters. I can execute this python file from a notebook as
import subprocess subprocess.call(["python","CutFlow.py"])
The current problem is that the code below cannot be executed correctly since it cannot find the SparkContext provided by SWAN.
import ROOT RDataFrame = ROOT.RDF.Experimental.Distributed.Spark.RDataFrame rdf = RDataFrame(tchain, sparkcontext = sc)
Is there a way to pass the sparkcontext to execution files (or import modules) to keep the structure of my local scripts as much as possible ?