Open .root file with spark

Dear experts,

I am trying to open a .root file in my local folder or /eos within spark. I have found several different approaches to this but none work for me. I tried:
“df = spark.read.format(“org.dianahep.sparkroot.experimental”).load(“new_files/mytuple1.root”)” which returns “Py4JJavaError: An error occurred while calling o197.load.
: java.lang.ClassNotFoundException: Failed to find data source: org.dianahep.sparkroot.experimental.”

and
“spark.read.load(“root://eosuser.cern.ch//eos/user/n/nraab/SWAN_projects/MC_efficiency/new_files/mytuple1.root”)” which returns " Path does not exist: "

I have also tried to use a PyRDF.RDataFrame for which I can import my data but then I get the error that, for example, it has no attribute “Filter” or “Histo1D” which are used in CERN examples.

Is there something special that I need to do or import or anything else that I am missing?

Any help is greatly appreciated!
Naomi

Hello Naomi,

may @pkothuri can give us a hand :smiley:

Hi @nraab

Make sure you choose the ‘Cloud Containers’ cluster in the configuration page and make a spark session using the ‘star’ button in the notebook. You can find examples of Spark and ROOT analysis in the gallery. If you had done this, the way you are accessing the files seems to be correct, if the problem persists contact EOS support using Service Desk

regards,
Prasanth