I am setting up a workflow to use spark to analyse large ROOT TTrees.
What I’d like to achieve is exactly what is presented in one of the swan very useful example/tutorial:
Swan > Examples Gallery (top right corner) > Apache Spark > An example using LHCb open data
As I execute that example, I cannot resolve this error:
: java.lang.ClassNotFoundException: Failed to find data source: org.dianahep.sparkroot. Please find packages at http://spark.apache.org/third-party-projects.html
which arises the first time a conversion ROOT -> df is invoked:
spark.read.format(“org.dianahep.sparkroot”).load(data_directory + “PhaseSpaceSimulation.root”)
I’ve also tried to move to a later version
spark = SparkSession.builder
.appName(“LHCb Open Data with Spark”)
(1.0.15 in place of the original 1.0.4)
I am new to spark, and wonder if anyone who used https://diana-hep.org/pages/project_spark_root.html.html or set up the tutorial
is aware of possible issues with spark-root and workarounds ?
Thanks in advance,