CERN Accelerating science

Multithread in SWAN and spark backends

Dear developpers,

I use SWAN with RDataFrames, and I would like to know if the multithreading can be used, first in SWAN, and second on top of spark backends using the PyRDF module ?
That is to say, does it work to add something like ROOT.EnableImplicitMT() in PyROOT or declaring it using ROOT
ROOT.gInterpreter.Declare(“ROOT::EnableImplicitMT();”) ?

Moreover my question extends to spark backends thanks to PyRDF: Can we ask for multithread doing something like (with indentation…) ?

def declare_var():
import ROOT


Hi Brian,

Yes you can use RDataFrame multi-threading from your SWAN session. You just need to add the line:
ROOT.ROOT.EnableImplicitMT(numthreads) before you create you RDataFrame. Please set numthreads to some reasonable number (you can get up to 4 cores).

Regarding PyRDF distributed with Spark + multi-threading, I would not advise to do that. In PyRDF we know how many cores we have on the cluster side and we try to make the most of them already. If you add you extra layer of multi-threading on top we will be overcommitting.