SWAN for physics analysis

Dear SWAN users,

The SWAN team is looking for users who are currently using SWAN to do physics analysis and can provide feedback about their experience – what works well, what doesn’t, what is missing, etc. If you are one of those, please reply to this message or send me a PM!

Also on this topic, we are currently working closely with the Batch service at CERN to integrate SWAN with HTCondor resources for both batch and interactive analysis (the latter via Dask or any framework that can use Dask underneath, such as RDataFrame or coffea). More news on this will be announced soon!

Best,

Enric

Hi. Not sure if that qualified as physics analysis, but since I just saw that message:

I’m looking into some first Run3 collisions for prompt feedback using CMS MiniAOD through XrootD and uproot (RDF fails to load the CMSSW files even though the data types are not custom !).
And I must say that even though I’m in exploratory mode the loading of the individual branches over XrootD is very, very slow. Kudos that XRootD works at all though!

Not sure if that is solvable at all, but I like using notebook in particular in the exploratory phase for any kind of analysis and this slow file access is quite annoying.

When working with EOS there is of course no problem at all! (just hitting the memory limit quite often when working on many files).

Hello Artur,

Thank you for your reply and sorry for my late reply!

RDF fails to load the CMSSW files even though the data types are not custom !).

Could we follow this up in the ROOT forum? We’d be interested in knowing what the problem is (I am speaking with my ROOT hat on now :slight_smile: ).

And I must say that even though I’m in exploratory mode the loading of the individual branches over XrootD is very, very slow. K

I see, would you say that XrootD access of ROOT files is slower in SWAN than say lxplus? If this is the case, we need to find out why.

When working with EOS there is of course no problem at all! (just hitting the memory limit quite often when working on many files).

Good, you might know already but you can increase the memory limit of your session in the web form when starting the session. Also, with RDataFrame you shouldn’t have memory problems (unless you run something like AsNumpy and the result does not fit into memory).

Would you by any chance be interested in trying distributed analysis? We are integrating SWAN with Dask and HTCondor, and both RDataFrame and coffea can work on top of Dask.

Hi Enric,

I’ve actually never really used RDF so can’t guarantee my experience with it is useful. I’ll report any issues to ROOT if I’ll encounter them again.

I haven’t tried accessing the XrootD files from lxplus since I don’t like using lxplus for such exploratory things.

As for distributed analysis: does this really fit for exploratory studies? I rather thought of it for production level analysis and as an alternative to the batch system (HTCondor).

Hi Artur,

I haven’t tried accessing the XrootD files from lxplus since I don’t like using lxplus for such exploratory things.

Ok, so you are comparing SWAN w.r.t. your own machine? Are you based at CERN or elsewhere? I understand you find XRootD access consistently slow(er) in SWAN and not just sporadically? If this is the case I’d be very much interested in investigating this.

As for distributed analysis: does this really fit for exploratory studies? I rather thought of it for production level analysis and as an alternative to the batch system (HTCondor).

It fits anything you can’t run interactively on your own machine because it would take too long. Perhaps now your exploratory work can be done in a local SWAN session and you don’t need any offloading, but we are also thinking of what will be necessary in a few years with more data. Anyway, if you are interested in being an early user of this, just let me know.

No, only in SWAN and I have no way to compare to anything else. I general I have not used non-EOS based files since a very long time. Will get back to you if that reappears.

That’s definitely interesting for sure! If you don’t need fast feedback, I’d be interested to beta test :slight_smile:

Seems that now I should get interested in learning Dask:
uproot.lazy will be deprecated in favour of uproot.dask

And SWAN has everything you need to do that :slight_smile:

Hi @etejedor please point me to the dask-enabled SWAN installation whenever you will have it running! Thanks :slight_smile:

Sure, will do!