GPU support in SWAN

Hi @shelena
You are in the egroup for GPUs

Go to https://swan-k8s.cern.ch select the CUDA stack to get a gpu.

Cheers
Omar.

Thank you, Omar, for the fast response.
Running the same training is a factor of 2 faster with GPU. Not sure if this is expected, but it is encouraging. It is my first contact with GPUs.
Cheers,
Helena

1 Like

Hello,

Iā€™m not able to configure any server in
https://swan-k8s.cern.ch/hub/spawn/shelena
I get: Could not create required user credential.

No problem in https://swan004.cern.ch.
Thanks for help.

Best,
Helena

Hi @shelena,

There was an issue with the central authentication service, which prevented us from contacting EOS (SSB).
Things should be back again.

Sorry for the issue,
Diogo

Hi Diogo, yes, it is fixed. Thank you!
Helena

Hi,
Iā€™m having some problems while training ATLAS b-tagging data on Swan machines due to software versions incompatibilities. As this problems seem to be quite persistent, I instead tried to use the 100 Cuda 10.1 Python3 machine on Swan, https://swan-k8s.cern.ch/hub/spawn, however it appears that I canā€™t configure GPUs.
Once GPUs are of great help for the training processes in ATLAS, where Iā€™m inserted, I wanted to know if I can get access to GPUsā€™ machines and how do I get it.

Cheers,
JoĆ£o

Dear Joao,

I have granted you with permissions to use the GPUs on SWAN, it should take a few hours to be effective. After that, if you log in to https://swan-k8s.cern.ch and select the cuda stack you mention, you should get a GPU.

Thanks a lot!

Best wishes,
Joao

Hi,

I donā€™t know if Iā€™m the only one, but I lost the permission to access the 101 Cuda 11.2 GPU machine. Is this any temporary problem? At the moment, Iā€™m trying to train the lighter jobs on the 101 machine, but itā€™s being unsustainable.

Best regards,
Joao.

My understanding is that until the end of tomorrow we canā€™t use it because of the on-going school of computing

https://cern.service-now.com/service-portal?id=outage&n=OTG0070741

Cheers,

Pedro

Hello,

Yes that is correct, apologies for the inconveniences this can cause.

Thanks Pedro and Enric, I wasnā€™t aware about this ongoing activity.

Cheers,
Joao.

Hi again,

Iā€™m still getting the same error when trying to access the 101 Cuda machine. Iā€™ve been waiting since yesterday hopping that the Swan services take care of the problem, but it might be taking more time than the expected, thus, my question is when will this situation be regularized?

Best wishes,
Joao.

Hi,

I just restored the GPU access.

Cheers
Riccardo

Hi Riccardo,

Thanks.

Best regards,
Joao.

Would anyone be able to provide the hardware details for SWAN GPU acceleration?

Hi Thomas,

At the moment, if you start a SWAN session with a GPU attached you get a Tesla T4, just for you.

In the future we might rely on virtualization (either time sharing or physical sharing) depending on the demand and the available GPUs.

Cheers,

Enric

Thank you!

Hi Christian,
I am interested on using 102b Cuda 11.7.1 (GPU) for neural network training. Nevertheless, every time I tried to set up this software stack the following error message appears ā€œAccess to GPUs is not grantedā€. Thanks

Dear Daniela,

Please see my answer in the SNOW ticket you opened about this topic.

Best,

Enric