Introduction to Apache Spark - CERN training online and self-paced

canali · November 22, 2022, 1:10pm

Dear All,

To help people who are starting up with the use of Apache Spark services and/or its integration in SWAN, we have been happy to run introductory short classes on a yearly basis.
Please note that the course “Introduction to Apache Spark APIs for data processing” is now available anytime and online as a self-paced course.
You can find more details and enrol (for free) in the course from the CERN learning hub:

https://lms.cern.ch/ekp/servlet/ekp?TX=FORMAT1&LOTYPE=O&CID=EKP000044126

or you can go directly to the course website, if you prefer: https://sparktraining.web.cern.ch/

The course covers the main architectural components and key abstractions used by Spark, illustrated with a few examples of how to use the main Spark APIs: DataFrame API, Spark SQL, Streaming, Machine Learning.
The course also covers deploying Spark on CERN computing resources, notably by using the CERN SWAN service and its integrations with the CERN Hadoop clusters and the CERN cloud service.
Most tutorials and exercises are in Python and run on Jupyter notebooks.

We hope you’ll find this useful.
Feel free to forward this message to your team members and/or to people who are starting up with data processing, as needed.
Best,
Luca for the Data Analytics services