Using oauth for api access to SWAN

grigolet · December 5, 2020, 4:41pm

Hello,
I’ll quickly explain my case first: I have a data acquisition software running on a machine and I would like that each time a “run” is performed, a jupyter notebook is created programmatically from a template so that the user can run it, configure, etc.

I wanted to test jupyterhub apis but when trying to use CERN’s auth api access (see link in step 2 below) for the swan-service I get a response of the type:
{"error":"unknown_error"}.

This doesn’t happened with other registered services, so is there some configuration in the swan-service application that doesn’t allow to do this?

I don’t know if this is the correct way to authenticate swan so my flow may be wrong.

Steps to reproduce the issue:

Create a new application on https://application-portal.web.cern.ch/
Register an API client: https://auth.docs.cern.ch/user-documentation/oidc/api-access/
Try sending the following requests with the obtained credentials

curl --location --request POST "https://auth.cern.ch/auth/realms/cern/api-access/token" \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'grant_type=client_credentials' \
--data-urlencode "client_id=api-client-id" \
--data-urlencode "client_secret=00000000-0000-0000-0000-000000000000" \
--data-urlencode "audience=swan-service"
# {"error":"unknown_error"}

dalvesde · December 7, 2020, 8:45am

Dear Gianluca,

We will need more details on your implementation to decide wether to grant this access or not. But, given that this request would grant you a lot of power on the whole service (and dependencies, like EOS), I can tell you now that, unfortunately, this will probably get rejected.

How do you plan to create the notebooks? Are you going to start a user session/container and call some api inside it? Or you only need access to storage to save a file? If so, isn’t a shared storage space, like a EOS Project, enough?
We really need full details on what you’re planning to do.

If the only thing you need is access to the JupyterHub API (right now I’m not sure if this would allow you to do any action inside the user session itself), you can ask your users to generate a “User token” in the “Token” tab that appears when you are in the session configuration page.

Cheers,
Diogo

grigolet · December 7, 2020, 9:21am

Dear Diogo,
thanks a lot for claryfing I thought I was making some mistake
in the calls to the APIs.

Considering my use case: in our group we use swan to analyse data
collected from some detectors (data is already stored on a EOS project). For each run we create a notebook from a template notebook to run the analysis.
My first idea was to automatically create a notebook for each run so that we can easily access it in SWAN and run analysis.
After finding out jupyter hub has APIs to spawn processes and running notebooks I also considered to test them to see if I could already run notebooks so that they could be ready to be presented to the end user.

After your reply I get to understand the access to APIs must be really motivated by a strong need so since I don’t have one I would try to run a jupyter hub instance myself on openshift (it’s a little bit of a pity though since SWAN is working really well), skip trying implementing the feature or work around the oauth limitation by changing the authorization workflow and let the user login via a browser on my daq software (if this is compatible with your terms of usage).

Cheers,
Gianluca

dalvesde · December 8, 2020, 8:24am

But can you give us more detail on the APIs you call?
This requires a proper discussion, so I propose I introduce this issue internally (we have a meeting tomorrow) and then, maybe we can have a meeting together to get more details.
I might need some input from the security team as well before reaching a decision. But I would be interested at least in investigating a bit more, since this might become a use case in the future.

Diogo

grigolet · December 8, 2020, 4:52pm

Hi Diogo,
the APIs that I’ll be using:

If I got it right I need to authenticate on swan using CERN’s oauth. To do that I will use the CERN auth APIs mentioned in the first post. From this point I should be able to get a token by using /authorizations/token endpoint.
Then I’m not 100% if I need this but I would need to spawn a server under my username. This is the part that I’m still trying to figure out how to do via REST APIs
Once I figured out how to spawn a single-user server I should be
able to use jupyter APIs to create contents in my projects. This should be achieved in particular using the api/contents/{path} endpoint.

Perhaps the steps are not in the correct order or some pieces are missing but this is more or less the set of APIs I found out online that I would like to use.

I don’t think it will be a problem to access jupyter APIs as not being admin I can’t access lot of admin endpoints (for example /groups, /users, /services).

The only thing I am missing is for now the ability to use Oauth client_credentials workflow to avoid to open a browser to authenticate an user account running a script.

I am currently testing the feasability of the notebook creation by creating manually a token as you suggested.

Cheers,
Gianluca

dalvesde · December 9, 2020, 12:28pm

Hi Gianluca,

We discussed internally the possibility of allowing the exchange of tokens and our position is as follows: we will grant the permission to exchange tokens to any official service. If it’s not official, that won’t be possible. So, if your service becomes official, we will do it.

Having said that, as I told you before, you can use the JupyterHub token* to start a session in SWAN via the JupyterHub api. To have the environment just like the one you would have if using the UI, you need to pass in the options we expect, like the lcg stack, platform etc (please check the form HTML code for the names). If not, the spawn will fail.

*Even though I expect a JH token to work right now, that won’t be the case in a very near future, as we will start using the oAuth tokens to talk to EOS. So, you will really have to authenticate to the SSO otherwise there won’t be EOS and the session won’t start.

But… If your problem is avoid the user having to authenticate to the SSO in a script, I’m not sure the token exchange would help, as you still have to log to your service. I suggest you have a look at auth-get-sso-cookie (which can even be used as a library). This will allow your users to use their kerberos tickets (I’m not sure you have those, but maybe you could) and automatically log in to the SSO. And you can specify the SWAN url, which means that your users would be logged in to SWAN without us having to exchange tokens! They would have the SWAN token from the start and you would be able to call any API in JH and Jupyter.

Hope this helps.
Let me know if this last option works.
Diogo

grigolet · December 9, 2020, 5:18pm

Hi Diogo,
Regarding the JupyterHub api token:

Having said that, as I told you before, you can use the JupyterHub token* to start a session in SWAN via the JupyterHub api. To have the environment just like the one you would have if using the UI, you need to pass in the options we expect, like the lcg stack, platform etc (please check the form HTML code for the names). If not, the spawn will fail.

I tested this option and I’m having some issues with it. Let’s say I manually login to swan, get a token save it for myself and then logout. Consider the following snippet:

import requests

api_token = "<my-manually-requested-token>"
spawn_options = {'LCG-rel': 'LCG_97apython3',
'platform': 'x86_64-centos7-gcc8-opt',
'scriptenv': '',
'ncores': '2',
'memory': '8',
'spark-cluster': 'none'}
endpoint = 'https://swan002.cern.ch/hub/api/users/grigolet/server'
r = requests.post(endpoint, json=spawn_options, headers={'Authorization': f'token {api_token}'})

print(r.content)
# b'{"status": 500, "message": "error"}'

Since I don’t have the logs I don’t know if it could be related to the fact the token is invalid or the user configuration is wrong (I copied it from the form data sent by checking my browser).

I will check if the auth-get-sso-cookie with kerberos option and let you know if that could work

Cheers,
Gianluca

dalvesde · December 9, 2020, 7:05pm

'ncores': 2,
'memory': '8G',

will do the trick. And then you can call the Jupyter API without problems (i.e GET https://swan002.cern.ch/user/xxxx/api/config/tree).

grigolet · December 10, 2020, 5:49pm

Hi Diogo,
so I played around your suggestion of getting kerberos tickets and use auth-get-sso-cookie to get the cookies.
I think this solution is working fine for me. I got some random 500 errors when trying to start the server but I’m sure I can find a way to work around it.

I’ll leave a snippet of what I did to interact with the APIs with some example such as starting/stopping a server, getting a folder content, creating tokens, etc.

Thanks for the support,
Gianluca