Jupyterhub image is needed to be updated for Kubernetes > 1.16

Hi,

I am trying to deploy boxed on my kubernetes cluster. The version of Kubernetes is as following:

# kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:56:40Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.2", GitCommit:"52c56ce7a8272c798dbc29846288d7cd9fbae032", GitTreeState:"clean", BuildDate:"2020-04-16T11:48:36Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

I have struggled to deploy boxed on this version of Kubernetes because of some deprecated APIs as shown here: https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/

E.g. Deployment in “extensions/v1beta1” or “apps/v1beta1” should be migrated to “apps/v1” and the most of cases “spec.selector” is required.

Anyway, I have been through and so far it was very nicely deployed. Below is the current status of deployment of boxed:

# kubectl get svc,pod,ep,deployment,daemonset -n boxed
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                               AGE
service/cernbox        ClusterIP   None             <none>        80/TCP,443/TCP                        5h17m
service/cernboxmysql   ClusterIP   10.102.32.132    <none>        3306/TCP                              5h17m
service/cvmfssquid     ClusterIP   10.102.139.234   <none>        3128/TCP                              21m
service/eos-fst1       ClusterIP   None             <none>        1095/TCP,8001/TCP                     5h58m
service/eos-fst2       ClusterIP   None             <none>        1095/TCP,8001/TCP                     5h58m
service/eos-mgm        ClusterIP   None             <none>        1094/TCP,1096/TCP,1097/TCP,8000/TCP   5h58m
service/ldap           ClusterIP   10.105.233.119   <none>        389/TCP,636/TCP                       2d1h

NAME                                 READY   STATUS    RESTARTS   AGE
pod/cernbox-676c86cfc6-4dj96         1/1     Running   0          5h17m
pod/cernboxgateway-d7f97bc44-qtsbc   1/1     Running   0          5h17m
pod/cernboxmysql-8657b48b9c-8n4w8    1/1     Running   0          5h17m
pod/cvmfssquid-5c48bc97c9-vtvnf      1/1     Running   0          21m
pod/eos-fst1                         1/1     Running   0          5h58m
pod/eos-fst2                         1/1     Running   0          5h58m
pod/eos-mgm                          1/1     Running   0          5h58m
pod/ldap-7c6546b567-btgds            1/1     Running   0          2d1h
pod/swan-5f9584cbbc-6r4lh            1/1     Running   0          21m
pod/swan-daemons-r2mdg               3/3     Running   0          21m

NAME                     ENDPOINTS                                                  AGE
endpoints/cernbox        10.36.0.2:443,10.36.0.2:80                                 5h17m
endpoints/cernboxmysql   10.36.0.1:3306                                             5h17m
endpoints/cvmfssquid     10.44.0.1:3128                                             21m
endpoints/eos-fst1       10.45.0.1:8001,10.45.0.1:1095                              5h58m
endpoints/eos-fst2       10.39.0.3:8001,10.39.0.3:1095                              5h58m
endpoints/eos-mgm        10.40.0.1:1096,10.40.0.1:1094,10.40.0.1:8000 + 1 more...   5h58m
endpoints/ldap           10.42.0.1:636,10.42.0.1:389                                2d1h

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cernbox          1/1     1            1           5h17m
deployment.apps/cernboxgateway   1/1     1            1           5h17m
deployment.apps/cernboxmysql     1/1     1            1           5h17m
deployment.apps/cvmfssquid       1/1     1            1           21m
deployment.apps/ldap             1/1     1            1           2d1h
deployment.apps/swan             1/1     1            1           21m

NAME                          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR        AGE
daemonset.apps/swan-daemons   1         1         1       1            1           nodeApp=swan-users   21m

EOS and CERNBox are as well working. Few minor issue is that filename cannot be changed in CERNBox because eos claims that it cannot access to the renamed file due to its absence, which is very strange… E.g.

200611 08:54:51 time=1591858491.517375 func=Emsg                     level=ERROR logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@eos-mgm.eos-mgm.boxed.svc.cluster.local:1094 tid=00007fb518efd700 source=XrdMgmOfs:826                  tident=<single-exec> sec=      uid=0 gid=0 name= geo="" Unable to access /eos/docker/user/u/user0/cernbox/test/new_textfile2_rename.txt; No such file or directory
200611 08:54:51 time=1591858491.517435 func=MakeResult               level=ERROR logid=static.............................. unit=mgm@eos-mgm.eos-mgm.boxed.svc.cluster.local:1094 tid=00007fb518efd700 source=ProcCommand:578                tident= sec=(null) uid=99 gid=99 name=- geo="" error: Unable to access /eos/docker/user/u/user0/cernbox/test/new_textfile2_rename.txt; No such file or directory (errno=2)

What I understood from these logs is that the behavior of eos when filename is asked to be changed to try copying a new file (with new name) and then deleting the original one. But it doesn’t look like. It tries first to look for the new file… I wonder the rename function is working at CERN’s CERNBox. Maybe it was already fixed in new release of CERNBox.

For SWAN, I can login but it fails to spawn notebook with the following error:

'<' not supported between instances of 'datetime.datetime' and 'NoneType'

According to this thread: https://github.com/jupyterhub/kubespawner/issues/354, the fix was included in jupyterhub 0.9.0-beta3. I wonder which version boxed is using for jupyterhub and if it is possible it would be very much appreciate if one can release new image of jupyterhub for boxed. The current version included in boxed is v1.9 and its url is gitlab-registry.cern.ch/swan/docker-images/jupyterhub:v1.9.

Thank you.

Best regards,
Sang-Un

Hello Sang-Un

maybe @ebocchi can give us a hand :slight_smile:

I will take a look in the jupyterhub issue.
Cheers
Omar.

Hello Sang-Un

related to the jupyterhub issue, we can see that the official python support for kubernetes is until 1.15 version


sciencebox is validated only until 1.15, all the cloud providers still support 1.15 (in fact from 1.13 onwards).
We suggest you, use a supported version of kubernetes until we develop the support for the new versions.
Cheers
Omar.

Hello Sang-Un,

Thank you for reporting your issue.
As Omar correctly said, there are some incompatibilities with k8s 1.18 at the moment, and we recommend to use kubernetes up to its 1.15 version.
We are working on fixes to support more recent versions as well, but these are not ready for the moment and I cannot provide a precise timeline. If this is not disruptive for other applications you run in your cluster, I would suggest to roll back to k8s 1.15.

Cheers,
Enrico

Hi @ozapatam and @ebocchi,

Thank you so much for the reply. I will try downgrading Kubernetes to 1.15 and then deploying again. This is just a test-bed setup so it would not be any problem. I really enjoyed the deployment process :slight_smile:

By the way, could I get some advice on operation ScienceBox in production? E.g. I wonder how CERN is operating CERNBox and SWAN services in production. Are you using ScienceBox on Kubernetes clusters or deploying CERNbox standalone on top of the existing EOS instances at CERN and integrating them with SWAN, which is as well independently deployed? Is it possible for CERNBox and/or SWAN to be integrated with the latest EOS instance (4.7.7) with QDB backend?

For our case, I do not expect a huge number of access or users but about tens of users (about 10 ~ 20 connections at the same time daily) when we have CERNBox and SWAN (Jupyterhub) at the site. Do you see that in this case the basic setup of ScienceBox can afford? To give you some details, currently we have 9 nodes for the Kubernetes cluster (each has 72 threads and 384GB memory) and expect to have 2 more nodes. About 500TB NAS storage is already there in production.

And the last question is about the performance. Even though this is a local setup but the access to CERNBox is somehow too slow. Precisely, login and logout take minutes. This is true as well when loggin in SWAN. Other activities such as uploading, deleting, creating a text file or a directory are instantaneous. Do you have any suggestions where to look into?

Thank you.

Best regards,
Sang-Un