Tensorflow_backend

shelena · February 5, 2021, 5:44pm

Hello, I’m a new user of SWAN. I have several warnings of the same kind when running my Jupiter notebook. I put here one as example:
WARNING: Logging before flag parsing goes to stderr. W0205 17:08:25.477203 140255624709952 deprecation_wrapper.py:119] From /cvmfs/sft.cern.ch/lcg/views/LCG_97apython3/x86_64-centos7-gcc8-opt/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

Later on I get also ERRORs preventing from running:
KeyError Traceback (most recent call last)
in
16 batch_size=3000,
17 callbacks=callbacks,
—> 18 verbose=1
19 )

/cvmfs/sft.cern.ch/lcg/views/LCG_97apython3/x86_64-centos7-gcc8-opt/lib/python3.7/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
1037 initial_epoch=initial_epoch,
1038 steps_per_epoch=steps_per_epoch,
→ 1039 validation_steps=validation_steps)
1040
1041 def evaluate(self, x=None, y=None,

At the beginning of the notebook I have:
from keras.layers import BatchNormalization
from keras.layers import Dense, Activation, Input, add
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau, Callback

import os
os.environ[‘KMP_DUPLICATE_LIB_OK’]=‘True’

Should I change something?
Thank you in advance.
Helena Santos

adavid · February 5, 2021, 8:10pm

Hi Helena,

I’m not an expert, but it seems that what you are seeing is not a SWAN issue, but rather a TF-related warning. From this SO thread it does not seem like an issue but YMMV.

Cheers,

André

shelena · February 5, 2021, 8:56pm

Olá André,
When running my notebook in SWAN it returns tensorflow version 1.14. At hub.cern.ch it is 2.0 and I do not have these warnings. So there must be a way to import version 2.0 when running in SWAN, right?
Thank you.
Best wishes,
Helena

mato · February 6, 2021, 2:19pm

It is true that the default software stack in normal SWAN is quite old (97a).
Use the Bleeding edge in this case:

Python 3.8.6 (default, Dec 11 2020, 21:39:59)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'2.3.0'

With 99 cuda 10.1 python3 in the GPU nodes:

bash-4.2$ python
Python 3.7.6 (default, Aug 12 2020, 09:46:40)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'2.3.0'

etejedor · February 8, 2021, 8:43am

Hi Helena,

We will deploy LCG 99 this week in SWAN, which has tensorflow 2.3.0. As Pere said, in the meantime you can select the Bleeding Edge software stack when you start your SWAN session.

Another possibility is that you connect to https://swan-k8s.cern.ch, where we are preparing the new production instance of SWAN. This new instance provides GPUs - you just need to select the 99 Cuda 10.1 Python3 software stack and a GPU will be attached to your SWAN session. This means that you will be able to run tensorflow much faster on that GPU. I’ve just added you to the list of users allowed to access that instance.

Cheers,
Enric

shelena · February 8, 2021, 10:57am

Dear all,
Thanks a lot. Either Bleeding Edge or https://swan-k8s.cern.ch fixes those WARNINGS.
However, in both cases the maximum memory allocated to the container is just 10 GB, whereas in 97c it is 16 GB. Unless I use the latter I’m not able to run because the kernel dies when concatenating a large number of files. Is it possible to allocate more memory when running under that software stacks?
Thanks again.
Best wishes,
Helena

etejedor · February 8, 2021, 1:16pm

Yes that should be possible, I’ll get back to you on that.

etejedor · February 9, 2021, 9:02am

Hello Helena,
If you access https://swan006.cern.ch directly now, LCG 99 is already present and it allows you to select 16 GB of memory. You can use that machine specifically in the meantime, but we’ll propagate this change to the rest of the SWAN machines soon.

shelena · February 9, 2021, 12:08pm

Hi Enric,
Awesome! Thank you.

shelena · February 10, 2021, 4:58pm

I’m sorry for backing again. Now there is some non backward compatibility. If the error below is not trivial to you, do you know to which community should I address it?
The error is:

Input layer

inputs = Input(shape=(X_train.shape[1],))
…

AttributeError Traceback (most recent call last)
in
1 # Input layer
----> 2 inputs = Input(shape=(X_train.shape[1],))
3 # number of nodes in the different hidden layers
4 l_units = [72, 57, 60, 48, 36, 24, 12, 6]
5 x = inputs

/cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages/keras/engine/input_layer.py in Input(shape, batch_shape, name, dtype, sparse, tensor)
173 if not dtype:
174 dtype = K.floatx()
→ 175 input_layer = InputLayer(batch_input_shape=batch_shape,
176 name=name, dtype=dtype,
177 sparse=sparse,

/cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your ' + object_name + ' call to the ’ +
90 'Keras 2 API: ’ + signature, stacklevel=2)
—> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper

/cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages/keras/engine/input_layer.py in init(self, input_shape, batch_size, batch_input_shape, dtype, input_tensor, sparse, name)
37 if not name:
38 prefix = ‘input’
—> 39 name = prefix + ‘_’ + str(K.get_uid(prefix))
40 super(InputLayer, self).init(dtype=dtype, name=name)
41

/cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages/keras/backend/tensorflow_backend.py in get_uid(prefix)
72 “”"
73 global _GRAPH_UID_DICTS
—> 74 graph = tf.get_default_graph()
75 if graph not in _GRAPH_UID_DICTS:
76 _GRAPH_UID_DICTS[graph] = defaultdict(int)

AttributeError: module ‘tensorflow’ has no attribute ‘get_default_graph’

The second cell has:
from keras.layers import BatchNormalization
from keras.layers import Dense, Activation, Input, add
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau, Callback

Should I add something?

ozapatam · February 10, 2021, 6:54pm

Hello @shelena

try changing the imports
replace
from keras.layers import BatchNormalization
by
from tensorflow.keras.layers import BatchNormalization

change all of them.
Cheers
Omar.

shelena · February 10, 2021, 7:30pm

Fixes. Thank you!
Maybe you can help further. In a cell below I get:
model.fit(X_train, Y_train,
validation_data=[X_test[:], Y_test[:]],
epochs=num_of_epochs, # typically ~130 are necessary to converge
#batch_size=32,
batch_size=3000,
callbacks=callbacks,
verbose=1
)
Epoch 1/2
240/242 [============================>.] - ETA: 0s - loss: 0.7936 - accuracy: 0.6215

AttributeError Traceback (most recent call last)
in
10
11 #model.fit?
—> 12 model.fit(X_train, Y_train,
13 validation_data=[X_test[:], Y_test[:]],
14 epochs=num_of_epochs, # typically ~130 are necessary to converge

/cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
106 def _method_wrapper(self, *args, **kwargs):
107 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
→ 108 return method(self, *args, **kwargs)
109
110 # Running inside run_distribute_coordinator already.

/cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1135 epoch_logs.update(val_logs)
1136
→ 1137 callbacks.on_epoch_end(epoch, epoch_logs)
1138 training_logs = epoch_logs
1139 if self.stop_training:

/cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py in on_epoch_end(self, epoch, logs)
414 if numpy_logs is None: # Only convert once.
415 numpy_logs = tf_utils.to_numpy_or_python_type(logs)
→ 416 callback.on_epoch_end(epoch, numpy_logs)
417
418 def on_train_batch_begin(self, batch, logs=None):

in on_epoch_end(self, epoch, logs)
46 dict_epoch = {
47 “epoch”: epoch,
—> 48 “loss”: logs[‘loss’].astype(np.float64),
49 “acc”: logs[‘accuracy’].astype(np.float64),
50 “val_loss”: logs[‘val_loss’],

AttributeError: ‘float’ object has no attribute ‘astype’

If I comment callbacks=callbacks, error vanishes, but I believe that this is not the solution.

etejedor · February 11, 2021, 8:22am

Hi Helena,

Do you think this is a problem of the tensorflow installation in the LCG releases or rather a migration problem between tensorflow versions? If it’s the latter, perhaps tensorflow-related forums would be of more help.

shelena · February 11, 2021, 10:09am

Hi Enric,
I think it is the latter. You are right, I’ll try another forum.
Thanks and cheers,

sandrean · February 11, 2021, 2:47pm

Hi Enric, I am also very interested in trying out instances with GPU mounted.
Can you please add me to the list if that’s possible?
Thanks!

etejedor · February 11, 2021, 3:22pm

Hi Yosse,

I just granted you with access to https://swan-k8s.cern.ch (it should take a bit to be effective, please try later in the evening).

In the web form, start your session with “99 Cuda 10.1 Python3” as Software stack. That will attach a GPU to your session.

Once you are in your SWAN session, you can create projects and notebooks as usual in SWAN. The difference is that there are some libraries in your environment that have been compiled with GPU support. For example, if you use tensorflow from a Python notebook, it will offload computations to the GPU. You also have the nvcc compiler available which you can use e.g. from a terminal in SWAN.

shelena · February 11, 2021, 3:55pm

Dear colleagues,
I’m having troubles when querying a dataframe, specifically in:
jets.query(’(HadronConeExclTruthLabelID <= 5)’, inplace=True)
…
TypeError: visit_Constant() got an unexpected keyword argument ‘side’

There is related post in a forum that says: python 3.8 should not be supported at pandas 0.24 . Well in fact in LCG_99 we have:
python: 3.8.6.final.0
…
pandas: 0.24.2

I wonder if un upgrade of pandas could fix this query problem.

Thanks a lot!

Helena

etejedor · February 11, 2021, 4:13pm

Hi Helena,

Could you check that upgrading pandas actually solves the issue? Please install a newer version of pandas on your CERNBox:

https://swan.docs.cern.ch/advanced/install_packages/

(the --upgrade flag will be necessary in your case because pandas exists already in the LCG releases).

If that fixes the issue, then please open a ticket to the SPI team to request an update of the pandas package in the LCG releases:

https://sft.its.cern.ch/jira/projects/SPI

shelena · February 11, 2021, 5:35pm

Hi Enric,
$pip install pandas --upgrade
Collecting pandas
Using cached https://files.pythonhosted.org/packages/31/a4/c10f07959fd58ffd066518e3f163f55d40bc033191184a8903e660d88c03/pandas-1.2.2-cp38-cp38-manylinux1_x86_64.whl
Requirement already satisfied, skipping upgrade: python-dateutil>=2.7.3 in /cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages (from pandas) (2.8.0)
Requirement already satisfied, skipping upgrade: pytz>=2017.3 in /cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages (from pandas) (2019.1)
Requirement already satisfied, skipping upgrade: numpy>=1.16.5 in /cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages (from pandas) (1.18.2)
Requirement already satisfied, skipping upgrade: six>=1.5 in /cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.12.0)
Installing collected packages: pandas
Found existing installation: pandas 0.24.2
Uninstalling pandas-0.24.2:
: ERROR: Could not install packages due to an EnvironmentError: [Errno 30] Read-only file system: ‘conftest.py’

etejedor · February 12, 2021, 8:00am

Hi,

You also need --user (see docs link above).

Tensorflow_backend

Input layer

inputs = Input(shape=(X_train.shape[1],)) …

inputs = Input(shape=(X_train.shape[1],))
…