Demo 0: Example and usage

In order to make things simple the following rules have been followed during development:

  • deel-lip follows the keras package structure.

  • All elements (layers, activations, initializers, …) are compatible with standard the keras elements.

  • When a k-Lipschitz layer overrides a standard keras layer, it uses the same interface and the same parameters. The only difference is a new parameter to control the Lipschitz constant of a layer.

Which layers are safe to use?

The following table indicates which layers are safe to use in a Lipshitz network, and which are not.

layer

1-lip?

deel-lip equivalent

comments

Dense

no

SpectralDense
FrobeniusDense

SpectralDense and FrobeniusDense are similar when there is a single output.

Conv2D

no

SpectralConv2D
FrobeniusConv2D

SpectralConv2D also implements Björck normalization.

MaxPooling
GlobalMaxPooling

yes

n/a

AveragePooling2D
GlobalAveragePooling2D

no

ScaledAveragePooling2D
ScaledGlobalAveragePooling2D

The lipschitz constant is bounded by sqrt(pool_h * pool_h).

Flatten

yes

n/a

Dropout

no

None

The lipschitz constant is bounded by the dropout factor.

BatchNorm

no

None

We suspect that layer normalization already limits internal covariate shift.

Design tips

Designing lipschitz networks require a careful design in order to avoid vanishing/exploding gradient problem.

Choosing pooling layers:

layer

advantages

disadvantages

ScaledAveragePooling2D and MaxPooling2D

very similar to original implementation (just add a scaling factor for avg).

not norm preserving nor gradient norm preserving.

InvertibleDownSampling

norm preserving and gradient norm preserving.

increases the number of channels (and the number of parameters of the next layer).

ScaledL2NormPooling2D ( sqrt(avgpool(x**2)) )

norm preserving.

lower numerical stability of the gradient when inputs are close to zero.

Choosing activations:

layer

advantages

disadvantages

ReLU

create a strong vanishing gradient effect. If you manage to learn with it, please call 911.

MaxMin (stack([ReLU(x), ReLU(-x)]))

have similar properties to ReLU, but is norm and gradient norm preserving

double the number of outputs

GroupSort

Input and GradientNorm preserving. Also limit the need of biases (as it is shift invariant).

more computationally expensive, (when it’s parameter n is large)

Please note that when learning with the HKR_loss and HKR_multiclass_loss, no activation is required on the last layer.

How to use it ?

Open In Colab

Here is an example of 1-lipschitz network trained on MNIST:

from deel.lip.layers import (
    SpectralDense,
    SpectralConv2D,
    ScaledL2NormPooling2D,
    FrobeniusDense,
)
from deel.lip.model import Sequential
from deel.lip.activations import GroupSort
from deel.lip.losses import MulticlassHKR, MulticlassKR
from tensorflow.keras.layers import Input, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
import numpy as np

# Sequential (resp Model) from deel.model has the same properties as any lipschitz model.
# It act only as a container, with features specific to lipschitz
# functions (condensation, vanilla_exportation...) but The layers are fully compatible
# with the tf.keras.model.Sequential/Model
model = Sequential(
    [
        Input(shape=(28, 28, 1)),
        # Lipschitz layers preserve the API of their superclass ( here Conv2D )
        # an optional param is available: k_coef_lip which control the lipschitz
        # constant of the layer
        SpectralConv2D(
            filters=16,
            kernel_size=(3, 3),
            activation=GroupSort(2),
            use_bias=True,
            kernel_initializer="orthogonal",
        ),
        # usual pooling layer are implemented (avg, max...), but new layers are also available
        ScaledL2NormPooling2D(pool_size=(2, 2), data_format="channels_last"),
        SpectralConv2D(
            filters=16,
            kernel_size=(3, 3),
            activation=GroupSort(2),
            use_bias=True,
            kernel_initializer="orthogonal",
        ),
        ScaledL2NormPooling2D(pool_size=(2, 2), data_format="channels_last"),
        # our layers are fully interoperable with existing keras layers
        Flatten(),
        SpectralDense(
            32,
            activation=GroupSort(2),
            use_bias=True,
            kernel_initializer="orthogonal",
        ),
        FrobeniusDense(
            10, activation=None, use_bias=False, kernel_initializer="orthogonal"
        ),
    ],
    # similary model has a parameter to set the lipschitz constant
    # to set automatically the constant of each layer
    k_coef_lip=1.0,
    name="hkr_model",
)

# HKR (Hinge-Krantorovich-Rubinstein) optimize robustness along with accuracy
model.compile(
    # decreasing alpha and increasing min_margin improve robustness (at the cost of accuracy)
    # note also in the case of lipschitz networks, more robustness require more parameters.
    loss=MulticlassHKR(alpha=50, min_margin=0.05),
    optimizer=Adam(1e-3),
    metrics=["accuracy", MulticlassKR()],
)

model.summary()

# load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# standardize and reshape the data
x_train = np.expand_dims(x_train, -1)
mean = x_train.mean()
std = x_train.std()
x_train = (x_train - mean) / std
x_test = np.expand_dims(x_test, -1)
x_test = (x_test - mean) / std
# one hot encode the labels
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# fit the model
model.fit(
    x_train,
    y_train,
    batch_size=2048,
    epochs=30,
    validation_data=(x_test, y_test),
    shuffle=True,
)

# once training is finished you can convert
# SpectralDense layers into Dense layers and SpectralConv2D into Conv2D
# which optimize performance for inference
vanilla_model = model.vanilla_export()
2021-09-09 18:20:38.651881: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-09 18:20:41.859471: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-09 18:20:41.859959: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-09-09 18:20:41.887947: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-09 18:20:41.888196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2021-09-09 18:20:41.888209: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-09 18:20:41.889435: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-09 18:20:41.889461: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-09-09 18:20:41.889997: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-09-09 18:20:41.890121: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-09-09 18:20:41.891391: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-09-09 18:20:41.891695: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-09-09 18:20:41.891762: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-09-09 18:20:41.891814: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-09 18:20:41.892071: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-09 18:20:41.892288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-09-09 18:20:41.892775: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-09 18:20:41.892838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-09 18:20:41.893060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 SUPER computeCapability: 7.5
coreClock: 1.785GHz coreCount: 40 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2021-09-09 18:20:41.893071: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-09 18:20:41.893079: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-09 18:20:41.893086: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-09-09 18:20:41.893094: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-09-09 18:20:41.893101: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-09-09 18:20:41.893107: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-09-09 18:20:41.893115: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-09-09 18:20:41.893122: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-09-09 18:20:41.893153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-09 18:20:41.893390: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-09 18:20:41.893601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-09-09 18:20:41.893617: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-09 18:20:42.348799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-09 18:20:42.348820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-09-09 18:20:42.348824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2021-09-09 18:20:42.348955: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-09 18:20:42.349207: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-09 18:20:42.349427: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-09 18:20:42.349634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7250 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
/home/thibaut.boissin/projects/repo_github/deel-lip/deel/lip/model.py:56: UserWarning: Sequential model contains a layer wich is not a Lipschitz layer: flatten
  layer.name
Model: "hkr_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
spectral_conv2d (SpectralCon (None, 28, 28, 16)        321
_________________________________________________________________
scaled_l2norm_pooling2d (Sca (None, 14, 14, 16)        0
_________________________________________________________________
spectral_conv2d_1 (SpectralC (None, 14, 14, 16)        4641
_________________________________________________________________
scaled_l2norm_pooling2d_1 (S (None, 7, 7, 16)          0
_________________________________________________________________
flatten (Flatten)            (None, 784)               0
_________________________________________________________________
spectral_dense (SpectralDens (None, 32)                50241
_________________________________________________________________
frobenius_dense (FrobeniusDe (None, 10)                640
=================================================================
Total params: 55,843
Trainable params: 27,920
Non-trainable params: 27,923
_________________________________________________________________
2021-09-09 18:20:43.638117: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-09-09 18:20:43.656873: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3600000000 Hz
Epoch 1/30
2021-09-09 18:20:45.586440: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-09 18:20:45.805767: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-09-09 18:20:45.815934: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
30/30 [==============================] - 4s 52ms/step - loss: 5.7859 - accuracy: 0.3059 - MulticlassKR: 0.0994 - val_loss: 0.7743 - val_accuracy: 0.8195 - val_MulticlassKR: 0.3336
Epoch 2/30
30/30 [==============================] - 1s 35ms/step - loss: 0.5617 - accuracy: 0.8488 - MulticlassKR: 0.3664 - val_loss: 0.2028 - val_accuracy: 0.8998 - val_MulticlassKR: 0.4562
Epoch 3/30
30/30 [==============================] - 1s 40ms/step - loss: 0.1443 - accuracy: 0.9037 - MulticlassKR: 0.4800 - val_loss: -0.0439 - val_accuracy: 0.9243 - val_MulticlassKR: 0.5668
Epoch 4/30
30/30 [==============================] - 1s 39ms/step - loss: -0.0865 - accuracy: 0.9233 - MulticlassKR: 0.6017 - val_loss: -0.2614 - val_accuracy: 0.9352 - val_MulticlassKR: 0.7281
Epoch 5/30
30/30 [==============================] - 1s 44ms/step - loss: -0.3090 - accuracy: 0.9345 - MulticlassKR: 0.7771 - val_loss: -0.5085 - val_accuracy: 0.9448 - val_MulticlassKR: 0.9635
Epoch 6/30
30/30 [==============================] - 1s 35ms/step - loss: -0.5742 - accuracy: 0.9418 - MulticlassKR: 1.0413 - val_loss: -0.8245 - val_accuracy: 0.9469 - val_MulticlassKR: 1.3165
Epoch 7/30
30/30 [==============================] - 1s 36ms/step - loss: -0.8896 - accuracy: 0.9426 - MulticlassKR: 1.4164 - val_loss: -1.2121 - val_accuracy: 0.9464 - val_MulticlassKR: 1.7998
Epoch 8/30
30/30 [==============================] - 1s 35ms/step - loss: -1.3101 - accuracy: 0.9430 - MulticlassKR: 1.9421 - val_loss: -1.7661 - val_accuracy: 0.9515 - val_MulticlassKR: 2.4609
Epoch 9/30
30/30 [==============================] - 1s 47ms/step - loss: -1.8807 - accuracy: 0.9425 - MulticlassKR: 2.6451 - val_loss: -2.4294 - val_accuracy: 0.9480 - val_MulticlassKR: 3.2977
Epoch 10/30
30/30 [==============================] - 1s 43ms/step - loss: -2.5482 - accuracy: 0.9444 - MulticlassKR: 3.4797 - val_loss: -3.0506 - val_accuracy: 0.9478 - val_MulticlassKR: 4.1679
Epoch 11/30
30/30 [==============================] - 1s 38ms/step - loss: -3.1723 - accuracy: 0.9439 - MulticlassKR: 4.3124 - val_loss: -3.6976 - val_accuracy: 0.9475 - val_MulticlassKR: 4.9445
Epoch 12/30
30/30 [==============================] - 1s 34ms/step - loss: -3.7133 - accuracy: 0.9441 - MulticlassKR: 5.0248 - val_loss: -4.2211 - val_accuracy: 0.9525 - val_MulticlassKR: 5.5240
Epoch 13/30
30/30 [==============================] - 1s 37ms/step - loss: -4.1847 - accuracy: 0.9456 - MulticlassKR: 5.5629 - val_loss: -4.5868 - val_accuracy: 0.9538 - val_MulticlassKR: 5.9152
Epoch 14/30
30/30 [==============================] - 1s 46ms/step - loss: -4.4194 - accuracy: 0.9447 - MulticlassKR: 5.9083 - val_loss: -4.8092 - val_accuracy: 0.9530 - val_MulticlassKR: 6.2309
Epoch 15/30
30/30 [==============================] - 1s 42ms/step - loss: -4.6380 - accuracy: 0.9473 - MulticlassKR: 6.1855 - val_loss: -4.9103 - val_accuracy: 0.9499 - val_MulticlassKR: 6.4634
Epoch 16/30
30/30 [==============================] - 1s 36ms/step - loss: -4.8019 - accuracy: 0.9476 - MulticlassKR: 6.3995 - val_loss: -5.1251 - val_accuracy: 0.9541 - val_MulticlassKR: 6.6381
Epoch 17/30
30/30 [==============================] - 1s 40ms/step - loss: -4.9292 - accuracy: 0.9503 - MulticlassKR: 6.5580 - val_loss: -5.2763 - val_accuracy: 0.9563 - val_MulticlassKR: 6.7558
Epoch 18/30
30/30 [==============================] - 1s 35ms/step - loss: -5.0473 - accuracy: 0.9504 - MulticlassKR: 6.6735 - val_loss: -5.3574 - val_accuracy: 0.9554 - val_MulticlassKR: 6.8654
Epoch 19/30
30/30 [==============================] - 1s 41ms/step - loss: -5.1484 - accuracy: 0.9503 - MulticlassKR: 6.7765 - val_loss: -5.4485 - val_accuracy: 0.9561 - val_MulticlassKR: 6.9638
Epoch 20/30
30/30 [==============================] - 1s 47ms/step - loss: -5.2245 - accuracy: 0.9506 - MulticlassKR: 6.8670 - val_loss: -5.5184 - val_accuracy: 0.9558 - val_MulticlassKR: 7.0767
Epoch 21/30
30/30 [==============================] - 1s 35ms/step - loss: -5.3259 - accuracy: 0.9507 - MulticlassKR: 6.9613 - val_loss: -5.5777 - val_accuracy: 0.9573 - val_MulticlassKR: 7.1658
Epoch 22/30
30/30 [==============================] - 1s 35ms/step - loss: -5.4587 - accuracy: 0.9519 - MulticlassKR: 7.0682 - val_loss: -5.7211 - val_accuracy: 0.9595 - val_MulticlassKR: 7.2207
Epoch 23/30
30/30 [==============================] - 1s 37ms/step - loss: -5.5685 - accuracy: 0.9534 - MulticlassKR: 7.1410 - val_loss: -5.7894 - val_accuracy: 0.9618 - val_MulticlassKR: 7.2921
Epoch 24/30
30/30 [==============================] - 1s 35ms/step - loss: -5.4871 - accuracy: 0.9533 - MulticlassKR: 7.1789 - val_loss: -5.8136 - val_accuracy: 0.9606 - val_MulticlassKR: 7.3730
Epoch 25/30
30/30 [==============================] - 1s 46ms/step - loss: -5.6827 - accuracy: 0.9551 - MulticlassKR: 7.2730 - val_loss: -5.9069 - val_accuracy: 0.9588 - val_MulticlassKR: 7.4427
Epoch 26/30
30/30 [==============================] - 1s 34ms/step - loss: -5.7042 - accuracy: 0.9556 - MulticlassKR: 7.3001 - val_loss: -5.9921 - val_accuracy: 0.9606 - val_MulticlassKR: 7.4756
Epoch 27/30
30/30 [==============================] - 1s 48ms/step - loss: -5.7871 - accuracy: 0.9549 - MulticlassKR: 7.3868 - val_loss: -6.0014 - val_accuracy: 0.9609 - val_MulticlassKR: 7.5259
Epoch 28/30
30/30 [==============================] - 1s 38ms/step - loss: -5.8166 - accuracy: 0.9548 - MulticlassKR: 7.3946 - val_loss: -5.9561 - val_accuracy: 0.9573 - val_MulticlassKR: 7.5932
Epoch 29/30
30/30 [==============================] - 1s 36ms/step - loss: -5.8229 - accuracy: 0.9551 - MulticlassKR: 7.4779 - val_loss: -6.1211 - val_accuracy: 0.9593 - val_MulticlassKR: 7.6141
Epoch 30/30
30/30 [==============================] - 1s 34ms/step - loss: -5.9549 - accuracy: 0.9559 - MulticlassKR: 7.5246 - val_loss: -6.2155 - val_accuracy: 0.9606 - val_MulticlassKR: 7.6790