Demo 3: HKR classifier on MNIST dataset

This notebook will demonstrate learning a binary task on the MNIST0-8 dataset.

import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import ReLU
from tensorflow.keras.optimizers import Adam

from deel.lip.layers import SpectralConv2D, SpectralDense, FrobeniusDense
from deel.lip.activations import MaxMin, GroupSort, GroupSort2, FullSort
from deel.lip.utils import load_model
from deel.lip.losses import HKR_loss, KR_loss, hinge_margin_loss

from model_samples.model_samples import get_lipMLP, get_lipVGG_model

data preparation

For this task we will select two classes: 0 and 8. Labels are changed to {-1,1}, wich is compatible with the Hinge term used in the loss.

from tensorflow.keras.datasets import mnist

# first we select the two classes
selected_classes = [0, 8]  # must be two classes as we perform binary classification

def prepare_data(x, y, class_a=0, class_b=8):
    """
    This function convert the MNIST data to make it suitable for our binary classification
    setup.
    """
    # select items from the two selected classes
    mask = (y==class_a)+(y==class_b)  # mask to select only items from class_a or class_b
    x=x[mask]
    y=y[mask]
    x=x.astype('float32')
    y=y.astype('float32')
    # convert from range int[0,255] to float32[-1,1]
    x/=255
    x=x.reshape((-1,28,28,1))
    # change label to binary classification {-1,1}
    y[y==class_a] = 1.0
    y[y==class_b] = -1.0
    return x, y

# now we load the dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# prepare the data
x_train, y_train = prepare_data(x_train, y_train, selected_classes[0], selected_classes[1])
x_test, y_test = prepare_data(x_test, y_test, selected_classes[0], selected_classes[1])

# display infos about dataset
print("train set size: %i samples, classes proportions: %.3f percent" %
      (y_train.shape[0], 100*y_train[y_train==1].sum()/y_train.shape[0]))
print("test set size: %i samples, classes proportions: %.3f percent" %
      (y_test.shape[0], 100*y_test[y_test==1].sum()/y_test.shape[0]))
train set size: 11774 samples, classes proportions: 50.306 percent
test set size: 1954 samples, classes proportions: 50.154 percent

Build lipschitz Model

Let’s first explicit the paremeters of this experiment

# training parameters
epochs=5
batch_size=128

# network parameters
hidden_layers_size = [128,64,32]
activation = GroupSort #ReLU, MaxMin, GroupSort2

# loss parameters
min_margin=1
alpha = 10

Now we can build the network. Here the experiment is done with a MLP. But Deel-lip also provide state of the art 1-Lipschitz convolutions.

K.clear_session()
# helper function to build the 1-lipschitz MLP
wass=get_lipMLP((28,28,1), hidden_layers_size = hidden_layers_size ,activation=activation, nb_classes = 1,kCoefLip=1.0)
# an other helper function exist to build a VGG model
# wass=get_lipVGG_model((28,28,1),layers_conv=[32,64],layers_dense=[128],activation_conv=GroupSort2,activation_dense=FullSort,use_bias=True , nb_classes = 1, last_activ = None)
wass.summary()
128
64
32
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, 28, 28, 1)]       0
_________________________________________________________________
flatten (Flatten)            (None, 784)               0
_________________________________________________________________
spectral_dense (SpectralDens (None, 128)               100609
_________________________________________________________________
group_sort (GroupSort)       (None, 128)               0
_________________________________________________________________
spectral_dense_1 (SpectralDe (None, 64)                8321
_________________________________________________________________
group_sort_1 (GroupSort)     (None, 64)                0
_________________________________________________________________
spectral_dense_2 (SpectralDe (None, 32)                2113
_________________________________________________________________
group_sort_2 (GroupSort)     (None, 32)                0
_________________________________________________________________
frobenius_dense (FrobeniusDe (None, 1)                 33
=================================================================
Total params: 111,076
Trainable params: 110,849
Non-trainable params: 227
_________________________________________________________________
optimizer = Adam(lr=0.01)
# as the output of our classifier is in the real range [-1, 1], binary accuracy must be redefined
def HKR_binary_accuracy(y_true, y_pred):
    S_true= tf.dtypes.cast(tf.greater_equal(y_true[:,0], 0),dtype=tf.float32)
    S_pred= tf.dtypes.cast(tf.greater_equal(y_pred[:,0], 0),dtype=tf.float32)
    return binary_accuracy(S_true,S_pred)
wass.compile(
    loss=HKR_loss(alpha=alpha,min_margin=min_margin),  # HKR stands for the hinge regularized KR loss
    metrics=[
        KR_loss((-1,1)),  # shows the KR term of the loss
        hinge_margin_loss(min_margin=min_margin),  # shows the hinge term of the loss
        HKR_binary_accuracy  # shows the classification accuracy
    ],
    optimizer=optimizer
)

Learn classification on MNIST

Now the model is build, we can learn the task.

wass.fit(
    x=x_train, y=y_train,
    validation_data=(x_test, y_test),
    batch_size=batch_size,
    shuffle=True,
    epochs=epochs,
    verbose=1
)
Train on 11774 samples, validate on 1954 samples
Epoch 1/5
11774/11774 [==============================] - 5s 426us/sample - loss: -3.8264 - KR_loss_fct: -5.2401 - hinge_margin_fct: 0.1413 - HKR_binary_accuracy: 0.9546 - val_loss: -6.3826 - val_KR_loss_fct: -6.6289 - val_hinge_margin_fct: 0.0269 - val_HKR_binary_accuracy: 0.9889
Epoch 2/5
11774/11774 [==============================] - 2s 194us/sample - loss: -6.5813 - KR_loss_fct: -6.8297 - hinge_margin_fct: 0.0248 - HKR_binary_accuracy: 0.9906 - val_loss: -6.8006 - val_KR_loss_fct: -6.9829 - val_hinge_margin_fct: 0.0202 - val_HKR_binary_accuracy: 0.9908
Epoch 3/5
11774/11774 [==============================] - 2s 206us/sample - loss: -6.8227 - KR_loss_fct: -7.0366 - hinge_margin_fct: 0.0214 - HKR_binary_accuracy: 0.9929 - val_loss: -6.8027 - val_KR_loss_fct: -7.0636 - val_hinge_margin_fct: 0.0270 - val_HKR_binary_accuracy: 0.9893
Epoch 4/5
11774/11774 [==============================] - 2s 206us/sample - loss: -6.9042 - KR_loss_fct: -7.1081 - hinge_margin_fct: 0.0204 - HKR_binary_accuracy: 0.9929 - val_loss: -6.9615 - val_KR_loss_fct: -7.1755 - val_hinge_margin_fct: 0.0233 - val_HKR_binary_accuracy: 0.9913
Epoch 5/5
11774/11774 [==============================] - 2s 207us/sample - loss: -6.9774 - KR_loss_fct: -7.1707 - hinge_margin_fct: 0.0193 - HKR_binary_accuracy: 0.9927 - val_loss: -6.9884 - val_KR_loss_fct: -7.1752 - val_hinge_margin_fct: 0.0215 - val_HKR_binary_accuracy: 0.9918
<tensorflow.python.keras.callbacks.History at 0x1fd64b2a048>

As we can see the model reach a very decent accuracy on this task.