How to use keras embedding layer with 3D tensor input?

debugcn 投稿 Dev

Abdul Karim Khan

I am facing difficulty in using Keras embedding layer with one hot encoding of my input data.

Following is the toy code.

Import packages

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding
from keras.optimizers import Adam
import matplotlib.pyplot as plt
import numpy as np
import openpyxl
import pandas as pd
from keras.callbacks import ModelCheckpoint
from keras.callbacks import ReduceLROnPlateau

The input data is text based as follows.

Train and Test data

X_train_orignal= np.array(['OC(=O)C1=C(Cl)C=CC=C1Cl', 'OC(=O)C1=C(Cl)C=C(Cl)C=C1Cl',
       'OC(=O)C1=CC=CC(=C1Cl)Cl', 'OC(=O)C1=CC(=CC=C1Cl)Cl',
       'OC1=C(C=C(C=C1)[N+]([O-])=O)[N+]([O-])=O'])

X_test_orignal=np.array(['OC(=O)C1=CC=C(Cl)C=C1Cl', 'CCOC(N)=O',
       'OC1=C(Cl)C(=C(Cl)C=C1Cl)Cl'])

Y_train=np.array(([[2.33],
       [2.59],
       [2.59],
       [2.54],
       [4.06]]))

Y_test=np.array([[2.20],
   [2.81],
   [2.00]])

Creating dictionaries

Now i create two dictionaries, characters to index vice. The unique character number is stored in len(charset) and maximum length of the string along with 5 additional characters is stored in embed. The start of each string will be padded with ! and end will be E.

charset = set("".join(list(X_train_orignal))+"!E")
char_to_int = dict((c,i) for i,c in enumerate(charset))
int_to_char = dict((i,c) for i,c in enumerate(charset))
embed = max([len(smile) for smile in X_train_orignal]) + 5
print (str(charset))
print(len(charset), embed)

One hot encoding

I convert all the train data into one hot encoding as follows.

def vectorize(smiles):
        one_hot =  np.zeros((smiles.shape[0], embed , len(charset)),dtype=np.int8)
        for i,smile in enumerate(smiles):
            #encode the startchar
            one_hot[i,0,char_to_int["!"]] = 1
            #encode the rest of the chars
            for j,c in enumerate(smile):
                one_hot[i,j+1,char_to_int[c]] = 1
            #Encode endchar
            one_hot[i,len(smile)+1:,char_to_int["E"]] = 1

        return one_hot[:,0:-1,:]

X_train = vectorize(X_train_orignal)
print(X_train.shape)
X_test = vectorize(X_test_orignal)
print(X_test.shape)

When it converts the input train data into one hot encoding, the shape of the one hot encoded data becomes (5, 44, 14) for train and (3, 44, 14) for test. For train, there are 5 example, 0-44 is the maximum length and 14 are the unique characters. The examples for which there are less number of characters, are padded with E till the maximum length.

Verifying the correct padding Following is the code to verify if we have done the padding rightly.

mol_str_train=[]
mol_str_test=[]
for x in range(5):

    mol_str_train.append("".join([int_to_char[idx] for idx in np.argmax(X_train[x,:,:], axis=1)]))

for x in range(3):
    mol_str_test.append("".join([int_to_char[idx] for idx in np.argmax(X_test[x,:,:], axis=1)]))

and let's see, how the train set looks like.

mol_str_train

['!OC(=O)C1=C(Cl)C=CC=C1ClEEEEEEEEEEEEEEEEEEEE',
 '!OC(=O)C1=C(Cl)C=C(Cl)C=C1ClEEEEEEEEEEEEEEEE',
 '!OC(=O)C1=CC=CC(=C1Cl)ClEEEEEEEEEEEEEEEEEEEE',
 '!OC(=O)C1=CC(=CC=C1Cl)ClEEEEEEEEEEEEEEEEEEEE',
 '!OC1=C(C=C(C=C1)[N+]([O-])=O)[N+]([O-])=OEEE']

Now is the time to build model.

Model

model = Sequential()
model.add(Embedding(len(charset), 10, input_length=embed))
model.add(Flatten())
model.add(Dense(1, activation='linear'))

def coeff_determination(y_true, y_pred):
    from keras import backend as K
    SS_res =  K.sum(K.square( y_true-y_pred ))
    SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
    return ( 1 - SS_res/(SS_tot + K.epsilon()) )

def get_lr_metric(optimizer):
    def lr(y_true, y_pred):
        return optimizer.lr
    return lr


optimizer = Adam(lr=0.00025)
lr_metric = get_lr_metric(optimizer)
model.compile(loss="mse", optimizer=optimizer, metrics=[coeff_determination, lr_metric])



callbacks_list = [
    ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-15, verbose=1, mode='auto',cooldown=0),
    ModelCheckpoint(filepath="weights.best.hdf5", monitor='val_loss', save_best_only=True, verbose=1, mode='auto')]


history =model.fit(x=X_train, y=Y_train,
                              batch_size=1,
                              epochs=10,
                              validation_data=(X_test,Y_test),
                              callbacks=callbacks_list)

Error

ValueError: Error when checking input: expected embedding_3_input to have 2 dimensions, but got array with shape (5, 44, 14)

The embedding layer expects two dimensional array. How can I deal with this issue so that it can accept the one hot vector encoded data.

All the above code can be run.

Nomiluks

our input shape was not defined properly in the embedding layer. The following code works for me by reducing the steps to covert your data dimensions to 2D you can directly pass the 3-D input to your embedding layer.

#THE MISSING STUFF
#_________________________________________
Y_train = Y_train.reshape(5) #Dense layer contains a single unit so need to input single dimension array
max_len = len(charset)
max_features = embed-1
inputshape = (max_features, max_len) #input shape didn't define. Embedding layer can accept 3D input by using input_shape
#__________________________________________

model = Sequential()
#model.add(Embedding(len(charset), 10, input_length=14))

model.add(Embedding(max_features, 10, input_shape=inputshape))#input_length=max_len))
model.add(Flatten())
model.add(Dense(1, activation='linear'))
print(model.summary())

optimizer = Adam(lr=0.00025)
lr_metric = get_lr_metric(optimizer)
model.compile(loss="mse", optimizer=optimizer, metrics=[coeff_determination, lr_metric])


callbacks_list = [
    ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-15, verbose=1, mode='auto',cooldown=0),
    ModelCheckpoint(filepath="weights.best.hdf5", monitor='val_loss', save_best_only=True, verbose=1, mode='auto')]

history =model.fit(x=X_train, y=Y_train,
                              batch_size=10,
                              epochs=10,
                              validation_data=(X_test,Y_test),
                              callbacks=callbacks_list)

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-7

コメントを追加

サインイン

分類Dev

Related 関連記事

記事

How to use keras embedding layer with 3D tensor input?

How to use keras embedding layer with 3D tensor input?

How to input a list to the embedding layer?

how to build Sequence-to-sequence autoencoder in keras with embedding layer?

isinstance() to check Keras Layer Type on Tensor

How to use a scipy function on each element of a tensor using Keras?

Keras: ValueError: Input 0 of layer sequential_1 is incompatible with the layer: expected ndim=3, found ndim=2

How to ignore some input layer, while predicting, in a keras model trained with multiple input layers?

couldn't run embedding network Keras with multiplue input

How to specify padding with keras in Conv2D layer?

How to cache layer activations in Keras?

How to use Keras LSTM batch_input_size properly

How to replace (or insert) intermediate layer in Keras model?

How to input cifar10 into inceptionv3 in keras

How to input a 2D array in Keras-Python?

How to use Keras TimeseriesGenerator

Is there a way to use the native tf Attention layer with keras Sequential API?

How to reshape (None, 10)-dimensional tensor to (None, None, 10) in Keras?

Tensor Flow 2.0、kerasのConv2Dレイヤーでinput_shapeを指定する方法

How to use mouse to rotate matplotlib 3D plots in wxPython?

Slice tensor in Keras Tensorflow

Use "Flatten" or "Reshape" to get 1D output of unknown input shape in keras

Different methods for initializing embedding layer weights in Pytorch

How to specify the axis when using the softmax activation in a Keras layer?

How to support masking in custom tf.keras.layers.Layer

How to remove the FC layer off of a fine turned model keras

How to implement custom output layer with dynamic shape in Keras?

Pytorch Inner Product of 3D tensor with 1D Tensor to generate 2D Tensor

Modify layer parameters in Keras

Splitting cnn layer in keras

How to get input tensor shape of an unknown PyTorch model