Objective

learn about word embeddings
implement it to classify sentiment of movie reviews

We can use several ways to vectorize text so that we can feed it to the model. Most common one is Tf-Idf. Word embeddings is another way we can use to encode the text into vectors. To simply put, word embeddings maps each word to a vector. The mapping is learned from the data.

GLOVE mebedding

For example, if we had a corpus with 3 sentences

This movie is nice
This movie is bad
This movie sucked

Then our vocabulary would be: ["this", "movie", "is", "nice", "bad", "sucked"]

Let’s map each word in our vocabulary to an integer

 word_to_id = {
"this": 1,
"movie": 2,
"is": 3,
"nice": 4,
"bad": 5,
"sucked": 6
}

Now let’s consider that we want the “embedding dimension” to be 5 i.e. each word is represented by a vector of size 5. Our word embedding might look something like:

embeddings = {
"1": [4.3, 4.6, 85, 34, 43],
"2": [34, 6.4, 34, 45, 46],
...
}

Dataset

Let’s load the data. We’ll use existing dataset provided by Keras.

from keras.datasets import imdb
from keras import preprocessing
(x_train, y_train), (x_test, y_test) = imdb.load_data()

The reviews are encoded into numbers like we saw above. Let’s create two dictionaries to map from word to index and index to word. Then we’ll print the first review after mapping index to corresponding word.

# word to index mapping
w_id = imdb.get_word_index()
# index to word mapping
id_w = {i: w for w, i in w_id.items()}
 
for idx in x_train[0]:
    print(id_w[idx], end=" ")

the as you with out themselves powerful lets loves their becomes reaching had journalist of lot from anyone to have after out atmosphere never more room titillate it so heart shows to years of every never going villaronga help moments or of every chest visual movie except her was several of enough more with is now current film as you of mine potentially unfortunately of you than him that with out themselves her get for was camp of you movie sometimes movie that with scary but pratfalls to story wonderful that in seeing in character to of 70s musicians with heart had shadows they of here that with her serious to have does when from why what have critics they is you that isn't one will very to as itself with other tricky in of seen over landed for anyone of gilmore's br show's to whether from than out themselves history he name half some br of 'n odd was two most of mean for 1 any an boat she he should is thought frog but of script you not while history he heart to real at barrel but when from one bit then have two of script their with her nobody most that with wasn't to with armed acting watch an for with heartfelt film want an

Similary, we can encode the words into integers using the dictionary we created above.

for w in ["this", "moive", "is", "bad"]:
    print(w_id[w], end=" ")

11 86087 6 75

There are a couple of things we need to define- maximum number of vocabulary and maximum number of words in a review. We’ll arbitrarily choose 5000 and 400 respectively.

max_vocabulary = 5000
max_sentence_length = 400

Since each review has different word counts, we need to make sure that all reviews have same word counts. We will use pad_sequences function provided by keras. By default it will add 0s in-front of the review if it contains less than max_sentence_length. If it contains more than max_sentence_length words, then it will remove words from beginning of the review.

x_train = preprocessing.sequence.pad_sequences(x_train, maxlen=max_sentence_length)
x_test = preprocessing.sequence.pad_sequences(x_test, maxlen=max_sentence_length)

Model

We’ll use 1D Convolution instead of RNN since it trains faster. Also, we’ll use GlobalMaxPool1D to convert 2D output from the convolution layer to 1D so that we can feed this to the Dense layer.

from keras.models import Model
from keras.layers import Input, Dense, Conv1D, Embedding, GlobalMaxPool1D, Dropout, Activation
 
input_layer = Input(shape=(max_sentence_length, ))
_ = Embedding(max_vocabulary, output_dim=50)(input_layer)
_ = Dropout(0.2)(_)
_ = Conv1D(filters=250, kernel_size=3, activation="relu", padding="valid", strides=1)(_)
_ = GlobalMaxPool1D()(_)
_ = Dense(units=256)(_)
_ = Dropout(0.2)(_)
_ = Activation("relu")(_)
_ = Dense(units=1)(_)
_ = Activation("sigmoid")(_)
 
model = Model(input_layer, _)
 
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #  
=================================================================
input_22 (InputLayer)        (None, 400)               0        
_________________________________________________________________
embedding_22 (Embedding)     (None, 400, 50)           250000  
_________________________________________________________________
dropout_42 (Dropout)         (None, 400, 50)           0        
_________________________________________________________________
conv1d_30 (Conv1D)           (None, 398, 250)          37750    
_________________________________________________________________
global_max_pooling1d_21 (Glo (None, 250)               0        
_________________________________________________________________
dense_41 (Dense)             (None, 256)               64256    
_________________________________________________________________
dropout_43 (Dropout)         (None, 256)               0        
_________________________________________________________________
activation_7 (Activation)    (None, 256)               0        
_________________________________________________________________
dense_42 (Dense)             (None, 1)                 257      
_________________________________________________________________
activation_8 (Activation)    (None, 1)                 0        
=================================================================
Total params: 352,263
Trainable params: 352,263
Non-trainable params: 0

model.fit(x_train, y_train, epochs=4, batch_size=32)
model.evaluate(x_test, y_test, batch_size=256)

Epoch 1/4
25000/25000 [==============================] - 17s 692us/step - loss: 0.6609 - acc: 0.6015
Epoch 2/4
25000/25000 [==============================] - 15s 605us/step - loss: 0.5518 - acc: 0.7228
Epoch 3/4
25000/25000 [==============================] - 15s 605us/step - loss: 0.4379 - acc: 0.8002
Epoch 4/4
25000/25000 [==============================] - 15s 612us/step - loss: 0.3653 - acc: 0.8398
 
25000/25000 [==============================] - 3s 128us/step
[0.3153888236236572, 0.86172]

We managed to achieve 86% accuracy in test set. Cool!

Twitter Facebook LinkedIn

Word Embeddings in Keras

Sanjaya Subedi

Objective

Dataset

Model

Comments

You May Also Enjoy

How Does Triplet Loss and Online Triplet Mining Work?

Decoding strategies in Decoder models (LLMs)

Implementing Transformer Decoder Layer From Scratch

Implementing Transformer Encoder Layer From Scratch