RNN in TensorFlow to classify the sentiment of movie reviews as positive or negative
Here is an example of how you can use a simple recurrent neural network (RNN) in TensorFlow to classify the sentiment of movie reviews as positive or negative. This example assumes that you have already pre-processed the movie review data and split it into training and testing sets.
First, you will need to install TensorFlow and import the necessary modules:
!pip install tensorflow
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
Next, you will need to define the hyperparameters for the model:
vocab_size = 10000 # Size of the vocabulary
max_length = 200 # Maximum length of a review
embedding_dim = 16 # Dimension of the embedding layer
batch_size = 128 # Batch size for training
num_epochs = 10 # Number of epochs to train for
Then, you can define the model using the Sequential
API:
model = tf.keras.Sequential()
# Add an embedding layer
model.add(tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length))
# Add a simple RNN layer
model.add(tf.keras.layers.SimpleRNN(32))
# Add a dense layer with sigmoid activation for classification
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
# Compile the model with an Adam optimizer and binary cross-entropy loss
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Next, you can fit the model to the training data using the fit
method:
# Convert the training data into sequences of integers
tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>")
tokenizer.fit_on_texts(train_texts)
sequences = tokenizer.texts_to_sequences(train_texts)
# Pad the sequences to the same length
padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post')
# Fit the model to the training data
model.fit(padded_sequences, train_labels, batch_size=batch_size, epochs=num_epochs)
Finally, you can evaluate the model on the testing data using the evaluate
method:
# Convert the testing data into sequences of integers
test_sequences = tokenizer.texts_to_sequences(test_texts)
# Pad the sequences to the same length
test_padded_sequences = pad_sequences(test_sequences, maxlen=max_length, padding='post', truncating='post')
# Evaluate the model on the testing data
loss, accuracy = model.evaluate(test_padded_sequences, test_labels)
print('Test loss:', loss)
print('Test accuracy:', accuracy)
Here is the full code for using a simple recurrent neural network (RNN) in TensorFlow to classify the sentiment of movie reviews as positive or negative:# Install TensorFlow and import the necessary modules
!pip install tensorflow
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Define the hyperparameters for the model
vocab_size = 10000 # Size of the vocabulary
max_length = 200 # Maximum length of a review
embedding_dim = 16 # Dimension of the embedding layer
batch_size = 128 # Batch size for training
num_epochs = 10 # Number of epochs to train for
# Define the model using the Sequential API
model = tf.keras.Sequential()
# Add an embedding layer
model.add(tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length))
# Add a simple RNN layer
model.add(tf.keras.layers.SimpleRNN(32))
# Add a dense layer with sigmoid activation for classification
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
# Compile the model with an Adam optimizer and binary cross-entropy loss
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Convert the training data into sequences of integers
tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>")
tokenizer.fit_on_texts(train_texts)
sequences = tokenizer.texts_to_sequences(train_texts)
# Pad the sequences to the same length
padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post')
# Fit the model to the training data
model.fit(padded_sequences, train_labels, batch_size=batch_size, epochs=num_epochs)
# Convert the testing data into sequences of integers
test_sequences = tokenizer.texts_to_sequences(test_texts)
# Pad the sequences to the same length
test_padded_sequences = pad_sequences(test_sequences, maxlen=max_length, padding='post', truncating='post')
# Evaluate the model on the testing data
loss, accuracy = model.evaluate(test_padded_sequences, test_labels)
print('Test loss:', loss)
print('Test accuracy:', accuracy)
This code assumes that you have already loaded the movie review data into the variables train_texts
, train_labels
, test_texts
, and test_labels
. The train_texts
and test_texts
variables should contain the movie review text, and the train_labels
and test_labels
variables should contain the corresponding labels (0 for negative reviews and 1 for positive reviews).
Leave a Comment