RNN in TensorFlow to classify the sentiment of movie reviews as positive or negative

Here is an example of how you can use a simple recurrent neural network (RNN) in TensorFlow to classify the sentiment of movie reviews as positive or negative. This example assumes that you have already pre-processed the movie review data and split it into training and testing sets.

First, you will need to install TensorFlow and import the necessary modules:

!pip install tensorflow import tensorflow as tf from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences

Next, you will need to define the hyperparameters for the model:

vocab_size = 10000 # Size of the vocabulary max_length = 200 # Maximum length of a review embedding_dim = 16 # Dimension of the embedding layer batch_size = 128 # Batch size for training num_epochs = 10 # Number of epochs to train for

Then, you can define the model using the Sequential API:

model = tf.keras.Sequential() # Add an embedding layer model.add(tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length)) # Add a simple RNN layer model.add(tf.keras.layers.SimpleRNN(32)) # Add a dense layer with sigmoid activation for classification model.add(tf.keras.layers.Dense(1, activation='sigmoid')) # Compile the model with an Adam optimizer and binary cross-entropy loss model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Next, you can fit the model to the training data using the fit method:

# Convert the training data into sequences of integers tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>") tokenizer.fit_on_texts(train_texts) sequences = tokenizer.texts_to_sequences(train_texts) # Pad the sequences to the same length padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post') # Fit the model to the training data model.fit(padded_sequences, train_labels, batch_size=batch_size, epochs=num_epochs)

Finally, you can evaluate the model on the testing data using the evaluate method:

# Convert the testing data into sequences of integers test_sequences = tokenizer.texts_to_sequences(test_texts) # Pad the sequences to the same length test_padded_sequences = pad_sequences(test_sequences, maxlen=max_length, padding='post', truncating='post') # Evaluate the model on the testing data loss, accuracy = model.evaluate(test_padded_sequences, test_labels) print('Test loss:', loss) print('Test accuracy:', accuracy)
 Here is the full code for using a simple recurrent neural network (RNN) in TensorFlow to classify the sentiment of movie reviews as positive or negative:
# Install TensorFlow and import the necessary modules !pip install tensorflow import tensorflow as tf from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences # Define the hyperparameters for the model vocab_size = 10000 # Size of the vocabulary max_length = 200 # Maximum length of a review embedding_dim = 16 # Dimension of the embedding layer batch_size = 128 # Batch size for training num_epochs = 10 # Number of epochs to train for # Define the model using the Sequential API model = tf.keras.Sequential() # Add an embedding layer model.add(tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length)) # Add a simple RNN layer model.add(tf.keras.layers.SimpleRNN(32)) # Add a dense layer with sigmoid activation for classification model.add(tf.keras.layers.Dense(1, activation='sigmoid')) # Compile the model with an Adam optimizer and binary cross-entropy loss model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # Convert the training data into sequences of integers tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>") tokenizer.fit_on_texts(train_texts) sequences = tokenizer.texts_to_sequences(train_texts) # Pad the sequences to the same length padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post') # Fit the model to the training data model.fit(padded_sequences, train_labels, batch_size=batch_size, epochs=num_epochs) # Convert the testing data into sequences of integers test_sequences = tokenizer.texts_to_sequences(test_texts) # Pad the sequences to the same length test_padded_sequences = pad_sequences(test_sequences, maxlen=max_length, padding='post', truncating='post') # Evaluate the model on the testing data loss, accuracy = model.evaluate(test_padded_sequences, test_labels) print('Test loss:', loss) print('Test accuracy:', accuracy)

This code assumes that you have already loaded the movie review data into the variables train_texts, train_labels, test_texts, and test_labels. The train_texts and test_texts variables should contain the movie review text, and the train_labels and test_labels variables should contain the corresponding labels (0 for negative reviews and 1 for positive reviews).


No comments

Powered by Blogger.