Text Generation Model using Python

← Previous Article All Articles Next Article →

A Text Generation Model is a type of Natural Language Processing (NLP) model that automatically generates human-like text. It can produce coherent and contextually relevant text based on the input text. So, if you want to learn how to build a Text Generation Model, this article is for you. In this article, I’ll take you through the task of building a Text Generation Model with Deep Learning using the Python programming language.

Text Generation Model: Process We Can Follow

Text Generation Models have various applications, such as content creation, chatbots, automated story writing, and more. They often utilize advanced Machine Learning techniques, particularly Deep Learning models like Recurrent Neural Networks (RNNs), Long Short-Term Memory Networks (LSTMs), and Transformer models like GPT (Generative Pre-trained Transformer).

Below is the process we can follow for the task of building a Text Generation Model:

  1. Understand what you want to achieve with the text generation model (e.g., chatbot responses, creative writing, code generation).
  2. Consider the style, complexity, and length of the text to be generated.
  3. Collect a large dataset of text that’s representative of the style and content you want to generate.
  4. Clean the text data (remove unwanted characters, correct spellings), and preprocess it (tokenization, lowercasing, removing stop words if necessary).
  5. Choose a deep neural network architecture to handle sequences for text generation.
  6. Frame the problem as a sequence modelling task where the model learns to predict the next words in a sequence.
  7. Use your text data to train the model.

For this task, we can use the Tiny Shakespeare dataset because of two reasons:

  1. It’s available in the format of dialogues, so you will learn how to generate text in the form of dialogues.
  2. Usually, we need huge textual datasets for building text generation models. The Tiny Shakespeare dataset is already available in the tensorflow datasets, so we don’t need to download any dataset externally.

Text Generation Model using Python

So, let’s understand how to build a Text Generation Model with Deep Learning using Python. I’ll start this task by importing the necessary Python libraries and the dataset:

import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np

# load the Tiny Shakespeare dataset
dataset, info = tfds.load('tiny_shakespeare', with_info=True, as_supervised=False)

Our dataset contains data in a textual format. Language models need numerical data, so we’ll convert the text to sequences of integers. We’ll also create sequences for training:

# get the text from the dataset
text = next(iter(dataset['train']))['text'].numpy().decode('utf-8')

# create a mapping from unique characters to indices
vocab = sorted(set(text))
char2idx = {char: idx for idx, char in enumerate(vocab)}
idx2char = np.array(vocab)

# numerically represent the characters
text_as_int = np.array([char2idx[c] for c in text])

# create training examples and targets
seq_length = 100
examples_per_epoch = len(text) // (seq_length + 1)

# create training sequences
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

sequences = char_dataset.batch(seq_length + 1, drop_remainder=True)

For each sequence, we will now duplicate and shift it to form the input and target text by using the map method to apply a simple function to each batch:

def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

Now, we’ll shuffle the dataset and pack it into training batches:

# batch size and buffer size
BATCH_SIZE = 64
BUFFER_SIZE = 10000

dataset = (
    dataset
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE, drop_remainder=True)
    .prefetch(tf.data.experimental.AUTOTUNE)
)

Now, we’ll use a simple Recurrent Neural Network model with a few layers to build the model:

# length of the vocabulary
vocab_size = len(vocab)

# the embedding dimension
embedding_dim = 256

# number of RNN units
rnn_units = 1024

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
        tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]),
        tf.keras.layers.LSTM(rnn_units, return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dense(vocab_size)
    ])
    return model

model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)

We’ll now choose an optimizer and a loss function to compile the model:

def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

model.compile(optimizer='adam', loss=loss)

We’ll now train the model:

import os

# directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'

# name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True
)

# train the model
EPOCHS = 10
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])
Epoch 1/10
155/155 [==============================] - 37s 211ms/step - loss: 2.6553
Epoch 2/10
155/155 [==============================] - 33s 212ms/step - loss: 1.9554
Epoch 3/10
155/155 [==============================] - 32s 204ms/step - loss: 1.6952
Epoch 4/10
155/155 [==============================] - 32s 205ms/step - loss: 1.5474
Epoch 5/10
155/155 [==============================] - 32s 203ms/step - loss: 1.4565
Epoch 6/10
155/155 [==============================] - 32s 205ms/step - loss: 1.3947
Epoch 7/10
155/155 [==============================] - 33s 208ms/step - loss: 1.3468
Epoch 8/10
155/155 [==============================] - 32s 205ms/step - loss: 1.3059
Epoch 9/10
155/155 [==============================] - 32s 205ms/step - loss: 1.2686
Epoch 10/10
155/155 [==============================] - 33s 212ms/step - loss: 1.2339

After training, we can now use the model to generate text. First, we will restore the latest checkpoint and rebuild the model with a batch size of 1:

model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))
model.build(tf.TensorShape([1, None]))

Now, to generate text, we’ll input a seed string, predict the next character, and then add it back to the input, continuing this process to generate longer text:

def generate_text(model, start_string):
    num_generate = 1000

    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    text_generated = []

    model.reset_states()
    for i in range(num_generate):
        predictions = model(input_eval)
        predictions = tf.squeeze(predictions, 0)

        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
        input_eval = tf.expand_dims([predicted_id], 0)

        text_generated.append(idx2char[predicted_id])

    return (start_string + ''.join(text_generated))

print(generate_text(model, start_string=u"QUEEN: So, lets end this"))
QUEEN: So, lets end this less
Than the pointain to prison? 'tis the place
That thought not down? even charping on peace
By souther to corrupt the tears--O' mine!

JULIET:
If it both my subject for succare,
As said that speaks that did in this young pleasant field go 'll.

MENEPIUS:
No, my good lord, it was preserves at my book,
With silstity take in advice,
Sected yort;
And more bring a screporature and his bones,
A heart of the prince and mind for herself:
He this the devilining Past you gave in that
proclaim than takes not so light;
Since showing with the eagle blembours' more, but it shall prove
These tongue fasting of a horse shall be in
The aspergo and lawful rightly father,
As I shall say that deneral prince,
That ever marning sweet Tybalt of York,
And made them weal an traitor, prevails, I will live
All consceers and in a fielder head;
And, if you move her signif live was shear--hearted mean's end:
He did Sell's daughter that full us arrigabe,
And patience discover'd; as the malmerer
should say 'show t

The generate_text function in the above code uses a trained Recurrent Neural Network model to generate a sequence of text, starting with a given seed phrase (start_string). It converts the seed phrase into a sequence of numeric indices, feeds these indices into the model, and then iteratively generates new characters, each time using the model’s most recent output as the input for the next step. This process continues for a specified number of iterations (num_generate), resulting in a stream of text that extends from the initial seed.

The function employs randomness in character selection to ensure variability in the generated text, and the final output is a concatenation of the seed phrase with the newly generated characters, typically reflecting the style and content of the training data used for the model.

Summary

So, this is how you can build a Text Generation Model with Deep Learning using Python. Text Generation Models have various applications, such as content creation, chatbots, automated story writing, and more. They often utilize advanced Machine Learning techniques, particularly Deep Learning models like Recurrent Neural Networks (RNNs), Long Short-Term Memory Networks (LSTMs), and Transformer models like GPT (Generative Pre-trained Transformer).

I hope you liked this article on building a Text Generation Model with Deep Learning using Python. Feel free to ask valuable questions in the comments section below. You can follow me on Instagram for many more resources.