Skip to main content

Introduction to minGPT

· 2 min read

minGPT is a minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training. It is designed to be a simple, educational resource for those who want to learn more about how GPT works and perhaps even train their own version of the model.

The library is composed of three main files:

  1. mingpt/ Contains the GPT model itself, which is a transformer-based neural network.
  2. mingpt/ Provides a training loop that can be used to train the GPT model.
  3. mingpt/ Implements byte pair encoding, a form of tokenization used for processing text for input into the model.

Here's a basic overview of how you might use minGPT to instantiate and train a GPT model:


To use minGPT, you would typically clone the repository from GitHub and then install any necessary dependencies.

git clone
cd minGPT
pip install -e .

Instantiating a GPT Model

Here's an example of how you might instantiate a GPT-2 model using minGPT:

from mingpt.model import GPT, GPTConfig

# Define the configuration for the GPT model (here, using GPT-2's configuration)
model_config = GPTConfig(vocab_size=50257, # openai's model vocabulary
block_size=1024, # openai's model block_size (i.e., input context length)
n_layer=12, # number of layers
n_head=12, # number of heads
n_embd=768) # number of embedding dimensions

# Instantiate the model
model = GPT(model_config)

Training the GPT Model

To train the model, you would need to create a dataset and use the Trainer class provided by minGPT:

from mingpt.trainer import Trainer, TrainerConfig
from import Dataset

# Define a PyTorch dataset
class YourDataset(Dataset):
# Implement dataset methods (__len__ and __getitem__)

# Instantiate the dataset
train_dataset = YourDataset()

# Define the training configuration
train_config = TrainerConfig(max_epochs=1, # number of epochs
batch_size=64, # batch size

# Instantiate the trainer
trainer = Trainer(model, train_dataset, None, train_config)

# Start the training

Additional Notes

  • The actual training loop would involve more details, such as data preprocessing, tokenization, and handling of the training iterations and logging.
  • The repository at also includes Jupyter notebooks (demo.ipynb and generate.ipynb) that likely contain more detailed examples and usage scenarios.