GPT-Simple¶
A clean, readable framework for pretraining language models from scratch. GPT-Simple handles the full LLM pretraining workflow — tokenization, streaming data loading, multi-GPU training, deterministic stop/resume, and inference — through a single YAML config and a small CLI.
Install¶
pip install gpt-simple-lm
The distribution is named gpt-simple-lm; you import it as gpt_simple and run
the gpt-simple CLI. For source installs and optional extras, see the
project README.
Quick start¶
gpt-simple init -o config.yaml # write a commented config template
gpt-simple tokenize --input_dir ./raw_data \
--output_dir ./data/tokenized --tokenizer_path gpt2
gpt-simple train --config config.yaml # add --nproc_per_node N for multi-GPU
gpt-simple generate --output-dir ./outputs --prompt "Once upon a time"
New here? Start with the Training guide; every config field is documented in the Configuration reference.
Guides¶
These pages go deeper than the quick start above — each owns a single concern.
| Guide | What it covers |
|---|---|
| Architecture | The built-in model: decoder block, RoPE, normalization, MLP variants, attention backends, weight tying, KV-cache. |
| Configuration | Every config field — meaning and valid values — for the model / data / optimizer / training sections. |
| Data pipeline | Tokenization, the .bin/.idx format, pretokenized vs JSONL, sequence packing, document windowing, curriculum buckets. |
| Training | Running single- and multi-GPU training, mixed precision, torch.compile, gradient checkpointing, W&B. |
| Checkpointing & resume | On-disk layout, the deterministic stop/resume model, walltime budgets, signals, topology-agnostic data cursors. |
| Orchestration | Running long, chained jobs under any orchestrator (SLURM, Kubernetes, a local loop). |
| Inference | generate and batch-generate, sampling parameters, the batch JSONL schema, dry-run validation. |
| Hardware tuning | Getting peak throughput from your GPUs: precision, attention backend selection, batch size, dataloader workers. |
| Performance | Measured throughput on a real 2.8B run, MFU/HFU methodology, how to measure FLOPs, and how it compares to other libraries. |
Source of truth¶
These guides describe behavior and intent. When a default value or an exact field list matters, the authoritative source is always the code:
- Config fields and defaults:
src/gpt_simple/config.py(or rungpt-simple initfor a commented template). - Model architecture:
src/gpt_simple/model.py. - Public Python API:
src/gpt_simple/__init__.py.