Appendix A Automatic differentiation made easy (9.8 MB) Appendix A A typical training loop (16.76 MB) Appendix A Exercise answers (2.99 MB) Appendix A Further reading (4.51 MB) Appendix A Implementing multilayer neural networks (19.08 MB) Appendix A Introduction to PyTorch (33.01 MB) Appendix A Optimizing training performance with GPUs (38.68 MB) Appendix A Saving and loading models (3.63 MB) Appendix A Seeing models as computation graphs (5.71 MB) Appendix A Setting up efficient data loaders (19.08 MB) Appendix A Summary (4.48 MB) Appendix A Understanding tensors (14 MB) Appendix D Adding Bells and Whistles to the Training Loop (5.74 MB) Appendix D Cosine decay (3.99 MB) Appendix D Gradient clipping (5.83 MB) Appendix D The modified training function (3.57 MB) Appendix E Initializing the model (2.82 MB) Appendix E Parameter-efficient finetuning with LoRA (1) (20.14 MB) Appendix E Parameter-efficient Finetuning with LoRA (10.13 MB) Appendix E Preparing the dataset (3.09 MB) Chapter 1 Applications of LLMs (4.94 MB) Chapter 1 A closer look at the GPT architecture (13.74 MB) Chapter 1 Building a large language model (3.99 MB) Chapter 1 Introducing the transformer architecture (20.58 MB) Chapter 1 Stages of building and using LLMs (10.97 MB) Chapter 1 Summary (3.61 MB) Chapter 1 Understanding Large Language Models (21.23 MB) Chapter 1 Utilizing large datasets (8.86 MB) Chapter 2 Adding special context tokens (14.78 MB) Chapter 2 Byte pair encoding (9.53 MB) Chapter 2 Converting tokens into token IDs (12.67 MB) Chapter 2 Creating token embeddings (11.93 MB) Chapter 2 Data sampling with a sliding window (21.35 MB) Chapter 2 Encoding word positions (14.05 MB) Chapter 2 Summary (6.54 MB) Chapter 2 Tokenizing text (11.56 MB) Chapter 2 Working with Text Data (21.67 MB) Chapter 3 Attending to different parts of the input with self-attention (34.74 MB) Chapter 3 Capturing data dependencies with attention mechanisms (7.02 MB) Chapter 3 Coding Attention Mechanisms (15.89 MB) Chapter 3 Extending single-head attention to multi-head attention (32.23 MB) Chapter 3 Hiding future words with causal attention (27.14 MB) Chapter 3 Implementing self-attention with trainable weights (38.89 MB) Chapter 3 Summary (6 MB) Chapter 4 Adding shortcut connections (14.61 MB) Chapter 4 Coding the GPT model (20.64 MB) Chapter 4 Connecting attention and linear layers in a transformer block (13.91 MB) Chapter 4 Generating text (19.76 MB) Chapter 4 Implementing a feed forward network with GELU activations (17.29 MB) Chapter 4 Implementing a GPT model from Scratch To Generate Text (30.36 MB) Chapter 4 Normalizing activations with layer normalization (29.93 MB) Chapter 4 Summary (5.61 MB) Chapter 5 Decoding strategies to control randomness (26.08 MB) Chapter 5 Loading and saving model weights in PyTorch (7.18 MB) Chapter 5 Loading pretrained weights from OpenAI (18.73 MB) Chapter 5 Pretraining on Unlabeled Data (63.37 MB) Chapter 5 Summary (4.29 MB) Chapter 5 Training an LLM (18.33 MB) Chapter 6 Adding a classification head (21.65 MB) Chapter 6 Calculating the classification loss and accuracy (12.04 MB) Chapter 6 Creating data loaders (14.7 MB) Chapter 6 Finetuning for Classification (11.72 MB) Chapter 6 Finetuning the model on supervised data (15.78 MB) Chapter 6 Initializing a model with pretrained weights (5.08 MB) Chapter 6 Preparing the dataset (9.65 MB) Chapter 6 Summary (5.37 MB) Chapter 6 Using the LLM as a spam classifier (4.13 MB) Chapter 7 Conclusions (8.12 MB) Chapter 7 Creating data loaders for an instruction dataset (9.42 MB) Chapter 7 Evaluating the finetuned LLM (23.67 MB) Chapter 7 Extracting and saving responses (14.25 MB) Chapter 7 Finetuning the LLM on instruction data (17.46 MB) Chapter 7 Finetuning to Follow Instructions (8.6 MB) Chapter 7 Loading a pretrained LLM (11.09 MB) Chapter 7 Organizing data into training batches (33.77 MB) Chapter 7 Preparing a dataset for supervised instruction finetuning (10.82 MB) Chapter 7 Summary (4.25 MB)